Debugging AR8035 ethernet on EAP120 which only transmits at 1000M, not 10M or 100M

I'm having some fun with the AR8035 ethernet interface on a TP-Link EAP120. All the hardware works fine with 18.06.5 and openwrt.git master, apart from eth0, which reliably autonegotiates at any speed but only successfully transmits packets on a gigabit link. On slower links, sent packets silently vanish although received packets arrive fine.

If I provide a 100M or 10M Linux host to cross-connect to, it brings up the link with the correct speed/duplex and the EAP120 receives packets fine (seen on tcpdump or just in /sys/class/net/eth0/statistics/*), but no outbound packets arrive at the destination host's NIC. (No RX errors of any kind, just silence.) Flip the destination interface to 1000M, it renegotiates up and everything starts working fine; drop it back to 100M, it renegotiates down correctly but all outbound packets vanish again.

I can replicate with a flashed stock 18.06.5 image, a flashed freshly built 18.06.5 with CONFIG_TARGET_ar71xx_generic_DEVICE_eap120-v1 and all options left untouched, and with kernel+initramfs builds of both 18.06.5 and openwrt.git master. (I have serial console and bootloader access, so testing hacked kernels with TFTPed initramfs images is easier than flashing every time.)

In each case, an unchanged factory-default configuration shows the problem: by default OpenWRT brings up a bridge with an address and eth0 enslaved, which I can prod at with an external linux host (and a variety of other devices to double check) at various speeds. For completeness, I've replaced the EAP120 with a different type of OpenWRT device with the same versions installed and default factory configs, to confirm OpenWRT devices normally behave as expected on the same test, autonegotiating and working at all speeds.

I've tried releasing eth0 from the bridge and configuring it manually to rule out problems with promiscuous mode: symptoms are identical.

I've tried (without much insight) twiddling the various options in target/linux/ar71xx/files/arch/mips/ath79/mach-eap120.c such as disable_smarteee, enable_rgmii_tx_delay and fixup_rgmii_tx_delay. The values for pll_100 and pll_10 look sensible - identical to other machine types.

The manufacturer's firmware uses a truly ancient 2.6.31 kernel but does successfully transmit at 100M, as does the device's bootloader, so I can reasonably rule out a physically broken device.

What might be reasonable next steps in trying to pin down what's going wrong? I'm happy hacking some debug code into target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx, but I'm not really sure what to look for or try — I've not seen wired ethernet misbehaviour like this before.

The fault appears to lie with the pll_100 and pll_10 values after all. The following is sufficient to get it working at all three speeds. But I wonder if there's a variation between devices labelled EAP120, as presumably the previous values worked for the author of mach-eap120.c?

diff --git a/target/linux/ar71xx/files/arch/mips/ath79/mach-eap120.c b/target/linux/ar71xx/files/arch/mips/ath79/mach-eap120.c
index 130c7706a6..2e6cd73aab 100644
--- a/target/linux/ar71xx/files/arch/mips/ath79/mach-eap120.c
+++ b/target/linux/ar71xx/files/arch/mips/ath79/mach-eap120.c
@@ -106,8 +106,8 @@ static void __init eap_setup(u8 *mac)
        ath79_eth0_data.phy_if_mode = PHY_INTERFACE_MODE_RGMII;
        ath79_eth0_data.phy_mask = BIT(EAP120_LAN_PHYADDR);
        ath79_eth0_pll_data.pll_1000 = 0x0e000000;
-       ath79_eth0_pll_data.pll_100 = 0x00000101;
-       ath79_eth0_pll_data.pll_10 = 0x00001313;
+       ath79_eth0_pll_data.pll_100 = 0x08000101;
+       ath79_eth0_pll_data.pll_10 = 0x08001313;

@ChrisW the story is much more than this. I have worked with several boards with AR8035 this year. if you like I can help you port this board to the new "ath79" kernel target, with all the link speeds working.

Hi @mpratt14, sorry for my slow response, didn't see your reply to this. Yes, I'd be very interested indeed - thanks!

I don't actually have the previous two EAP120s here to experiment any more: they're deployed as production access points now, with the above bodge to get the stripped-down image working correctly at Gigabit speeds.

Probably best thing is for me to pick up an extra one on eBay or similar so I have have some convenient test hardware to play with in doing the port; I'll have a look now. Thanks once again!

1 Like