I'm having some fun with the AR8035 ethernet interface on a TP-Link EAP120. All the hardware works fine with 18.06.5 and openwrt.git master, apart from eth0, which reliably autonegotiates at any speed but only successfully transmits packets on a gigabit link. On slower links, sent packets silently vanish although received packets arrive fine.
If I provide a 100M or 10M Linux host to cross-connect to, it brings up the link with the correct speed/duplex and the EAP120 receives packets fine (seen on tcpdump or just in /sys/class/net/eth0/statistics/*), but no outbound packets arrive at the destination host's NIC. (No RX errors of any kind, just silence.) Flip the destination interface to 1000M, it renegotiates up and everything starts working fine; drop it back to 100M, it renegotiates down correctly but all outbound packets vanish again.
I can replicate with a flashed stock 18.06.5 image, a flashed freshly built 18.06.5 with CONFIG_TARGET_ar71xx_generic_DEVICE_eap120-v1 and all options left untouched, and with kernel+initramfs builds of both 18.06.5 and openwrt.git master. (I have serial console and bootloader access, so testing hacked kernels with TFTPed initramfs images is easier than flashing every time.)
In each case, an unchanged factory-default configuration shows the problem: by default OpenWRT brings up a bridge with an address 192.168.1.1/24 and eth0 enslaved, which I can prod at with an external linux host (and a variety of other devices to double check) at various speeds. For completeness, I've replaced the EAP120 with a different type of OpenWRT device with the same versions installed and default factory configs, to confirm OpenWRT devices normally behave as expected on the same test, autonegotiating and working at all speeds.
I've tried releasing eth0 from the bridge and configuring it manually to rule out problems with promiscuous mode: symptoms are identical.
I've tried (without much insight) twiddling the various options in target/linux/ar71xx/files/arch/mips/ath79/mach-eap120.c such as disable_smarteee, enable_rgmii_tx_delay and fixup_rgmii_tx_delay. The values for pll_100 and pll_10 look sensible - identical to other machine types.
The manufacturer's firmware uses a truly ancient 2.6.31 kernel but does successfully transmit at 100M, as does the device's bootloader, so I can reasonably rule out a physically broken device.
What might be reasonable next steps in trying to pin down what's going wrong? I'm happy hacking some debug code into target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx, but I'm not really sure what to look for or try — I've not seen wired ethernet misbehaviour like this before.