Troubleshooting Transmit Failure/Silence

Hi,

Could anyone share any ideas on how I could further troubleshoot this problem I'm having with my OpenWRT devices? Heavy WiFi load kills the WiFi until a reboot. The same happens on two different OpenWRT devices.

Thanks!

Here are my notes so far:

  • Works great for days-weeks at a time under light WiFi load

  • Heavy WiFi load (stream audio to 9 smart speakers concurrently for 1-10 minutes) puts router in a state where it shows everything is fine over ethernet, but WiFi packet transmission rate slows to one every few seconds, then after several minutes none at all.

  • In this error state, all WiFi clients lose their connection and the SSID is no longer detected.

  • In this state, the router remains fully responsive (SSH, routing, etc.) over ethernet.

  • Stopping the streaming and power cycling all WiFi clients has no effect. The router's WiFi remains failed.

  • tcpdump on the router's wlan interface shows many frequently received ARP requests, and shows replies being sent consistently to each of them. It shows ARP requests going out, and responses received if I try to contact a station on the WLAN.

  • However ifconfig "TX packets" counter increments very slowly only once every few seconds, and then stops completely. All TX error counters remain 0 except txqueuelen:1000 but that's for QoS which I don't believe applies here.

  • Setting the interfaces down/up with ifconfig and iwconfig doesn't help. Removing/adding the bridge config doesn't help.

  • There are no relevant messages in dmesg.

  • I tried rmmod and modprobe of any of the wlan/soc/80211/mt7620/ath9k in various orders. Several rmmods crash&reboot the system (fixes it until the next time it happens). But the ones that don't crash it, also don't resolve this error state.

  • This is a relatively low-RF environment (only 5 other SSIDs visible around my home, few other electronics in my house, the closest electronic equipment from neighbors must be 75+ feet away).

  • Same symptoms on two different devices (ZBT WE826: MT7620 and TP-Link TL-WR841N/ND v9: Qualcomm Atheros QCA9533 ver 1 rev 1)

  • Using the latest build available of OpenWRT for both devices.

  • My suspicion is that there's some kind of outgoing TX queue that's not getting serviced properly (possibly low-level in SOC or WLAN firmware or hardware). But I can't find anything that looks like that under the /sys/net or /proc/net trees.

I've experienced the same problem with a TP-Link TL-WR841ND v9. When I connect an iPad, and position it physically so its reception is not great and I run iperf3, eventually the WiFi of the router hangs exactly as you described (all clients disconnected, TX stops). /sbin/wifi gets things working again.

The problem reproduced even with OpenWrt factory default settings, the stock OEM firmware, and DD-WRT.

I'm no longer using the router, but if I try it again, I'd try setting it to only B/G mode with no N. Maybe then it will be stable, at the expense of speed.