Xiaomi AX3600 / ath11k: Failed to flush transmit queue and wifi disruption

Hi there,

I acquired a used AX3600 and got most recent OpenWRT snapshot ( OpenWrt SNAPSHOT r22496-d98c8fc06d )

Right now I did a speed test with 2 devices (Intel Dual Band Wireless AC 8265 + 1x Google Pixel 3a, both Wifi 5 afaik) connected to it. In the middle of the speed test, one wifi connection (Google Pixel 3A) dropped and came back a few seconds later while the second wifi connection (Intel Wifi 8265) only stopped transmitting any data for a few seconds.
Logfile:

Tue Apr 4 16:35:07 2023 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED 96:72:1f:d5:46:d6
Tue Apr 4 16:35:07 2023 daemon.info hostapd: phy1-ap0: STA 96:72:1f:d5:46:d6 IEEE 802.11: disassociated
Tue Apr 4 16:35:12 2023 daemon.info hostapd: phy1-ap0: STA ae:94:1a:61:0f:85 IEEE 802.11: authenticated
Tue Apr 4 16:35:12 2023 kern.warn kernel: [11127.277821] ath11k c000000.wifi: failed to flush transmit queue, data pkts pending 1443
Tue Apr 4 16:35:12 2023 daemon.info hostapd: phy1-ap0: STA ae:94:1a:61:0f:85 IEEE 802.11: associated (aid 3)
Tue Apr 4 16:35:12 2023 daemon.notice hostapd: phy1-ap0: AP-STA-CONNECTED ae:94:1a:61:0f:85 auth_alg=open
Tue Apr 4 16:35:12 2023 daemon.info hostapd: phy1-ap0: STA ae:94:1a:61:0f:85 WPA: pairwise key handshake completed (RSN)
Tue Apr 4 16:35:12 2023 daemon.notice hostapd: phy1-ap0: EAPOL-4WAY-HS-COMPLETED ae:94:1a:61:0f:85

(the phone, ae:... and 96:... has mac randomization turned on, the computer is d4:... and did not even create any log entries in that time frame). There is no further errors above or below, wifi works again afterwards.. AP Settings:

config wifi-device 'radio1'
	option type 'mac80211'
	option path 'platform/soc/c000000.wifi'
	option channel '36'
	option band '5g'
	option htmode 'HE80'
	option country 'DE'
	option cell_density '0'
	option he_bss_color '8'
	option he_su_beamformee '1'
	option txpower '27'

config wifi-iface 'default_radio1'
	option device 'radio1'
	option network 'lan'
	option mode 'ap'
	option ssid 'the_beast5g'
	option encryption 'psk2'
	option key 'mywpakey'

(after I opened the package with the AX3600, i immediately thought its test-ssid must be named the_beast)

I somehow got the impression this AP is production ready - it does not seem so :frowning: Can I do something?

I figured something out:
When I run speedtest (dslreports.com) on PC and ookla on phone and then turn off the wifi on the phone, I get an error:
[12912.077164] ath11k c000000.wifi: failed to flush transmit queue, data pkts pending 5
and at the same time there is no throughput on the computer for a few seconds.

This is 100% reproducible...

See discussion about ath11k flush bug and a hack fix by @Ansuel in

(The flash bug may affect sysupgrade negatively and prevent it due to timeout. )

Thanks for pointing to that "bug", i have my doubts if that is related... As I don't have the build environment set up, I cannot easily try it out :frowning: If someone provides a sysupgrade, I am happy to test. Otherwise I guess it might be firmware related so we are in the hands of the gods? :smiley:

I disabled beamforming and BSS coloring again. Could still reproduce, disconnect phone during speedtest means computer doesn't get any packets for 2-3 seconds.

I encounter the same issue perhaps @Ansuel can help

I'd be also interested in knowing if others just don't notice this problem or if it is something new? To me, this makes wifi unusable as I don't want interruptions in video conferences :slight_smile: Is there somewhere an archive for older snapshots available?

For now, I might get 2x Xiaomi AX3200 which seems to me the device with the best price tag (in germany, where you even pay 58€ for this)

Wow, I downgraded to
r22446-1c552eb44d (2023-03-28)
from:

and the problem here is gone. While from the changelog I wouldn't know what the cause could be I wonder if pointing somebody to the regression might be useful?

can you elaborate this more? also i guess it's unrelated to the failed to flush queue bug.

I don't know exactly what your question is?

With r22496-d98c8fc06d i have the problem that disconnecting a client under load (my google phone running "ookla speedtest") stopped any traffic on the wifi for a few seconds (see opening post).
I could 100% reliably reproduce this.

After downgrading to r22446-1c552eb44d from the link above (my own build failed..) which looks like it is the corresponding snapshot build + a few packages not related to wifi, I can not reproduce the problem anymore: I can disconnect my phone while doing a speedtest repeatedly and other clients are not impacted nor do I get a transmit queue flush failure in the log.

Ok and the diff is one patch or the mac80211 package bumped?

git diff 1c552eb44ddba4d8630eb3453b4f6dd8ef83b13a..d98c8fc06ddd36f4b47c0eef8094e68257b92f87

It is

  • the kernel bumped to .105 from .104
  • the ath11k patch that restores 160 MHz support (but I don't think this is the cause as I tried on 80 MHz with and without beamforming)
  • then there is this: d54c91bd9ab3c54ee06923eafbd67047816a37e4 patch that says it is fixing vulnerabilities but contains a whole bunch of patches to mac80211 including some "flush the tx buffer on client disconnect" which sounds like it could be connected to my problem

Unfortunately I don't really have a build system setup and on my laptop it takes ages, but I could try to bisect the responsible commit...

Ok by the name it's the flush thing that locks the tx ring and flush them I need to check why it was introduced tho?

I am now building an image without that patch. The reason is givne in the commit message, seems quite logical to me but might have side effects -_-
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/kernel/mac80211/patches/subsys/331-wifi-mac80211-flush-queues-on-STA-removal.patch;h=00232ec1b9e9f6d077c5b17827bde675c9fec2b6;hb=d54c91bd9ab3c54ee06923eafbd67047816a37e4

I think you are absolutely right and it was cause of the issue

Problem is how ath11k works with buffer... I need to check what data ieee80211 provide

The tx buffer in ath11k is global so there isn't a clear separation of buffer related to one client... Doing that change is doable but require a big rework of the driver... Good thing is that there is already a problem in how the tx ring works so I already had to mess with that.

Thanks for taking on! But actually this looks like a problem that should exist upstream as well.

I just build recent snapshot, with 2 patches removed:

deleted: package/kernel/mac80211/patches/subsys/331-wifi-mac80211-flush-queues-on-STA-removal.patch
deleted: package/kernel/mac80211/patches/subsys/333-wifi-mac80211-add-flush_sta-method.patch

and cannot reproduce my original issue, so these patches are the culprit :slight_smile:

r22503-3b212db232

Buffer flush is really driver dependent

I retested current master and the issues are gone. I am a bit confused if fw upgrade 2.9 fixed it as well as I thought I tested before. Anyway, current snapshot runs great, I easily get 600 Mbps with my 2-stream AC laptop behind 1 wall... Nice :slight_smile: Thanks for all the great work and I am happy that somehow this radio is also working great.

When taking the "802.11ax speed problems" that some people describe with the mediatek radios into account, I wonder if we should actually recommend the AX3600 more - great pricepoint & performance.

1 Like

You tried the current master with the 2 patches ?