Ath11k c000000.wifi: failed to flush transmit queue, data pkts pending

I am opening a new topic because I posted about this about 2 months ago but that topic has been auto closed due to 10 days of inactivity.

So I am still getting these errors on my ath11k ipq807x devices.

I own 2 of these, one ipq8072 and one ipq8074. They both exhibit this behavior.

ath11k c000000.wifi: failed to flush transmit queue, data pkts pending X

This happens just after an STA has been disconnected unexpectedly. That is to say: If you are doing a transfer on a device and you start walking away from your AP while this transfer is taking place and you keep walking until you are too far and unexpectedly loose connection, chances are you will see this in your logs.

So, now some further details:

If you see data pkts pending 1 chances are this was just the STA pool packet that was sent out to see if you are still connected to the device or not.

If you see something over 1, then the ungraceful disconnect probably happened while a "real transfer" was taking place: eg you were pushing / pulling traffic to / from the AP while you disconnected ungracefully for whatever reason (signal loss? device battery died?).

Why this matters:

I have noticed the following behavior: If you see data pkts pending 1, your AP will most likely recover from this. Transfer to / from all other connected STAs will block for a second or two (eg: ping loss, etc)... But your AP will recover, and besides having a 2-3 second pause to / from ALL other connected devices, you will continue on without any issues.

Now once every month or so, I see this error with something like 100 or 200 data pkts pending. At this point, the AP will NOT be able to recover. The behavior I experience is that all other devices connected to the AP will exhibit INCREDIBLY high ping rates (1000ms - 10000ms) including right out ping loss.... And this will not clear until a full reboot. I have experienced this twice so far, once at about 25 days up time and once at about 35 days up time. There are no other messages in the logs. Everything else looks perfectly fine. I will add that its quite possible that had I waited for "some time" the XXX packets would have flushed, and things would have recovered. But I only spent about 5-10 minutes before giving her the ole reboot. Restarting hostapd did not help.

Potential fix:

I have just gone ahead and compiled 2 builds using openwrt snapshot (r23400 to be exact)... Without the 2 related patches that are quite likely the culprit:

331-wifi-mac80211-flush-queues-on-STA-removal.patch

and

333-wifi-mac80211-add-flush_sta-method.patch


Both my APs are now 100% up and running, configured and as they are to be without any expected reboots, that is to say the state they are in now I do not plan on rebooting them manually...

I will post every week or so letting you all know if removing these fixes this or not.

Last but not least, I am well aware of the potential security implications of excluding these 2 patches.

Welp, its been a week now. I said I would post an update in a week so here it is:

During the last week I migrated my last AP to an AX3600. So in total I have 3 APs running an ipx807x chipset.

Since compiling my own build WITHOUT those 2 patches I have not seen this error in my dmesg a single time:

AP 1:

root@OPENWRT-PRIMARY:~# dmesg | head -n1 ; dmesg | grep flush | wc -l ; dmesg | wc -l ; uptime ;
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
0
440
 17:09:31 up 7 days, 45 min,  load average: 0.03, 0.02, 0.00

AP 2:

root@OPENWRT-SALON:~# dmesg | head -n1 ; dmesg | grep flush | wc -l ; dmesg | wc -l ; uptime ;
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
0
502
 17:10:07 up 7 days,  6:14,  load average: 0.00, 0.00, 0.00

AP 3, this is the one I recently added, AX3600:

root@OPENWRT-UPSTAIRS:~# dmesg | head -n1 ; dmesg | grep flush | wc -l ; dmesg | wc -l ; uptime ;
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
0
477
 17:10:28 up 4 days,  1:57,  load average: 0.00, 0.00, 0.00

So far it looks like including the 2 patches I mention in my initial post at a minimum causes issues when flushing the transmit queue when a client (ungracefully?) disconnects.

But more importantly I have not had the issue I mentioned above where my AP will semi-hang and require a reboot. Bare in mind in the past this popped up after 3-4 weeks of uptime so it very well might still pop up.

Will post in another week.

1 Like

Welp sure enough, it happened again.

Absolutely nothing in the logs, not even the flush transmit queue message this time.

I was able to wifi down radio1 ; wifi up radio1 ; and bring it back to life.

I am seeing STA-OPMODE-SMPS-MODE-CHANGED in my logs... Although I see this frequently as my laptop(s) support this feature... Unfortunately there is no way to disable this via hostapd (or /etc/config/wireless) as far as I can see, so I will update my laptops config to diable this as there is a flag (in windows anyways) to disable this feature under the adapter properties.

Back at step 1 :frowning:

in advanced settings "Disable Inactivity Polling" and "Disassociate On Low Acknowledgement"
disable those settings i never encountered same error after that

even "331-wifi-mac80211-flush-queues-on-STA-removal.patch" and "333-wifi-mac80211-add-flush_sta-method.patch" also applied

1 Like

i don't think this is the reason, in my case anyways.

i have dissasoc on inactivity disabled since the start

for now i have created a powershell script that:

turns on my wifi adapter
runs iperf3 for 60 seconds
turns off my wifi adapter
wait 10 second
goto start

so its essentially disconnecting, connecting, iperf3 and loop.

so far so good, i did not run it all night long yet as i forgot to turn it on but will tonight.

i have disabled smps on the 2 devices in my household that supported it, mainly our 2 laptops.

its been about 7 days since i have done so, no wierd freeze so far.

will post in another week, so far so good.

1 Like

sure enough, it happened again.

back at square one.

i have (unfortunately) added a 3:55 AM cronjob to reset the wifi stack.

we'll see if this is a bug that "builds up" over time requiring a reset, or something that will not be helped with the daily wifi reset.

will post in a week or two how it goes.

I also got this on rc2 with WAX620 IPQ807x. I'm now testing SNAPSHOT r23766-5356462ce5 / LuCI Master git-23.223.85458-f7583b6 to see if there's any difference. Build is from 2023-08-18.

From my log, this happened when a client disconnected.

Anyone figured out a solution to this? It's starting to bother me a lot, since every time a client disconnects, it causes 2-3 seconds pause for all other connected client to the AP.

Just to keep this post alive, I see this error too all over dmesg output. My router is Xiaomi AX9000 with IPQ8074.

I haven't actively looked for unusual slow-downs / disconnections however. It may already be happening but nothing that have caught my attention...

Same on ipq8174

EDIT : Just FYI,I see the error message in dmesg from time to time (sometimes often).... My issue was the long ping times every 20-40days. Its all mentioned above in my initial post.

I no longer have this issue BTW.

I use 2 ipq8074 devices.

I still see the message pop up in dmesg, frequently... But the issue I mentioned above with high ping times etc i have not experienced in quite a few months.

What has changed?

Well... All my clients are 802.11ax. I no longer have any 802.11ac clients on the 5ghz band.

Also i was getting this issue when using the ax3600 which from what i recall is ipq8072... My 301Ws are 8074.

1 Like

Is there any development going on ipq8074 ath11k firmware update?

--> http://lists.infradead.org/pipermail/ath11k/

The ath11k is closed-source and proprietary, provided by QCA as-is - there is no OpenWrt development on it.

I am already following up this ath11k firmware path list but question is even its provided by QCA as-is it still get updates and it should have been reflected here for openwrt as well. Openwrt is still using March 2023 version so apparently no build coming up in openwrt domain.

Here is some information about the newer firmware: https://git.codelinaro.org/clo/qsdk/oss/ath11k-bdf/-/tree/NHSS.QSDK.12.4.5.r3/IPQ8074/hw2.0/WLAN.HK.2.9.0.1/WLAN.HK.2.9.0.1-01977-QCAHKSWPL_SILICONZ-1

I got some new data. This doesn't happen on my other AP/Radios without WDS. I believe this error is related to WDS enabled on my 2.4 radio. Hope it helps...

[13265.582494] ath11k c000000.wifi: failed to flush transmit queue, data pkts pending 1
[18705.853807] ------------[ cut here ]------------
[18705.853847] wlan2g.sta2: Failed check-sdata-in-driver check, flags: 0x1
[18705.857545] WARNING: CPU: 0 PID: 1328 at __ieee80211_flush_queues+0x198/0x1b0 [mac80211]
[18705.863841] Modules linked in: nft_fib_inet nf_flow_table_inet ath11k_ahb ath11k nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mac80211 cfg80211 spi_gpio spi_bitbang qrtr_smd qrtr qmi_helpers nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat sha512_generic seqiv jitterentropy_rng drbg michael_mic hmac cmac leds_gpio qca_nss_dp qca_ssdk gpio_button_hotplug ext4 mbcache jbd2 aquantia hwmon crc_ccitt crc32c_generic
[18705.909396] CPU: 0 PID: 1328 Comm: hostapd Not tainted 6.1.67 #0
[18705.931631] Hardware name: Netgear WAX630 (DT)
[18705.937704] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[18705.941963] pc : __ieee80211_flush_queues+0x198/0x1b0 [mac80211]
[18705.948818] lr : __ieee80211_flush_queues+0x198/0x1b0 [mac80211]
[18705.955068] sp : ffffffc00b95b820
[18705.961052] x29: ffffffc00b95b820 x28: ffffff8002aca400 x27: ffffffc00b95bdc8
[18705.964271] x26: ffffff80020a8080 x25: ffffffc008c32940 x24: ffffff8012c34900
[18705.971390] x23: 0000000000000000 x22: ffffff8012c34900 x21: ffffff80060f08a0
[18705.978508] x20: 0000000000000007 x19: 0000000000000007 x18: 000000000000011a
[18705.985625] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffc008b37240
[18705.992743] x14: 000000000000034e x13: 000000000000011a x12: 00000000ffffffea
[18705.999862] x11: 00000000ffffefff x10: ffffffc008b8f240 x9 : ffffffc008b371e8
[18706.006980] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
[18706.014099] x5 : ffffff803fda4708 x4 : 0000000000000000 x3 : 0000000000000027
[18706.021216] x2 : 0000000000000027 x1 : 0000000000000023 x0 : 000000000000003b
[18706.028334] Call trace:
[18706.035442]  __ieee80211_flush_queues+0x198/0x1b0 [mac80211]
[18706.037706]  ieee80211_flush_queues+0x18/0x24 [mac80211]
[18706.043608]  sta_set_sinfo+0xb78/0xc20 [mac80211]
[18706.048902]  sta_info_destroy_addr_bss+0x54/0x80 [mac80211]
[18706.053505]  ieee80211_color_change_finish+0x1518/0x1830 [mac80211]
[18706.058887]  cfg80211_check_station_change+0x1268/0x3530 [cfg80211]
[18706.065138]  genl_family_rcv_msg_doit+0xb8/0x11c
[18706.071384]  genl_rcv_msg+0x108/0x230
[18706.076244]  netlink_rcv_skb+0x5c/0x12c
[18706.079802]  genl_rcv+0x38/0x50
[18706.083446]  netlink_unicast+0x1e8/0x2d4
[18706.086574]  netlink_sendmsg+0x1a0/0x3d0
[18706.090742]  ____sys_sendmsg+0x1c8/0x270
[18706.094648]  ___sys_sendmsg+0x7c/0xc0
[18706.098552]  __sys_sendmsg+0x48/0xb0
[18706.102112]  __arm64_sys_sendmsg+0x24/0x30
[18706.105759]  invoke_syscall.constprop.0+0x5c/0x104
[18706.109667]  do_el0_svc+0x58/0x17c
[18706.114437]  el0_svc+0x18/0x54
[18706.117822]  el0t_64_sync_handler+0xf4/0x120
[18706.120863]  el0t_64_sync+0x174/0x178
[18706.125290] ---[ end trace 0000000000000000 ]---

Has there been any solution to the firmware issue? should I try to manually update it?

There is newer ath11k firmware now.
Have you tried it?