Discussion of this first appeared in the release threads for 22.03.4/5, but I'm starting a separate thread here to collect further information and maybe eventually come to some conclusions.
In essence, kernel log entries such as this appear:
[05/02/2023 15:45:12] ath10k_pci 0000:00:00.0: mac flush vdev 0 drop 0 queues 0x1 ar->paused: 0x0 arvif->paused: 0x0
[05/02/2023 15:45:13] ath10k_pci 0000:00:00.0: mac flush vdev 0 drop 0 queues 0x1 ar->paused: 0x0 arvif->paused: 0x0
With corresponding system log entries following, such as:
Tue May 2 15:45:16 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED a8:93:4a:xx:xx:xx
Tue May 2 15:45:16 2023 kern.info kernel: [211425.958483] ath10k_pci 0000:00:00.0: mac flush vdev 0 drop 0 queues 0x1 ar->paused: 0x0 arvif->paused: 0x0
Tue May 2 15:45:16 2023 daemon.info hostapd: wlan0: STA a8:93:4a:xx:xx:xx IEEE 802.11: authenticated
Tue May 2 15:45:16 2023 daemon.info hostapd: wlan0: STA a8:93:4a:xx:xx:xx IEEE 802.11: associated (aid 1)
Tue May 2 15:45:16 2023 daemon.notice hostapd: wlan0: AP-STA-CONNECTED a8:93:4a:ac:89:fd
Tue May 2 15:45:16 2023 daemon.info hostapd: wlan0: STA a8:93:4a:xx:xx:xx WPA: pairwise key handshake completed (RSN)
Tue May 2 15:45:16 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED a8:93:4a:xx:xx:xx
Tue May 2 15:45:16 2023 kern.info kernel: [211426.166764] ath10k_pci 0000:00:00.0: mac flush vdev 0 drop 0 queues 0x1 ar->paused: 0x0 arvif->paused: 0x0
These seem to occur more frequently with increased wireless activity but appeared rarely if at all before the above releases.
I'll let others speak to the possible effects of these events on wireless stability.
I should note that it's not an Archer-only issue, as a few others have been mentioned, such as Turris Omnia. I'm assuming the common denominator is certain Qualcomm Atheros chipsets.
I'm using a TP-Link A7v5 and a Belkin F9K1115v2 as dumb APs, using 802.11r. They both use the ATH79/ath10k-firmware-qca988x.
I have encountered these error messages on 22.03.4 and 5 with the default wifi firmware/drivers
Since I have been experimenting with 802.11r/fast roaming and 802.11s/mesh with my ATH79/qca988x devices and the Linksys MR8300s, I have found less issues using non-ct firmware with mesh.
With switching from the default to non-ct drivers, no more of these error messages:
kmod-ath10k-ct --> kmod-ath10k
ath10k-firmware-qca988x-ct --> ath10k-firmware-qca988x
Very interesting. Some years ago I used to do that, but then I went to ath10k-ct-smallbuffers instead. There was a good reason at the time, I think tied to memory saving (as implied by the name), but I'm unsure now.
Question: Does a smallbuffers version of non-ct exist or is there no need for one for whatever reason? If it does exist, I'd be curious to try it.
I have actually been wondering lately, if we should again somehow compare the current ath10-mainline and the current ath10k-ct.
OpenWrt switched to -ct as default in 2018, five years ago, when it looked like the mainline ath10k was buggy and really slow progress in fixing those, while @greearb actively developed the -ct version and was responsive to bug reports. After that, greearb did develop/fix ath10k-ct during the first few years, but since late 2020 the development has been minimal (as the driver has been mature / good enough). During the same time, the mainline ath10k in the Linux kernel repo has seen roughly ~100 updates/fixes (a a total of some ~500 commits since our switching the OpenWrt default). The mainline version might again start to be a viable choice.
eh we might consider to switch again... for sure some perf comparison would help... main problem with ct thing was trash iot device making the driver or fw crash
So I should probably leave my updated experience here too (Archer A7 v5).
21.02.5 -> don't recall ever seeing these messages
22.03.4 -> spurious failed to flush transmit queue alongside some AP-CONNECT/DISCONNECT activity. Also, towards the end of my run with this version, some correlation with Spurious quick kickout for STA (which were the cases where the message correlated with actual network issues).
22.03.5 -> somewhat over a day of data now, they just started appearing again after initially working fine.
Events seem to correlate with certain devices more than others. However, there are also still plenty of cases where those devices drop/reconnect just fine without kernel message (see below).
Note that, contrary to @94121-usr, all my data were collected with non-ctkmod-ath10k-smallbuffers and ath10k-firmware-qca988x
Log excerpt of a particularly active minute just now on 22.03.5. Note that initially the more or less same phone activity doesn't cause the kernel issue, but then halfway down it suddenly does.
Thu May 4 21:02:09 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:09 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:09 2023 daemon.notice hostapd: wlan0: Prune association for XXX
Thu May 4 21:02:09 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:09 2023 daemon.notice hostapd: wlan1: AP-STA-CONNECTED XXX
Thu May 4 21:02:09 2023 daemon.info hostapd: wlan1: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:09 2023 daemon.notice hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:17 2023 daemon.notice hostapd: wlan0: STA-OPMODE-N_SS-CHANGED XXX 1
Thu May 4 21:02:17 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:17 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:17 2023 daemon.notice hostapd: wlan1: Prune association for XXX
Thu May 4 21:02:17 2023 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:17 2023 daemon.notice hostapd: wlan0: AP-STA-CONNECTED XXX
Thu May 4 21:02:17 2023 daemon.info hostapd: wlan0: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:17 2023 daemon.notice hostapd: wlan0: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:18 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:18 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan0: Prune association for XXX
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan1: AP-STA-CONNECTED XXX
Thu May 4 21:02:18 2023 daemon.info hostapd: wlan1: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan0: STA-OPMODE-N_SS-CHANGED XXX 1
Thu May 4 21:02:18 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:18 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan1: Prune association for XXX
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan0: AP-STA-CONNECTED XXX
Thu May 4 21:02:18 2023 daemon.info hostapd: wlan0: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:18 2023 daemon.notice hostapd: wlan0: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:19 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:19 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:19 2023 daemon.notice hostapd: wlan0: Prune association for XXX
Thu May 4 21:02:19 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:26 2023 kern.warn kernel: [41127.980395] ath10k_pci 0000:00:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
Thu May 4 21:02:26 2023 daemon.notice hostapd: nl80211: nl80211_recv_beacons->nl_recvmsgs failed: -5
Thu May 4 21:02:26 2023 daemon.notice hostapd: wlan0: STA-OPMODE-N_SS-CHANGED XXX 1
Thu May 4 21:02:26 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:26 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:26 2023 daemon.notice hostapd: wlan1: Prune association for XXX
Thu May 4 21:02:26 2023 daemon.notice hostapd: wlan0: AP-STA-CONNECTED XXX
Thu May 4 21:02:26 2023 daemon.info hostapd: wlan0: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:26 2023 daemon.notice hostapd: wlan0: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:27 2023 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(br-lan) 192.168.1.40 XXX
Thu May 4 21:02:27 2023 daemon.info dnsmasq-dhcp[1]: DHCPACK(br-lan) 192.168.1.40 XXX t-phone
Thu May 4 21:02:27 2023 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Thu May 4 21:02:27 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:27 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:27 2023 daemon.notice hostapd: wlan0: Prune association for XXX
Thu May 4 21:02:27 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:27 2023 daemon.notice hostapd: wlan1: AP-STA-CONNECTED XXX
Thu May 4 21:02:27 2023 daemon.info hostapd: wlan1: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:27 2023 daemon.notice hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:28 2023 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(br-lan) 192.168.1.40 XXX
Thu May 4 21:02:28 2023 daemon.info dnsmasq-dhcp[1]: DHCPACK(br-lan) 192.168.1.40 XXX t-phone
Thu May 4 21:02:28 2023 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan
Thu May 4 21:02:34 2023 kern.warn kernel: [41135.980355] ath10k_pci 0000:00:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
Thu May 4 21:02:34 2023 daemon.notice hostapd: nl80211: nl80211_recv_beacons->nl_recvmsgs failed: -5
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan0: STA-OPMODE-N_SS-CHANGED XXX 1
Thu May 4 21:02:51 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:51 2023 daemon.info hostapd: wlan0: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan1: Prune association for XXX
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan1: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan0: AP-STA-CONNECTED XXX
Thu May 4 21:02:51 2023 daemon.info hostapd: wlan0: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan0: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:51 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: authenticated
Thu May 4 21:02:51 2023 daemon.info hostapd: wlan1: STA XXX IEEE 802.11: associated (aid 4)
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan0: Prune association for XXX
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED XXX
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan1: AP-STA-CONNECTED XXX
Thu May 4 21:02:51 2023 daemon.info hostapd: wlan1: STA XXX WPA: pairwise key handshake completed (RSN)
Thu May 4 21:02:51 2023 daemon.notice hostapd: wlan1: EAPOL-4WAY-HS-COMPLETED XXX
Thu May 4 21:02:58 2023 kern.warn kernel: [41160.220256] ath10k_pci 0000:00:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0
Yes, that works (used to do the equivalent of that via LuCI in some of the early 21.02 builds), plus reboot of course. In practice, though, I now do make image [...] PACKAGES="-ath10k-firmware-qca988x-ct -kmod-ath10k-ct kmod-ath10k-smallbuffers ath10k-firmware-qca988x [...]"
Also, just now started getting the ath10k_pci 0000:00:00.0: failed to flush transmit queue (skip 0 ar-state 1): 0 again (updated above with log). No obvious correlation to anything - it's just now always tied to my phone connecting/dropping, but the phone was also present on the network during the first 24h of my current-version test.
I'm the one bringing this up earlier in the release thread.
My network is a home network, with a DOCSIS Cable modem providing the net connection, a mini x86 box on OpenWRT as the central router, and 1-2 Tp-link C7/A7 as the AP's. (also running OpenWRT)
Have 20-25 clients of various desktops, phones, TV's, IOT gizmos...
So, my behavior is as shown in the first post, the "mac flush vdev..." error. It nearly always occurs after a disconnect. In my home network I seem to have a few devices that have issues in rekeying, they take a few tries to get it done, and sometimes drop in the process, and then one of these log messages occurs.
I was having short network connectivity drops, where devices are still connected to the AP, but lose internet connection. These would last for seconds to a minute or so, then "heal" themselves without any action needed. I HAD thought that they were corresponding to the log messages, but never fully verified that. So, that had me pretty concerned with what these things are.
Now, the net dropouts have pretty much disappeared... and I still see the occasional message, so I'm wondering if they were ever were connected. I have been trying different settings for rekey time and association, to see the influence there, so a slight chance I got rid of the net drops that way. Will do a control test of full standard settings for a while to be sure. I also usually run a different bunch of settings for beacon, DTIM, rekey, and association time, so I was starting from a non stock place...
What I did was to slow down rekeying to 12hrs and turn off dissociate on low acks, this lowered the number of log events, by lowering the number of dissociations, in my network.
If these errors are just some greater level of reporting of a common thing, then they are just an annoyance to be dealt with, or ignored. Eventually, they should be masked by normal vs debugging log reporting in the driver.
But, we really need to find out what this is telling us, and how important it is.
I have also noticed a similar error on my Linksys MR8300 running snapshot builds kern.info kernel: [47337.586961] ath10k_ahb a000000.wifi: mac flush vdev 0 drop 0 queues 0x1 ar->paused: 0x0 arvif->paused: 0x0
In my experience over the past month or so i have had devices being kicked off the network requiring them to manually reconnect to the wifi with this error showing up after the ap disconnect
i have made a forum topic about it which will be linked here
josteink - I'm having good performance as well, possibly better(?) or at least the same. I was experiencing short drops in connection, on the 5ghz radio (AC) that seemed to coincide with the log entries, but then they disappeared, with the log entries continuing... so it would seem they weren't connected. For me, the log entries are connected with a client device getting dropped/dropping, but not every time.
DJMalachite - Unlike yours, where you seem to need to make them reconnect, I was getting a short dropoff, that would always return on it's own.
I should also add, that I had originally seen this when moving to 22.03.04, and the net drops stopped while I was still using that. I've now gone on to 22.03.05, with no return of net dropouts. The log entries are continuing. My reduction for that is to change my rekey time to 12hrs. Many of the events were occuring where a couple of my devices have trouble rekeying on the first try. I still get them for other disconnects.
That does sound similar to what i have experienced, for example i was in the middle of a speedtest on my phone and i lost connection for a few seconds before it recovered unlike on my moms phone which will not recover unless they reconnect.
One reason i havent tried 22.03.05 is because my config is from SNAPSHOT which uses DSA and will not work