TP-Link Archer C7 v2 AC1750 ath10 firmware crashes (2)

I'm watching you guys... :wink:

Been following, trying to log, learn, provoke others to dig deep... on a few problems that look like this.
Been a long time C7 owner since the 17.x.x.x days, and they used to be solid as rocks. Odd, when the problems started that some seemed to have more 2.4ghz radio/ath9k issues, vs the 5ghz/ath10k one. I've had both, and times where there's more of one than the other.

In the past year or so, it's been more the ath9k radio having issues, usually a halt of communications, with stations out there still connected. Sometimes it fixes itself, sometimes needs a reset of radio if not router. Other times, I see a kernel warning from the ath10k with a short outage. But recently, I'm almost always seeing the ath9k having problems. It is more like 2-4 days between lockups, vs have cron script to reset the radio's every night and maybe you still see one inbetween, like in the late 19's.x.x.x thru early 21's.x.x.x. Less full hangs, usually a short outage.

Confounders...
I have been using the C7 for 2-3yrs now as an dumbAP only, got a tiny x86 box doing OWT router duties.

Lately, have been running a snapshot from 2-3 mo ago. Seems to be doing better, problems are rarer. So I haven't run stable for a while. The snapshot had later releases in ath10k drivers and fw's, and other pieces of the networking code, so maybe those updates have improved things. I think a lot of that may now be in 21.02.2, maybe there's some not in there...

And finally, my home network has changed, 3-4mo back, a family member who was far back and a floor up in the house, and did a lot of gaming and voice/vid talking, convinced me to run a cable to him, so we don't have weak signal, lots of UDP streaming, etc conditon on the wifi any more.
Things got better, but it makes it hard to say, was it that new firmware revision, or was it the lack of a certain kind of traffic or change in density?

So I'm not typical, and have had a lot of variables change. This is why these things are hard to figure out...

I intend now that 21.02.2 (or is 21.02.3 about to hit?) is out, I will go back to stable version and see how it flies...

Edit: Question, anyone know if the smallbuffers version is supposed to help with on the wifi chip memory, or is it the system memory being used? My thought was, a C7 has a larger amount of RAM, so would it really need/benefit from smaller allocated buffers? (unless something else was using excessive amounts)

The difference for me on 21.02.n running on a C7 V2 for "Total available memory" in LuCI

CT - 32%

CT-smallbuffers - 51%

Well, having upgraded to 22.03, the default ath10k driver still has problems. Extract from kernel logs on two devices below. Other forums point to it perhaps being a problem caused by Apple devices on the 5GHz band behaving in unexpected ways. Since I can't get the Apple client devices to modify their behaviour, I need to find APs with different hardware that can cope. Any suggestions?

Archer C7 v2

[230328.087201] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[230328.138398] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[230328.165966] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36954:23002882 36904:23002866 36904:23002866 36954:23002864, jiffies: 23003168, attempting restart restart firmware, dev-flags: 0 x142
[230328.186486] ath10k_pci 0000:00:00.0: failed to transmit management frame via WMI: -11
[230328.194580] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[230328.202266] ath10k_pci 0000:00:00.0: failed to transmit management frame via WMI: -143
[230328.210516] ath10k_pci 0000:00:00.0: failed to transmit management frame via WMI: -143
[230328.240801] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[230328.292001] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[230328.326395] ath10k_pci 0000:00:00.0: failed to send wmi nop: -143
[230328.332685] ath10k_pci 0000:00:00.0: could not request stats (type -268435456 ret -143 specifier 1)
[230328.342146] ath10k_pci 0000:00:00.0: failed to send pdev bss chan info request: -143
[230328.350194] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[230328.376392] ath10k_pci 0000:00:00.0: failed to send pdev bss chan info request: -143
[230328.394415] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[230328.402108] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer cc9e0744 vdev: 0 addr: 18:cf:5e:d1:13:07 
[230328.413371] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 739ead96 vdev: 1 addr: 32:b5:c2:d7:8d:fd 
[230328.424591] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer e904c84d vdev: 0 addr: 30:b5:c2:d7:8d:fd 
[230328.570213] ieee80211 phy0: Hardware restart was requested
[230328.577721] ath10k_pci 0000:00:00.0: failed to send pdev bss chan info request: -143
[230329.555035] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256
[230329.572556] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410'
[230329.579019] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424  sw-crypt: 0 ct-sta: 0'
[230329.587291] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 24984 iram: 38672'
[230329.657532] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware
[230329.672768] ath10k_pci 0000:00:00.0: rts threshold -1
[230329.678670] ath10k_pci 0000:00:00.0: rts threshold -1
[230329.696602] ath10k_pci 0000:00:00.0: device successfully recovered
[276259.093538] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.101099] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.108682] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.116273] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.123838] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.131391] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.138945] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.146551] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.154111] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.161666] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.169228] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.176811] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.184372] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.191938] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.199501] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.207046] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.214598] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.222172] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.229729] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.237307] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.244862] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.252448] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.259970] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.267529] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.275077] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.282662] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.290185] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.297797] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.305366] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.312949] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.320475] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.328073] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.335633] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.343219] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 1, skipped old beacon
[276259.350746] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[276259.362823] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped new beacon
[276259.370404] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped new beacon

Archer C7 v4

[956822.785217] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956822.887619] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956822.990019] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.092418] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.194825] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.297226] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.399627] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.502008] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.604412] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.706818] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.809216] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956823.911614] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.014028] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.116420] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.218821] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.321232] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.423628] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.526030] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.628435] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.730837] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.833236] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956824.935649] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.038035] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.140436] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.242840] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.345241] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.447652] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.550079] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.652460] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[956825.664740] ath10k_pci 0000:00:00.0: Cannot communicate with firmware, previous wmi cmds: 36954:95652264 36904:95652260 36954:95652246 36954:95652228, jiffies: 95652568, attempting restart restart firmware, dev-flags: 0 x142
[956825.685721] ath10k_pci 0000:00:00.0: failed to send pdev bss chan info request: -143
[956825.694877] ath10k_pci 0000:00:00.0: failed to send wmi nop: -143
[956825.701495] ath10k_pci 0000:00:00.0: could not request stats (type -268435456 ret -143 specifier 1)
[956825.719018] ath10k_pci 0000:00:00.0: failed to set beacon mode for vdev 0: -143
[956825.726777] ath10k_pci 0000:00:00.0: failed to set dtim period for vdev 0: -143
[956825.737372] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 804607f0 vdev: 0 addr: 70:4f:57:8a:75:5b 
[956825.879493] ieee80211 phy0: Hardware restart was requested
[956825.885417] ath10k_pci 0000:00:00.0: failed to set cts protection for vdev 0: -143
[956825.893374] ath10k_pci 0000:00:00.0: failed to recalculate rts/cts prot for vdev 0: -143
[956825.901894] ath10k_pci 0000:00:00.0: failed to set preamble for vdev 0: -143
[956825.909395] ath10k_pci 0000:00:00.0: failed to set mgmt tx rate -143
[956826.888460] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256
[956826.905351] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410'
[956826.911885] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424  sw-crypt: 0 ct-sta: 0'
[956826.920300] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 24984 iram: 38672'
[956826.985163] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware
[956826.999669] ath10k_pci 0000:00:00.0: rts threshold -1
[956827.014013] ath10k_pci 0000:00:00.0: device successfully recovered

A few people (including me) have also seen this same error "SWBA overrun on vdev 0, skipped old beacon" (and ieee80211 phy0: Hardware restart) on R7800 running the latest ath10k (non-ct) firmware v157. However, it's very rare so it's fine for us.

It's not so rare for me. It's almost certainly something to do with the environment the routers are in (about 50 different WiFi networks 'visible' at this site - high density residential), but working out precisely what causes the problem, and what can mitigate it appears rather difficult.
It produces connectivity drop-outs, experienced when using streamed services and video/audio-conferencing, which is very irritating for the users concerned.

I had been having this problem, but once switching to non-ct version of ath10k, no more problem. You may give it a try.

I have had the same error on Archer C7v4 and v5 with -ct firmware in the past. Also very rare.