Ipq806x NSS build (Netgear R7800 / TP-Link C2600 / Linksys EA8500)

I used the ath-10k driver and firmware.

If you are affected by the recent Wi-Fi issue for the latest 21.02 builds, and are doing your own builds, would you like to test the patch posted here?

My R7800 is not available to test at the moment, so I can't test it myself.

I'm currently building based off acwifidudes master branch. Would the patch work there as well?

In theory it should patch cleanly, as I think the ath10k driver has not been touched for a long while.

You may want to place the patch file in ath10k patch folder, as the last patch file.

Edit: I assume that you are also affected and can reliably reproduce the issue?

@quarky
Why don't you try with the latest Ath-10k firmware.
One R7800 in service has at least 6 WLAN clients constantly connected to 2.4 and 5GHz WLANs and most of the time there are around 10 or more. I cannot see any issues with Wi-Fi. Using an older master build from @ACwifidude.


I keep an eye on 5 other R7800 routers for any issues with Wi-Fi. Touch wood, for now all are OK. They use the default ath-10k firmware in heavily mixed WLAN clients environment. AN, AC, AX WPA2/WPA3 clients.

1 Like

The firmware used in the openwrt tree seems to work fine for me.

I'm using ath10k because I'm affected by this bug https://github.com/greearb/ath10k-ct/issues/139 with ath10k-ct.

With ath10k I'm experiencing a complete shutdown of the 5ghz SSIDs once in a while.
I'm assuming that is the recent issue?

Edit: compiling now

If the issue is caused by the new airtime scheduler, it should start sometime end Nov 21.

Edit: the above timeline is for 21.02. For master it started since end Oct 21 when master’s backport switched to 5.15.

That sounds about right yeah.

It's compiled and flashed now so let's see.

The patch did not shut down transmission completely? :stuck_out_tongue:

Haha.

Do let me know how it goes.

On my R7800 I use always current master build from @ACwifidude with ath-10k drivers. Since last week I put the latest ath-10k firmware too. I have up to 10 wlan clients WPA2/WPA3 connected to the Wi-Fi during the day.
For now cannot see any abnormal behaviour.

2 Likes

@sppmaster do your 10 WLAN clients connect and disconnect frequently?

For my R7800, the issue starts to manifest itself after 3-4 days uptime. The usage pattern is that my router's clients will connect and disconnect over the course of the day as I move my devices between locations. I have multiple APs at home. When this happens, as how I understand the new airtime scheduler behaviour, it will try to insert / remove client's transmit queues from their RB tree data structure.

Now the way that the new scheduler is coded, when a new transmit round is initiated, the left-most node (of the txq) of the RB tree will be the first to be selected for transmission, if it satisfy the airtime limit imposed. Else the next node to it's right will be selected and it goes on until it's done.

So what I found is that the ath10k driver seems to be doing something funny (hence my suggested patch) in that it is not scheduling the txq mac80211 is asking to be scheduled, but instead find another txq to schedule. This in itself seems to be a bug to me. In addition, the driver will sync with the firmware on transmit txq accounting. I guess it probably didn't manifest itself with the old round-robin scheduler as all txq will get their fair-share of transmit time eventually. With the new scheduler, since it always starts from the left-most node of the RB tree, potential starvation of transmit time may occur, hence the observation of high latency seen by myself and others affected.

I still do not understand why some folks like yourself did not encounter such issues tho.

Okay so this morning I turned off airplane mode on my phone and after connecting wifi initially worked and then after about 10 minutes stalled to a halt. Reconnecting didn't fix anything either. So I did /etc/init.d/network restart on the router and everything is working again. I'll keep an eye on this.

Hmm ... I'll have to dig deeper into the issue then, but I'm quite confident that the patch I proposed should make things better, as the original code do not make much sense to me.

Do you noticed anything that's different this time round?

Yes usually my 5ghz wifi dies completely which didn't happen this time around so I'm hoping this was just a one time occurrence and that the patch actually does help. Fingers crossed :crossed_fingers:

1 Like

Actually most of the time I have 6-7 clients connected and they only disconnect/connect when leaving/coming home (at least several times a day) or are turned off/on. I have only one AP (R7800 in router mode) with two different SSIDs for 2.4 and 5GHz.
My phone switches very frequently between 2.4 and 5GHz because when I move to a distant room the 2.4GHz signal is better and it automatically switches from 5 to 2.4GHz (with different SSIDs) and vice versa.
Other clients are constantly connected to 2.4 or 5GHz.
For almost a year I was running a WDS connection via 2.4GHz to another OpenWRT client router that supported another VLAN. Never seen problems there either.
And I have one guest WLAN set up on 2.4GHz that I shut down when it's not used.
I use mixed WPA2/WPA3 encryption.
There may be a rarer conditions in your Wi-Fi setup that I don't have.

In the last hour I've tried to throw the gauntlet to my Wi-Fi. I've run multiple speed tests on six devices, played youtube videos, downloaded lots of updates from PlayStore, browsed web pages with Chrome.
All of this was accompanied with over a hundred Wi-Fi disconnects/connects (that I purposely initiated turning devices Wi-Fi on/off all the time) from all six devices. I couldn't see any delay at least with this setup.

Are there tools like wireshark that can be used for troubleshooting of similar network issues.

Wireshark or tcpdump is useful only if you can get packets into/out of the network interface reliably. If it cannot get out, it probably won't show anything useful.

From my experience, if your client is constantly busy transmitting, it would probably be less affected as the txq will likely be always scheduled for transmission. IIRC from the forums, it seems to affect iOS devices more, which I have many connected to my APs. I guess it's because iOS devices are more aggresive in power save mode, and making the scheduler drop the txq from it's data structure. It also affected my Dell work notebook tho., so it's kind of a mystery still why it is not affecting everyone.

Before I use a lot of traffic I've tried multiple disconnects/connects from several devices while the others were just idling.
I don't have iOS devices though.
Can tcpdump be used to troubleshoot the issue with WLAN poor performance that I've written about some time ago.

Wonder what triggers your issue, I'm on still on the non-NSS build after I switched to it.
I can't say I've experienced any issues other than that the wireless interfaces got disabled at one point.
I also switched to ath10k driver and firmware at the same time and I have to say WiFi is much more stable now.

OpenWrt SNAPSHOT, r18942-cbfce92367
11:17:53 up 76 days, 14:19

For reference I for example have 3 5ghz SSIDs and 2 2.4ghz SSIDs. There are also iOS devices on the network