I also had the issue of both 2.4GHz and 5GHz wireless interfaces being unexpectedly disabled randomly, when running the previous 22.03 NSS ath10k-ct image on R7800. Since then, I have switched to non-ct ath10k driver/firmware but it will some time for me to see whether the condition may happen again.
I have also noticed that the non-ct firmware/driver have better range and higher throughput as well. Definitely a big improvement for me. I just need to wait and monitor its reliability over time.
I use tcpdump to analyse ingress or egress traffic routing issues. AFAIK it’s not suitable to analyse performance issue.
Still a mystery for me at this point in time tho. I’m hoping what I found recently is the root cause.
I have issues with Wi-Fi roaming with 22.03 and 21.02 builds for R7800. When switching to the nearest AP device, the internet on my iPad (VPN on) disconnects. This also happens when switching between 5 and 2.4GHz networks under the same ESSID. To get internet, I need to manually reconnect VPN network. This does not happen with hnyman's builds with the same settings. I'm using a mainline ath10k driver.
What’s the behavior without VPN on on this NSS capable build?
I also have multiple AP’s using this build, roaming works fine. But I don’t use VPN on my WiFi clients.
Everything works fine without active VPN connections. The type of VPN protocol (WireGuard, IKE2 or OpenVPN) does not matter - 100% reproducible from the first try to switch to another AP. I had to rename all ESSIDs, which is a big inconvenience.
I’m not an expert, but you would think that this might be caused by offloading to the NSS cores. But then I wonder if you would have the same behavior when using original OEM image from the hardware vendor?
What VPN provider are you using? I use one that has a client that will automatically reconnect is the connection is disconnected by any other reason that me purposely disconnecting.
If that OEM image from a vendor would not let VPN’s disconnect when roaming then I’d guess the cause for disconnect-vpn-on-roaming could be related to the way NSS offloading is implemented in OpenWRT. But I’m no expert on this matter.
- Are there developers/contributors that have experience with VPN active on client devices with this or any NSS enabled OpenWRT build?
- Are there experiences from others that use VPN on client devices, that roam to another AP but all is on OEM-builds? (I think this particular use case is hard to find on this forum)
- Are there developers/contributors that have an understanding of how VPN traffic in particular could be forced to reconnect when roaming to a different AP, even from one band to another on the same device when NSS cores are in the chain?
Hi everyone, tired of random wifi lost or router crash (r7800) at least 1 per week.
I decided to upgrade to R7800-20220508-Stable2203NSS-ath10k-sysupgrade.bin today, thinking maybe some improvements make it better, but I notice NSS is not enabled on speedtests I only get arround 400mpbs and cpu at 100%, in my last installation 21.03 I got arround 900mbps and no cpu consumption, but of course not stable router
Do i need to enable something on this new version?
This issue happens with different VPN providers. I performed reset to defaults and it did not help. I do not have such problems with other non NSS builds, but I do need hardware offloading (500 Mbit network).
Okay so I did some further testing and while I didn't have a complete "shutdown" of my 5ghz wifi SSIDs I experienced some regular slowdowns. So to make sure that this wasn't something else I compiled to firmwares that are identical only one has your patch and one hasn't. Then I used both for a while.
So far without your patch I don't have these slowdowns.
And without total shutdown of the 5GHz interface!
Did you actually mean "So far WITH quarky's patches, I don't have these slowdowns." ?
I am trying to run qsdk11 on kernel 5.15. My machine is Netgear R7800 (ipq806x).
I downloaded all the patches from the repo https://github.com/bitthief/openwrt/tree/ipq807x-5.15 for ipq807x and added the missing ones for ipq806x. Everything has been built. The router starts up, but the communication on the switch does not work. I need a specialist to help me patch the qca-nss-gmac driver. Other drivers probably work, because after WIFI I have access to the router and I didn't see anything special in the logs.
Kernel logs https://paste.in/xpJOTT
̶M̶y̶ ̶s̶o̶u̶r̶c̶e̶s̶ ̶h̶e̶r̶e̶ ̶>̶ ̶h̶t̶t̶p̶s̶:̶/̶/̶g̶i̶t̶h̶u̶b̶.̶c̶o̶m̶/̶S̶q̶T̶E̶R̶-̶P̶L̶/̶o̶p̶e̶n̶w̶r̶t̶/̶t̶r̶e̶e̶/̶o̶p̶e̶n̶w̶r̶t̶_̶t̶e̶s̶t̶ ̶<̶
I'm on hold on a project.
No these are two separate issues:
- The original issue. 5ghz wifi shuts down completely meaning clients loose connection and can't find the network anymore. In Luci the interface is shown as inactive. This is what quarky's patch is supposed to solve. Mostly happens after several days of usage.
- The issue apparently introduced by quarky's patch. Wifi slows down, meaning pages barely load, videos stall or buffer a lot. This can happen after a few hours of usage and occurs much more frequently.
The plot thickens!
Quarky's approach (Keep AQL / Change virtual time-based scheduler to round-robin)
Con: WIFI slowdown (YES according to th3voic3, but NO according to quarky)
Pro: No random WIFI interface shutdown
Kong's approach (Remove AQL altogether)
Con: None (according to Kong. He's the only one using his special build for now)
Pro: Total reliability for many days
Put all the blame on Google:
To be honest, I'm kind of confused what has been tested at this point though.
To keep the record straight, what I found was that:
The switch over to the new virtual time-based airtime scheduler at the end of 2021 has affected my ipq806x routers (R7800 and Askey RT4230W). Both routers exhibit high latency after a period of use of the routers, usually after 3-4 days of uptime.
Due to this high latency issue, what I did was that instead of using the new virtual time-based airtime scheduler, I reverted (as explained in this post) my builds to the previous round-robin airtime scheduler. This completely resolved the high latency issue that I'm facing. If you want to test if the new virtual time-based scheduler is causing WiFi problem for you, you would need to build a clean pull of openwrt-21.02 and remove the two patch files as suggest in my post.
What I found recently is that I suspected that the mainline ath10k driver used by OpenWrt may be the root cause of the high latency issue when using the new virtual time-based airtime scheduler. So I proposed a patch as explain in this post. To test this patch, you would need to build a clean pull of openwrt-21.02 and apply the suggested patch to it and see if the WiFi issue you may encounter with recent OpenWrt builds are resolved. I have not tested this theory myself as both my ipq806x is on production duty and not available for testing. That why I'm asking for volunteers in the forum to help with the test.
Hope the above clears the air on the issues and suggested solutions that I'm facing.
@th3voic3 Thank you for taking the time to test out my suggested patch. It looks like the root cause may not be what I had imagine. Back to the drawing board then. I'm quite certain though that my suggested patch in 3. should be beneficial as the original code doesn't make sense.
No problem. Keep in mind though that all my testing was done with the most recent master not 21.02. As you said it shouldn't make a difference, but yeah.
Rebased the master build yesterday. Includes some changes for the rt4230w that should properly enable NSS hardware offloading.
I see that we have now ath10k firmware ver 10.4-220.127.116.11-00156 in master build.
Sounds good. Let's see what the people with Wi-Fi issues will report hereafter.
With this firmware version 00156 I can confirm once again that I have considerable (20-30%) Wi-Fi performance increase on my Laptop that is in a distant room as I have written in a previous post.