@dtaht
It seems that sjpacket was still using the old mainline QCA9984 firmware v131 (several years old) that is currently available in the 22.03 branch. I had all kinds of weird problems with that old firmware v131, including total loss of WIFI connectivity overnight. The new board-2.bin and mainline firmware v157 have gotten rid of all such problems for me.
@sjpacket: please upgrade to the new board-2.bin and firmware v157.
Or you can just use a recent master snapshot since the master branch has been recently updated with the latest board-2.bin and QCA9984 v157 firmware.
Can someone with some authority within the OpenWrt developer community help convince the OpenWrt developers to commit the latest linux-firmware package to the 22.03 branch?
Also, a commit of the recent ATF/RRS + multicast latency fix to the 21.02 branch so the next 21.02.4 release will have nice WIFI again. For the 21.02 branch, only the 21.02.1 release has decent WIFI because 21.02.2, 21.02.3 had the troublesome Virtual time-based airtime scheduler commit
I'm not a hundred percent sure we are out of the woods yet. A couple of days ago I had to rollback my 21.02-rc5 + multicast patches because my partner was constantly complaining that she couldn't open SSH tunnels to connect to the Citrix farm she works on.
After rolling back to 21.02.1 everything is fine again. I've been thinking how to test what's going on, I tried rebooting (just in case it was uptime degradation), removing DSCP rules, disabling SQM, disabling the firewall, nothing helped. However, flent tests where showing good latency and nothing weird.
I use multiple different VPN/tunnelling methods (Wireguard, OpenVPN, Strongswan/IPsec, SSLVPN, Websocket/WebTransport tunnelling, Reverse SSH tunnelling) frequently for connectivity to different network environments and I don't have any problem with them when connecting over WIFI. I'm using R7800 master snapshot with NSS acceleration and the latest mainline QCA9984 FW v157.
Yes, @vochong, we did that to rule out a problem with the tunnels. Rolling back is the only thing that worked. To note, we are using the mt76 driver in our APs. I'll try a few more things this weekend to see if I can pin it to the driver or it's something else.
If this frushinluggin thing is finally stable I have several patches intended to demonstrate less latency... queued up.
You are also demonstrating 100ms rrul_be, which I hope is coming from your client? (Yes, I'm happy it's stable, but...)
Quick test:
T=mt76 # and other test params
for i in 1 2 4 8 16
do
flent -l 30 --socket-stats -x --step-size=.05 --te=upload_streams=$i -t $T-$i tcp_nup
flent -l 30 --socket-stats -x --step-size=.05 --te=download_streams=$i -t $T-$i tcp_ndown
done
a packet cap at the server for the single flow tests would be good.
Or we can all declare victory, clink glasses, and take august off if @sjpacket checks in positive. Anyone going to battlemesh in september? (it's in rome). I'm buying the first round....
It goes up to 80-100 ms during bi-directional load. ECN was on in client's configuration. I'll perform a new test with it on in the server/router and perform the 5 rounds you request too.
I am not sure yet, but initial testing looks promising! I hoping to test this week with more macOS clients, around 30 clients. I only had 4 clients available to test for the past few days.
You can find the packet capture in the shared folder with all the data from the tests, below you can see the console output, just in case is somehow useful.
Command:
T="v22.03-rc6 mt76 WLAN ECN-on loc servs off"; for i in 1 2 4 8 16; do flent -l 30 --socket-stats -x --step-size=.05 --te=upload_streams=$i -H openwrt.lan -t tcp_nup-$i-threads-$T tcp_nup; flent -l 30 --socket-stats -x --step-size=.05 --te=download_streams=$i -H openwrt.lan -t tcp_ndown-$i-threads-$T tcp_ndown; done
But the difference in throughput about 470 vs about 600 between down and up could be any of a number of things - inefficient txop packing, beamforming, retries, signal strength (rssi), minstrel bandwidth probes, but not (probably) the phase of the moon.
This (2016) set of benchmarks was between an osx and linux station vs the ap -> server, and in general the scheduler should force less latency than a single station test.
Honestly, I don't know what it can be. It's been like this for ever. I'm starting to think it has to do with the APs. If I connect my laptop using an USB dongle to another AP which connects to the main AP as a client, I can see the same difference between up and down. It's like AP favors downloading to itself vs uploading to a client. Some more configuration params, hope it helps.
Are the wifi lag/stall issues for R7800 fixed in 22.03.0-rc6? What about porting the fixes to 21.02.x? I lost track where we are, lots of discussion in this thread. Thanks for catching me up!
Yes, no more lag/stall issue for R7800 in 22.03.0-rc6.
Now give yourself the challenge to persuade @nbd to commit the same fixes to 21.02.x so the next 21.02.4 release will be golden again.
That would also be a great consolation prize for people running NSS image with PPPoE WAN. PPPoE/NSS is completely broken in 22.03 and Master (both on kernel 5.10 for R7800), so the only way to get NSS acceleration for PPPoE WAN is to revert back to 21.02 (kernel 5.4). But there you are with laggy WIFI issues. It's like being a little kitten trapped from both ends by 2 raccoon comrades-in-crime.