AQL and the ath10k is *lovely*

You can download latest 21.02-SNAPSHOT image from https://asu.aparcar.org/ or https://downloads.openwrt.org/releases/21.02-SNAPSHOT/targets/ipq806x/generic/ (ipq806x, for example). The 21.02-SNAPSHOT build contains the latest commits from 21.02 git branch so it is a bit more stable compared to git "master" SNAPSHOT build.

OK, good to know.

I guess I'm basically wondering about the differences I see, in the shortlog list for "Heads" and the corresponding "Tags" In other words, I see in "Tags" for say, 19.07.8, with all the commits up to the release date. In "Heads" for the same release, I see commits that were submitted past the release date.

Assumably, those are later commits that could apply to that release (seeing different sets for different ver's) and maybe one could build a release with them in it. I don't know about that.

This is all about educating myself on what's up to what, which versions have which level of a particular feature's development, etc.

And, I used to run the latest nightly snapshots, usually with great results, till they occasionally didn't... and there was great gnashing of teeth in the happy home... Then, there's the added package incompatibility if you want to add something later.

So back to my question, on how much of the latest stuff is in 21.02.0, I am opening windows between version's shortlists and comparing... one way to do it I guess. And between 19.07.8, 21.02.0 and Master, we have 3 sets of commits. ka2017's list are all earlier last year and that's in 21.02.0, except for that last one, but I'm wondering about castie's statement. I see it in Master, but not 21.02.0. Please show me if I'm wrong...

Lastly, I believe I've read that the virtual time developments are just coming out and some or all may not be in the current release version. And maybe a bugfix with the most recent hasn't migrated to Master yet, as well? :wink: Would be nice if there was a clearer "status of the project", but I also don't want to joggle elbows of the developers who devote their time to work out this stuff...

Oh! So, is that everything in that branch, basically what I was asking for? Doh! Difference between "snapshot" and "master"? If so, sorry for subjecting everyone to the verbosity...

Getting back to the topic at hand, loaded 21.02.0 snapshot on my C7 AP, and am seeing latency improvements on ath10k, over what I had been seeing on 19.07.7 -19.07.8

The commit "mac80211: merge the virtual time based airtime scheduler" has been added to git "openwrt-21.02" branch which means it will be in the next 21.02.2 release.

"master" branch: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=a5888ad6b33840d913438ce664c0e7da7e7f53e6

"openwrt-21.02" branch: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=68886f301850c6c35bdff002a719fed1ded584c9

3 Likes

I had to shut down my lab a while back, so I'm not in position to test. I keep hoping to find someone to repeat my tests this thread started with, not just on this, but the mt76.

[slightly scared posting here, but it was the start of this thread that inspired me in the first place so hopefully that's OK, and am also a big fan of SQM which transformed our lockdown life, so anything I can do to contribute...]

Not quite the same tests you wanted @dtaht and not from Head but I have been testing OpenWRT 21.02.1 vs the stock firmware on a Unifi 6 Lite (mt76).

Setup is a linux box running variations on flent rrul_be, wired to an OpenWRT router which in turn is wired to the AP. Ran netserver on an M1 MacBook Air to act as the target, so all LAN for the test.

I also tried some (not very scientific) additional loading of the AP by having a few other devices connected to it streaming videos while the tests were running.

BEFORE - Unifi stock firmware (current at time of writing).
unifi_6_lite_stock_firmware

AFTER - OpenWRT 21.02.1
unifi_6_lite_openwrt_21.02.1

I was amazed by the difference to be honest - so I then spent a happy day or two reflashing all the other APs in the house.

Happy to run others if the test setup is adequate, and/or to make adjustments to tests. Could even reflash one of the APs back to Unifi stock if that's a useful comparison.

7 Likes

Very nice validation of @nbd 's work on the mt76!

The latest OSX has a rrul-like /usr/bin/networkQuality test. I'd be very interested in th output from that.

1 Like

Thanks @dtaht! I've been using networkQuality on and off but haven't kept thorough records. The stock Unifi firmware on my APs (mix of nanoHD and 6 Lite) was showing anything from 2000-3000 RPM to the default Apple test server, and I'm seeing similar results after flashing to OpenWRT but:

  1. networkQuality normally tests to an external server of course (where my previous tests were LAN only), so I guess my router in the middle and the usual vagaries of the internet will play a part as well; I've therefore tried something to focus on the wireless leg of the journey (see below)

  2. the RPM metric seems to vary a fair bit at the top end, so without collecting many more samples I can't be certain of a difference and I don't think there's a command-line switch to gather data for longer. (I guess I should write a script and store a collection of test runs.)

For instance, three runs in quick succession to the Apple servers gave RPMs of 3254, 2546 and 2742:

% networkQuality -v                                          
==== SUMMARY ====                                                                                         
Upload capacity: 10.514 Mbps
Download capacity: 53.944 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (3254 RPM)
Base RTT: 11
Start: 19/12/2021, 15:18:48
End: 19/12/2021, 15:18:58
OS Version: Version 12.1 (Build 21C52)

% networkQuality -v
==== SUMMARY ====                                                                                         
Upload capacity: 10.679 Mbps
Download capacity: 54.232 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (2546 RPM)
Base RTT: 11
Start: 19/12/2021, 15:19:00
End: 19/12/2021, 15:19:10

% networkQuality -v
==== SUMMARY ====                                                                                         
Upload capacity: 10.192 Mbps
Download capacity: 54.281 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (2742 RPM)
Base RTT: 8
Start: 19/12/2021, 15:19:12
End: 19/12/2021, 15:19:22

To focus more on LAN-side wifi performance, I used the networkQuality server which the Apple folk have conveniently open-sourced, to run a LAN only test, though again I'm seeing high variability.

Here, for instance is a run of three tests in succession to a local wired machine (note the higher throughput) which gave RPMs of 2990, 2455, 1365 and 2064.

% networkQuality -C https://my.local.server:443/config -v
==== SUMMARY ====                                                                                         
Upload capacity: 341.823 Mbps
Download capacity: 440.442 Mbps
Upload flows: 12
Download flows: 20
Responsiveness: High (2990 RPM)
Base RTT: 3
Start: 19/12/2021, 15:15:24
End: 19/12/2021, 15:15:39
OS Version: Version 12.1 (Build 21C52)

% networkQuality -C https://my.local.server:443/config -v
==== SUMMARY ====                                                                                         
Upload capacity: 342.780 Mbps
Download capacity: 257.074 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (2455 RPM)
Base RTT: 11
Start: 19/12/2021, 15:15:44
End: 19/12/2021, 15:15:54
OS Version: Version 12.1 (Build 21C52)

% networkQuality -C https://my.local.server:443/config -v
==== SUMMARY ====                                                                                         
Upload capacity: 371.912 Mbps
Download capacity: 345.958 Mbps
Upload flows: 12
Download flows: 20
Responsiveness: High (1365 RPM)
Base RTT: 11
Start: 19/12/2021, 15:15:59
End: 19/12/2021, 15:16:15
OS Version: Version 12.1 (Build 21C52)

% networkQuality -C https://my.local.server:443/config -v
==== SUMMARY ====                                                                                         
Upload capacity: 378.863 Mbps
Download capacity: 210.255 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (2064 RPM)
Base RTT: 3
Start: 19/12/2021, 15:17:55
End: 19/12/2021, 15:18:05
OS Version: Version 12.1 (Build 21C52)

% networkQuality -v
==== SUMMARY ====                                                                                         
Upload capacity: 10.192 Mbps
Download capacity: 54.281 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (2742 RPM)
Base RTT: 8
Start: 19/12/2021, 15:19:12
End: 19/12/2021, 15:19:22
OS Version: Version 12.1 (Build 21C52)

I ran the go version of networkQualityd on a wired ethernet my.local.server and then networkQuality on the same Mac laptop (M1 Air) used for the previous tests. Invocation on the server was sudo ./networkqualityd --cert-file /etc/letsencrypt/live/my.local.server/fullchain.pem --key-file /etc/letsencrypt/live/my.local.server/privkey.pem --public-name my.local.server -domain my.local.server -listen-addr 192.168.1.2 -base-port 443

Happy to do tests comparing stock Unifi f/w against OpenWRT on identical APs, but wanted to share this approach first to check the methodology is sound.

Well... we are laboring forward towards methodologies that work. For starters the apple folk are attempting to rigorously define and refine the methodology for the IETF over here: https://github.com/network-quality/draft-cpaasch-ippm-responsiveness and I'd recommend direct feedback.

Happy with the wifi test. Would love to see any difference on the openwrt one. The big thing worth varying to me is to test closer to the edge of your coverage range - 50' away, or through a wall or two. There latencies should still stay flat also ( https://blog.cerowrt.org/flent/airtime-c2/latency_flat_at_all_rates_cdf.svg ) and go completely to hell with ubnt. But the might also go bad on the m1 side, and testing further just uploads or just downloads pinpoint things.

sqm is on on your to-apple test?

EDIT - I completely misread that the test was wired wifi m1, thinking it was all wired, and thus wrote the following:

Your results for the local wired rpm test are puzzling.

  1. One test reported 3ms baseline rtt, the others, 11. On wifi I can see an error like that, but wired??
  2. One test had 20 uploads, the other, 12.
  3. A wired test should be able to achieve, oh, about 880Mbit simultaneously in both directions, unless its run out of cpu (for crypto), or hit some other limit in the stack on either side, including being out of cpu or unable to context switch fast enough... or go... re-running the rrul_be, wired, might help rrul crypto out. monitoring cpu also. (flent can gather cpu_stats btw at least on linux)

"my.local.server" is a?

EDIT: oh, that was a m1 -> ubnt wifi -> local-server test. :faceplant:

Can ya put in an ethernet on the m1? The comment below and above was also based on me thinking you were going wired to wired, and I have to go think about the shape of the numbers over wifi, instead, and it would be nice to have an ethernet baseline to reason from.

  1. Any way that goes down, responsiveness, in my mind, should go up. Way up.

Theoretically, if we didn't have all the buffering we do, transit of a single gbit packet takes 13us. 12 flows, well, call it 160us*2 (with acks), so a theoretically achievable revolutions per minute would be: (leaving blank because I fear I've dropped a decimal point)

A packet capture of the to-apple test with ecn on and off would be interesting. (tests sqm)

same question for m1 -> wireless openwrt -> local server

to enable ecn on osx: sudo sysctl -w net.inet.tcp.disable_tcp_heuristics=1
on linux it's:

sudo sysctl -w net.ipv4.tcp_ecn=1

I use ecn primarily as a debugging tool. "Did that drop in throughput come from the aqm?"

thx so much for whatever you can coax yourself into doing for the sake of the internet. Over here there's someone also going to town testing mikrotik's new release, where I had a lot of teaching (and being taught!) moments:

https://forum.mikrotik.com/viewtopic.php?t=179307

The latest openwrt is out. I am wondering if y'all could upgrade and retest. Pretty sure nothing broke...

I also have a new optimization after that that will reduce bandwidth slightly but improve latency by a lot more. I think.

1 Like

And this causes issues on R7800 for some of us :frowning:

If I go back to 21.02.1 no wifi problems, R7800 uptime 38 days. I think this patch needs to be reverted until we figure out what is going on. FWIW, my R7800 is on mainline ath10k firmware and driver, not on CT.

1 Like

Thx for the heads up on this one. I don't have this hardware. Which of the -ct firmwares is this on?

It is not on CT, it is on mainline ath10k firmware and driver. It may have to do with devices leaving the network, e.g. phone leaving the house, it feels like the AP is then stuck perhaps still trying to send to that device, but after a few minutes the device is dropped from the AP and the rest of the devices seem to recover.

Have we ruled out that this WiFi stall-issue only occurs on ath10k (old, mainline) driver and not on ath10k-ct drivers?

i do wonder if a simple fix would be bypass aql for device with very low signal... i mean the problem seems really to be when a device is disconnected when it's too far from the router. In that specific case the algo try to optimize everything and fail as the device never actually respond back with the correct stats... @quarky any hint about this?

I would like to do some testing, is it possible to reduce CoDel target to 10 ms (currently is 20 ms). I know this might cause unnecessary drops, but I'm running a base rate of 24 Mbps without issues and this is for testing and don't mind reduced throughput at this stage.

Can it be done changing parameters through debugfs?

Sorry to bring this back to life. Was this change made at compile time? Or, is it exposed to be configured?

Well, my test with my E8450 didn't unveil anything in particular tho. I do notice that when my device is connected to the 2.4GHz interface, it tends to lag more than the 5GHz interface. Maybe we should have different AQL limits for different speed interfaces?

When I'm testing on my R7800, the signal strength is strong and I'm testing on the 5GHz interface only, and after a few days uptime, with client connecting and disconnecting, I hit extreme lag, and only a restart of WiFi or router will fix it.

So it looks like a combination of AQL and the new virtual time-based scheduler and looks like it's particularly bad with the ath10k driver. Switching back to the round-robin scheduler stopped all complains about Wi-Fi connectivity for the R7800 (21 days uptime and counting.)

So at the moment, it is still a mystery as far as root cause is concerned, well, as least for me.

2 Likes

are we sure the different algo is the only change done to the code?