AQL and the ath10k is *lovely*

I'll give it a shot when I have time.

This is probably naive and may be the wrong way to do this, but I simply ran a capture in Wireshark and added the following filter to see how many retries I could spot: wlan.fc.retry ==1

But just to be clear, I didn't do it for this round of tests; I just did a quick test a while back to check how low the signal strength would have to be in order to get a lot of entries with that filter enabled.

What do you mean by this? In theory AQL is completely orthorgonal to the station scheduler (except that if there are no packets queue we obviously won't schedule that station)...

aql is per station? aql_threshold also?

Ran another test with aql_threshold values 3000 and 1500. I also included 6000 as a reference point. And the results seem to indicate that going lower than 6000 doesn't have any benefits, but it also doesn't seem to have any negative impacts in my test setup. Again, these results are based on single tests for each of the threshold values, and finer details may have been lost in the inherent noise these sort of tests have.

First laptop (768 weight)

Second laptop (256 weight)

Flent files:

Wireshark captures:

@tohojo, do you find these tests useful? Is there anything else you would suggest testing?

your throughput numbers are BETTER with less AQL

You guys are not testing what you think you're testing :wink:

The explanation requires diving into how AQL works a bit: The queue limits are per-station, but there are actually two per-station limits. A low limit (default 5ms) and a high limit (default 12ms). The low limit is a lower bound on throttling (meaning a station will always get to queue up to the low limit), and the high limit is applied if the total airtime for all stations is below aql_threshold. I.e., with the defaults, two stations can queue up to their high limit (12ms) before hitting the threshold, so the low limit is only applied if more than three stations are active at the same time.

This means that when you're lowering the aql_threshold, you're just tightening the bounds within which the low limit will get applied, you're not actually changing the limit. Given that you're only testing with two stations this has the immediate effect of pushing the two stations down to their low limit most of the time, and as you lower the threshold further towards 5ms, you'll get the low limit an increasing fraction of the time. And of course this explains why you're not seeing any difference after lowering the threshold below 6ms: below that, a single station will use up to the threshold, so you're effectively disabling the high limit entirely. But lowering the threshold below that won't make any difference, and any variation between those low values is just noise...

If you want to change the actual queue limits, you'll want to fiddle with the aql_txq_limit sysctl, which will also show you the actual limits:

# cat /sys/kernel/debug/ieee80211/phy0/aql_txq_limit 
AC	AQL limit low	AQL limit high
VO	5000		12000
VI	5000		12000
BE	5000		12000
BK	5000		12000

As you can see, the limits are set per AC, and you can change them by echoing 'ac low high' into that file. So if you just want to test with a fixed queue limit for all stations, you could do something like:

# LIMIT=5000 ; for ac in 0 1 2 3; do echo $ac $LIMIT $LIMIT > /sys/kernel/debug/ieee80211/phy0/aql_txq_limit; done
# cat /sys/kernel/debug/ieee80211/phy0/aql_txq_limit 
AC	AQL limit low	AQL limit high
VO	5000		5000
VI	5000		5000
BE	5000		5000
BK	5000		5000

These values can also be set per-station (in the 'aql' file beneath the station dir in sysfs), but if you haven't customised the station values, changing the global values will also update all stations.

Also, I do agree that testing with lower values is worthwhile - the defaults are probably too high for general usage. I believe that they are set to avoid thoughput drops at very high rates (where it's difficult to fill the buffer fast enough to form big aggregates) and at very low rates where single packets take up a lot of time. So verifying results at those extremes is probably going to be necessary to gain any traction with changing the defaults upstream :slight_smile:

1 Like

thx for the detailed explanation. Me... I'd like
to shoot for a 2ms txop using every tuning param we can try....

aql is in time, not bytes?

Well, using my example above with LIMIT=2000 should limit it to 2ms as an upper bound...

Yup, hence the 'a' in AQL instead of the 'b' from BQL :wink:

usecs, specifically...

BTW, finally posted the updated virtual airtime-patch upstream: https://lore.kernel.org/linux-wireless/20210318213142.138707-1-toke@redhat.com/T/

It's mostly identical to what you've been testing, but here's an updated OpenWrt repo with the exact same version, just in case: https://github.com/tohojo/openwrt/commit/8d7ae1f6b1fa776280c3bb23681a07c0e42879cf

3 Likes

Can someone run this iperf test on fq_codeled 5GHz wireless?

iperf -s on server.

ping 192.168.x.x -t (to server)

iperf -c 192.168.x.x -w 1M -P 5

As far as I know, I heard that fq_codel is implemented in opnewrt's wireless driver.
AQM plays a great role in wired connections, but it is weak in high-speed wireless connections.

In the case of multiple simultaneous TCP connections from one client, for example Internet download manager, Jdownloader, steam, blizzard battle.net and torrent, wireless shows poor latency management.

I am solving these by dscp tagging in windows qos policy and putting them into the wmm bulk queue.

It works fine, but I am wondering if there is a device that does a good job of managing latency at the wireless driver level without this process.

Here is what I have done on my Asus RT-AC86U.
J4105 openwrt box (iperf server)
RT-AC86U (AP)
Windows 10 laptop Intel AC8260 (iperf client)

Reply from 192.168.50.1: bytes=32 time=1ms TTL=64
Reply from 192.168.50.1: bytes=32 time=1ms TTL=64
Reply from 192.168.50.1: bytes=32 time<1ms TTL=64
Reply from 192.168.50.1: bytes=32 time=1ms TTL=64
Reply from 192.168.50.1: bytes=32 time<1ms TTL=64
Reply from 192.168.50.1: bytes=32 time=91ms TTL=64
Reply from 192.168.50.1: bytes=32 time=94ms TTL=64
Reply from 192.168.50.1: bytes=32 time=82ms TTL=64
Reply from 192.168.50.1: bytes=32 time=80ms TTL=64
Reply from 192.168.50.1: bytes=32 time=84ms TTL=64
Reply from 192.168.50.1: bytes=32 time=97ms TTL=64
Reply from 192.168.50.1: bytes=32 time=84ms TTL=64
Reply from 192.168.50.1: bytes=32 time=14ms TTL=64
Reply from 192.168.50.1: bytes=32 time=87ms TTL=64
Reply from 192.168.50.1: bytes=32 time=61ms TTL=64
Reply from 192.168.50.1: bytes=32 time=1ms TTL=64
Reply from 192.168.50.1: bytes=32 time=1ms TTL=64
Reply from 192.168.50.1: bytes=32 time=1ms TTL=64
iperf-2.0.9-win64>iperf -c 192.168.50.1 -w 1M -P 5
Client connecting to 192.168.50.1, TCP port 5001
TCP window size: 1.00 MByte
[  7] local 192.168.50.125 port 8772 connected with 192.168.50.1 port 5001
[  6] local 192.168.50.125 port 8771 connected with 192.168.50.1 port 5001
[  4] local 192.168.50.125 port 8769 connected with 192.168.50.1 port 5001
[  5] local 192.168.50.125 port 8770 connected with 192.168.50.1 port 5001
[  3] local 192.168.50.125 port 8768 connected with 192.168.50.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  7]  0.0-10.0 sec   115 MBytes  96.0 Mbits/sec
[  6]  0.0-10.1 sec   115 MBytes  95.9 Mbits/sec
[  4]  0.0-10.0 sec   112 MBytes  93.7 Mbits/sec
[  5]  0.0-10.0 sec   138 MBytes   115 Mbits/sec
[  3]  0.0-10.0 sec   111 MBytes  93.3 Mbits/sec
[SUM]  0.0-10.1 sec   591 MBytes   493 Mbits/sec

I don't know why someone has hidden my post.
I'm wondering how well the fq_codeled ath10k driver manages latency in a single best effort queue.

Since this is a forum for developers, is it only allowed to post on improvements or debugging?

I am puzzled as to why this message is obscured. The variability you see is typical of wifi when subject to interference and retries. However I don't know if the chipset
in the Asus RT-AC86U has the code we are talking about,. in it? You can tell if aqm is enabled by cat /sys/kernel/debug/iee*/phy*/aqm

It's stock firmware.
openwrt is not supported and I was wondering how much your efforts into the mt76, atk9k and atk10k chipset shows a better number.
All I can do in the stock firmware is to adjust the ring buffer size.
I have found that reducing this number reduces latency, but throughput also decreases.
The default setting is id 3 and besteffort has 2048 ring buffer size.
I am currently using 1024.
If I go down to 256 or 512, I can see a significant improvement in latency.

dhd -i eth6 rnr_flowring_profile
[id] [ ac_bk ] [ ac_be ] [ ac_vi ] [ ac_vo ] [ bc_mc ]
 *0   01:1024   -1:1024   -1:1024   -1:0512   00:0512
  1   01:1024   -1:2048   -1:1024   -1:0512   01:0512
  2   01:1024   -1:2048   -1:1024   -1:0512   01:0512
  3   01:1024   -1:2048   -1:1024   -1:0512   00:0512
  4   -1:1024   -1:2048   -1:1024   -1:0512   01:0512
  5   01:1024   01:2048   01:1024   01:0512   01:0512
  6   01:1024   02:2048   04:1024   08:0512   01:0512
  7   01:2048   01:2048   01:2048   01:2048   01:2048

Should I see it as the limitation of wireless connection?
Internet speeds are getting faster, and programs that use tcp simultaneous connections are also increasing, especially as the file size of games increases.
Would it be better on wifi6?

One of today's commits to master:

mac80211: merge the virtual time based airtime scheduler

Awesome!

2 Likes

Hello everyone, sorry for joining late the party !!!

I am running many routers and ap's and do suffer from latency and packet loss under some load. I have all my devices running recent snapshots, I do see aql nodes on sysfs like so:

/sys/kernel/debug/ieee80211/phy1/aql_threshold
/sys/kernel/debug/ieee80211/phy1/aql_txq_limit
/sys/kernel/debug/ieee80211/phy1/aql_enable
/sys/kernel/debug/ieee80211/phy0/netdev:wlan0-1/stations/a4:34:d9:29:e9:56/aql
/sys/kernel/debug/ieee80211/phy0/aql_threshold
/sys/kernel/debug/ieee80211/phy0/aql_txq_limit
/sys/kernel/debug/ieee80211/phy0/aql_enable
root@openwrt-c6v2:~#

I read some pages saying to put sqm on the wifi interface, some other talking about just codel, some about airtime fairness, some about aql. So, as this aql seems to be integrated on the snapshots, how do I make it solve the latency/packet loss ? Do I need to enable anything ? Some echo command ? Any way to see if it is working, if there is a tuning parameter, any way to test the results ?
For the testing, I installed flent + flent-gui on a linux box, installed netperf + flent-tools on the routers, execute some tests but they seem incomplete.

I have a central router (c7 v5) + 3 ap's (re450, c6v3.2, c6v2), all connected to each other with 802.11s mesh both bands. I am not satisfied at all with overall result.

I think it's enabled by default

The netperf data rate might be limited by the router's CPU. I suggest to set up netperf on another linux box instead. You could still instruct flent to collect some stats from the router via SSH, but I have not tried this.

For a start, I suggest testing only a single wireless hop with each run, like this:

  • netperf box --(wireless)-- AP --(wire)-- netperf box
  • netperf box --(wire)-- AP --(wireless)-- AP --(wire)-- netperf box

In addition, make sure you know which frequency band you are testing - perhaps disable the other during the test. Ensure the channel is clear from interference as much as possible, and the signal is strong enough.

Is there any way to check that ? some counters ? specific tests ? some tuning options ?

you could try the cloud-based test https://www.waveform.com/tools/bufferbloat, when connected via WiFi.
This test shows high http-ping (its not real icmp pings) times during a http app provided load test, if buffer management somewhere on the connection has issues.

But its an end-to-end test. It may be due to SQM, AQM or client-side buffer mngement (cfos, killer network card…) or a combination that you may or may not notice lags in this test.

Usually ping times under this load test peak up by more than ~ +300ms, if buffer management somewhere on the connection sucks.
For practical reasons, an additional device without buffer management might be helpful for comparison.

@Pico thanks for the link ! I was already using dslreports, but this one is pretty cool too !

Apparently my bufferbloat is fixed, I got an A and have, as gateway, a RPi4 with SQM and cake, it seems to work well. My problem is more local, related to wifi, because I dont have tri-band AP's or an industrial mesh solution, I only have cheap devices, and I suppose some saturation of wireless channels and some local tuning. As those devices are only biband, they obviously use a particular channel for both AP to stations, and AP to rest-of-the-mesh. This may be bad if I used a lot of bandwidth, which is not the case, as we are only 5 persons at home, using regular netflix/meet/gaming. I barely use more than 50Mbps overall WAN-ly. So, there is something wrong with queues, buffers, softirq's etc... which those timers, aql and stuff may help to fix ! But, how do I see what causes the latency to spike, the packets to be lost ? Where do I see any log about the decision to drop the packet, the reason of the high lag ?