AQL and the ath10k is *lovely*

I will try to test on both directions:

flent -H some_server_on_the_other_side -t sometitle -s .05 --te=upload_streams=4 -X --socket-stats tcp_nup
flent -H some_server_on_the_other_side -t sometitle -s .05 --te=download_streams=4 -X --socket-stats tcp_ndown

I will probably increase it from 4 to 8 and finally to 16 streams. By the way, I'm not experiencing packet loss in my only ath10k device with ath10k-firmware-qca988x-ct and on kmod-ath10k-ct-smallbuffers, but its chipset is different, so probably is not a good example.

First is with the ath10k-firmware-qca4019-ct-full-htt & second is with ath10k-firmware-qca4019-ct. It might just be a weird quirk that randomly happens after some time and or with the full firmware & these tasmota devices. Note: This uses the same configuration as per the above & so 5ghz isn't in use here - only 2.4ghz.

First
filename

Second
filename

Summary of tcp_ndown test run from 2022-09-11 02:58:19.392528
  Title: 'first-download'

                             avg       median          # data pts
 Ping (ms) ICMP   :        28.68        29.35 ms              910
 TCP download avg :        13.65          N/A Mbits/s        1277
 TCP download sum :        54.59          N/A Mbits/s        1277
 TCP download::1  :        12.57        17.55 Mbits/s        1277
 TCP download::2  :        15.24        22.21 Mbits/s        1277
 TCP download::3  :        13.58        16.85 Mbits/s        1277
 TCP download::4  :        13.20        18.25 Mbits/s        1277
Summary of tcp_ndown test run from 2022-09-11 03:10:25.597397
  Title: 'second-download'

                             avg       median          # data pts
 Ping (ms) ICMP   :        21.30        21.10 ms             1399
 TCP download avg :        21.27          N/A Mbits/s        1399
 TCP download sum :        85.08          N/A Mbits/s        1399
 TCP download::1  :        21.05        20.28 Mbits/s        1399
 TCP download::2  :        20.97        20.62 Mbits/s        1399
 TCP download::3  :        24.21        23.16 Mbits/s        1399
 TCP download::4  :        18.85        18.44 Mbits/s        1399
Summary of tcp_nup test run from 2022-09-11 02:55:26.109736
  Title: 'first'

                                             avg       median          # data pts
 Ping (ms) ICMP                   :        60.71        59.55 ms             1144
 TCP upload avg                   :        12.13          N/A Mbits/s        1380
 TCP upload sum                   :        48.54          N/A Mbits/s        1380
 TCP upload::1                    :        12.86        18.10 Mbits/s        1380
 TCP upload::1::tcp_cwnd          :        75.92        79.00                 917
 TCP upload::1::tcp_delivery_rate :        14.80        14.82                 916
 TCP upload::1::tcp_pacing_rate   :        21.56        21.49                 916
 TCP upload::1::tcp_rtt           :        67.18        59.27                 915
 TCP upload::1::tcp_rtt_var       :         6.29         5.43                 915
 TCP upload::2                    :        11.67        16.18 Mbits/s        1380
 TCP upload::2::tcp_cwnd          :        70.47        75.00                 917
 TCP upload::2::tcp_delivery_rate :        12.43        10.79                 917
 TCP upload::2::tcp_pacing_rate   :        19.65        19.83                 917
 TCP upload::2::tcp_rtt           :        68.74        58.44                 913
 TCP upload::2::tcp_rtt_var       :         8.35         4.85                 913
 TCP upload::3                    :        11.02        14.88 Mbits/s        1380
 TCP upload::3::tcp_cwnd          :        64.00        70.00                 917
 TCP upload::3::tcp_delivery_rate :        12.19        10.36                 917
 TCP upload::3::tcp_pacing_rate   :        19.22        19.47                 917
 TCP upload::3::tcp_rtt           :        66.59        56.96                 916
 TCP upload::3::tcp_rtt_var       :         8.27         5.23                 916
 TCP upload::4                    :        12.99        17.93 Mbits/s        1380
 TCP upload::4::tcp_cwnd          :        74.89        81.00                 917
 TCP upload::4::tcp_delivery_rate :        15.23        16.24                 917
 TCP upload::4::tcp_pacing_rate   :        22.24        23.05                 917
 TCP upload::4::tcp_rtt           :        65.54        57.33                 917
 TCP upload::4::tcp_rtt_var       :         6.79         5.50                 917
Summary of tcp_nup test run from 2022-09-11 03:08:31.711736
  Title: 'second'

                                             avg       median          # data pts
 Ping (ms) ICMP                   :        45.22        45.45 ms             1392
 TCP upload avg                   :        18.94          N/A Mbits/s        1400
 TCP upload sum                   :        75.76          N/A Mbits/s        1400
 TCP upload::1                    :        18.29        18.79 Mbits/s        1400
 TCP upload::1::tcp_cwnd          :       100.29       101.00                 892
 TCP upload::1::tcp_delivery_rate :        17.16        16.98                 892
 TCP upload::1::tcp_pacing_rate   :        24.49        23.34                 892
 TCP upload::1::tcp_rtt           :        60.59        58.63                 889
 TCP upload::1::tcp_rtt_var       :         4.03         3.34                 889
 TCP upload::2                    :        20.06        20.04 Mbits/s        1400
 TCP upload::2::tcp_cwnd          :       115.51       109.00                 891
 TCP upload::2::tcp_delivery_rate :        18.71        18.43                 891
 TCP upload::2::tcp_pacing_rate   :        26.97        25.07                 891
 TCP upload::2::tcp_rtt           :        62.51        60.92                 890
 TCP upload::2::tcp_rtt_var       :         4.08         3.44                 890
 TCP upload::3                    :        18.45        19.24 Mbits/s        1400
 TCP upload::3::tcp_cwnd          :       101.08       104.00                 890
 TCP upload::3::tcp_delivery_rate :        17.18        17.57                 889
 TCP upload::3::tcp_pacing_rate   :        24.20        24.14                 889
 TCP upload::3::tcp_rtt           :        60.79        59.74                 887
 TCP upload::3::tcp_rtt_var       :         4.22         3.64                 887
 TCP upload::4                    :        18.96        19.64 Mbits/s        1400
 TCP upload::4::tcp_cwnd          :       104.76       106.00                 890
 TCP upload::4::tcp_delivery_rate :        17.70        17.86                 889
 TCP upload::4::tcp_pacing_rate   :        25.10        24.53                 889
 TCP upload::4::tcp_rtt           :        60.92        59.16                 886
 TCP upload::4::tcp_rtt_var       :         3.93         3.26                 886
```

what's the wifi chip on the client? the second run is better across the board,
are you sure you weren't hitting the bugs we had pre august?
-l 300 for a longer run..

This chromebook seems to be using iwlwifi - spec sheet suggests that it has an Intel Wireless-AC 9560 card.

I am now wondering if I need to reset this tasmota device though ...

29 packets transmitted, 14 received, 51.7241% packet loss, time 28557ms
rtt min/avg/max/mdev = 4.984/1881.031/4272.207/1385.128 ms, pipe 5

Interestingly, this seems to relate to switching from psk2+ccmp to sae-mixed & seems to fix by switching it back. This is kind of weird as the other devices do not have this issue. Anyways - thanks, I think we can rule this out as being related to AQL.

3 Likes

you are also showing 40+ms of bloat on your client's wifi chip. We just got that down to 8 on the mt76, hope was by reducing AQL's limits your current 20+ on the ath10k we could get down to 8 also.

1 Like

I was testing out cAP ac with OpenWrt 22.03 (as an AP running irqbalance). Paired with an Intel AX210 on a Windows client it performed better than RouterOS and had lower latency, but I noticed it had quite a bit of packet loss. Here's the measurement I did with crusader where the down direction is from the AP:

My 2013 MacBook Pro also seems to have the high packet loss. My Samsung A41 phone does not have the packet loss, but the latency is higher than RouterOS there ~40 ms vs ~20 ms. These tests were done on 5 Ghz with a 80 Mhz channel.

1 Like

this is what we were achieving on a ifferent test, in 2016: http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf

2 Likes

@zoxc you are measuring measurement packet loss, not tcp packet loss, yes?

1 Like

Yeah. It's the packet loss of the separate UDP flow doing latency measurement. I was expecting the FQ component to keep that loss down.

1 Like

I wanted to add to this old thread because to me it's not very clear from the results posted that changing the AQL TX queue length does in fact have a significant effect on both latency and throughput, and that it might be worthwile to spend time on tuning this.

Below are some examples:

tx queue limit of 5000:

tx queue limit of 1500, this seems to be a good compromise between latency and throughput:

tx_queue_limit of 500, even lower latency but now throughput takes a big hit:

I didn't use rrul tests because I've found latency in the rrul test is basically meaningless on MacOS as there appears to be so much buffering going on on the Mac itself: if I test latency from another station when rrul is running the latency is much much lower than what flent reports. (Ideally we'd plot that in these rrul graphs!). But in general the result is: the lower the AQL length the bigger the down/up disparity and marginal improvements in latency.

For the record: These test were done on an R7800 running an NSS enabled snapshot build, with the "do_codel_right" patch set to 5ms target and 50ms interval, NAPI_POLL_WEIGHT and ATH10K_NAPI_BUDGET patched to 8, all of which seem to cause no adverse effects in a typical
household of 4 - in fact, it's great :slight_smile:

A quick way to change the queue length yourself is:
for ac in 0 1 2 3; do echo $ac 1500 1500 > /sys/kernel/debug/ieee80211/phy0/aql_txq_limit; done

8 Likes

@rickkz0r I came to the same conclusion in an old post:

LGTM! I am always willing to sacrifice a little throughput for better latency, particularly with multiple users on the link.

1 Like

Just thought I'd try this for a quick test on an TP-Link A7 that is one of my AP's...

  1. It works.
  2. WOW! My lowly old A7 goes from +17ms/+1ms, to +1ms/+1ms, without much drop in DL speed.

Stock 22.03.03, A7 is about 20' and thru a wall away from test PC, 940/35Mb DOCSIS cable, x86 box handling upstream routing and SQM at 700/30Mb, usual wifi speeds in the 220's-280Mbit range. (not the best client wifi hardware at the moment)

Results hacked up from a couple of quick Waveform Bufferbloat test run .csv's

====== RESULTS SUMMARY ====== 	            			Stock AQM	    Lowered AQM
Bufferbloat Grade				                         A		        A+

====== RESULTS SUMMARY ====== 			
Mean Unloaded Latency (ms)				                   15.01		14.50
Increase In Mean Latency During Download Test (ms)	       17.07		 0.77
Increase In Mean During Upload Test (ms)		            1.22    	 1.48
			
Download speed (Mbps)        			                    213.71	    191.15
Upload speed (Mbps)					                         30.32		 30.26
			
====== LATENCY TEST DETAIL ====== 			
Unloaded - Median Latency (ms)				                14.72		13.98
Unloaded - Mean Latency (ms)				                15.01		14.50
Unloaded - 95th %ile Latency (ms)			                17.95		18.41
			
During Download - Median Latency (ms)			            32.63		14.82
During Download - Mean Latency (ms)			                32.08		15.27
During Download - 95th %ile Latency (ms)		            46.91		20.49
			
During Upload - Median Latency (ms)			                16.14		15.89
During Upload - Mean Latency (ms)			                16.23		15.98
During Upload - 95th %ile Latency (ms)			            19.55		19.57

Forgive my funky formatting, it took almost as long to beat that into shape as it did to do the graphs in Excel!

Pretty wild to see 0 or +1, on a wifi bufferbloat test! I'm going to leave this going and see how it goes for a while... this is the 3 person household main AP, so some "real world" mileage will occur.

3 Likes

Nice! But now I'm wondering where my ~10ms of additional latency is coming from :thinking:

Looks like something else if buffering in my setup. In any case, I think it's related to the relatively low speed your client is at, if I lower the channel width and/or disable beamforming throughput drops and I get less latency. Also when another station is active and stealing bandwidth this actually improves latency. But I can't get quite near 1ms yet.

For reference, which channel width, firmware and driver (ath10k or ath10k-ct?) are you running?

1 Like

For dumb AP's that are connected to routers with SQM on the WAN, what qdisc are you all using on the LAN ethernet (or recommend) on the dumb AP?

Different ones may change the test results.

The qdisc is only hit when the interface is saturated. So unless you've connected your dumb AP using 100Mb, the qdisc will never be hit. And the default fq_codel will do a fine job if that's the case.

1 Like

In general, I believe that aiming for low latency and being willing to sacrifice even as much as 10% of the bandwidth has major benefits for multiple users on a link, in eliminating jitter, and in enabling the 97% or more of all flows that don't need bandwidth to always achieve low latency.

1 Like

Yes, I'm totally with you, but in this case I need to go from ~700 to ~200 mbit to see significant benefits, and that's a bit much. But it makes me think there's another layer somewhere (could very well be in the client) that starts buffering only at those higher rates. I'll do some more experimenting.

Channel width is 80mhz.
Edit: AP is running stock 22.03.03, with the stock FW and drivers below.
Firmware: ath10k-board-qca988x 20220411-1, ath10k-firmware-qca988x-ct 2020-11-08-1
Driver: kmod-ath10k-ct 5.10.161+2022-05-13-f808496f-1.

The C7/A7's run the same (slightly different base chip for 2.4Ghz, same? chip for 5Ghz) Wave 1 not Wave 2 chipsets, and I am running a TP-Link T4U USB dongle (2x2 not 3x3?) in the main PC, so on both ends there may be reasons preventing great speed. I've never seen much above 280Mbts under best conditions, despite occasionally seeing 877mbit on both sides for the link rate.

At the moment, I'm getting as much as 3-5ms latency, occasional lower, but still a lot lower than the +17 or more usual on the download. Last night was probably pretty low load on 'ol Ch 36.

Just did some testing with the fam TV playing a You Tube video.. maybe nearly as low, except for a few very delayed samples in the 125 and 250ms range.

For reference, the two 0 value samples are graph artifacts of the blank lines and or comments between the unloaded, download and upload. I left them in this graph, they provide markers showing where the different test stages are. So, its only in the download test where I'm getting the long delay events.

Second edit:
I enabled the "Multi to Unicast" selection that appeared in very recent 22.03.x (just .03?). Haven't A/B tested it to see if its an influence on the long delays. Short term testing seems to indicate less length in latency of those long samples, less of them, and now scattered between upload and download rather than upload only.

3 Likes

Which current versions of openwrt for the Archer c2600 have AQL enabled for the wifi drivers? I've tried v23 latest default build and /sys/kernel/debug/ieee80211/phy0 and phy1 directories don't have any aql sysfiles. Is it the snapshots that have it enabled?