AQL and the ath10k is lovely

amteza · September 2, 2022, 9:00pm

@dtaht, here you go—same test variables, MCS, rate, channel, etc. The only difference is that I did previous tests in macOS 12.5, and now they are performed in macOS 12.5.1. So I hope it won't be a big difference.

Without further ado, a flent rrul_be test with 300 s duration follows:

Following, you can see TCP download and upload ping tests results:

Note: I redid this test 3 times, and on all occasions, I found that a 1-threaded download provides better ping values. Not sure if this is a glitch.

And, just for completeness I did an iperf -c openwrt.lan -e -z --bounceback test:

------------------------------------------------------------
Client connecting to openwrt.lan, TCP port 5001 with pid 67119 (1 flows)
Write buffer size:  100 Byte
Bursting:  100 Byte writes 10 times every 1.00 second(s)
Bounce-back test (size= 100 Byte) (server hold req=0 usecs)
TOS set to 0x0 and nodelay (Nagle off)
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.151%en0 port 54280 connected with 192.168.1.1 port 5001 (bb len/hold=100/0) (sock=5) (icwnd/mss/irtt=11/1448/3000) (ct=2.65 ms) on 2022-09-03 06:08:14 (AEST)
[ ID] Interval            Transfer    Bandwidth         BB cnt=avg/min/max/stdev         Rtry  Cwnd/RTT    RPS
[  1] 0.0000-1.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.856/2.425/3.961/0.514 ms    0   12K/3000 us    350 rps
[  1] 1.0000-2.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.794/2.447/3.750/0.380 ms    0   13K/3000 us    358 rps
[  1] 2.0000-3.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.052/2.482/6.341/1.184 ms    0   14K/3000 us    328 rps
[  1] 3.0000-4.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.854/2.510/4.140/0.531 ms    0   15K/2000 us    350 rps
[  1] 4.0000-5.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.887/2.314/4.334/0.644 ms    0   16K/3000 us    346 rps
[  1] 5.0000-6.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.175/2.522/7.153/1.435 ms    0   17K/3000 us    315 rps
[  1] 6.0000-7.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.731/2.522/3.877/0.413 ms    0   18K/2000 us    366 rps
[  1] 7.0000-8.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.718/2.537/3.785/0.378 ms    0   19K/3000 us    368 rps
[  1] 8.0000-9.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.759/2.534/5.029/0.971 ms    0   20K/3000 us    266 rps
[  1] 9.0000-10.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.208/2.517/7.084/1.401 ms    0   21K/3000 us    312 rps
[  1] 0.0000-10.0222 sec  19.7 KBytes  16.1 Kbits/sec    101=3.039/2.314/7.153/0.958 ms    0   21K/12000 us    329 rps
[  1] 0.0000-10.0222 sec BB8-PDF: bin(w=100us):cnt(101)=24:1,25:5,26:31,27:25,28:10,29:4,30:2,31:1,34:1,35:1,37:3,38:3,39:2,40:1,42:2,43:1,44:1,46:1,50:1,51:1,64:1,66:1,71:1,72:1 (5.00/95.00/99.7%=25/50/72,Outliers=0,obl/obu=0/0)

Following, you can find the link to the usual flent data files. Would you be so kind as to render the ping_cdf graph yourself? I'm not very confident my matlib is working correctly.

flent data files (click me)

I want someone else with an mt76 device to perform these tests. I keep seeing a difference between 15 MiB/s and 20 MiB/s comparing upload and download bandwidth.

Lynx · September 2, 2022, 9:52pm

@amteza you are the king of WiFi testing at the moment! I've been so enjoying seeing your diligent test results and correspondence with @dtaht.

amteza · September 2, 2022, 9:58pm

I try to help a little bit, like you, mate.

dtaht · September 2, 2022, 10:00pm

The actual test that would test a little better for the ssh queue problem is rrul, not rrul_be. Somewhere on this thread was also a test that exercised the queues differently. Thanks as always for testing and providing full data, I will look at 'em harder in the morning, PST.

I keep thinking that putting the codel target 8ms patch back in would be good, but suspect we have some other set of buffering somewhere eating that 20ms. aql?

dtaht · September 2, 2022, 10:36pm

I was unhappy at seeing 350 rps, and I was wondering what was wrong and then i realized that I was thinking in terms of "rpm". 21000 RPM, (admittedly without load) is quite good..... I'm so done for the week...

amteza · September 2, 2022, 10:37pm

Dave, forgive my ignorance on this matter. Are we talking about a former patchset, or are we talking about something like https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002/304?page=16

If you refer to the source code, can you point me in the right direction? Is it in mac80211? Long ago (3 years ago), I recall seeing in CoDel MS2TIME defined as 20 ms. Is it what you refer to?

dtaht · September 2, 2022, 10:52pm

I don't know why AQL is even needed on the mt76, and always thought it was too high by default for the ath10k. @nbd?

Since the original patches in 2016, MSTIME(8) worked fine in testing across the board for 5ghz, and I think openwrt has mostly disabled legacy rates for 2.4.

And prior to all this keruffle I was trying to aim for a max txop under multi-station contention of ~2.5ms or less.

dtaht · September 2, 2022, 11:20pm

@tohojo

sta->local->num_sta should have been "most recently active stations", not total stations.

    if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
                sta->cparams.target = MS2TIME(50);
                sta->cparams.interval = MS2TIME(300);
                sta->cparams.ecn = false;
        } else {
                sta->cparams.target = MS2TIME(20);
                sta->cparams.interval = MS2TIME(100);
                sta->cparams.ecn = true;
        }

dtaht · September 2, 2022, 11:23pm

It's just not needed anymore

EDIT: or so a thought before doing the ton of fruitless testing we just did.

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index fe8702d92892..762542fbd2e4 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -632,7 +632,7 @@ __sta_info_alloc(struct ieee80211_sub_if_data *sdata,
 	sta->sta.max_rc_amsdu_len = IEEE80211_MAX_MPDU_LEN_HT_BA;
 
 	sta->cparams.ce_threshold = CODEL_DISABLED_THRESHOLD;
-	sta->cparams.target = MS2TIME(20);
+	sta->cparams.target = MS2TIME(8);
 	sta->cparams.interval = MS2TIME(100);
 	sta->cparams.ecn = true;
 	sta->cparams.ce_threshold_selector = 0;
@@ -2697,15 +2697,6 @@ static void sta_update_codel_params(struct sta_info *sta, u32 thr)
 	if (!sta->sdata->local->ops->wake_tx_queue)
 		return;
 
-	if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
-		sta->cparams.target = MS2TIME(50);
-		sta->cparams.interval = MS2TIME(300);
-		sta->cparams.ecn = false;
-	} else {
-		sta->cparams.target = MS2TIME(20);
-		sta->cparams.interval = MS2TIME(100);
-		sta->cparams.ecn = true;
-	}
 }
 
 void ieee80211_sta_set_expected_throughput(struct ieee80211_sta *pubsta,
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 1be8c9d83d6a..d467b98b6e84 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1625,7 +1625,7 @@ int ieee80211_txq_setup_flows(struct ieee80211_local *local)
 
 	codel_params_init(&local->cparams);
 	local->cparams.interval = MS2TIME(100);
-	local->cparams.target = MS2TIME(20);
+	local->cparams.target = MS2TIME(8);
 	local->cparams.ecn = true;
 
 	local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),

amteza · September 2, 2022, 11:26pm

AQL can be disabled manually, by the way. I am going to compile an image with this patch. I'm eager to see how it goes. But it will be after lunchtime AEDT today, Dave. Have a fantastic night.

dtaht · September 2, 2022, 11:36pm

thx. The righter thing, after the above patch went in, was to reduce the max txop size in the beacon under contention instead of adding more buffering to the AP, as we sort of tested somewhere on this thread in the past few months... but a slew of other bugs, then, as now, stopped us from pursuing that.

amteza · September 2, 2022, 11:40pm

Sure, can do.

Update, I have some time right now, and my network is not in use. So the following graph and data are with the MS2TIME 8 ms patch applied; nothing else changed.

flent rrul_be test follows (latency is very high and a tad rocky and speed a bit low):

flent tcpndown / tcpnup tests follows:

Click here to download flent data.

iperf2 bounceback is a little worse too:

------------------------------------------------------------
Client connecting to openwrt.lan, TCP port 5001 with pid 25113 (1 flows)
Write buffer size:  100 Byte
Bursting:  100 Byte writes 10 times every 1.00 second(s)
Bounce-back test (size= 100 Byte) (server hold req=0 usecs)
TOS set to 0x0 and nodelay (Nagle off)
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.151%en0 port 59847 connected with 192.168.1.1 port 5001 (bb len/hold=100/0) (sock=5) (icwnd/mss/irtt=11/1448/7000) (ct=6.48 ms) on 2022-09-03 16:24:59 (AEST)
[ ID] Interval            Transfer    Bandwidth         BB cnt=avg/min/max/stdev         Rtry  Cwnd/RTT    RPS
[  1] 0.0000-1.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.161/2.599/4.659/0.667 ms    0   12K/2000 us    316 rps
[  1] 1.0000-2.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.263/2.551/5.019/0.755 ms    0   13K/3000 us    306 rps
[  1] 2.0000-3.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.343/2.646/5.629/0.925 ms    0   14K/3000 us    299 rps
[  1] 3.0000-4.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.829/2.515/3.632/0.404 ms    0   15K/3000 us    353 rps
[  1] 4.0000-5.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.284/2.489/6.884/1.366 ms    0   16K/3000 us    304 rps
[  1] 5.0000-6.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=4.024/2.519/7.832/1.697 ms    0   17K/3000 us    249 rps
[  1] 6.0000-7.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.520/2.617/6.784/1.264 ms    0   18K/2000 us    284 rps
[  1] 7.0000-8.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.796/2.427/4.509/0.610 ms    0   19K/3000 us    358 rps
[  1] 8.0000-9.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=2.742/2.491/3.517/0.281 ms    0   20K/2000 us    365 rps
[  1] 9.0000-10.0000 sec  1.95 KBytes  16.0 Kbits/sec    10=3.138/2.614/6.736/1.267 ms    0   21K/3000 us    319 rps
[  1] 0.0000-10.0225 sec  19.5 KBytes  16.0 Kbits/sec    100=3.210/2.427/7.832/1.040 ms    0   21K/13000 us    312 rps
[  1] 0.0000-10.0225 sec BB8-PDF: bin(w=100us):cnt(100)=25:4,26:7,27:29,28:17,29:9,30:1,31:1,32:1,33:1,34:1,35:3,36:4,37:5,38:2,39:2,41:1,42:1,43:1,44:1,46:1,47:1,51:1,57:1,60:1,68:2,69:1,79:1 (5.00/95.00/99.7%=26/60/79,Outliers=0,obl/obu=0/0)

dtaht · September 3, 2022, 10:35am

what happened at t+180? do you have location services/afd on?

And the baseline latency doubled. whoa... that doesn't seem possible...

An issue was that aql reduction or elimination, and managing the txop is required for this concept to work, but... hmmm...

dtaht · September 3, 2022, 4:13pm

Are you doing the burst_3 on the latest test?

amteza · September 3, 2022, 8:33pm

I'm as puzzled as you are, I repeated it twice to ensure nothing was wrong, double-checked location services are off and awd interface was down.

Okay, so not an issue if we know what to do, shall we reduce TXOP?

Always been on in my system lately. By the way, I think something is od with my matlib... can you please draw the graphs from my data points?

Since the patch is on, I'm observing a percentage of packet losses during pings not seeing before, as you may recall:

fping -A -l -m -p 20 nanohd-upstairs.lan

2403:5804:6e::3   : xmt/rcv/%loss = 81/81/0%, min/avg/max = 4.39/6.09/12.6
192.168.1.3       : xmt/rcv/%loss = 81/79/2%, min/avg/max = 3.57/6.07/19.3
fd57:11da:b11c::3 : xmt/rcv/%loss = 81/79/2%, min/avg/max = 4.40/6.13/12.4

dtaht · September 3, 2022, 9:03pm

please yes, reduce txop, kill burst_3, and reduce aql. I really wanted that 20ms baseline to move... down, not up!

a packet cap of a dual download test would be useful.

amteza · September 3, 2022, 10:13pm

So, here it goes:

wmm_ac_be_txop_limit=32
tx_queue_data2_burst=0
MS2TIME(8)

flent rrul_be 300 s test graph results:

flent TCP up/down test ping CDF graph results:

Flent data points and up/down 1-threaded cap (click me to download)

dtaht · September 3, 2022, 10:25pm

wow. I got no idea. The cap i wanted tho was 2 down. I have no idea if fq is working at all at this point.

to set aql, there's a file in the sysfs dir, it's set to 5000/10000 by default. 2000/2000?

amteza · September 3, 2022, 10:36pm

I added a 2-threaded flent tcp_ndown with its tcpdump packet capture inside the same Drive folder.

root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
0
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_txq_limit
AC	AQL limit low	AQL limit high
VO	5000		12000
VI	5000		12000
BE	5000		12000
BK	5000		12000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_threshold
24000

As you can see, I disabled AQL, too, using sysfs.

dtaht · September 3, 2022, 10:46pm

it doesn't look like you are fq-ing worth a darn. What's your quantum set to?

does this device have GRO?

Are you perhaps not running cubic on the server?

Also tcpdump -s 128 is enough to capture the headers in the future.

Please turn aql back on, but with vastly lower values.

The server is the AP in this case or a separate server?

AQL and the ath10k is *lovely*

AQL and the ath10k is lovely