AQL and the ath10k is *lovely*

And, the last one for today.

WMM test parameters:

tx_queue_data2_aifs=3
tx_queue_data2_cwmin=15
tx_queue_data2_cwmax=63
tx_queue_data2_burst=0

wmm_ac_be_txop_limit=0

AQL test parameters:

root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_txq_limit
AC	AQL limit low	AQL limit high
VO	2000		2000
VI	2000		2000
BE	2000		2000
BK	2000		2000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_threshold
24000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
1

Please, note that aql_threshold gets nullified because low and high aql_tx_limits are identical.

flent rrul_be 300 s test graph:

flent tcpnup test with 1, 2, 4, 8 and 16 threads ping cdf graph:

flent tcpndown test with 1, 2, 4, 8 and 16 threads ping cdf graph:

And as usual, click here to download including a tcpdump of 2-threaded up and down captures.

I think we are onto something, but why is downloading so good and uploading so bad?! Can you think of anything @dtaht?

3 Likes

We aren't winning the election often enough. Distributing a txop in the beacon of 2ms or less might help. Nice to see progress (without y'alls help and interest I'd have given up multiple times on these fronts. Actually, I DO give up periodically in the hope that a zen like moment would yield inspiration, or someone else would have the inspiration).

I still need to look at these packet captures closely.

A key thing to remember is we got about 1/3 ms RTT with no load. What we achieve now is pretty miserable compared to that.

Fiddling with NAPI_POLL_WEIGHT=16 might pull some latency out of the ethernet interfaces.

On the osx upload side, there are no drops, essentially. My guess is all the latency is coming from a fixed length queue there, and when they run out, they push back on the stack.

nodrps

Two ways to test that - I can't find a way to lock the mcs rate in osx with a few searches of the web, but narrowing the channel width at the AP to HT20 might be revealing. Going from VHT80 to HT20 would probably quadruple the latency observed from the osx upload if that theory is correct.

Polling the netstat -qq -I the_interface and dumping that somewhere during that upload test might be revealing also.

for i in `seq 1 60`
do
netstat -qq -I the_theinterface
sleep 1
done > fq_codel_osx.log
1 Like

I will try this.

By the way, I just did a new quick test on Waveform, and with Apple's tool letting WLAN be the choking point... quite an improvement on these two tests, previously the results were +20-40 ms and < 1000 RPM.

==== SUMMARY ====
Upload capacity: 34.504 Mbps
Download capacity: 418.589 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: High (3157 RPM)
Base RTT: 11
Start: 6/9/2022, 5:34:57 am
End: 6/9/2022, 5:35:07 am
OS Version: Version 12.5.1 (Build 21G83)

Let me run the tests before starting my daily fight against the Gravity. Stay tuned.

1 Like

Here you go. I ran two sets of tests, upload and download from 1 to 16 threads with the AP in HT20 mode. In parallel, I ran a version of your script to capture fq_code stats.

You can download all the files by clicking here.

To not I didn't see the same kind of "contention" between upload and download. I think that it might be that we are not hitting the download limit, so latency does not increase. Another of my nïves queries, I reckon.

I will patch this next. Test to come tomorrow, though, Dave.

Update: I changed my mind; I will fight against Gravity in the arvo. See below.

WMM test parameters:

tx_queue_data2_aifs=3
tx_queue_data2_cwmin=15
tx_queue_data2_cwmax=63
tx_queue_data2_burst=0

wmm_ac_be_txop_limit=0

AQL test parameters:

root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_txq_limit
AC	AQL limit low	AQL limit high
VO	2000		2000
VI	2000		2000
BE	2000		2000
BK	2000		2000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_threshold
24000
root@nanohd-downstairs:~# cat /sys/kernel/debug/ieee80211/phy1/aql_enable
1

Kernel patches:

MS2TIME(8)
NAPI_POLL_WEIGHT=16

flent rrul_be 300 s test graph:

flent tcpnup test with 1, 2, 4, 8 and 16 threads ping cdf (median ≈70 ms) graph:

flent tcpndown test with 1, 2, 4, 8 and 16 threads ping cdf (median ≈8 ms) graph:

Click me to download flent data and tcpdump capture.

Note that I didn't see any kernel warning; hence the driver is not calling netif_napi_add() with a weight value higher than the one defined (16).

        if (weight > NAPI_POLL_WEIGHT)
                netdev_err_once(dev, "%s() called with weight %d\n", __func__,
                                weight);
2 Likes

The napi pollweight affects the whole system. You should see an increase in context switches due to it. It's been 64 for two decades,
and I've always felt it was too high for modern (especially arm) multicore systems. It's better to do less work, more often, to have a more fluid experience.

Since your test result was identical, it's not clear if it was actually applied. A printk in init from the mt76 ethernet driver printing out what its set to would validate that it changed.

For all I know, 8 is closer to a good number on arm.

Very happy with the download result (we were at VHT80(?) 430Mbit though, what were we getting before? 80Mbit mt76 vs 120 OSX or so... to me this ratio we've had since the beginning of this thread STILL points maybe to an ampdu sizing or mcs-rate problem more than anything else, since we now have plenty of cpu left over.

FQ is working just fine and dandy with these reduced AQL values. I still don't get why aql needs to be enabled at all... (could you do another test with aql disabled with HT20?)

If you don't mind, I'd like to stay at HT20 for a while - makes the packet captures smaller! I ended up with 9GB on swap on the last cap....

To make 'em bigger... to be able to actually look at what's on the wireless... sigh... do you have a usb stick on available for the router? Does a wifi monitoring interface work on this chipset? tcpdump

I just swapped some email with @sultannaml - working on the latest mt79 chipset - who told me:

Other mt76 changes I made AP-side include making mt7915 construct larger A-MSDUs
when A-MSDUs are hardware offloaded (which it has been for over a year now), and
working around a weird firmware bug where mt7915 couldn't transmit frames with
MCS 10-11 when using 160 MHz bandwidth with 2 spatial streams over DFS spectrum.

I also made a handful of other changes to mt76 to fix mt7922 bugs and
performance issues, such as how mt7922 would never TX at 160 MHz bandwidth to my
mt7915 AP out of the box — TX would be limited to 80 MHz — despite working just
fine with a Broadcom AP.

[1] https://github.com/openwrt/openwrt/commit/f338f76a66a50d201ae57c98852aa9c74e9e278a
[2] https://github.com/kerneltoast/kernel_x86_laptop/commit/ca89780690f7492c2d357e0ed2213a1d027341ae

You went from VHT80 or 160 down to 20?

1 Like

Can I have a quick pointer to it, please?

I read those patches. I've moved from VHT80 to HT20 here. My devices are MT7621 CPUs with MT7615e chipsets.

also, I no longer care about upload testing on osx. we know it's overbuffered, and that problem I reported to apple 1.5 years ago. we also don't need really long rrul tests now that things are stabler. Addressing the fixes to packet aggregation in the mt76 vi and vo queues... somewhere on this thread sebastian leveraged an 8 flow thing

2 Likes

git, ssh, and android are no fun, tomorrow for a patch?

1.5 years ago! :roll_eyes:

I've got a spare device that might run properly with Linux.

1 Like

I feel your pain, I just need a pointer on where to insert the printk() in the right place. Try to find it will take a little while for me otherwise.

1 Like

Quick teaser, after having an idea:

Download:

Upload:

5 Likes

That's really pretty.

Pretty huge throughput hit, though. Unless that's VT80? I'm not complaining! Given a choice between household members being able to game, videoconference, watch movies and surf the web with zero glitches, or the maximum throughput possible, I'll always choose the former. Ripping latency out usually ends up with finding ways to get more bandwidth back, just with different techniques like zero copy or fq, or stuff we haven't thunk of yet.

I'd like us to move back over here:

and have some feedback on just the ath10k on this bug from other folk.

1 Like

So let me explain about the speed hit, these tests were done with the same parameters, including the MS2TIME and poll both set to 8. However, I had an idea after reading your feedback about Apple fq_codel upload buffering (which can be seen clearly in the log with the list of flows). Usually, I work in a different room where I have my computer (macOS) connected to a small switch (ERX running OpenWrt) and it is connected to another NanoHD that connects using WDS to my main AP. As it follows:

macOS <--USB dongle/eth--> switch <--eth--> NanoHD <--4x4 MIMO WDS--> NanoHD <--eth--> RPi4 (irtt/netserver)

With this convo I'm not using the macOS wireless but NanoHD to NanoHD, I thought our latency problem was in the macOS wifi Codel this will remove it, and it seems it worked. But, the wireless connection between these two devices is not as good due to distance, so it changes (bandwidth hit explanation), for example now is

780.0 Mbit/s, 80 MHz, VHT-MCS 4, VHT-NSS 4, Short GI
650.0 Mbit/s, 80 MHz, VHT-MCS 7, VHT-NSS 2, Short GI

Can you point me to where I should put the printk() or dev_info() to ensure the poll value is correct in my image? I wasn't able to make it work.
And with this bombshell moving on to the other thread. I guess our work with AQL is done here as 22.03.0 works flawlessly according, at least, to my tests.

what does the rrul_be test look like on this topology?

still far from something I can git on

Hi all, I am not sure if this is related or not ... but the "vanilla" ath10k-firmware-qca4019-ct firmware in setup described below with ~ 3 tasmota clients was resulting in large latency spikes and packets being dropped in openwrt 22.03.0. Switching to ath10k-firmware-qca4019-ct-full-htt seems to fix it for my device (GL-B1300).

Setup ->

Openwrt version: 22.03.0
Device: GL.iNet GL-B1300

wifi config


config wifi-device 'radio0'
        option type 'mac80211'
        option hwmode '11g'
        option path 'platform/soc/a000000.wifi'
        option htmode 'HT20'
        option channel '11'
        option country 'AU'
        # option disabled '1'

config wifi-device 'radio1'
        option type 'mac80211'
        option hwmode '11na'
        option path 'platform/soc/a800000.wifi'
        option htmode 'VHT80'
        option channel '36'
        option country 'AU'
        option disabled 1


config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option key 'REDACTED'
        option ssid 'mainline'
        option encryption 'sae-mixed'
        option ieee80211w '1'
        option macaddr '<REDACTED>:01'
        option disassoc_low_ack '0'



config wifi-iface 'wifi_iot_2_4'
        option ssid 'internetofthings'
        option encryption 'psk2+ccmp'
        option device 'radio0'
        option mode 'ap'
        option ieee80211w '1'
        option key 'REDACTED'
        option network 'iot'
        option macaddr 'REDACTED:02'
        option disassoc_low_ack '0'


config wifi-iface 'wifi_guest_24'
        option network 'guest'
        option ssid 'ourguestnetwork'
        option encryption 'sae-mixed'
        option device 'radio0'
        option mode 'ap'
        option ieee80211w '1'
        option key 'REDACTED'
        option macaddr 'REDACTED:03'
        option disassoc_low_ack '0'

Relevant package information I hope is as follows ->

opkg list-installed|grep ath
ath10k-board-qca4019 - 20220411-1
ath10k-firmware-qca4019-ct-full-htt - 2020-11-08-1
kmod-ath - 5.10.138+5.15.58-1-1
kmod-ath10k-ct - 5.10.138+2022-05-13-f808496f-1
opkg list-installed|grep host
hostapd-common - 2022-01-16-cff80b4f-12
opkg list-installed|grep openssl
libopenssl1.1 - 1.1.1q-1
wpad-openssl - 2022-01-16-cff80b4f-12

I don't know if it is related or not. We've had to take a look at the long term behaviors of each chipset and multiple driver combinations using flent to drive tests over the past few months of development. Any chance you could run those?

2 Likes

Sure. Do you suggest testing using flent as per AQL and the ath10k is *lovely* - #36 by dtaht

flent -H some_server_on_the_other_side -t sometitle --te=upload_streams=4 --socket-stats tcp_nup

or

flent -H the_server_ip --step-size=.04 --socket-stats --te=upload_streams=16 tcp_nup
(AQL and the ath10k is *lovely* - #63 by dtaht)

or

flent -x --socket-stats --step-size=.04 -t whatever_is_under_test --te=upload_streams=1 tcp_nup # upload_streams=4, upload_streams=16

(AQL and the ath10k is *lovely* - #181 by dtaht)

?