Reducing multiplexing latencies still further in wifi

dtaht · August 1, 2022, 11:17pm

I have been sucked into helping stablizing the current and next releases of openwrt wifi, and it's beginning to look like many of the power, multicast, and reliability problems have all been solved, thx to the enthusiastic testing and commitment by openwrt's wonderful community, and I really hope rc6 rocks and hundreds of wifi bugs all over can be closed that were side-effects of the mess.

But!

What I'd actually set out to do months ago was test implementing more than a few latency reducing ideas left over from the original make-wifi-fast project: https://www.youtube.com/watch?v=Rb-UnHDw02o - of note were ack-filtering, reducing the intervals announced in the beacon, reducing the codel target, l4s-style ecn, better controlling the multicast queue, and fiddling with understanding the real effects of qosify on modern wifi stacks.

The simplest of these is to try improving multiplexing behavior by reducing the maximum txop size. There's four places to do this nowadays, this is in hostapd.conf...

The simplest test is to verify that the AP chipset can use some reduced parameters, mimicking the VI queue thusly.

tx_queue_data2_aifs=1
tx_queue_data2_cwmin=7
tx_queue_data2_cwmax=15
tx_queue_data2_burst=3.0 # this is a really low value for modern ac but

This will clobber from the AP performance even more than what we are seeing today, and is undesirable for most apps. (but does it work in x chipsets? What does it look like?) - CORRECTION - I assumed this was the max txop size (5.7), in saying that, and it turned out that openwrt was already tuning it down to 2. Putting it up to 3 got a better down/up ratio. Still not the right thing but (see a few more posts below)

More interesting is observing whether or not announcing this reduced interval in the beacon is respected by the client, which should improve up/down fairness in the rrul_be test, and is probably desirable behavior for most.

wmm_ac_be_aifs=2
wmm_ac_be_cwmin=3
wmm_ac_be_cwmax=4
wmm_ac_be_txop_limit=94 # arguably we can just change this
wmm_ac_be_acm=0

There are a lot of other mysterious parameters in https://w1.fi/cgit/hostap/plain/hostapd/hostapd.conf -

Now, an algorithm to do this more right (reducing the txop size in beacon announcements in response to the number of stations with data outstanding) is in the cards, with a ceiling of say, 1.1ms instead of 5.7, but first up is merely verifying that drivers and clients obey the rruls. There is also a potential impact on the ATF calculations...

Assuming that actually doesn't crash things, verifying that the VI queue is working as designed vs a vs the BE queue would be next, then merely reducing wmm_ac_be_txop_limit sanely for be.

anyway, we could get a 4-8fold reduction in latency and jitter by fiddling with these at some cost in bandwidth that those with a lot of stations would probably find acceptible. I routinely ran with 2.1ms as my outer limit on my old mesh.

amteza · August 3, 2022, 2:58am

@dtaht, I've got a mac80211.sh patched and ready to go. What should be tested, what should be observed? Is this something only for APs? Should STAs be patched too?

dtaht · August 3, 2022, 3:25am

One or both of us should figure out where our OCD meds got to...

The first set of options I described tells the AP or STA to use limited airtime, the second announces in the beacon that the clients should use limited airtime. Since you have the up/down difference, simplest test is 2) a short rrul_be against:

wmm_ac_be_txop_limit=94

(if i was unclear, the first test was to see if changing those parameters actually did anything on the local AP or STA, before trying to command the client to obey them also)

There's two approaches to life. Scientifically, and rigorously change one variable at a time, or "all up" - change as much as possible, and iterate rapidly backwards from the RUD. Both approaches are valid, and needed, the latter got us to the moon in 8 years.

Bonus link: https://longnow.org/essays/richard-feynman-connection-machine/

amteza · August 3, 2022, 4:44am

Me, mostly certain. I promise not to read something that must be comprehended with only two and a half hours of sleep. Let's try it first reducing wmm_ac_be_txop_limit=94. I'll do some testing once my significant other finish work today.

Ta, @dtaht!

Gingernut · August 3, 2022, 5:11am

Are you modifying these?

		append base_cfg "he_default_pe_duration=4" "$N"
		append base_cfg "he_rts_threshold=1023" "$N"
		append base_cfg "he_mu_edca_qos_info_param_count=0" "$N"
		append base_cfg "he_mu_edca_qos_info_q_ack=0" "$N"
		append base_cfg "he_mu_edca_qos_info_queue_request=0" "$N"
		append base_cfg "he_mu_edca_qos_info_txop_request=0" "$N"
		append base_cfg "he_mu_edca_ac_be_aifsn=8" "$N"
		append base_cfg "he_mu_edca_ac_be_aci=0" "$N"
		append base_cfg "he_mu_edca_ac_be_ecwmin=9" "$N"
		append base_cfg "he_mu_edca_ac_be_ecwmax=10" "$N"
		append base_cfg "he_mu_edca_ac_be_timer=255" "$N"
		append base_cfg "he_mu_edca_ac_bk_aifsn=15" "$N"
		append base_cfg "he_mu_edca_ac_bk_aci=1" "$N"
		append base_cfg "he_mu_edca_ac_bk_ecwmin=9" "$N"
		append base_cfg "he_mu_edca_ac_bk_ecwmax=10" "$N"
		append base_cfg "he_mu_edca_ac_bk_timer=255" "$N"
		append base_cfg "he_mu_edca_ac_vi_ecwmin=5" "$N"
		append base_cfg "he_mu_edca_ac_vi_ecwmax=7" "$N"
		append base_cfg "he_mu_edca_ac_vi_aifsn=5" "$N"
		append base_cfg "he_mu_edca_ac_vi_aci=2" "$N"
		append base_cfg "he_mu_edca_ac_vi_timer=255" "$N"
		append base_cfg "he_mu_edca_ac_vo_aifsn=5" "$N"
		append base_cfg "he_mu_edca_ac_vo_aci=3" "$N"
		append base_cfg "he_mu_edca_ac_vo_ecwmin=5" "$N"
		append base_cfg "he_mu_edca_ac_vo_ecwmax=7" "$N"
		append base_cfg "he_mu_edca_ac_vo_timer=255" "$N

amteza · August 3, 2022, 5:14am

No, those are for AX connections. It's very simple, let me paste a snippet of code, inside the same function mac80211_hostapd_setup_base():

[...]
		append base_cfg "he_mu_beamformer=1" "$N"
	fi

	# Tweak txop limit
	append base_cfg "wmm_ac_be_txop_limit=94" "$N"	

	hostapd_prepare_device_config "$hostapd_conf_file" nl80211
[...]

To note: this is an example to perform test 2). I didn't test anything yet! So I don't know if it works, but at least you should see the parameter getting added to /var/run/hostapd-phy*.conf.

amteza · August 3, 2022, 6:36am

BTW, reviewing this default parameters in my /var/run/hostapd-phy1.conf which is my 802.11ac 5 GHz network, and by default its value is 2.0, even lower than the default 3.0 for AC_VI.

Gingernut · August 3, 2022, 6:46am

FWIW a quick search on the web and the tx_queue_data2_burst value is found many times with a value of 0.

amteza · August 3, 2022, 6:48am

Yes, you are correct, that's the default value for AC_BE, not sure why OpenWrt has it at 2.0. Just wondering if someone know why.

amteza · August 3, 2022, 7:32pm

@dtaht, I was able to perform some test modifying these parameters. Looks like mt76 chipset is able to use reduced parameters and it helps with the balance between upload download differences.

dtaht:

The simplest test is to verify that the AP chipset can use some reduced parameters, mimicking the VI queue thusly.
tx_queue_data2_aifs=1
tx_queue_data2_cwmin=7
tx_queue_data2_cwmax=15
tx_queue_data2_burst=3.0 # this is a really low value for modern ac but
This will clobber from the AP performance even more than what we are seeing today, and is undesirable for most apps. (but does it work in x chipsets? What does it look like?)

It clearly has an impact, see test below with above parameters:

To note: up vs down bandwidth utilisation almost swaped, reduction on ≈5 ms average latency.

Click me to download flent data (as usual)

Next test includes previous one plus wmm_ac_be_txop_limit=94:

Click me to download flent data

I can confirm new TXOP limit is pushed to clients:

dtaht · August 3, 2022, 8:02pm

long-standing mystery solved!!! assuming that bump was from the (previously unknown) openwrt txop limit of 2 to 3. Still total throughput of ~600Mbits! It still might not be the "right thing" - goal (for me at least) is to reduce latencies for the multi-station case....

Anyway, another test would be wmm_ac_be_txop_limit=32: which cuts the TU to 1ms. I'm now unsure how to cut the AP's TU to the same! I no longer remember what the burst value is supposed to configure, but it's not the same as the txop limit.

amteza · August 3, 2022, 8:21pm

I guess you refer to tx_queue_data2_burst not TXOP limit, right?

See below, my last one for today. Parameters tested:

tx_queue_data2_aifs=1
tx_queue_data2_cwmin=7
tx_queue_data2_cwmax=15
tx_queue_data2_burst=3.0

wmm_ac_be_txop_limit=32

Click me as usual to download

dtaht · August 3, 2022, 9:42pm

up/downs?

It would be good to see if these changes make any difference on the ath10k, too.

amteza · August 3, 2022, 11:31pm

Has to be by tomorrow early AEST, sorry.

dtaht · August 4, 2022, 2:03am

No worries. I'm in PDT, typically working 7-7 with a nap in the middle.

On a down test...

tx_queue_data2_burst=5.0

might make your inner bandwidth junkie happy.

tx_queue_data2_burst=.5 # (hopefully this will round up to TU properly)

Might make the dream of playing a twitch game over wifi less distant. The problem is, I think there is easily another 10-15ms of latency lurking somewhere else in the stack, be it NAPI (ethernet or wifi), AQL, BQL, rx rings, the cpu scheduler, reorder buffer, or what have you...

dtaht · August 4, 2022, 2:21am

@Gingernut No, but I do hope we fool with the ax equivalents. What hardware do you have? Is it possible for you to repeat some of the tests we've been using?

amteza · August 4, 2022, 4:49am

Found when and why, see below Felix's patch:

From 8650201f10afe83387fd6cde00b08172172eeba3 Mon Sep 17 00:00:00 2001
From: Felix Fietkau <nbd@nbd.name>
Date: Wed, 19 Jun 2019 12:32:20 +0200
Subject: [PATCH] mac80211: add config tweak for tx bursting when using VHT

By default, set BE tx queue TXOP limit to 1.0 in the hostapd config
Many vendor drivers are doing similar things to boost throughput.
On MT7612 under ideal conditions, it improves tx throughput from 470 Mbit/s
to about 570 Mbit/s.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 .../kernel/mac80211/files/lib/netifd/wireless/mac80211.sh   | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/package/kernel/mac80211/files/lib/netifd/wireless/mac80211.sh b/package/kernel/mac80211/files/lib/netifd/wireless/mac80211.sh
index 0426cb60f7..6dc4e5bf5f 100644
--- a/package/kernel/mac80211/files/lib/netifd/wireless/mac80211.sh
+++ b/package/kernel/mac80211/files/lib/netifd/wireless/mac80211.sh
@@ -23,6 +23,7 @@ drv_mac80211_init_device_config() {
 
 	config_add_string path phy 'macaddr:macaddr'
 	config_add_string hwmode
+	config_add_string tx_burst
 	config_add_int beacon_int chanbw frag rts
 	config_add_int rxantenna txantenna antenna_gain txpower distance
 	config_add_boolean noscan ht_coex
@@ -97,9 +98,10 @@ mac80211_hostapd_setup_base() {
 	[ "$auto_channel" -gt 0 ] && json_get_values channel_list channels
 
 	json_get_vars noscan ht_coex
-	json_get_values ht_capab_list ht_capab
+	json_get_values ht_capab_list ht_capab tx_burst
 
 	[ -n "$noscan" -a "$noscan" -gt 0 ] && hostapd_noscan=1
+	[ "$tx_burst" = 0 ] && tx_burst=
 
 	ieee80211n=1
 	ht_capab=
@@ -229,6 +231,7 @@ mac80211_hostapd_setup_base() {
 			vht_link_adapt:3 \
 			vht160:2
 
+		set_default tx_burst 2.0
 		append base_cfg "ieee80211ac=1" "$N"
 		vht_cap=0
 		for cap in $(iw phy "$phy" info | awk -F "[()]" '/VHT Capabilities/ { print $2 }'); do
@@ -310,6 +313,7 @@ mac80211_hostapd_setup_base() {
 ${channel:+channel=$channel}
 ${channel_list:+chanlist=$channel_list}
 ${hostapd_noscan:+noscan=1}
+${tx_burst:+tx_queue_data2_burst=$tx_burst}
 $base_cfg
 
 EOF
-- 
2.30.2

At least this means that adding in /etc/config/wireless under config wifi-device something like option tx_burst '5.0' will work. No need for patching anything if this is the only configuration parameter to change. I hope this helps testing it.

Update: There is a bug in mac80211.sh and the reading of option tx_burst, the next line in the patch

+	json_get_values ht_capab_list ht_capab tx_burst

should be

	json_get_values ht_capab_list ht_capab 
+	json_get_vars tx_burst

See PR: https://github.com/openwrt/openwrt/pull/10395

Gingernut · August 4, 2022, 4:52am

At the moment I have mt76 and ath10k, stock OpenWRT ath10k-ct.

If I get some spare time I will get some tests done on the ath10k platform, no promises though.

P.D This is focusing on the 5GHz band correct?

Gingernut · August 4, 2022, 4:56am

Nice find.

What I don't get is it states to set BE tx queue TXOP limit to 1.0 but then explicitly sets it to 2.0.

Or at least it looks like that.

amteza · August 4, 2022, 5:00am

Clearly a typo, the 1 key and 2 key are very close.