Reducing multiplexing latencies still further in wifi

The interval should be around 60-70ms for most modern networks.
I regularly run the target at 8ms on good wifi networks. Regrettably, few are good, and there seem to be other problems deep in the mt76 stack that are totally unrelated to codel causing jitter today (that I am totally stumped on)

this is so broken: if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta)

num->sta is the number of stations associated with the network, not active.

been trying to rip this out for years.

setting the interval to 10 is only sane if all your traffic is local to your own network.

thank you for the laugh.

(my wifi story started with that, here: http://www.rage.net/wireless/wireless_howto.html

I would rather like to ressurect that floppy, and that build, and that hardware, to put into a museum somewhere.

1 Like

I saw your mail to the wireless mailing list, where you explained this in more detail:

https://lore.kernel.org/linux-wireless/CAA93jw6NJ2cmLmMauz0xAgC2MGbBq6n0ZiZzAdkK0u4b+O2yXg@mail.gmail.com/#r

as far as I can see, you still require people to test this? - I personally would, but I have yet to compile OpenWRT from source. Am more of a user than a programmer, so I am not sure, if I would manage to do it personally. Just sharing this here, for anybody who is not yet aware :slight_smile:

2 Likes

I keep hoping @nbd will just slam that patch in mainline openwrt and see if anyone notices.

4 Likes

Thanks a lot! This is really handy for tuning and testing. I also like the dashboard.

I've updated your patches for master:

(note that I did rip out the STA_SLOW_THRESHOLD and changed the defaults to 5ms/50ms)

Also, nl80211.h in master mac80211 has a different number attributes compared to the one included in iw-5.19, which caused weird behavior trying to change parameters (target would change interval, and interval would try to change ECM, etc.), so I copied the one from mac80211.

Thanks again!

5 Likes

Thank you for making progress here!

1 Like

Is there any chance you can get this upstream? I can help a little, net-next window opens in a week or so, I do not know what schedule iw is on...

@stintel as this is your work, are you willing to take this upstream or shall I give it a shot? I can help with testing and preparing the patch if you like. I guess we'll have to put back in the STA_SLOW_THRESHOLD back in, and set the defaults back to 20/100.

1 Like

So I've been playing around with the codel target today and it became clear to me that at least in my situation (high rates, low number of active clients) it really only has an effect with multiple stations generating load, and even then, it's pretty subtle.

If it's just one station generating load it's really only hitting the AQL limit and codel won't drop anything. If you add more stations I see codel starts dropping, but even then it seems that AQL does most of the work keeping latency down. It seems that codel only works on the packets that make it through AQL. Fortunately AQL on it's own seems to do a good job.

Anyway, for me changing the AQL TX queue limit makes a far bigger difference in latency under load.

3 Likes

Feel free to give it a shot, I currently have other priorities.

1 Like

I see that you actively update your sqm-scripts-nss and you have other additions to your R7800 OpenWrt fork like crash counter script, rc.once and a lot of other tunings in rc.local.

I've run my own build that, I've compiled with a little help from @nihilt and it includes your commits

Summary
git cherry-pick ff6a6f02bb37266a7eec29e5ad8b144ef0c5d8e7
git cherry-pick d6a922c46729f2cda39b9ced4e1145be26e7f4a1
git cherry-pick 6d1e3c04ce3b4d3347860dea4a51af77febeb5f0
git cherry-pick 3129d41625a5f52b6139d6008597fdf9b211dc1a
git cherry-pick 184031503ba449349a63cea45ea6a0c06ffc10f1
git cherry-pick e49d2a62d05e5d85808f0827907bf6bee6368b3b
git cherry-pick b5df6c2b6fa7742c6fb64209634b481f54859098
git cherry-pick 7e2ca95836e14014f854d0a9bc7fbb450acf90b7
git cherry-pick ed168baa783491c9eeefab9b8c65d38235689c7e
git cherry-pick 0162fd32537a5b0f85770d4aef43e85996d2bf25
git cherry-pick 4487ecb473ac2889064bfa0598ea16d9103337d9
git cherry-pick 4734c6de3cd2d2f739bba9fcc7e9b7910381ea65
git cherry-pick b1608fc9b160b6ad47f31123ffb71f6056ecfe9d

and I test sqm-scripts-nss with some optimizations in rc.local.
But as I still try to fully understand how all the commits, etc work and I only partially understand the code (even though your comments are really helpful) I hesitate if those commits are enough to try all the settings that you use in rc.local. Maybe I need some additional tools added like ethtool and probably others.
I want to test even more from your settings. I've installed the

too.
Honestly I don't know what to look for in the data that the dashboard shows. Trying and willing to learn new things that hopefully can further improve the wi-fi experience. Maybe any advise from @dtaht.
Thank you all here for your work.

I am deeply heads-down in shipping libreqos 1.4, and hope to return to some openwrt wifi stuff in mid april. I kind of hope that at least some of the analytical tools we have developed as part of that might help us, https://github.com/LibreQoE/LibreQoS/wiki/1.3.1-to-1.4-Change-Summary

And hope that a few of you can toss a libreqos box inband to look at stuff this way.

Please keep banging the rocks together?

4 Likes

@moeller0 May I ask you a question about hw_queues in ieee80211? Looking at this output

root@wax620:~# cat /sys/kernel/debug/ieee80211/phy0/netdev:5GHz/hw_queues
AC queues: VO:0 VI:1 BE:2 BK:0
cab queue: 1

Does it translate to that VO and BK has no queue?
From your qos_map_set post earlier

UP	DSCP	    AC	  PHBs(decDSCP)
Ex0	BE	        BE(0) BE/CS0(0)
Range0	2-16	BE	  CS1(8)**, AF11(10), AF12(12), AF13(14), CS2(16)
Range1	1-1	    BK	  LE(1)
Range2	-	
Range3	18-22	BE	  AF21(18), AF22(20), AF23(22)
Range4	24-38	VI	  CS3(24), AF31(26), AF32(28), AF33(30), CS4(32), AF41(34), AF42(36), AF43(38)
Range5	40-40	VI	  CS5(40)
Range6	44-46	VO	  VA(44), EF(46)
Range7	48-56	VO	  CS6(48), CS7(56)

I am sorry, I can not answer your question, as I do not know.

@rickkz0r and @dtaht I stumbled across this yesterday and immediately gravitated toward it as I am keenly interested in improving latency.

I am presently running with this patch, though I know there are debates as to the efficacy of the values therein:

The Patch (Do not blindly copy and use, people)
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -685,8 +685,8 @@ __sta_info_alloc(struct ieee80211_sub_if
 	}

 	sta->cparams.ce_threshold = CODEL_DISABLED_THRESHOLD;
-	sta->cparams.target = MS2TIME(20);
-	sta->cparams.interval = MS2TIME(100);
+	sta->cparams.target = MS2TIME(5);
+	sta->cparams.interval = MS2TIME(50);
 	sta->cparams.ecn = true;
 	sta->cparams.ce_threshold_selector = 0;
 	sta->cparams.ce_threshold_mask = 0;
@@ -2878,15 +2878,7 @@ unsigned long ieee80211_sta_last_active(

 static void sta_update_codel_params(struct sta_info *sta, u32 thr)
 {
-	if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
-		sta->cparams.target = MS2TIME(50);
-		sta->cparams.interval = MS2TIME(300);
-		sta->cparams.ecn = false;
-	} else {
-		sta->cparams.target = MS2TIME(20);
-		sta->cparams.interval = MS2TIME(100);
-		sta->cparams.ecn = true;
-	}
+	return;
 }

 void ieee80211_sta_set_expected_throughput(struct ieee80211_sta *pubsta,
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1607,8 +1607,8 @@ int ieee80211_txq_setup_flows(struct iee
 		fq->memory_limit = 4 << 20; /* 4 Mbytes */

 	codel_params_init(&local->cparams);
-	local->cparams.interval = MS2TIME(100);
-	local->cparams.target = MS2TIME(20);
+	local->cparams.interval = MS2TIME(50);
+	local->cparams.target = MS2TIME(5);
 	local->cparams.ecn = true;

 	local->cvars = kcalloc(fq->flows_cnt, sizeof(local->cvars[0]),

@rickkz0r I tried to modify these patches to work with my APs (GL-MT6000 devices) running snapshot builds with the testing kernel (currently 6.6). While I got a clean build, I was not able to get any data back from my get codel calls:

root@AP-Office:~# iw phy0 get codel
root@AP-Office:~# iw phy1 get codel
root@AP-Office:~# iw phy0 get | grep codel
	phy <phyname> get codel
	phy <phyname> set codel [ecn <0|1>] [ interval <time in ms> ] [ target <time in ms> ]

For grins, I tried the set codel command with even lesser success... :upside_down_face:

root@AP-Office:~# iw phy phy0 set codel target 10
kernel reports: NLA_F_NESTED is missing
command failed: Invalid argument (-22)
root@AP-Office:~# iw phy phy1 set codel target 10
kernel reports: NLA_F_NESTED is missing
command failed: Invalid argument (-22)

I started researching the error and it seems that NLA_F_NESTED has to be passed in newer kernels (5.14+ ??) with each modification due to:

This is needed to make cfg80211 allow the nl80211 command
NL80211_ATTR_TID_CONFIG in the new kernel versions that enforce netlink
attribute policy validation.

(from https://lore.kernel.org/all/20210910141618.1594617-1-gokulkumar792@gmail.com/)

Just curious if you've run into the same and have, perhaps, crossed this bridge already. Thanks!