Reducing multiplexing latencies still further in wifi

I'm testing this to align WMM to CAKE diffserv4:
option iw_qos_map_set '8,1,0,2,10,2,12,2,18,3,20,3,22,3,24,3,26,3,44,4,46,4,48,4,56,4,63,4'

+----------------------+------------------------+--------------------------+
| WiFi Priority Level  | DSCP Values            | Description              |
+----------------------+------------------------+--------------------------+
| Priority 1 (AC_BK)   | 8                      | Background Traffic (CS1) |
+----------------------+------------------------+--------------------------+
| Priority 2 (AC_BE)   | 0, 10, 12              | Best Effort (CS0, TOS2)  |
+----------------------+------------------------+--------------------------+
| Priority 3 (AC_VI)   | 18, 20, 22, 24, 26     | Streaming Media (AF2x,   |
|                      |                        | AF3x, AF4x)              |
+----------------------+------------------------+--------------------------+
| Priority 4 (AC_VO)   | 44, 46, 48, 56, 63     | Latency Sensitive (CS5,  |
|                      |                        | Expedited Forwarding,    |
|                      |                        | CS6, CS7)                |
+----------------------+------------------------+--------------------------+

Our beloved Dave Täht has passed away, leaving a void that can never be filled. Your kindness, wisdom, and passion will forever be remembered. We will always miss you. Rest in peace.

Rest in peace :heart:

Wow, this is very sad news. It was only a few days ago when I was reading some posts from Dave that were posted up on GitHub some years back, right as I was tinkering with some stuff for my home network.

I hope Dave rests in peace, and that his family and close friends give themselves enough time to process the sudden loss of a loved one. Dave Täht will never be forgotten.

Oh my word. :cry: When I first saw this, I was desperately hoping it was a [sick] April Fools trick. Unfortunately it seems that is not so.

I'm shocked. RIP, Dave.

Will be missed

I liked Dave's enthusiasm. Never met him, but he struck a cord right my alley. I would have liked to say fare thee well, but when I looked up the meaning of that phrase on the net, it came to my attention that it apparently can be confused with "to a fare thee well", which seems to be a better fit in this context. Here my findings:

[ ... 'to a fare thee well' ] came to mean "done perfectly". I think it is clear that meaning came from the idea something was done to a point there was nothing left but to say "goodbye". (Sam beat Fred "to a fare thee well", (perfectly beat him), meaning there was nothing left to do but say goodbye.) - J. Taylor, 2018

Except that there is more left to do than to say good bye.

This proposal by him in the Linux Mailing list still hasn't received a single reply or review: a nuking the mac80211 changing codel parameters patch. Unless I missed somebody addressing that, some testing, a review and some thoughts on the matter to do so might be the least that we could do (or at least the ones that have the knowledge and skills or willingness to dig into it).

Here a short excerpt of his proposal:

This is the single, most buggy, piece of code in "my" portion of wifi
today. It is so wrong, yet thus far I cannot get it out of linux or
find an acceptable substitute. It makes it hard to sleep at night
knowing this code has been so wrong... and now in millions , maybe
even 10s of millions, of devices by now.... Since I've been ranting
about the wrongness of this for years, I keep hoping that we can
excise it, especially for wifi6 devices and even more especially on
6ghz spectrum... but just about everything, somehow, would benefit
hugely if we could somehow do more of the right thing here.

Regardless if true or not and the implications of his patch, isn't the wording just a piece of art?

Nice post.

Yes, albeit I’d encourage anyone curious to click on the patch link provided and read it in its entirety.

OpenWrt users could surely do more to test and improve WiFi.

Given that the patch in question has been stale for ~2.5 years by now, it's safe to assume that it won't move anywhere.

Someone who knows what they're talking about would need to rebase/ update it, propose it again - and defend the reasoning/ viability. The original author can't do it anymore for obvious reasons and (for one reason or another, maybe there were more important things, maybe it was factually incorrect, maybe other topics took precedence - doesn't matter for potential merging) didn't feel a need to follow it up over the last 2.5 years, so someone else would need to fully adopt it and restart the discussion.

@tohojo was the original author of that block of code (at least per the commit message). Perhaps he has an opinion on the proposed change and might be the best person to see Dave’s patch through to the end?

Sure, I can rebase and submit upstream. Did anyone actually test the patch? I glanced through the thread, but hard to tell without going through all 130 posts in detail...

As an alternative to nuking, based on the reasoning:

  1. The STA_SLOW_THRESHOLD was completely arbitrary in 2016.
  •           sta->cparams.target = MS2TIME(50);
    

This, by itself, was probably not too bad. 30ms might have been better, at the time, when we were battling powersave etc, but 20ms was enough, really, to cover most scenarios, even where we had low rate 2Ghz multicast to cope with. Even then, codel has a hard time finding any sane drop rate at all, with a target this high.

  •           sta->cparams.interval = MS2TIME(300);
    

But this was horrible, a total mistake, that is leading to codel being completely ineffective in almost any scenario on clients or APS. 100ms, even 80ms, here, would be vastly better than this insanity. I'm seeing 5+seconds of delay accumulated in a bunch of otherwise happily fq-ing APs....100ms of observed jitter during a flow is enough. Certainly (in 2016) there were interactions with powersave that I did not understand, and still don't, but if you are transmitting in the first place, powersave shouldn't be a problemmmm.....

In production, on p2p wireless, I've had 8ms and 80ms for target and
interval for years now, and it works great.

I wonder whether another option to consider might simply have been to replace the target and interval values from 50ms and 300ms to 20ms and 80ms:

	if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
		sta->cparams.target = MS2TIME(20);
		sta->cparams.interval = MS2TIME(80);
		sta->cparams.ecn = false;
	} else {
		sta->cparams.target = MS2TIME(20);
		sta->cparams.interval = MS2TIME(100);
		sta->cparams.ecn = true;
	}

Regarding:

static void sta_update_codel_params(struct sta_info *sta, u32 thr)
{

  •   if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
    
  1. sta->local->num_sta is the number of associated, rather than active, stations. "Active" stations in the last 50ms or so, might have been a better thing to use, but as most people have far more than that associated, we end up with really lousy codel parameters, all the time. Mistake numero uno!

Would there be an easy way to limit to 'active stations in the last 50ms'?

If this one can be considered the same commit then yes, I compile from that @asvio repo and use the build on several routers. It looks like it is the same code to me but I may be wrong.

How does it perform?

@_FailSafe ran with this patch for a while, IIRC. A modified version made into @pesa1234 community build.

Spot-on! Not to further muddy the waters, but I had also been testing the patch in this link for a while. It was more in-tune with Dave's "nuking the mac80211 changing codel parameters patch" sentiments. However, instead of his suggested 8ms/80ms target and interval, I chose 5ms/50ms for my testing.

I think 8ms/80ms is probably the safer bet as Dave suggested. Though there are "a lot of miles" on the testing of 5ms/100ms that @pesa1234 has had in his build for quite a while now. I haven't heard any complaints about the target and interval that @pesa1234 settled upon.

Alright, submitted a revert upstream: https://lore.kernel.org/r/20250403183930.197716-1-toke@toke.dk

Thank you! BTW, I noticed this in your revert:

Suggested-By: Dave Taht <dave.taht@gmail.com>
In-memory-of: Dave Taht <dave.taht@gmail.com>

Here's to Dave :clinking_beer_mugs::pleading_face:

Well, I'm far from the scientific approach to make any significant conclusions and all I can surely say at this moment is that I cannot find any shortages nor issues running the wifi with that patch.
I don't see any significant drops of the throughput compared to the case with AQL set to 1500.
But my observations may be wrong and I didn't make any extensive testing.
I've just followed the suggestions made here on the forum by reputable persons like the honored pioneer @dtaht (RIP).

Same code, I've been using it for years now in my routers.