Reducing multiplexing latencies still further in wifi

This is only active in the first round, in last round with tx_queue_data2_burst=5.0 and 0.5 those 3 parameters are left as they are by default in BE queue, so tests are supposed to be valid.

To test BE vs VI, I knew I was forgetting something, I'll see what I can do.

That's what I was expecting too... oh, well, I'm in no hurry at all :wink:

rrul (not _be at least used to) exercise the BK, BE, and VI queues.

Arguably rrulv2 if we ever get around to it, will exercise all four.

There's a pic in the above preso of what that used to look like in the ath9k before all the new stuff landed. Even after it landed.. it was still pretty miserable.

1 Like
#% flent markings seem to be full decimal TOS byte values
#% conversion: TOS(dec) = DSCP(dec) * 4
#dscp(dec): EF:46 -> 184
#markings=CS0,CS1,CS2,CS3,CS4,CS5,CS6,CS7
#markings=0,32,64,96,128,160,192,224


date ; ping -c 10 netperf-eu.bufferbloat.net ; ./run-flent --ipv4 -l 300 -H netperf-eu.bufferbloat.net rrul_var --remote-metadata=root@192.168.42.1 --test-parameter=cpu_stats_hosts=root@192.168.42.1 --step-size=.05 --socket-stats --test-parameter bidir_streams=8 --test-parameter markings=0,32,64,96,128,160,192,224 --test-parameter ping_hosts=1.1.1.1 -D . -t IPv4_SQM_cake_layer-cake_LLA-ETH_OH34_U097pct34500of35483K-D090pct105000of116797K_work-horse-eth0_2_TurrisOmnia-TurrisOS.5.7.2-pppoe-wan-eth2.7_2_bridged-BTHH5A-OpenWrt-r17498-07203cb253-Hvt-VDSL100_2_netperf-eu.bufferbloat.net --log-file

rrul_var to the rescue, just define the number of flows and which DSCPs to use, see above for all 8 class selectors... just pick which DSCPs you want to include...

hostapd: add wmm qos map set by default
author Felix Fietkau <[nbd@nbd.name](mailto:nbd@nbd.name)>
Wed, 3 Nov 2021 22:40:53 +0100 (22:40 +0100)
committer Felix Fietkau <[nbd@nbd.name](mailto:nbd@nbd.name)>
Wed, 3 Nov 2021 22:47:55 +0100 (22:47 +0100)
commit a5e3def1822431ef6436cb493df77006dbacafd6
tree f4494efd6e08a872524eedb5081564a6f5ece20c tree | snapshot
parent b14f0628499142a718a68be7d1a7243f7f51ef0a commit | diff
hostapd: add wmm qos map set by default

This implements the mapping recommendations from RFC8325, with an
update from RFC8622. This ensures that DSCP marked packets are properly
sorted into WMM classes.
The map can be disabled by setting iw_qos_map_set to something invalid
like 'none'

Signed-off-by: Felix Fietkau <[nbd@nbd.name](mailto:nbd@nbd.name)>

Which introduces the following new RFC8325 inspired DSCP to AC mappings:
set_default iw_qos_map_set 0,0,2,16,1,1,255,255,18,22,24,38,40,40,44,46,48,56

Which translates into the following mappings (according to the hostapd rules below*):

unraveling this gets us to (0 is coded as DSCP Exception, the rest as DSCP ranges):

UP	DSCP	    AC	  PHBs(decDSCP)
Ex0	BE	        BE(0) BE/CS0(0)
Range0	2-16	BE	  CS1(8)**, AF11(10), AF12(12), AF13(14), CS2(16)
Range1	1-1	    BK	  LE(1)
Range2	-	
Range3	18-22	BE	  AF21(18), AF22(20), AF23(22)
Range4	24-38	VI	  CS3(24), AF31(26), AF32(28), AF33(30), CS4(32), AF41(34), AF42(36), AF43(38)
Range5	40-40	VI	  CS5(40)
Range6	44-46	VO	  VA(44), EF(46)
Range7	48-56	VO	  CS6(48), CS7(56)

So e.g. 4,4,0,0,160,160,184,184 (LE,LE,BE,BE,CS5,CS5,EF,EF) should put two flows into each AC of current OpenWrt...

date ; ping -c 10 netperf-eu.bufferbloat.net ; ./run-flent --ipv4 -l 300 -H netperf-eu.bufferbloat.net rrul_var --remote-metadata=root@192.168.42.1 --test-parameter=cpu_stats_hosts=root@192.168.42.1 --step-size=.05 --socket-stats --test-parameter bidir_streams=8 --test-parameter markings=4,4,0,0,160,160,184,184 --test-parameter ping_hosts=1.1.1.1 -D . -t IPv4_SQM_cake_layer-cake_LLA-ETH_OH34_U097pct34500of35483K-D090pct105000of116797K_work-horse-eth0_2_TurrisOmnia-TurrisOS.5.7.2-pppoe-wan-eth2.7_2_bridged-BTHH5A-OpenWrt-r17498-07203cb253-Hvt-VDSL100_2_netperf-eu.bufferbloat.net --log-file
1 Like

Thanks heaps for taking the time to post this, mate. I'll use it in my next tests, tomorrow, tho'.

@dtaht

Next round of tests, I hope these are useful.

Test rrul, AC_BE all default:

Test rrul, AC_VI all default:

Test rrul, AC_BE and AC_VI, all default:

Test rrul, AC_BE and AC_VI, tx_burst=5.0:

Test rrul, AC_BE and AC_VI, tx_burst=0.5:

Test rrul, AC_BE and AC_VI, BE parameters equal to VI:

Test rrul, AC_BE and AC_VI, BE parameters equal to VI and BE TXOP=94:

  • Please, find all data for this 2nd round clicking here

Updated: mistakenly used 40 as DSCP value in place of 160

I do not think 40 is AC_VI. As far as I understand flent accepts and prints the TOS values, so 40 would be TOS 40/4=10, which still maps to AC_BE... However, flent might print decimal DSCP values while requiring decimal TOS values for configuration, so you might have done the right thing and I am just confused...

But the fact that all flows get the same throughput indicates that the marking might not be as intended.

(Why is TOS = DSCP * 4? Because this essentially is a shift by two bits to get from 6bit DSCP to 8 bit TOS with those two ECN bits zero by default)

2 Likes

You are right! The lack of sleep is affecting me clearly. Going to redo them. Argh.

Update: it should be fixed now, I'm going and mind my Saturday and have another coffee to see if I can wake up properly!

2 Likes

thank you, esp for showing, in particular, how badly the BE queue performs under contention vs VI. BK oughta be worse! There is a lot of traffic mismarked as CS1 out there, in the vain hope that background actually means what L3 protocol designers meant as background, where 4seconds of delay with only one station on the case is well beyond what we meant as background. Trying to spit tcp through there, which has typical timeouts at 250ms, 1s and 2s essentially, means we end up sending more packets in a somewhat futile manner. If we could force upon application designers the idea that the BK queue might be delayed 10s of seconds, and restrict usage of it to just those apps, that would be great.

Back when we were thinking about 802.11e, the problem as then (2003!) seen was that VOIP really really really wanted a 10ms interval (now it's 20ms), we didn't have good jitter buffers, and ulaw and gsm encodings were the law of the land. So a limited number of VOIP phones on an AP worked better - ship it! (and again, this was a client, not as much AP, option at the time. The APs were supposed to figure out how to schedule responses, and many (enterprise) APs actually did do some of the right things here...

VI ended up as a bucket for where videoconferencing was to go. It seemed to make sense... to some...

but the complexities of 802.11e's bus arbitration don't make a lot of sense, period. IMHO.

After 802.11n showed up with aggregation which was vastly superior in terms of fitting packets into a txop (if you managed the queues right)... and atheros sold out to qcomm... most of the detailed AP knowledge began to fade from the field.

I turned off mappings via qos-map almost entirely (EF-only) years ago, and have in general not looked back. WMM is required that it work to pass the wifi alliance's tests! and thus it's on by default for nearly everybody else still, and the effect on real traffic, well... I'm in general thankful that so few applications have tried to use it to date. Used carefully from certain kinds of STAs still seems to be a decent idea. Note "carefully". There are a few wifi joystick-game controllers that use VI or VO....

Despite my opinion, I'd never got sufficient data from real world usage to convince enough people I was right.

With enough data, perhaps we can convince the openwrt folk to obsolete qos-map into the bk queue, at least. The VI queue isn't looking all that good either. Scheduling smartly, and intelligently reducing txop size under contention seemed the best strategy to me (in 2016)

I've sometimes hoped we could find another use for the 4 hardware queues. Or that they would work better in a mu-mimo situation. I keep hoping we find a benchmark that shows a demonstrable benefit for some form of real world traffic for the VI and VO queues for some generation of wifi.

I should probably note also that there are all sorts of other possible sources for observing the 4sec spikes on icmp seen here...

3 Likes

Hence my principled objection against the harebrained idea of making "NQB" inhabit AC_VI... clearly nobody in the IETF WG bothers to look at actual data...

I thought 802.11e was finalized in 2005?

You convinced me; am I not enough people? :wink:

Oh I think there is, but it requires that you have <= 4 different levels of priority traffic and are willing to accept that higher priority, if not rate-limited sufficiently*, will severely choke lower priority traffic.

*) it is not that rate limiting on a variable rate link like WiFi is conceptually all that "simple"

I'd worked on wifi, 1998 - 2005 - http://the-edge.blogspot.com/2010/10/who-invented-embedded-linux-based.html - as well as on various voip products like asterisk and the sophia SIP stack. I tapered off after 2005. So I was aware that what became 802.11e was kind of a brain damaged idea, except for voip. I didn't really grok the real damage of 802.11n packet aggregation until 2012? 2013? All I really understood is that sometime around 2008 or so, wondershaper had stopped working worth a darn. Looking back in history (now) txqueuelen's had grown to 1000 packets and GSO and GRO had become a thing, and nobody else had noticed, either (and I was still doing things like SFQ by default and vegas, not realizing nobody else was doing that. I didn't get out much) After I believed jim enough to repeat his experiments in 2010? 2011?

I didn't get how big the problem was for everyone, either. I just thought it was my tin cans and string connecting me to the internet.

Anyway, a little more data on VI vs BE regarding just a BE flow competing with a high rate irtt -i3ms --dscp 160 - really need a test to integrate that sort of thing directly in flent - plotting irtt loss and marks -

My hope was that a test downloading via the VI queue exclusively would have, oh, no more than 4-8ms observed latency on this chipset. 20ms seems really excessive, and must be coming from ... AQL? the hardware? don't know.

2 Likes

A great deal of the testing I'd wanted to do on this thread took place over here: AQL and the ath10k is *lovely* - #859 by dtaht

I'd prefer to try and close out the aql and ath10k over there and move to here.

So, @dtaht, what feedback do you have about that ath10k bug?

My ath10k is in a storage unit 200 miles from here as are the remains of my lab. On my little boat I am using an ath9k/lte device, and recently picked up a starlink. I'm tempted to hack into the starlink and fix it ( https://www.youtube.com/watch?v=c9gLo6Xrwgw ) Anyway, the best I can do is help analyze tests, at the moment, until I find a cheap place to have a lab on land... or get a bigger boat.

Yeah, all right, I interpreted incorrectly your post. I'll test rrul_be later. I reckon it was mostly fine. Most of my "toys" are 18,000 km from here too. :wink:

@dtaht, see below a quick rrul_be test with the new topology, as promised in the ath10k thread (this is VHT80).

Any future test will do on HT20. I think it will help with the connection stability, right?

1 Like

really lovely. 4x1 bandwidth disparity. I'm really puzzled as to this in general. I felt after looking over ac and later in 2015 ack-filtering was going to be needed, but never got around to it: https://github.com/dtaht/sch_cake/blob/master/sch_cake.c#L1254

generic rrul, if it blows up, you can fix by quashing the qos_map. I have limited joy in seeing it blow up, but...

Do you mean by this porting it from cake to fq_codel?

I would do it a bit later during lunchtime as my network is under heavy use right now. By the way, I'm using the next qos_map_set in my network, i.e., re-mapping UP 6 and UP 7 to UP 5:

option iw_qos_map_set '1,1,8,1,18,3,20,3,22,3,24,4,26,4,28,4,30,4,32,4,34,4,36,4,38,4,40,5,44,5,46,5,48,5,56,5,0,63,255,255,255,255,255,255,255,255,255,255,255,255,255,255'

cake was in parallel development with fq_codel. The intent was to try out some ideas in cake first, and then compare with fq_codel, and merge the best of them. Over time cake grew to eat wayyyy too much cpu to want to port over to the wifi stack, and fq_codel runs faster on gigE and higher interfaces.

So it may be we go nuts and try to port most of cake over to the wifi stack (which might solve the 802.11e problems), or pieces, but until the last 6 months, most of our efforts were directed at very different stuff, like the L4S vs SCE fight in the ietf. In my case politics, outreach and mikrotik, apple, and now, libreqos, eat most of my time.

the ack-filter port would be so much easier if the fq_codel implementation hadn't grown hairy include files.

Another thing we needed in wifi was the drop-batch facility that's in the main qdisc... or cobalt... and increasingly GSO splitting seems sane. I'd come up with an fq_codel that was saner in a couple respects over here:

https://lists.bufferbloat.net/pipermail/cake/2018-September/004345.html

before getting sucked into the sce thing.

Other wifi factors besides ack-filtering dominate, as we've discussed, dealing with powersave, multicast, rssi, buggy drivers, have been most of the real latency and jitter inducing problems we've faced.

1 Like

Just want to say thank you all for all the effort you guys put into this. Upgraded my access point and router to 22.03 and everything seems to be performing even better than before.

2 Likes