RFC9330 / RFC9331 / RFC9332 for lower latency

Mainly a note to self.

Eliminates buffer bloat, good for video chat. L4S is an advancement to the existing ECN.

Something for the kernel?
See also https://github.com/L4STeam/linux
Earlier ECN discussion here: Effect of "Set tcp_ecn to off" on ECN

3 Likes

Any experts who know about this, whether (openwrt) kernel will get accurate ECN sysctl -w net.ipv4.tcp_ecn=3 or Prague sysctl -w net.ipv4.tcp_congestion_control=prague?

Possible to use L4S on OpenWRT?

Today, no.

Future, quite likely. It's not merged into the kernel yet, I don't think.

Pretty interested in all this, frankly. Has anyone tried to introduce the changes from here into the kernel for OpenWrt?

@dtaht might have an idea where the kernel is with these.

DOCSIS modem makers got together and implemented necessary changes a few years ago apparently. https://www.cablelabs.com/blog/l4s-interop-lays-groundwork-for-10g-metaverse

Apple says they're ready (iOS 17 and macOS sonoma) : https://developer.apple.com/videos/play/wwdc2023/10004

But they point to the above git repo "if you use TCP". I did not see a whole lot in the kernel mailing lists.

Comcast has ongoing tests https://datatracker.ietf.org/meeting/118/materials/slides-118-tsvwg-sessa-61-l4s-experience-00 until Mar 2024. You can still sign up if they're your provider.

See also jlivingood deployment repo https://github.com/jlivingood/IETF-L4S-Deployment

Basically, a router needs to mark the ECN bit in packet when said packet gets queued. Knowing when to mark packets is TCP Prague I think. Read summary here.

Had some discussion on this today as well, starting here:

I'm going to set this right here for edification purposes (link courtesy of @dtaht):

That's a bit to digest, but I don't see any problems.All of those graphs tell me that Prague is working as intended. You can see that Prague is superior. Some points to note: QUIC traffic volume is gradually replacing TCP. UDP is improved by Prague. Carrier and transit networks seem largely ready for ECN and Prague.

Whoha, that seems like a rather quick assessment. Prague really is heavily based on DCTCP, so it is expected to work equally well under non-adversarial scenarios. However DCTCP is not suitable for use over the existing internet, in FIFO bottlenecks it will get clobbered by non dctcp traffic and in dctcp-style ECN-marking bottlenecks it will clobber non-dctcp traffic. So literally the main effort of L4S is making dctcp fit for the internet by removing the two clobbering conditions, and the data shows it does a lackluster job at both.

EDIT: remove unsolicited grumpiness

1 Like

I am deeply scarred by the L4S battle, which unfortunately included far more orwellian language and far less science that I would have preferred. In general I decided to wait until the rubber met the road, to where the proponents were actually attempting implementations on real hardware, which is actually, finally happening. The early public reports are all rather biased towards positive results, and they have met a few bitter realities that the RED (SCE) team pointed out along the way. One of the big ones was that in a software implementation, Linux is far too batchy for anything less than about a 2ms mark rate.

I´ve been asked why does cake not support L4S quite a few times in the last few months, and I explain that it is a very simple patch (boosting all ecn marked packets into the VI queue, recognizing NQB also, and marking ECT1 properly) and they are welcome to apply one to do it, but with the none of the l4s stack (tcp prague and dualpi) submitted to the linux kernel, either, despite 6+ years of "development", I have not felt compelled to work on their behalf for free to support their future vision of how the internet should work. I HAVE expressed willingness to do the work, so long as the testing was unbiased and fair, and heard nothing but crickets in response. I laid out a large list of features cake could use in the CAKEMQ project, and have been seeking funding for it for months, principal among it is scaling better to multi-cores, which is something all qdiscs could use, but I am actually willing to implement and support all the L4S features (including the strict priority footgun) if someone pays me for it to also participate in the transport protocol development side, which is what actually needs the most work.

The dualpi code for linux that I reviewed a few years back is fatally flawed in multiple ways that so far as I know remain unaddressed. A) Did not work with GSO properly. B) Did not scale correctly across multicores in any way. C) Did not contain the infrastructural changes needed to distinguish the L4S bit from the other ECN bits across the stack and all the other drivers D) Has a packet, rather than byte limit that is too small for 10Gbit or more. E) TCP prague´s dctcp style backoff cannot possibly work with wifi or indeed most wireless transports. F) Had enormous convergence problems with multiple flows in slow start. G) Is DOSed with ping -f -Q 1 and despite specs for it, most of the protections designed for that are not public. H) is horribly RTT-unfair (where cake and fq_codel are nearly perfect here)

I regarded all these flaws as fatal at the time, and I am somewhat perversely watching attempts to make it work with some wifi chips in the hope that they do come up with something that works.

Additionally pie is so seriously outperformed by fq_codel or cake for normal traffic in nearly any scenario (even according to their own benchmarks from 2013) as to make the priority queue L4S is look much better than it is, by comparison. I have yet to see a head to head test of cake diffserv4 vs L4S + NQB. I also keep hoping for a head-2-head test of variety of traffic with the fq_codel patches (in linux 6.1) that enhanced the ce_threshold parameter to support the L4S mark.

To be fair, L4S, if it deploys, will have pie enabled for normal traffic also, which will be an improvement. I had hoped that docsis-pie would have become universal by now, only Cox and Comcast have deployed it.

I regard the principal benefit of L4S (and the associated NQB) is actually in a DOCSIS-4-LL underlying feature that improves the request/grant cycle docsis has from about 10ms to about 1ms, making it almost competitive with PON. Honestly fixing the r/g in docsis is a great feature! In general given the bandwidth surplus DOCSIS-4 users will soon have, not differentiating traffic and just giving everyone that feature without L4S being needed would be ideal.

IF I have any idea of a positive outcome it would be BBRv3 (which treats marks as advisory) + the 6.1 fq_codel ce_threshold extension being tested, and then applied to wifi. A little bit of marking of any sort, and a delay based metric would be the most backward compatible and useful way forward.

Given the top-down push to make this pig fly best I can do is keep pointing people at benchmarks that demonstrate the flaws of the dualpi+prague approach, and hope that enough engineers exist with taste to run screaming from it. I also would very much like L4S enthusiasts to be testing fq_codel and cake at the same time, but as I said unwilling to pay much attention until the patches hit the linux kernel.

My biggest dream was that some rebellious DOCSIS vendor would actually implement and ship CAKE.

For everyone else, there´s LibreQos. I LIKE LibreQos in that if L4S actually becomes a thing, we are in a position to develop and push a patch for L4S in a matter of days, vs years and years for embedded, even openwrt.

5 Likes

They ended up recommending to disable these kinds of offloads pretty thoroughly in their github wiki IIRC, ignoring that at high rates these kind of offloads make a ton of sense...

Fair criticism, but thatis inherited from the Linux qdisc design and affects all/most qdiscs, no?

Sad fact, windows' ping stopped honoring specific dsco/tos requests...

1 Like

Well, I always thought that this truly is the relevant part of the low latency docsis spec, the rest is mostly window dressing...

However this paper lists some XGS-PON and DOCSIS latency data (LLD and "normal" docsis):


Note while the table header says GPON the paper says XGS-PON, I think we should trust the rest of the paper more. also the relevant lines are S9_UDP and SU_TCP in the the scenarios traffic is artificially clamped to below segment capacity (not sure about the rationale for testing 5 below load scenarios, but that is what they did). The DOCSIS data was not measured with the same set-up (occasionally real life constraints can ruin the best planned experiment, I am glad they included DOCSIS data at all). Anyway the gist of the whole is that LLD helps a bit in reducing the latency GAP in regards to XGS-PON but does not close it fully... (The TCP test with AQM and with LLD (wich implies AQM as far as I understand LLD), shows that without an L4S protocol the LLD scheduler/AQM really operates similar to the old D3.1 AQM, with 2.5-3 times worse P99 latency compared to XGS-PON, which also operates a requet/grant mechanism).
Anyway what this shows is IMHO that LLD is not bad, but clearly well above L4S' 1 millisecond queueing delay goal... I wonder how the DOCSIS L4S latency target will end up being set...

2 Likes