CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

DigitalLabs · October 15, 2022, 5:35am

Sorry to jump in, saw the mentioning of hardware and I'm having a bit of grief with x86.

After jumping from an AC2100 with a MediaTek MT7621, I attempted to see what I could get with an i3 7100u, which in theory is a significantly more powerful architecture.

However, where I could get roughly 110mb/s on the AC2100, I can only achieve 80mb/s on the i3. Having a look at htop, the CPU usage is 100%, which is kind of baffling. I recall @Lynx mentioning to install IRQbalance, which I did, but I didn't witness any form of load balancing.

I would love someone to point me in the right direction, like I've seen Celerons able to sustain greater throughput. I'm just confused more than anything lol

I don't have SQM, or Autorate running, there surely must be something wrong lol.

My only thought would be set IRQ affinity, but I have very little experience in those regards.

Again, thanks for the insight. Tearing my hair out ):

Edit 1 - I can achieve full Gbit via iperf3... and this WWAN connection via x86 is using USB tethering instead of the AC2100s WiFi client

DigitalLabs · October 15, 2022, 8:25am

Nvm, from what I can deduce, running phone tethering via Openwrt results in traffic being solely on a single core.

moeller0 · October 15, 2022, 8:30am

Try to enable the detailled view for the CPU bars in htop, as that spkits out soft interupts in pink.

moeller0 · October 15, 2022, 8:35am

Seems reasonable, after all it is a single USB link that is used, no?

Lynx · October 15, 2022, 10:00am

Having relied upon a Huawei B818-263 positioned indoors for almost two years now, I have finally decided to upgrade my setup and have purchased a Zyxel NR7101, which is an outdoor integrated 5G/4G router and directional antenna:

It's German-made and obscenely expensive when bought new in the UK, but this is fine as a business purchase. Apparently OpenWrt can be run on it, albeit I'm not entirely sure yet of the benefit in doing so. Since perhaps a benefit may become apparent in the future I think I might just setup OpenWrt on it to begin with. @Lochnair I believe you have some experience with this device?

This will be operated in bridge mode and coupled via ethernet to my RT3200 as my main router. It gets power using a POE injector that will be coupled to my (also German-made) UPS.

This outdoor device is overkill because there are no 5G cell towers in the vicinity and the nearest 4G mast (Vodafone) is very close at only 1.4km away, and is only band 20 (10Mhz of bandwidth). So neither 5G nor carrier aggregation is available. Perhaps one day this mast will get upgraded though, either to 5G or at least such that it provides more bands for carrier aggregation. Other masts in the vicinity (EE) already offer band 20 and band 3, but they are much further away (circa 4km) without direct line of sight. So I imagine the Vodafone tower still wins.

In retrospect, I ought to have switched over to an outdoor antenna sooner. Hitherto I had assumed that during the low bandwidth periods, which I have always thought relate to congestion, there would be no bandwidth gain by improving the connection with the mast, but I tested this theory yesterday by getting a large extension cable and sitting my B818-263 on the windowsill with the window fully open and to my surprise the stronger connection with the mast resulted in a significant bandwidth gain. I do not understand this fully but perhaps during congestion periods with the signal with the mast shared out amongst many users there is a deterioration in signal quality in terms of wall penetration, which, in turn, affects bandwidth.

The improvement was so dramatic that I was able to run a demanding 4K video without any stuttering (impossible with the B818-263 only indoors during the same time period).

And yet with cake-autorate active the 4K video would stutter unless I significantly relaxed the sensitivity.

So once I get my new NR7101 set up (and assuming all goes well with that and I retain at least as good performance as with the B818-263 on the windowsill behind an open window), the challenge for me will be to see if I can tweak cake-autorate to permit a 4K video to operate during peak congestion smoothly, and yet at the same time still be sensitive enough to keep Teams and Zoom working well without stutter.

Certainly my settings so far have been extremely tight - with merely 2 out of the last 4 ping samples at 12.5ms OWD delay from baseline triggering a bufferbloat event. To get the 4K video to work stutter free I needed to relax this to more like 5 out of 10 ping samples at 50ms OWD delay from baseline, and increase the bufferbloat refractory period from 300ms to 500ms. Whether that would also permit Zoom and Teams to work stutter free remains to be seen. My gut is that this may not work, but I'm not sure.

Here is what Microsoft state in terms of latency requirements:

Not as tight as you might think: 100ms RTT + 30ms jitter over 15s interval.

@moeller0 any idea how this relates to our:

# delay threshold in ms is the extent of OWD increase to classify as a delay
# these are automatically adjusted based on maximum on the wire packet size
# (adjustment significant at sub 12Mbit/s rates, else negligible)
dl_delay_thr_ms=50 # (milliseconds)
ul_delay_thr_ms=50 # (milliseconds)

# bufferbloat is detected when (bufferbloat_detection_thr) samples
# out of the last (bufferbloat detection window) samples are delayed
bufferbloat_detection_window=10  # number of samples to retain in detection window
bufferbloat_detection_thr=5      # number of delayed samples for bufferbloat detection

If adjusting cake-autorate to handle both 4K during peak congestion yet also permit smooth Teams and Zoom is not possible, I wonder if we perhaps ought to consider the possibility of making cake-autorate alter its behaviour in dependence upon the type of traffic in use at the present time. I already have some nftables experience and I imagine it may be possible to somehow feed in information to cake-autorate so that, e.g., cake-autorate can see the present set of active DSCPs in the traffic network (and alter its behaviour accordingly).

Lochnair · October 15, 2022, 11:34am

Yes, I actually have two of these now, got another one cheaply for the vacation home.
I do run OpenWrt on them, the biggest downside of doing this is that there isn't any "bridge" mode in OpenWrt for LTE, so you can't easily get the public IP on any downstream device. So to avoid double NAT I've slapped a static route for my local subnet on the NR7101 and called it a day.

Admittedly I just flashed OpenWrt on them without much thought, due to the warnings that a future firmware update could potentially lock down the bootloader, so switching to OpenWrt even over serial would be impossible.

moeller0 · October 15, 2022, 12:47pm

This sounds hard to get right automatically*. However zoom/teams and friends have to deal with 'shitty' links already and hence try to allow quite some latency variability out of the box, so I am somewhat confident that you will find a setting that is acceptable for both use cases.

Alternatively we could think of a simple mechanism to switch easily between config files.

*) I all classification issues one important question is what are the consequences of false classifications. So punting that to the user to switch between different configurations ("profiles") at least makes sure the user is in control and needs to set a policy explicitly.

The other option is to simply download 4K material before viewing (youtube-dl comes to mind, or for example apple allows downloads of movies so you can simply start downloading and start watching after a sufficient buffer accumulated on-disk)

Lynx · October 15, 2022, 6:44pm

Looks like nftables' logging capability could be leveraged to write out DSCPs:

https://wiki.nftables.org/wiki-nftables/index.php/Logging_traffic

And as you mention this could be read in and used to switch to different cake-autorate profiles like:

videoconferencing: tight latency settings
videostreaming: relaxed latency settings
normal: medium latency settings

But all of this adds yet more complexity.

I wonder if in some cases such complexity may warranted though. Like especially for really weedy connections like sub 10Mbit/s where finding settings that work for all use cases is impossible.

Seems quit cool like: ah ok, videoconferencing network activity detected so let's switch to the videoconferencing profile with tight settings. Or oh, just videostreaming right now so let's relax and let the buffers fill up.

Given the complexity here unless lots of users jump in and say they desperately need this I think I'll first try and see if I can make things work on my own connection (once I have outdoor antenna) without any of this.

moeller0 · October 15, 2022, 7:11pm

On such a link I would not leave it to chance, but would like to switch between dedicated "VC" and "non-VC uses" profiles, to avoid having the heuristic misfire either within an important video conference or hurting video quality in a "streaming" evening (even though on such a link I would try to preload stuff before watching it).

But that means al users can now initiate such behavior switches. This can be nice if everybody agrees and plays nice.

I think having an easy way to switch profiles (could be as simple as shell script that symlinks a named configuration to the canonical name and restarts the autorate service).

dlakelan · October 15, 2022, 10:45pm

Hey Dave I'd be happy to write you some code in Julia to plot stuff. Pretty sure Julia JSON parsing will work fine. Hit me with a DM and then bug me every 2 days if you don't hear from me

exokinetic · October 18, 2022, 8:07pm

possibly related to the total transmit power available to the tower
at max user density it may be limiting transmit power to the the towers transmit radio's

I have noticed the same with my setup;

Teltonika RUT240 Category 4 LTE Modem (using external antennas)
-bridge mode

Into the WAN port of an ASUS rt-n66u running latest OpenWRT w/luci-app-sqm and cake-autorate running on the WAN interface
-whos CPU routinely maxes out under full load with scripted shaping averaging somewhere around 10down/8up

the lua implementation barely hits 60% CPU load under the same conditions, but for some reason I can eek out slightly more bandwidth at the same "A" bufferbloat rating using the bash script implementaion

With no sqm enabled I can single stream a 4k video seamlessly; with dl speedtests reaching as high as 40-50 mbps
(even though I will get a "C" grade on bufferbloat with no SQM enabled, so obviously this would not be workable for ZOOM/ Teams calls...)

...but if at that same time, if I enable sqm + autorate script, it will shape me down to somewhere around 10mbps and that same 4k vid will be unplayable
(but, with the A bufferbloat grade I get shaped down to, all the VoIP and video conference stuff is awesome)

I use two (wideband) yagi antennas mounted right on top of eachother in a cross polarization pattern (required for LTE mimo), they have 5dBi of gain at 700mhz which is where Band 12 is that I use consistently from an ATT tower that is 12km away.

I do NOT have direct line of sight, I am at elevation, but there is topography between me and the tower +trees
-full fresnel zone blockage, totally NLOS (no line of sight)

I average -65 RSSI -10 RSRQ -95 RSRP and +15 SINR

There is a T-Mobile tower right next to that ATT tower and I can pick up Band 2 and Band 4 (1900 and 2100 mhz respectively) from the same distance with usable signal as well, slightly worse than the 700mhz band 12, but still well within the "green" signal zone. (my antennas have ~8 dBi of gain in this freq range)

So I would re-evaluate the thought that your other towers are "too far away" to use with good signal.
It will depend on how they are deployed and what direciton the cells emitting your desired bands are facing -obviously-, but at 4km away I would strongly advise pointing at them to see if you can take advantage of that carrier aggregation

I feel like toying around with this...
perhaps I can relax my sensativity a bit and get some more "usable" bandwidth out of the shaper

I wouldnt exactly call myself desperate...

But the use-case for this paricilar setup of mine would likely benefit from this kind of feature;

I hold events in remote locations that requires internet access for all event attendees for a few things.

"official" live-streaming of the event (YouTube via OBS studio)
-lots of people watching, we really want a nice smooth stream with good audio/ no buffering (doesn't everybody?)
-one can see where SQM is going to help here
an event related app that all event attendees have access too and will be using throughout the event.
-this app requires internet access to function
-it is essentially relaying low level CSV text data from the apps cloud based relay servers
-almost nothing as far as bandwidth is concerned, besteffort is fine, very insensitive to latency spikes
-BUT, if I have 100's of attendees app's trying to access this relay server... SQM is likely going to help prevent this traffic from buffering out the "official" live-stream
Social media engagement baby
-it is beneficial to provide "social media" access to all the event attendees
-this is going to have a ...dynamic... impact on bandwidth utilization
-I have no way of preventing anyone else from firing up their own livesteam(s) at the event so this needs to be managed somehow -from a bandwidth sharing perspective (and I kind of want to encourage attenees to engage on social media however they "like" too, if there is capacity overhead to allow it, nor reason they cant livestream too)

And those three things need to co-exist nicely on a cellular backhual (4g LTE) connection in a remote rural area that is likely 10km or greater distance from the nearest tower.

OpenWRT has been critical to my initial deployment of this "solution"

Teltonika RUT240 (bridged) -> ASUS rt-n66u (the "gateway" and core network switch) -> Ruckus Unleashed R600 AP's distributed for even 5g cell coverage

Two seperate SSID's:

-One for the event livestream feeds (remote wifi IP cameras) with highest airtime fairness priority

-Second for the event attendees with low airtime fairness priority
I am also considering rate limiting this SSID to 80% of the LOWEST observed throughput while using sqm-scripts + cake-autorate
(essentially, I "think" I want ~20% 'goodput' headroom above whatever my event attendees are allowed to saturate because I am not sure how much I trust cakes per host fairness)

Thoughts on that?
Do I really need to reserve headroom for my "mission critical" livestream flow if I have cake enforcing dual src/dsthost nat?

Last question:

Can multiple instances of cake and cake-autorate run on the same router, pointing at their own interfaces?

I am thinking of going multi-wan with multiple LTE modems using different carriers sims (ATT, T-Mobile, Verizon) and load balancing them all together inside one OpenWRT device.

Would I use three seperate instances of cake pointed at the 3 seperate "wan" interfaces?
Or would I just point cake at the "multiwan" interface that I think gets created when using the mwan3 package...? ...basically, is anyone here familiar with applying cake to a multiwan situation.

And finally...

A huge thank you to everyone involved in this autorate project, and everyone involved in the wider bufferbloat project as well!

moeller0 · October 18, 2022, 8:20pm

Headroom always helps, other than that I think cake will hold its own pretty well, but you really should test it for your use-case. If you do please report success or failure back here and also on the cake mailing list over at https://lists.bufferbloat.net/listinfo/cake

For cake and SQM sure they can, for autorate not out of the box (as you would need to separate out files/logs per interface) but getting this to work should not be too hard either.

That is probably the cleanest way.

That should also work to some degree but this will add additional delay variance which will probably force you to relax the temporal thresholds to not fully tank throughput... but try it out, it might work well enough...

juliank · October 19, 2022, 2:19am

Given that my internet is faster than 2.4GHz WIFI I get bufferbloat on the AP (or laptop) not the WAN connection. Would it be proper to use CAKE with Adaptive Bandwidth on Access Point's 2.4GHz interface?

Btw, the cake graphs in this thread are great. How is that done, i.e how is the data extracted and what tools are used to plot it?

dlakelan · October 19, 2022, 3:55am

Yeah, this is rapidly becoming a thing, and even some people getting 10Gbps WANs and so gigabit switches on the LAN are becoming the bottleneck. I think this is a good reason to tag DSCP on game and video call traffic etc, and use switches with basic QoS built in... Switching fabric can switch usually Gbps between any two ports, so more than 1Gbps on the wan doesn't necessarily immediately bottleneck, but it's entirely possible for gigabit switches on the LAN to be the bottleneck these days... crazy.

exokinetic · October 19, 2022, 5:01am

Same here.

Short answer:

Yes; absolutely.

OpenWRT already applies this (fq_codel, right?) to the WiFi interface on certain supported chipsets. (mostly atheros chips if I recall)

Longer answer...:

I have symetric 1gig fiber for the house, that I am about to downgrade to 500/500 -probably even lower...

Not realizing the need for queue management, I spec'd my router based on simple NAT performance alone.

Mikrotik hEX rb750gr3, which will NAT 1gig all day long... with hardware offloading.

Turn off hardware offloading, and then, on top of that, throw cake at one of my 880mhz MIPS 32bit cores and, well...

100mbps is about the limit of cake performance, peaking the CPU load will start to induce its own latency above that.

I can actually get closer to 500mbps with fq_codel.... but I really want the per host isolation cake offers.

So, sucking up my 1gig fiber to the house pride, I ran cake @100mbps symetric.

Damn.

EVERYTHING got better.

No more buffering on the IP TV service (multiple HD tv's in the house, DirecTV 'Stream' service)
No more VoIP headaches
web pages much snappier on all devices

And; to your question...

Because I was now shaping the WAN port to a rate lower than my Access Points link rates, I never hit the buffers on the AP's.

I am running Ruckus R600 AP's on Unleashed firmware, which, although employing 'airtime fairness' algorithms, are still suceptible to bufferbloat when you saturate the airtime.

Unfortunatly I made my AP decision (which was between Ubiquiti and Ruckus at the time...) before I discovered the concept of bufferbloat.

One lives and one learns.

It seems to me, that given how many BAD BUFFERS -without proper queue managment- exist all along the general network stack; that it becomes advantageous to create an ARTIFICIAL BOTTLENECK at the location of your gateway router, EVEN IF you dont have an "actual" performance bottleneck right there.

Creating an artificial bottleneck that is actually the 'narrowest' bottleneck in the chain gives you COMPLETE queue control.

So long as this bottleneck is still sufficiently large enough for the bulk rate you actually NEED TO USE, you will never notice it being there, other than the impovement in round trip times.

And I am very quickly discovering that when round trip times are UNIVERSALLY "fast" (perhaps consistent is a better word?) that less throughput is actually required for a given use case.

dtaht · October 19, 2022, 5:18am

fq_codel is on by default on the mt76, ath9k and ath10k chips in openwrt. It's not as polished as I would like, and we've been trying to tune it up, the latest work was promising, shown here: AQL and the ath10k is *lovely* - #908 by amteza

dtaht · October 19, 2022, 5:19am

nice to hear, and yea, if you can make sure your wifi never buffers at the router, the fq_codel on that is definately overbuffered compared to what cake can do.

exokinetic · October 19, 2022, 5:28am

My local ISP is bragging right now as being the first 'home' ISP to offer 2gig symetric fiber service.

As I am becoming aware of "the problem" with the internet right now, this really is quite funny.

In the commercials they are touting it as the "solution" to...

...DRUM ROLL PLEASE...

"buffering issues"

I really cant lol harder.

2gig service is going to make it "exponentially" worse.

FTP slow start is going to ramp all the way up to 2gbps before it drops a packet to signal rate reduction?

...but, that packet wont get dropped untill all the buffers fill up...

Throw some nice fat buffers in there and you can guarantee your latency sensitive packets will never meet an empty queue.

I really dont think many people understand how FTP "rate sensing" works.

Go ahead; make the pipe wider.

It ramps EXPONENTIALLY...

User: "How much bandwidth will my file transfer use?"

FTP Slow Start: "All of it."

User: "I didnt even tell you how big it was, though... what if I transfer a smaller file?"

FTP Slow Start: "All of it. And all of your buffers are belong to me too."

exokinetic · October 19, 2022, 5:43am

brainstorming a bit...

lets say I have a bunch of AP's with the "biggest buffers in the industry"

Lets also say, hypothetically, that I had an x86 processored OpenWRT gateway running cake at 1gbps

Now I am in that unfortunate situation where I will get to "experience the graduer" of those 'big buffers' on my barking dog AP's.

So, lets say I had a hypothetical PoE switch that was running OpenWRT...
(I suppose there is nothing stopping you from doing this on the x86 gateway given it has sufficient ports and processor capacity....)

Would it be a "good idea" to run for example, 4 instances of cake, concurently, each pointed at a different ethernet port on the switch.

Plug the AP's into these 'caked' switch ports.

And set the 'shaper rate' to something "just short" of engaging the AP's Big Beatiful Buffers.

Thoughts?

moeller0 · October 19, 2022, 11:02am

Fun fact during a multiday absence with autrate running and automatic speedtests, I see that short speedtests seem not enough to ramp up fully to the upper limit (as I only see ~73-80 Mbps download reported instead of the 94 (and 27-29 Mbps upload instead of the 32 ) I typically see in this test). This mostly implies how most sppedtests with their run duration of ~10 seconds per direction are simply not thorough enough...