CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

Yes, I actually have two of these now, got another one cheaply for the vacation home.
I do run OpenWrt on them, the biggest downside of doing this is that there isn't any "bridge" mode in OpenWrt for LTE, so you can't easily get the public IP on any downstream device. So to avoid double NAT I've slapped a static route for my local subnet on the NR7101 and called it a day.

Admittedly I just flashed OpenWrt on them without much thought, due to the warnings that a future firmware update could potentially lock down the bootloader, so switching to OpenWrt even over serial would be impossible.

This sounds hard to get right automatically*. However zoom/teams and friends have to deal with 'shitty' links already and hence try to allow quite some latency variability out of the box, so I am somewhat confident that you will find a setting that is acceptable for both use cases.

Alternatively we could think of a simple mechanism to switch easily between config files.

*) I all classification issues one important question is what are the consequences of false classifications. So punting that to the user to switch between different configurations ("profiles") at least makes sure the user is in control and needs to set a policy explicitly.

The other option is to simply download 4K material before viewing (youtube-dl comes to mind, or for example apple allows downloads of movies so you can simply start downloading and start watching after a sufficient buffer accumulated on-disk)

1 Like

Looks like nftables' logging capability could be leveraged to write out DSCPs:

https://wiki.nftables.org/wiki-nftables/index.php/Logging_traffic

And as you mention this could be read in and used to switch to different cake-autorate profiles like:

  • videoconferencing: tight latency settings
  • videostreaming: relaxed latency settings
  • normal: medium latency settings

But all of this adds yet more complexity.

I wonder if in some cases such complexity may warranted though. Like especially for really weedy connections like sub 10Mbit/s where finding settings that work for all use cases is impossible.

Seems quit cool like: ah ok, videoconferencing network activity detected so let's switch to the videoconferencing profile with tight settings. Or oh, just videostreaming right now so let's relax and let the buffers fill up.

Given the complexity here unless lots of users jump in and say they desperately need this I think I'll first try and see if I can make things work on my own connection (once I have outdoor antenna) without any of this.

On such a link I would not leave it to chance, but would like to switch between dedicated "VC" and "non-VC uses" profiles, to avoid having the heuristic misfire either within an important video conference or hurting video quality in a "streaming" evening (even though on such a link I would try to preload stuff before watching it).

But that means al users can now initiate such behavior switches. This can be nice if everybody agrees and plays nice.

I think having an easy way to switch profiles (could be as simple as shell script that symlinks a named configuration to the canonical name and restarts the autorate service).

1 Like

Hey Dave I'd be happy to write you some code in Julia to plot stuff. Pretty sure Julia JSON parsing will work fine. Hit me with a DM and then bug me every 2 days if you don't hear from me

possibly related to the total transmit power available to the tower
at max user density it may be limiting transmit power to the the towers transmit radio's

I have noticed the same with my setup;

Teltonika RUT240 Category 4 LTE Modem (using external antennas)
-bridge mode

Into the WAN port of an ASUS rt-n66u running latest OpenWRT w/luci-app-sqm and cake-autorate running on the WAN interface
-whos CPU routinely maxes out under full load with scripted shaping averaging somewhere around 10down/8up

the lua implementation barely hits 60% CPU load under the same conditions, but for some reason I can eek out slightly more bandwidth at the same "A" bufferbloat rating using the bash script implementaion

With no sqm enabled I can single stream a 4k video seamlessly; with dl speedtests reaching as high as 40-50 mbps
(even though I will get a "C" grade on bufferbloat with no SQM enabled, so obviously this would not be workable for ZOOM/ Teams calls...)

...but if at that same time, if I enable sqm + autorate script, it will shape me down to somewhere around 10mbps and that same 4k vid will be unplayable
(but, with the A bufferbloat grade I get shaped down to, all the VoIP and video conference stuff is awesome)

I use two (wideband) yagi antennas mounted right on top of eachother in a cross polarization pattern (required for LTE mimo), they have 5dBi of gain at 700mhz which is where Band 12 is that I use consistently from an ATT tower that is 12km away.

I do NOT have direct line of sight, I am at elevation, but there is topography between me and the tower +trees
-full fresnel zone blockage, totally NLOS (no line of sight)

I average -65 RSSI -10 RSRQ -95 RSRP and +15 SINR

There is a T-Mobile tower right next to that ATT tower and I can pick up Band 2 and Band 4 (1900 and 2100 mhz respectively) from the same distance with usable signal as well, slightly worse than the 700mhz band 12, but still well within the "green" signal zone. (my antennas have ~8 dBi of gain in this freq range)

So I would re-evaluate the thought that your other towers are "too far away" to use with good signal.
It will depend on how they are deployed and what direciton the cells emitting your desired bands are facing -obviously-, but at 4km away I would strongly advise pointing at them to see if you can take advantage of that carrier aggregation

I feel like toying around with this...
perhaps I can relax my sensativity a bit and get some more "usable" bandwidth out of the shaper

I wouldnt exactly call myself desperate...

But the use-case for this paricilar setup of mine would likely benefit from this kind of feature;

I hold events in remote locations that requires internet access for all event attendees for a few things.

  1. "official" live-streaming of the event (YouTube via OBS studio)
    -lots of people watching, we really want a nice smooth stream with good audio/ no buffering (doesn't everybody?)
    -one can see where SQM is going to help here

  2. an event related app that all event attendees have access too and will be using throughout the event.
    -this app requires internet access to function
    -it is essentially relaying low level CSV text data from the apps cloud based relay servers
    -almost nothing as far as bandwidth is concerned, besteffort is fine, very insensitive to latency spikes
    -BUT, if I have 100's of attendees app's trying to access this relay server... SQM is likely going to help prevent this traffic from buffering out the "official" live-stream

  3. Social media engagement baby
    -it is beneficial to provide "social media" access to all the event attendees
    -this is going to have a ...dynamic... impact on bandwidth utilization
    -I have no way of preventing anyone else from firing up their own livesteam(s) at the event so this needs to be managed somehow -from a bandwidth sharing perspective (and I kind of want to encourage attenees to engage on social media however they "like" too, if there is capacity overhead to allow it, nor reason they cant livestream too)

And those three things need to co-exist nicely on a cellular backhual (4g LTE) connection in a remote rural area that is likely 10km or greater distance from the nearest tower.

OpenWRT has been critical to my initial deployment of this "solution"

Teltonika RUT240 (bridged) -> ASUS rt-n66u (the "gateway" and core network switch) -> Ruckus Unleashed R600 AP's distributed for even 5g cell coverage

Two seperate SSID's:

-One for the event livestream feeds (remote wifi IP cameras) with highest airtime fairness priority

-Second for the event attendees with low airtime fairness priority
I am also considering rate limiting this SSID to 80% of the LOWEST observed throughput while using sqm-scripts + cake-autorate
(essentially, I "think" I want ~20% 'goodput' headroom above whatever my event attendees are allowed to saturate because I am not sure how much I trust cakes per host fairness)

Thoughts on that?
Do I really need to reserve headroom for my "mission critical" livestream flow if I have cake enforcing dual src/dsthost nat?

Last question:

Can multiple instances of cake and cake-autorate run on the same router, pointing at their own interfaces?

I am thinking of going multi-wan with multiple LTE modems using different carriers sims (ATT, T-Mobile, Verizon) and load balancing them all together inside one OpenWRT device.

Would I use three seperate instances of cake pointed at the 3 seperate "wan" interfaces?
Or would I just point cake at the "multiwan" interface that I think gets created when using the mwan3 package...? ...basically, is anyone here familiar with applying cake to a multiwan situation.

And finally...

A huge thank you to everyone involved in this autorate project, and everyone involved in the wider bufferbloat project as well!

2 Likes

Headroom always helps, other than that I think cake will hold its own pretty well, but you really should test it for your use-case. If you do please report success or failure back here and also on the cake mailing list over at https://lists.bufferbloat.net/listinfo/cake

For cake and SQM sure they can, for autorate not out of the box (as you would need to separate out files/logs per interface) but getting this to work should not be too hard either.

That is probably the cleanest way.

That should also work to some degree but this will add additional delay variance which will probably force you to relax the temporal thresholds to not fully tank throughput... but try it out, it might work well enough...

Given that my internet is faster than 2.4GHz WIFI I get bufferbloat on the AP (or laptop) not the WAN connection. Would it be proper to use CAKE with Adaptive Bandwidth on Access Point's 2.4GHz interface?

Btw, the cake graphs in this thread are great. How is that done, i.e how is the data extracted and what tools are used to plot it?

Yeah, this is rapidly becoming a thing, and even some people getting 10Gbps WANs and so gigabit switches on the LAN are becoming the bottleneck. I think this is a good reason to tag DSCP on game and video call traffic etc, and use switches with basic QoS built in... Switching fabric can switch usually Gbps between any two ports, so more than 1Gbps on the wan doesn't necessarily immediately bottleneck, but it's entirely possible for gigabit switches on the LAN to be the bottleneck these days... crazy.

Same here.

Short answer:

Yes; absolutely.

OpenWRT already applies this (fq_codel, right?) to the WiFi interface on certain supported chipsets. (mostly atheros chips if I recall)

Longer answer...:

I have symetric 1gig fiber for the house, that I am about to downgrade to 500/500 -probably even lower...

Not realizing the need for queue management, I spec'd my router based on simple NAT performance alone.

Mikrotik hEX rb750gr3, which will NAT 1gig all day long... with hardware offloading.

Turn off hardware offloading, and then, on top of that, throw cake at one of my 880mhz MIPS 32bit cores and, well...

100mbps is about the limit of cake performance, peaking the CPU load will start to induce its own latency above that.

I can actually get closer to 500mbps with fq_codel.... but I really want the per host isolation cake offers.

So, sucking up my 1gig fiber to the house pride, I ran cake @100mbps symetric.

Damn.

EVERYTHING got better.

No more buffering on the IP TV service (multiple HD tv's in the house, DirecTV 'Stream' service)
No more VoIP headaches
web pages much snappier on all devices

And; to your question...

Because I was now shaping the WAN port to a rate lower than my Access Points link rates, I never hit the buffers on the AP's.

I am running Ruckus R600 AP's on Unleashed firmware, which, although employing 'airtime fairness' algorithms, are still suceptible to bufferbloat when you saturate the airtime.

Unfortunatly I made my AP decision (which was between Ubiquiti and Ruckus at the time...) before I discovered the concept of bufferbloat.

One lives and one learns.

It seems to me, that given how many BAD BUFFERS -without proper queue managment- exist all along the general network stack; that it becomes advantageous to create an ARTIFICIAL BOTTLENECK at the location of your gateway router, EVEN IF you dont have an "actual" performance bottleneck right there.

Creating an artificial bottleneck that is actually the 'narrowest' bottleneck in the chain gives you COMPLETE queue control.

So long as this bottleneck is still sufficiently large enough for the bulk rate you actually NEED TO USE, you will never notice it being there, other than the impovement in round trip times.

And I am very quickly discovering that when round trip times are UNIVERSALLY "fast" (perhaps consistent is a better word?) that less throughput is actually required for a given use case.

1 Like

fq_codel is on by default on the mt76, ath9k and ath10k chips in openwrt. It's not as polished as I would like, and we've been trying to tune it up, the latest work was promising, shown here: AQL and the ath10k is *lovely* - #908 by amteza

nice to hear, and yea, if you can make sure your wifi never buffers at the router, the fq_codel on that is definately overbuffered compared to what cake can do.

My local ISP is bragging right now as being the first 'home' ISP to offer 2gig symetric fiber service.

As I am becoming aware of "the problem" with the internet right now, this really is quite funny.

In the commercials they are touting it as the "solution" to...

...DRUM ROLL PLEASE...

"buffering issues"

I really cant lol harder.

2gig service is going to make it "exponentially" worse.

FTP slow start is going to ramp all the way up to 2gbps before it drops a packet to signal rate reduction?

...but, that packet wont get dropped untill all the buffers fill up...

Throw some nice fat buffers in there and you can guarantee your latency sensitive packets will never meet an empty queue.

I really dont think many people understand how FTP "rate sensing" works.

Go ahead; make the pipe wider.

It ramps EXPONENTIALLY...

User: "How much bandwidth will my file transfer use?"

FTP Slow Start: "All of it."

User: "I didnt even tell you how big it was, though... what if I transfer a smaller file?"

FTP Slow Start: "All of it. And all of your buffers are belong to me too."

brainstorming a bit...

lets say I have a bunch of AP's with the "biggest buffers in the industry"

Lets also say, hypothetically, that I had an x86 processored OpenWRT gateway running cake at 1gbps

Now I am in that unfortunate situation where I will get to "experience the graduer" of those 'big buffers' on my barking dog AP's.

So, lets say I had a hypothetical PoE switch that was running OpenWRT...
(I suppose there is nothing stopping you from doing this on the x86 gateway given it has sufficient ports and processor capacity....)

Would it be a "good idea" to run for example, 4 instances of cake, concurently, each pointed at a different ethernet port on the switch.

Plug the AP's into these 'caked' switch ports.

And set the 'shaper rate' to something "just short" of engaging the AP's Big Beatiful Buffers.

Thoughts?

1 Like

Fun fact during a multiday absence with autrate running and automatic speedtests, I see that short speedtests seem not enough to ramp up fully to the upper limit (as I only see ~73-80 Mbps download reported instead of the 94 (and 27-29 Mbps upload instead of the 32 ) I typically see in this test). This mostly implies how most sppedtests with their run duration of ~10 seconds per direction are simply not thorough enough...

1 Like
2 Likes

Would it be a "good idea" to run for example, 4 instances of cake, concurently, each pointed at a different ethernet port on the switch.

I have also wondered this.

I'm not qualified to answer, but I can speculate.

Naively, I would think the theoretical "optimal case" (optimal - with respect to queueing, assuming infinite CPU resources) would be a single, omniscient instance of cake managing all traffic.

And that dividing up the same traffic among several instances of cake would reduce total computational overhead at the cost of less optimal queueing.

Since each instance of cake would be blind to what's happening in each other instance, queueing decisions would be dumber - but much easier to crunch.

But I'm just a lurker and know nothing about how cake works under the hood.
There are people on here much smarter than me who would know better. :man_shrugging:

Ideally for WiFi we would like the scheduler and AQM live inside the WiFi stack/devices. Running fq_codel or cake in unlimited mode on multiple interfaces is a well-supported configuration already, but running multiple traffic shapers gets costly quickly.

If you have a port on your router that is directly connected just to an AP, and this AP is dumb and doesn't do any kind of shaping/queue management, then it can make sense to have a cake instance sending towards this AP with a max bandwidth of ... whatever is typical for your AP depending on your channel and band settings... maybe 200Mbps or 400 Mbps or whatever.

This won't eliminate buffering, because depending on the radio connection your individual device may connect at different speeds, from 6Mbps to several hundred. So just moving around your house will change the speed for this device. Because of this variable speed connection that's why it's best if the shaper/queue manager is in the wifi device, who knows what connection speed it has to the other radio.

@exokinetic I have read your posts with interest and you seem to me like a perspicacious individual. I am sure you can contribute a lot to this thread and project. Perhaps your event management scenario presents a good use case for one of those local 5G deployments(?), but probably the latter would eat up any profits!

As you have identified the bash script control seems to work well, albeit its CPU use is not insignificant. I have worked hard to try to get its CPU use down using various techniques in the bash code, and at 20 ping responses per second on my RT3200 I think it consumes about 5% of the total available CPU usage. Dropping the number of ping responses per second by reducing the ping interval or number of pingers drastically reduces CPU usage. It could be that there are still ways to further optimize the bash code in terms of CPU usage, but I think I've caught most of the low hanging fruit (avoiding any external binaries, using inbuilt functions wherever possible, etc.).

As you have already tried the lua code, you may also want to try out the perl code linked at the top. That is far less CPU hungry.

It strikes me that your ASUS RT-N66U with its 600MHz CPU is not the strongest out there. I wonder whether you might consider upgrading to something a bit beefier like an RPi4 (I think this might be best in your case)?

The NR7101 (to act as my new outdoors antenna) has a MediaTek MT7621AT 880MHz CPU and the RT3200 (to by my main router) has a MediaTek MT7622BV1350MHz CPU. I think even the latter in your case might be suboptimal compared with RPi4, but others can probably better advise on that front.

Thank you for your comments about trying to see if I can reach those EE masts and leverage carrier aggregation. I will give that a shot once my NR7101 arrives.