CAKE w/ Adaptive Bandwidth

The "CAKE w/ Adaptive Bandwidth" OpenWrt project addresses the application of the CAKE algorithm for mitigating bufferbloat to variable bandwidth connections such as 4G, 5G, LTE and Starlink.

An outcome of this project has been: cake-autorate - a bash script that automatically adjusts CAKE bandwidth settings based on traffic load and one-way-delay or round-trip-time measurements.

cake-autorate is intended for variable bandwidth connections and is not generally required for use on connections that have a stable, fixed bandwidth.

cake-autorate is primarily for use on top of an existing application of the CAKE qdiscs on the relevant interfaces (whether through the OpenWrt SQM packages or an alternative like cake-qos-simple - the latter supports DSCPs, which may be of particular value for connections that exhibit periods of lower bandwidth); cake-autorate then dynamically adjusts the CAKE bandwidth for those pre-existing CAKE qdiscs.

Alternatively, cake-autorate can also be used to monitor, log and analyse the achieved rates and latency in whatever resolution is desired of any connection (that is, it does not seek to alter the CAKE bandwidths of any pre-existing CAKE qdiscs).

cake-autorate includes the following features:

  • simple control algorithm to adjust shaper rate in dependence upon monitored load and latency to increase shaper rate as much as possible before latency begins to suffer;
  • can be used to adjust cake shaper rates on the fly or in a monitor mode just to monitor the underlying connection in very fine detail (latency and achieved rates);
  • highly configurable using a config file - many aspects of the underlying control algorithm or other functionality is configurable;
  • supports use of multiple ping binaries (ping, fping and tsping);
  • active reflector maintenance including rotating out bad reflectors and converging on an optimal set from any initial set provided;
  • utilises multiple states like running, idle and stall, with appropriate handling for each;
  • advanced logging and log rotation features; and
  • plotting tools for analyzing results Matlab or Grafana.

cake-autorate is now in a fairly mature state in the sense that the underlying control aspects, which have been found to work generally well, have not
changed in some time.

For convenience, an installer script is available.

cake-autorate has been designed for use on OpenWrt, but it can be run with only minor modification on other platforms like Asus Merlin too.

The cake-autorate Github repository resides at:

https://github.com/lynxthecat/cake-autorate

Current Status

Project History

The "CAKE w/ Adaptive Bandwidth" OpenWrt project builds upon the efforts of various experts on this forum spanning several years. For details concerning its early development, see these archived threads:

This is a new thread for continued discussion of the same.

4 Likes

@moeller0 to kick off some new discussion, I have been keeping myself busy recently with the following:

2024-03-09 - Version 3.2.0

  • This version focuses on reducing CPU usage.
  • Fold pinger parsing and pinger maintenance processes into main process. This involved a significant restructure of the code.
  • Improve IPC read efficiency of the main process.
  • Use buffered log file writes.
  • Replace costly regex with alternatives.
  • Other minor fixes and improvements.

The underlying control aspects and basic functionality in cake-autorate has not changed in some time now, and working on making the bash script more efficient seems like a good thing to do. With my sub 100Mbit/s 4G connection, I think my RT3200 already had plenty of breathing room, but giving it some more will not hurt. With the above changes and default settings (20Hz reflector response rate) and full logging, I'm seeing circa 18% total CPU use now - this is the total CPU use by all the processes whilst not in sleep mode. In sleep mode the total CPU use is around 2-3%. I'm keen to see how much room for improvement remains. I'd like to see if I can get the total CPU use on my RT3200 to below 10%.

I discovered using 'strace' that much CPU time was eaten up by byte-by-byte read calls and so generally endeavouring to get rid of the code responsible for those has helped significantly reduce CPU usage.

A quick question for you is the following. Right now LOAD lines include the shaper rates tagged on at the end (see here). Would it be so bad if they didn't? Would that break the present Matlab analysis script? I'd like to remove the shaper rates from the LOAD lines if possible because doing so would help me free up the CPU cycles eaten up by passing those through to the monitoring process.

1 Like

Not sure how you did that, but that was IOMHO a cleaner design, with these as separate entities...

Maybe it is time to think about translating the design to c, rust, or go, which should solve the some of the efficiency issues quite well...

I think the idea was to be able to plot the shaper rates even if a log starts with an idle/pinger sleep epoch...

Well you would not see the shaper setting in the condition above. Which does have some utility in making it easier to read the the graphs, but that is mostly a debugging exercise, so your call.

Probably not, and even if that would be trivial to fix (all I would need is a new style logfile)

Yeah, that indicates that it is time to move to some other language that does not require such work arounds to achieve good enough performance... (switching from independent processes to threads should work wonders here)

I'd welcome anyone porting cake-autorate to C (@Lochnair's tsping might present a nice starting point and springboard) or another speedy alternative language, but this is not something that I would want to embark on myself. My original motivation was to enable me to effectively leverage CAKE on my own sub 100Mbit/s 4G connection, and the bash implementation has achieved that for me. I've come to like bash a lot and how rapidly we have been able to radically change things to try out different approaches and add new features. I know that we haven't been changing the fundamentals or adding many new features of late, but I like how fast we can, if we want to. And finally I'm enjoying my present work in optimizing the bash code!

I personally think bash is just fine for many (perhaps even most) use cases. I mean 4G connections are mostly sub 100Mbit/s and so devices like the RT3200 have plenty of unused horsepower that can be leveraged, right? Those with higher bandwidth connections above 350Mbit/s will need to have much more powerful devices anyway, won't they? Also I think the default reflector rate of 20Hz is quite high - if that is reduced the CPU use decreases dramatically.

I noticed that the LUA spin-off has seen some recent development:

1 Like

I've also got an old branch where I'd attempted to port the Lua variant to C (since I knew better how that one works): https://github.com/Lochnair/sqm-autorate/tree/develop/c-port. I don't remember how well it worked, but might be possible to use parts of it for an cake-autorate fork.

As for alternative languages, from what I've seen it generally comes down to space. I had reasonable results using Rust, 872k binary for mips (388k after upx -9 compression, which shouldn't be far from what adding it straight into the openwrt build would do). Neither C nor Rust are the easiest languages to get started with though.

And yes the Lua variant has gotten some updates, but the fixed lualanes version that's required is still only in master I believe, so unfortunately still makes it a bit more difficult to get running still

I understand, however you now seem to be at the stage where you trade in simple (elegant) modular design for a monolithic design that avoids some CPU cost... and switching implementation languages might avoid this necessity :wink:
Other than that I am fine with bash...

Sure, but since the actual controller logic seems pretty stable this might be an interesting little project...

This is what I've converged towards:

So the main process:

  • starts/stops/replaces pingers;
  • reads from a FIFO and receives either reflector responses or achieved rate updates;
  • reflector responses are read and parsed in accordance with the chosen ping binary;
  • the usual latency checking, rate, and idle/stall logic is applied; and
  • if reflector interval health check is due, it applies the health check.

And there are separate processes for monitoring the achieved rates and maintaining the log file.

I found log file handling could be significantly sped up by reading in new log data and writing it out in 512 byte chunks.

This arrangement seems fairly logical in my mind at least.

I don't know about "little", but yes :wink:

Of course, there's nothing to stop anyone from ripping out the controller in the Rust port and building upon that. Most of the plumbing should be good, although using Netlink for the interface stats might've been a step too far :smile:

1 Like

The reduction in CPU usage in version 3.2.0 is significant, and as far as I can tell there is zero difference in performance, it works just as great as always.

1 Like

I really appreciate the feedback. Yes the day-to-day performance in terms of rate changes should be identical, and if not then there are issues needing ironed out.

I have been persevering to see how low I can get the CPU usage by modifying the code without altering the day-to-day performance and and this has involved significant restructuring of the code and consulting with bash experts on IRC from time to time. With default settings including the usual effective 20Hz reflector rate and minimal logging I am now down to circa 10-12% active and 1-2% idle CPU usage on my RT3200.

Most of the low hanging fruit (avoiding code that sets up byte-by-byte reads and redundant checks) has probably been mopped up already, but I think I might squeeze out another percentage or two.

For example, apparently double quoting variables results in excess work in most situations (can be easily verified by comparing setting x=${EPOCHREALTIME/.} vs x="${EPOCHREALTIME/.}") as bash then does extra work along the lines of expanding the string with escape characters (if I'm remembering correctly) inserted before every character (so string is doubled in size during processing), and so unless this is absolutely needed, removing the unnecessary double quotes throughout the code should also give a small gain.

3 Likes

I think this current approach makes perfect sense. The code is rock stable, it has run 24/7 for years now on my rpi4 system, and I have just installed it on my new x86 and it is the same story, it just works. The reduction in CPU usage will matter quite a bit for systems with less horse power I imagine, but I appreciate it too for the potential headroom it gives me. And free efficiency for same performance always seems a bit like a no brainier.

Excellent! Yes I run it 24/7 too, albeit I have been constantly tweaking it over the years. Out of curiosity, what settings do you run it with? Are you using the 'fping' variant? Am I right in remembering that you have a rather high bandwidth 4G/5G connection?

On my way to work right now, when I get home I can give you the details. running close to stock but just slightly looser with more allowance in Ms latency is the best for my line. And yeah, I have good connectivity to tower, but the local LTE network is strained, so the speeds vary a lot from say to night time. I am waiting for 5G to open up here during the next couple of months, then the script and my new router and modem will be put to the test, speeds sounds promising from the towers near me that already went online

1 Like

Interesting. I also run with more relaxed settings than the default. My bandwidth/latency compromise ended up looking like this:

dl_owd_delta_thr_ms=80.0 # (milliseconds)
ul_owd_delta_thr_ms=80.0 # (milliseconds)

dl_avg_owd_delta_thr_ms=150.0 # (milliseconds)
ul_avg_owd_delta_thr_ms=150.0 # (milliseconds)

So I tolerate a little bloat, but not too much bloat, in order not too sacrifice too much bandwidth.

Here are some illustrative tests:

  • without cake and cake-autorate:
  • with cake and cake-autorate (using default settings):
  • with cake and cake-autorate (using the above-identified less-aggressive settings):

Nim is another language that might be of interest. It offers modern language features that are much more convenient than C, is somewhat easier to get started with than Rust I think, and compiles to machine code through C as an intermediate, so should be able to produce embedded sized codes on all platforms.

Interesting, yet, if I start to play with that I will pick something boring with a predicted long future... rust due to it starting to seep into the Linux kernel is a decent candidate... (C and C++ obviously are as well, but I will only start down this route if it promises some fun, and while I respect C, 'fun' is not the quality I associate with it).

I mean, nim first appeared in 2008 so it's about twice as old as Rust and about as mature as Python was in 2007 :wink: but I understand it's not something massively popular and on the front page of every programming blog.

It is however kind of targeting embedded and realtime space, and mentions those on its front page.

1 Like

Hello everyone, on Sunday I will take pictures of my Starlink connection. Do you think this is as effective, ? and i see this
cake-autorate provides an installation script that installs all the required tools. To use it:

  • Install SQM (luci-app-sqm) and enable and configure cake Queue Discipline on the interface(s) as described in the OpenWrt SQM documentation

so sqm i put the value 0 and 0 i'm not sure and let work cake autorate
Thank you

i think make this for test just replace ifb4eth0 to eth1 ...

#!/usr/bin/env bash

# *** STANDARD CONFIGURATION OPTIONS ***

### For multihomed setups, it is the responsibility of the user to ensure that the probes
### sent by this instance of cake-autorate actually travel through these interfaces.
### See ping_extra_args and ping_prefix_string

dl_if=ifb4eth1 # download interface
ul_if=eth1     # upload interface

# Set either of the below to 0 to adjust one direction only
# or alternatively set both to 0 to simply use cake-autorate to monitor a connection
adjust_dl_shaper_rate=1 # enable (1) or disable (0) actually changing the dl shaper rate
adjust_ul_shaper_rate=1 # enable (1) or disable (0) actually changing the ul shaper rate

min_dl_shaper_rate_kbps=10000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=100000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=200000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=2000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=10000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=30000  # maximum bandwidth for upload (Kbit/s)

# *** OVERRIDES ***

### See defaults.sh for additional configuration options
### that can be set in this configuration file to override the defaults.
### Place any such overrides below this line.

# owd delta threshold in ms is the extent of OWD increase to classify as a delay
# these are automatically adjusted based on maximum on the wire packet size
# (adjustment significant at sub 12Mbit/s rates, else negligible)
dl_owd_delta_thr_ms=40.0 # (milliseconds)
ul_owd_delta_thr_ms=40.0 # (milliseconds)

# average owd delta threshold in ms at which maximum adjust_down_bufferbloat is applied
dl_avg_owd_delta_thr_ms=80.0 # (milliseconds)
ul_avg_owd_delta_thr_ms=80.0 # (milliseconds)

# Starlink satellite switch (sss) compensation options
sss_compensation=1
# satellite switch compensation start times in seconds of each minute
#sss_times_s=("12.0" "27.0" "42.0" "57.0")
#sss_compensation_pre_duration_ms=300
#sss_compensation_post_duration_ms=200

for you is the max result you are ?

i has put sqm like this is good ? on my starlink v2

config queue
	option enabled '1'
	option interface 'eth1'
	option download '0'
	option upload '0'
	option debug_logging '0'
	option verbosity '5'
	option qdisc 'cake'
	option script 'layer_cake.qos'
	option linklayer 'none'