CAKE w/ Adaptive Bandwidth [October 2021 to September 2022]

Did you make any further progress with this @anon98444528?

The script collection on GitHub seems a bit too expansive for my needs. I'm looking for a simple time-based switch of the settings: from 00:00 to 12:00 (AM period) I have 100mbit/s and from 12:00 to 00:00 (PM) I have 70mbit/s. So essentially I just want to run a script that changes the download/upload rate in the SQM config.

I'm willing to eat the memory cost of running crond 27/4, but I dont know how to change and then apply config rate for SQM. I guess I could just make 2 copeis and copy them over to current config with basic cp.

Looking at the script I see
tc qdisc change root dev $interface cake bandwidth ${shaper_rate_kbps}Kbit 2> /dev/null
command however, it does not make much sense to me. What "bandwidth" stands for? Upload? Download? Combined? Does the command only change the value but wont apply new settings or does it apply them instantly?

So shaper rate change is instantaneous. You just set the bandwidth for the appropriate interfaces (check output of 'tc qdisc ls') using that command you correctly identified from the code - one call for each interface.

Does your ISP really provide exactly those rates at those times?

So my understanding I dont need to restart any service to apply said changes?
tc qdisc ls gives me my current setup:

tc qdisc ls
qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth0.1 root refcnt 2
qdisc noqueue 0: dev br-guest root refcnt 2
qdisc noqueue 0: dev wlan0 root refcnt 2
qdisc cake 8015: dev br-wan root refcnt 2 bandwidth 48Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms noatm overhead 44 mpu 84
qdisc ingress ffff: dev br-wan parent ffff:fff1 ----------------
qdisc noqueue 0: dev eth0.2 root refcnt 2
qdisc cake 8016: dev ifb4br-wan root refcnt 2 bandwidth 48Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100.0ms noatm overhead 44 mpu 84
  1. Is ifb4br-wan an "upload interface"?
  2. Why it is different then download one? Why is there even separation by interface for the download/upload settings? It seems extremely anti-pattern and counter-intuitive.
  3. Is it enough for me to just specify the rate if I want to change rate only or do I need to copy the rest of the settings: besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms noatm overhead 44 mpu 84
  4. What is refcnt 2?
  5. So the final command is
    tc qdisc change root dev br-wan cake bandwidth 68000Kbit 2 > /dev/null for download, what's 2 is for?

I'm not exactly sure about actual meaning behind the question. This is the plan's conditions and I assume the limits will be set at specific or close to specific time. Few minutes wont kill me.
If you are asking "is your ISP provides these speeds with 100% window" then I think the answer is obvious for any ISP in any country. However, I doubt it is gonna be as unstable as mobile network so some occasional dips are okay/expected (hence I dont think autorate script is really relevant to my case and it probably will hurt more than do good).

Hard to tell from the outside. What does ifstatus wan | grep -e device return?

Because this is how Linux interfaces operate; each interface will only allow to instantiate qdisc (like cake) on its egress side. So we need to play some tricks to get a qdisc instantiated in incoming/ingress traffic. The trick sqm uses ist to use an intermediary functional block device short IFB for that purpose, the IFB is essentially an abstraction that allows up to copy all incoming packets from the true ingress interface to, and this IFB will then send these same packets to the kernel, but sending happens in the IFB's egress side where Linux will allow us to instantiate a qdisc like cake.

You can call this an anti-pattern if you will as well as counter-intuitive (I agree to the latter), but that is a discussion you need to have with the upstream linux kernel network maintainers, sqm is no real position to change that.

IIRC just specifying the rate is enough, after all you instruct a change and you specify what to change....

That means reference count 2 which probably means two other object hold references to the qdisc so these probably should not be deleted without first caring for those dependencies.

The important part is 2 > /dev/null which is described here in short it will silence any error output from that command.

1 Like
ifstatus wan | grep -e device
        "l3_device": "br-wan",
        "device": "br-wan",

I knew what dev/null redirection means, I just thought 2 was part of the previous command argument.

1 Like

Thanks that does not help much but it seems to imply that ifb4-br-wan is your ingress/download interface while br-wan is your egress interface.

No. A combination of other interests taking priority, wanting to see how the AQL/ATF debugging turns out, and my initial inspection indicated sch_arl will likely need changes for my intended use will make this slow going. As I mentioned above, measuring RTT may not be what I'm looking for - tweaking how a wifi AP adjusts rates may be where I end up. I can already demonstrate that by slowing clients down a bit, I seem to fix some issues.

If I was to go forward with passive measurement of RTT's, I think I would use pping to make an assessment and then look at how to implement something closer to pping in sch_arl. i.e. pping apparently can measure RTT at arbitrary "capture" points while sch_arl seems to be built for an end point.

If your interested in moving sch_arl forward, don't wait for me - I could take years (and never finish).

HTH

1 Like

I haven't had much time to do much testing with the latest version of the Starlink autorate code (though I am running it), but I did spend a little time looking with more detail at my previous flent runs and found a couple interesting things. I feel like this information could better inform how the autorate script is set up specifically for Starlink, although I'm not exactly sure what to change to improve it.

My observations are:

Starlink's download bandwidth fluctuates wildly, going from < 20 Mbps to > 120 Mbps in a matter of seconds. We already knew this. But looking at these flent graphs it is very obvious that the download bandwidth moves inversely to the ping latency. I'll have some examples where they are practically a mirror image of each other. This means there should be a lot of improvement from the autorate script, if the correct parameters are dialed in.

I had already found that sudden changes in latency occurred at those 15 second Starlink transitions, but flent makes it clear that sudden changes in download bandwidth also occurs at those same increments. There are some 15 second windows where the average download bandwidth is over 100 Mbps but the latency is fairly low. There are also 15 second windows where the average download bandwidth is < 10 Mbps and the latency is spiking to more than 300 ms.

It makes me wonder if, for Starlink, basically treating each 15 second block as a separate autorate period would be the way to go. Reset bandwidth back to the base shaper rate at the beginning of each one and be prepared to throttle back hard if the latency starts to spike? Not sure.

Of course then there is the question, as @dtaht brought up, how does Starlink decide how much bandwidth to give you in the next 15 second cycle (I'm making an assumption that it decides that every 15 seconds but that's just an assumption)? There likely is a fairness algorithm; will the autorate interfere with that?

Anyway, here are some graphs so you can see what I'm talking about. This was done with no autorate so that doesn't interfere, it is just using fixed CAKE with 200/30 down/up rate set so there shouldn't be many packets CAKE is dropping.

1 Like

Thanks a lot @gba. Hopefully others can give their thoughts on the extra and helpful looking data you have provided.

So how about in addition to setting the upload shaper rate to minimum, we set the download shaper rate to the base rate? The base rate is intended to be the safe harbour bandwidth. It's not the minimum and is what we always decay back to absent load. So I think it makes sense to go back to base given your indication that bandwidth seems to kind of reset.

@moeller0 given what @gba states above this seems like the best we can get, or?

That could be achieved by just adding in the appropriate case statement:

case $load_condition in

		# Starlink satelite switching compensation

		dl*sss)
				shaper_rate_kbps=$base_shaper_rate_kbps
		;;

		ul*sss)
				shaper_rate_kbps=$min_shaper_rate_kbps
		;;

I've got a question. If you enable sqm on upload only, does hardware offloading work on download?

As far as I can tell sqm does not interfere with "hardware offloading" at all, rather the opposite, packets being handled by hardware offloading will not be visible to and hence not be handled by sqm.

@gba any updates? Would you like me to implement the change above regarding setting download rate to base rate in addition to setting upload rate to minimum on satellite switch times?

Unfortunately I haven't had much time to do testing this week, although I've been running with it and I haven't noticed any issues. If you want to make that change I think it would be a good one to test, I think I would probably be able to run a few tests this week comparing the two versions.

OK, ready with this commit:

1 Like

I used my free day to deploy some cake today and I must say it works.

What I don't yet figured out how exactly does sqm and cake autorate "work" together. I can already see improvements regarding bufferbloat when running sqm only. Also I'm not sure how the "base" settings I have in sqm influencing cake autorate.

My bufferbloat rating without sqm and/or cake autorotate is typically between C and F

With sqm and/or cake autorotate it will be rated usually A or A+


But let's go in to detail of my settings quickly:
image
image
image

Specially the last setting I'm unsure about. My uplink is LTE and heavily depending on weather. Now in the dry session we are blessed with something like 10mbit/s down and like 20mbit/s in upstream.

In particular how these settings influence the cake autorotate config which are as follow a first try:

min_dl_shaper_rate_kbps=1000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=2500 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=10000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=1000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=2500 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=10000  # maximum bandwidth for upload (Kbit/s)

When running the script I can see how it toggles between idle, low and high.

The responsiveness of "normal" web :surfing_man: increased but at the same time streaming videos now tends to interrupt with buffering which weren't the case before.

Guess I need to tune the minimum bandwidth a bit upwards. Also any other suggestions?

So autorate is taking over the task of setting the shaper rate for (up to two) cake instances, sqm itself is only used for setting up the rest, like the IFB for ingress traffic and hotplug handling. autorate does not really need to work with sqm, all it needs are two cake instances (sqm is just one way of setting that up conveniently within OpenWrt). But autorate will not touch any setting other than the shaper rates so the "base settings" will still fully control cake's behaviour in their respective feature-dimensions.
If your link is more or less fixed rate you can get away with just configuring static shaper rates in SQM and you are set, if however your link is more variable rate (be it because it is wireless and depending on atmospheric/RF noise conditions link speed will vary, be it that in a shared segment like DOCSIS or LTE the relative load generated by other users affects the capacity share available for your link) then autorate can really save the day.

These then would be decent candidates for your max_XX_shaper_rate_kbps settings if you know that is the maximum you will ever get.

Probably, in a sense the min_XX_shaper_rate_kbps settings should be understood as, below this rate I do value throughput over low latency. If you link routinely drops down to to the minimum that might explain your streaming problems.

There are also other parameters to potentially fine tune, the first of which would be the threshold parameter.

2 Likes

I just set up a new OpenWrt installation for our neighbours who also have LTE (currently experiencing cell tower issues) with rates around 1-10Mbit/s, and the bash CAKE-autorate just worked rather well out the box with no adjustment of the defaults other than the min, base and max rates. They were experiencing latencies of up to 2000ms and now it is kept well in check circa 50-70ms despite the capacity fluctuation. It was the first time I tried installing this tool on something other than my own setup.

@MangoMan if you set:

output_processing_stats=1

and grab some data during a speed test we can have a look and see if it's doing the right thing and whether the defaults look like they can be improved.

Earlier on in this thread @moeller0 gives a great way to test RTT to help with verifying a good delta to work with for delays.

2 Likes

incoming :train2:

with this settings:

min_dl_shaper_rate_kbps=2500  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=3500 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=12000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=2500  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=3500 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=14000  # maximum bandwidth for upload (Kbit/s)

and doing a libre speed test :running_man:

image

And log output find to be in this paste :point_left:

That makes sense and probably is in need when doing things like video streaming in my case. Raised that now from 1mbit/s to 2.5mbit/s which looks like a compromise to me at this stage.

I also fitted the sqm settings max to the cake auto rotate max values now:

image

And any one wants to drop a line about the "Link Layer Adaptation" settings in sqm?

image

The wiki doesn't mention what values would be "good" for a LTE connection :thinking: