CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

I've been running cake-autorate on an Edgerouter X running OpenWrt for maybe a year? I'm currently on 22.03.3. I think I tried getting it to work on EdgeOS at the very beginning but had some reason I switched to OpenWrt, I can't remember now what the problem was though.

1 Like

Ah sweet - @moeller0 to the rescue - I see what you mean now. So actually we should fold in the dl and ul baseline comparisons with their minimums to signed combination of the baseline comparison with minimum.

Should we still consider the ewma of the deltas separately? I don't think those will be affected by the rollover issue.

Have not thought about this yet, and I do not have enough brain cycles left today to think over it :wink:

Hey @gba are you using a fairly recent version of cake-autorate now? How's performance these days on Starlink? Could you share your latest config since I want to update the readme for Starlink users. Are you bothering with the satellite switching compensation? We never truly bottomed that out. Maybe that should be deleted if it's not offering any benefit now.

You might like to try the tsping iteration since it uses OWDs and will give higher performance for mixed for mixed download and upload. On the other hand, I'd understand: if it ain't broke don't fix it!

I think we should reconstitute the RTT for all delay sources and operate on the RTTs here (so one method works for true OWD sources like tsping and fake OWD (really RTT/2) sources like fping)...

I think the deltas should be immune, after all we remove the constant (changing) baseline exactly so we look at something more stable... BTW I am still amazed how well that heuristic works :wink:

1 Like

Which, the baseline one? That does work well. The reflector convergence in general seems to work well too. I guess the fundamentals of cake-autorate haven't really changed very much for some time now. Although of course actually using OWDs is a big change.

1 Like

Nah mostly just looking at the deltas to abstract over path differences between different reflectors... (not that the replacement heuristic not also seem to do the right thing.)
I am slowly warming to your idea about trying to make the shaper change somewhat proportional to the amount of delay increase... (one precondition is that all/most reflectors show pretty similar delataDelay behaviour/CDFs).

Yes, that is quite a big item! Now we need to get tsping integrated as normal part of the OpenWrt packages :wink: (after some more testing that is...)

I have been running the development commit on March 6 since then. And I have continued to use the Starlink compensation functionality. Although I haven't done enough testing to really determine how useful that is or not. Unfortunately (or fortunately?) I haven't really had many loads lately that are very latency sensitive so cake-autorate probably hasn't been strictly necessary for me. Case in point, I just logged in to check cake-autorate after not having done so in perhaps a month and I see that the script isn't running right -- it got in that state where all the pingers weren't working but the script was continuing to run. I have no idea how long it had been doing that, so I may not have actually been using cake-autorate for some time.

I just upgraded to the latest version and will monitor over the next couple days to make sure it stays stable. I'm just running fping for now. I will plan on switching to tsping once it has a package in OpenWRT but I don't have time to compile it myself at the moment.

Thanks for your work!

I guess one thing I notice right away is that I see in the changelog "significantly reduced CPU consumption" so I wanted to check that. I did a speed test while cake-autorate is running on the new version.

When the download is going and gets up about 75 Mbps down, my router's CPU is pretty close to pegging. I can see in htop that all 4 cores are around 85-95%. This is on the EdgeRouter X, which isn't a super fast modern router by any means. But this is mostly cake-autorate as if I stop cake-autorate and do the speed test, I'm getting around 85 Mbps down with 20-40% CPU usage.

Do you have any suggestions of anything to try to adjust? My config is pretty straightforward right now:

#!/bin/bash

# *** STANDARD CONFIGURATION OPTIONS ***

### For multihomed setups, it is the responsibility of the user to ensure that the probes
### sent by this instance of cake-autorate actually travel through these interfaces.
### See ping_extra_args and ping_prefix_string

dl_if=ifb4eth0 # download interface
ul_if=eth0     # upload interface

# Set either of the below to 0 to adjust one direction only
# or alternatively set both to 0 to simply use cake-autorate to monitor a connection
adjust_dl_shaper_rate=1 # enable (1) or disable (0) actually changing the dl shaper rate
adjust_ul_shaper_rate=1 # enable (1) or disable (0) actually changing the ul shaper rate

min_dl_shaper_rate_kbps=10000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=100000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=200000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=2000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=10000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=30000  # maximum bandwidth for upload (Kbit/s)


dl_delay_thr_ms=40 # (milliseconds)
ul_delay_thr_ms=40 # (milliseconds)


sss_compensation=1


config_file_check="cake-autorate"

I don't remember the CPU going so high on the older version but I'm not 100% sure, maybe it always did.

Does that use:

MediaTek MT7621A Wi-Fi SoC contains a powerful 880 MHz MIPSĀ® 1004KEcā„¢ dual-core CPU

https://www.mediatek.com/products/home-networking/mt7621

And is that with the scaling governor set to 'performance' and with 'irqbalance' installed and enabled? The former ensures that the CPU is not being scaled down (rendering CPU utilisation statics not so meaningful), whilst the latter supposedly spreads processes out over cores.

I would have thought that the CPU utilisation of cake-autorate shouldn't scale with download rate because the computation relates to processing ping responses, the rate of which is not altered by downloading or not downloading.

Calls to 'tc' may well be excessive and there would be more calls to that associated with rates increasing. If you set 'adjust_dl_shaper_rate' and 'adjust_ul_shaper_rate' to 0, does that alter the CPU usage significantly? If so we should limit the rate of 'tc' calls.

Otherwise, the easiest way to reduce CPU utilisation of cake-autorate is simple to reduce the frequency of ICMP responses. The default is 20Hz (6 pingers with 0.3s spacing each).

Maybe @moeller0 has some other ideas.

Yes this is MediaTek MT7621. irqbalance is enabled although I've tried it both ways and it seems similar. It doesn't seem like the cpu governor can be changed on this CPU unless I'm missing it. Shows it is set to "teo".

I tried setting 'adjust_dl_shaper_rate' and 'adjust_ul_shaper_rate' both to 0 but that doesn't change the CPU much. I also tried disabling logging and a small change perhaps but not much.

It looks like the trouble is with the frequency of pingers as if I set "reflector_ping_interval_s=10.0" (just for testing obviously) then the CPU is significantly reduced.

I'm not sure that the CPU utilization of cake-autorate is actually scaling with the download rate but it might just be adding on to the router's own CPU utilization for cake itself which does scale with the download rate. When my network is quiet but pingers are not idle then the CPU utilization on all 4 cores is around 30%. When cake-autorate is stopped it is about 1%. So cake-autorate just doing the pingers takes about 30% and then when I do the download cake itself adds 20-40%, and with process swapping that probably can explain my results.

So then I guess the question should be if I should adjust the ping frequency. My performance is certainly acceptable but it isn't ideal making the router's CPU bump up around it's maximum as that in and of itself could probably introduce latency.

I've just implemented the switch over to the sum of OWD baselines with this commit:

New REFLECTOR header and example data:

REFLECTOR_HEADER LOG_DATETIME LOG_TIMESTAMP PROC_TIME_US REFLECTOR MIN_SUM_OWD_BASELINES_US SUM_OWD_BASELINES_US SUM_OWD_BASELINES_DELTA_US SUM_OWD_BASELINES_DELTA_THR_US MIN_DL_DELTA_EWMA_US DL_DELTA_EWMA_US DL_DELTA_EWMA_DELTA_US DL_DELTA_EWMA_DELTA_THR MIN_UL_DELTA_EWMA_US UL_DELTA_EWMA_US UL_DELTA_EWMA_DELTA_US UL_DELTA_EWMA_DELTA_THR
REFLECTOR 2023-05-09-19:37:32 1683657452 1683657452 9.9.9.9 45151 45164 13 20000 553 553 0 10000 7955 8860 905 10000
REFLECTOR 2023-05-09-19:37:32 1683657452 1683657452 9.9.9.10 45151 45151 0 20000 553 830 277 10000 7955 7955 0 10000
REFLECTOR 2023-05-09-19:37:32 1683657452 1683657452 9.9.9.11 45151 45516 365 20000 553 572 19 10000 7955 8669 714 10000
REFLECTOR 2023-05-09-19:37:32 1683657452 1683657452 94.140.14.15 45151 51597 6446 20000 553 1648 1095 10000 7955 10885 2930 10000
REFLECTOR 2023-05-09-19:37:32 1683657452 1683657452 94.140.14.140 45151 51709 6558 20000 553 1068 515 10000 7955 8512 557 10000
REFLECTOR 2023-05-09-19:37:32 1683657452 1683657452 94.140.14.141 45151 52180 7029 20000 553 1567 1014 10000 7955 9494 1539 10000

@gba it seems apt to test with a reduced ping frequency. You could simply reduce the default number of pingers from 6 to 3 or keep them the same and increase the interval from 0.3s to 0.6s.

Would this require TSping to be installed in order for it to work?

No we have retained backwards compatibility with both fping and iputils-ping.

But tsping is better because it allows one way delays (OWDs) to be determined, facilitating better performance.

You might like to give it a try.

tsping is available here:

And to use it:

These MIPSen are getting long in the tooth... sure its a dual core with 4 threads, but each individual thread simply is not that powerful. It shows that the MIPS cores themselves have not seen major rearchitecting in the last decade, IIUC 1004K is really only little more than 24K with SMT and SMP.

Yes, I guess that is to be expected from your CPU, I am amazed that cake only adds 40% I would have expected cake @100Mbps to fully max out one CPU...

I would consider switching routers :wink: (I understand that that might not be a realistic option).

BTW, could you do a test with the old version again just to check whether current cake-autorate got leaner on your hardware as well?

Presumably Starlink warrants something more powerful like an RPi4.

image

I like what @schubsi did with putting it into a drawer.

For me that would definitely help with the Wife Acceptance Factor (WAF).

Agreed! Let me find the best commit.

Here are the files with the commit right before the huge restructure to use FIFOs rather than temporary files:

@moeller0 does this wording seem OK to you:

# tsping employs ICMP type 13 and works with timestamps: Originate, Received, Transmit, Finished such that:
# dl_owd_us = Finished - Transmit
# ul_owd_us = Received - Originate
#
# The timestamps relate to milliseconds past midnight and hence timestamps can rollover at the local or remote ends,
# and the rollover may not be entirely synchronized.
#
# Such an event would result in a huge spike in dl_owd_us or ul_owd_us and a lare delta relative to the baseline.
#
# So, to compensate, check for delta > 50 mins and immediately reset the baselines to the new dl_owd_us and ul_owd_us.
#
# Happilly, the sum of dl_owd_baseline_us and ul_owd_baseline_us will roughly equal rtt_baseline_us.
# And since Transmit is approximately equal to Received, RTT is approximately equal to Finished - Originate.
# And thus the sum of dl_owd_baseline_us and ul_owd_baseline_us should not be affected.
# Therefore in maintain_pingers(), working with the sum of dl_owd_baseline_us and ul_owd_baseline_us is
# unaffected by the above-described rollover compensation.

@anon58727419 made the point on GitHub that I should really comment these changes because in the future we might forget what on earth they relate to.

"past midnight UTC" so conceptually the clocks should be reasonably in sync...

However in reality it seems not uncommon to return local time past midnight, so roll-overs can truly happen any time... (nobody is going to fix their ICMP type 13/14 responses no matter what is documented, so we should accept what we get and not be overly fussy :wink: )

Maybe add "... for the reflector recycling check..." or similar?

Yes, adding meaningful commit messages is a great idea :wink:

1 Like

I just realised that the latest cake-autorate code suffers from a flaw, and I have already thought through and coded up an attempted fix.

With monitor_achieved_rates() and maintain_pingers(), for each loop we:

  • read from the fd pointing to an anonymous FIFO for that process with timeout set based on time until next tick
  • process any IPC command sent from other processes
  • process the regular commands for that tick on timeout

This suffered from a significant flaw in that a timeout could result in a partial read and hence data loss in a command sent from one process to another. Mostly that's fine since the data is non-critical, but some data like CHANGE_STATE is critical, e.g. to start the pingers*.

Another flaw relates to the timing given the use of the timeouts - namely I'm not sure that the tick intervals were always correct.

So I think a better approach is as follows.

Firstly, consider the bash "-t 0" functionality:

This allows a determination to be made as to whether there is data in the FIFO ready to be read.

Making use of this, I propose the following new sequence for each loop:

  • read with timeout of 0 to see whether data exists in the FIFO
  • whilst data exists in the FIFO, read it in and process each outstanding IPC command
  • once data no longer exists in the FIFO, process the regular commands for that tick
  • sleep until the next tick

This modified approach (which I am testing here) seems to benefit from slightly reduced CPU use as well.

*) This could result in a bad situation in which when moving from IDLE to RUNNING the pingers didn't start and the code then went into STALL. And during STALL we just wait for new ping responses or increased load. And increased load would transition from STALL to RUNNING, but then just STALL again given no ping responses.

Does this all seem reasonable to you @moeller0?

I also just realised that if I kill 'tsping' and then start it again, I can end up with a partial line write from a previous line write getting mixed in with a new line write, e.g. see double timestamp in the following:

DEBUG; 2023-05-16-13:28:31; 1684240111.379680; REFLECTOR_RESPONSE 1684240105.316459 1684240111.376439 94.140.15.15 0 44911330 44911352 44911352 44911376 46 24 22

but I can address this appropriately.