CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

Great, thanks for your work on this! I updated to the latest commit and your sleep fix definitely made a difference. Now when it gets above 100 Mbps down one core is still getting 90-100% CPU but the other 3 are more like 60-70%, so there is a little breathing room on my little router. The bandwidth while waveform is sometimes showing above 130 Mbps which I didn't see before with cake-autorate running, although once it settled down it ended up right around 100 Mbps download. I'll have to try it earlier in the morning sometime when Starlink tends to be a little faster, but of course 100 Mbps is not a problem at all if we can keep latency down. Previously it seems like the CPU was limiting the speed when cake-autorate was running but now I think I might be getting the full bandwidth.

I'm still using fping. I'd like to try tsping once it is in the OpenWrt package repository.

I've just been letting cake-autorate run so in the latest run it had been going for 11 days before it got in that weird state. Do you think I should be restarting it every day?

Cool. Is that with the default 20Hz ping frequency (or a reduced frequency)?

Low latency 100MBit/s is surely all anyone needs. I'd be delighted with that, but my 4G tops out at 80Mbit/s at the absolute theoretical maximum without carrier aggregation.

You could radically cut down CPU use by setting:

no_pingers=4 # number of pingers to maintain 

And to work well with this:

bufferbloat_detection_window=4   # number of samples to retain in detection window
bufferbloat_detection_thr=2      # number of delayed samples for bufferbloat detection

Regarding stability, cake-autorate should, of course, just keep running in the background and do whatever it needs to do to manage the pingers to keep going. My latest round of testing and fixes, and with your help with testing, we can hopefully iron out any remaining issues.

And so with respect to:

Please don't - please keep checking that all is well and let me know if it goes into a bad state hopefully with log covering the period. If you have spare router memory please significantly increase the log duration to as long as possible subject to memory, keeping in mind that the maximum memory footprint will be 2x the maximum size set given rotation and retention of the previous .old log file (or perhaps you can use USB?) - I have set:

log_file_max_time_mins=60  # maximum time between log file rotations
log_file_max_size_KB=50000  # maximum KB (i.e. bytes/1024) worth of log lines between log file rotations

However, for the long term, we can as a failsafe have the procd service respawn cake-autorate instance(s) on failure. But of course that's no good for when we are trying to identify and iron out any issues.

I've made an attempt to address it here: https://github.com/Lochnair/tsping/commit/6ac8401cd906c439d990666cb57d827288dfc92a

Not sure if this is a correct way to handle it though.

Excellent! Also, we haven't quite bottomed out the timing discussion yet.

As I understand it, right now tsping adopts the following approach:

But after the last reflector ping, rather than just sleep, we have target + sleep. I think it should just be sleep if we keep the existing approach - as I wrote here:

As @patrakov identified on GitHub:

the present implementation is arguably broken if one pinger is used - giving an infinitesimal spacing.

As I wrote earlier, I like the fping approach.

Regarding time interval, fping uses:

−p, −−period= MSEC

In looping or counting modes (−l, −c, or −C), this parameter sets the time in milliseconds that fping waits between successive packets to an individual target. Default is 1000 and minimum is 10.

−i, −−interval= MSEC

The minimum amount of time (in milliseconds) between sending a ping packet to any target (default is 10, minimum is 1).

So with:

fping ${ping_extra_args} --timestamp --loop --period "${reflector_ping_interval_ms}" --interval "${ping_response_interval_ms}" --timeout 10000 "${reflectors[@]:0:${no_pingers}}"

as I understand it, this means that ${reflector_ping_interval_ms} separates ping sends to a particular target, and ${ping_response_interval_ms} separates ping sends to any target.


Right now I have coded in cake-autorate the following.

For fping:

${ping_prefix_string} fping ${ping_extra_args} --timestamp --loop --period "${reflector_ping_interval_ms}" --interval "${ping_response_interval_ms}" --timeout 10000 "${reflectors[@]:0:${no_pingers}}" 2> /dev/null >&"${parse_preprocessor_fd}" &

For tsping:

${ping_prefix_string} tsping ${ping_extra_args} --print-timestamps --machine-readable=' ' --sleep-time "0" --target-spacing "${ping_response_interval_ms}" "${reflectors[@]:0:${no_pingers}}" 2>/dev/null >&"${parse_preprocessor_fd}" &

Still working, I haven't restarted. And yes, the around 100 Mbps download speed is with the default pinger frequency. I will note for you that I disabled the Starlink compensation feature for now as I noticed that the upload speed is better without it on, and Starlink may have been making changes as there doesn't seem to be as much bufferbloat in the upload direction anymore. Perhaps once I am using tsping I will experiment with this more.

1 Like

How about now? My tsping variant has been running 24/7 for quite some time now. All seems well so far with the code on the 'master' branch right now.

And how is CPU use on your EdgeRouter X with the MT7621A at max throughput on your Starlink connection?

On my RT3200 with the MT7622BV at max throughout on my LTE connection, I think there are still plenty of spare CPU cycles.

Here is an htop snapshot under saturating load at around the worst loading I could generate with many concurrent speed tests (I switched from 'ondemand' to 'performance' in terms of the performance scheduler right before the speed tests):

That's not too bad right?

It seems that we could potentially reduce CPU cycles a little bit further by sending data between processes in say 250 character chunks and then splitting up like so:

root@OpenWrt-1:~# time for ((i=80000; i--;)); do printf "%s\n" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600; 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -u "${fd}" -a a; done

real    0m42.387s
user    0m28.998s
sys     0m12.756s

root@OpenWrt-1:~# time for ((i=80000; i--;)); do printf "%-250s" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600;
 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -N 250 -u "${fd}" a; a=(${a}); done

real    0m33.782s
user    0m30.728s
sys     0m1.389s

That is, assuming the above test is applicable to the asynchronous write and read case.

It'd be helpful to see the output of the above two timed commands on a bash shell from your router.

root@OpenWrt-1:~# bash
root@OpenWrt-1:~# exec {fd}<> <(:)
root@OpenWrt-1:~# time for ((i=80000; i--;)); do printf "%s\n" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600; 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -u "${fd}" -a a; done
root@OpenWrt-1:~# time for ((i=80000; i--;)); do printf "%-250s" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600;
 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -N 250 -u "${fd}" a; a=(${a}); done

It's still running without a restart after about 6 days. I think in the past it has made it to at least 10 days, though, before something happened to get in that weird state. So I'll keep watching it.

When the download speed test gets around 100 Mbps I'm seeing one core in htop get to around 95% and the other 3 cores get to around 60-70%.

Here's the result of your benchmark on this slower CPU (fairly idle network at the time but cake-autorate was still running):

root@OpenWrt:~/cake-autorate# bash
root@OpenWrt:~/cake-autorate# exec {fd}<> <(:)
root@OpenWrt:~/cake-autorate# time for ((i=80000; i--;)); do printf "%s\n" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600; 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -u "${fd}" -a a; done
real    2m15.011s
user    1m43.416s
sys     0m31.544s
root@OpenWrt:~/cake-autorate# 
root@OpenWrt:~/cake-autorate# time for ((i=80000; i--;)); do printf "%-250s" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600;
 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -N 250 -u "${fd}" a; a=(${a}); done
real    1m59.650s
user    1m56.367s
sys     0m3.237s

So ya, that does seem like it would be an improvement. But I also wouldn't spend a ton of time making those optimizations if they are going to take you a while -- I know my current router is old and I'll upgrade at some point when I have time. Thanks.

1 Like

Thanks for the update and for running that test.

The improvement over 80000 iterations from 2m15.011s to 1m59.650s is not exactly earth shattering, and I suppose that this circa 10% efficiency gain associated with reading larger chunks rather than individual bytes would only be achieved for the fraction of time spent reading, and thus if implemented in cake-autorate would presumably give less than a 10% gain in terms of overall CPU usage? Albeit I am only guessing here, and I am not sure if things would scale this way in the real-world context and with multiple writers and readers. So the real-world difference could in practice be more or less.

The significance of the change seems like it may be device specific. Here is the same from my desktop:

:~$ time for ((i=80000; i--;)); do printf "%s\n" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600; 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -u "${fd}" -a a; done

real    0m4.453s
user    0m2.462s
sys     0m1.991s

:~$ time for ((i=80000; i--;)); do printf "%-250s" "DATA; 2023-05-25-08:15:30; 1684998930.549480; 1684998930.547750; 10; 8; 0; 0; 1684998930.529694; 94.140.15.15; 9; 25002; 25000; 1397; -2; 10600;
 11407; 17000; 8862; 5593; 30600; 0; 0; dl_idle; ul_idle; 20000; 20000"  >&"${fd}"; read -N 250 -u "${fd}" a; a=(${a}); done

real    0m2.223s
user    0m2.093s
sys     0m0.131s

That's a 2x speedup.

Even if the change is small, I suppose every little helps for the cases in which the CPU is saturated.

I'm waiting for @richb-hanover-priv to do some testing on his his old Archer C7. I am pretty sure that will require a reduced pinger frequency compared to the defaults. @richb-hanover-priv perhaps you could try defaults, and then:

no_pingers=3 # number of pingers to maintain 

Hmmm when do you think tsping would be available through the opkg.
it looks extremely promising however even with the explanation on how to compile it... it may be too complicated for me to the point where I'm starting to feel like I might have a learning disability :confused:

Is there an easier way to install tsping for one who doesn't have a linux VM configured to compile the files needed. I'm not entirely sure how to add a repository to openwrt or how to use the openwrt sdk.

Hopefully soon - @Lochnair?

Which device do you have again? I think it's more or less just a case of copying and pasting the commands that @Lochnair wrote on his readme into SSH client.

Unfortunately I don't appear to have enough hard drive space on the ssh client, I've forgotten to mention.

I'm using a Raspberrypi 4b 4gb~ (with about 60mb~ of space left on the sdcard)

I didn't need to go away this weekend, so I decided to have some OpenWrt fun...

By George, he's got it! (Apologies to Prof. Henry Higgins...) I reset my Archer C7 v2 back to factory defaults, upgraded to OpenWrt 22.03.5 r20134-5f15225c1e, configured SQM, then used the install process documented in the README. I set the interfaces and the rate limits, left logging options and number of pingers at defaults, and it worked fine. I'll submit a github issue with comments on the installation process, but...

Most importantly, cake-autorate seems to consume a small amount of CPU - when running as a service, it bursted to 100% (via htop) for a dozen seconds at startup and thereafter ran around 10-20% of the CPU. Even when running betterspeedtest.sh or speedtest.net, the CPU never got very high for very long. There were sporadic bursts of higher CPU utilization, but they didn't seem related to traffic.

NICE JOB!

1 Like

@Lynx, it seems I have somehow borked my router install and will need to set it up from scratch. Unfortunately ,this currently has higher priority than cake-autorate testing (and figuring out how to compile tsping). However that testing is still on my TODO list :wink: (also we move houses next week, so everything will be on hold anyway).

Question, did you actually move the different processes into different files (not sure whether you ever wanted to do so))? While it looks like busy work the consequence is that the processes in bash would be able to have meaningful names instead of all being called cake-autorate?
Alternatively, we might pass a dummy parameter to each function just containing the name so htop would report e.g.:
cake-autprate.sh monitor_pingers
cake-autprate.sh parse_pingers
Again this might already be the case...

I did not, but by default htop actually shows multiple entries for several individual processes anyway:

At the moment cake-autorate maintains a 'proc_pids' file in its run directory with the correspondence between the processes and their pids as follows:

root@OpenWrt-1:~# cat /var/run/cake-autorate/primary/proc_pids
parse_tsping_preprocessor = 32517
parse_tsping = 3482
intercept_stderr = 3355
maintain_log_file = 3381
monitor_achieved_rates = 3464
maintain_pingers = 3469
parse_tsping_pinger = 32518

Hi all

I have been running @Lynx commit d175bf9 from last weekend for the last week and cake-autorate has not crashed once. Previously, it was crashing multiple times a week but since updating, it has been working really well and had no issues. Still using your tsping binary.

CPU usage pretty good on my Linksys E8450, the lower load stats below is where cake-autorate was crashing before updating last weekend, the plot is last 2 weeks:

Off topic but a while ago I split up my cake autorate OWD plots to make them more readable, I just uploaded a new screenshot, check it out: https://github.com/bairhys/prometheus-cake-autorate-exporter

3 Likes

Well, my goal is to be able to see which 'function' is eating up how much CPU cycles from just looking at htop output....
This makes casual diagnostics easier and user reports potentially more informative.

@rb1 sweet! I'm super happy to read your report because I have been working hard to try to iron out any remaining stability issues. On my system things seem super stable now too.


@moeller0 is only requires another terminal on the side to cross reference against htop. Easily the most hungry processes are: 1) the main process; and 2) parse_tsping, which presumably relates to a mixture of the frequency of the reads and associated processing for each read.

We could cut down processing a little by switching from byte-by-byte reads to fixed-width reads. But I wouldn't know how big to set the width to. Mostly the lines are less than 250 characters, but what happens if the users set domains rather than IP addresses. Then the size could exceed 250 characters.

Any thoughts on that?

1 Like

That depends on the details, we really have records of different length and can have multiple records in the FIFO, so we expect the read to give us a full line/record, but what happens if say a 20 bute control record is followed by a 100 byte data record in the FIFO and we read 100 bytes?

So I would not go down that route....

Does this mean Lynx is gone/banned/smited?

I guess he decided to leave the forum. I still hope this is not a permanent decision....