CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

With the Settings from Lynx.

1 Like

That is an amazing result for starlink. Would it be possible for you to try a more difficult test, like the flent rrul and tcp_nup and tcp_ndown tests?

1 Like

Hi,

Apologies for the long message, but I just wanted to quickly share my anecdotal experience with cake-autorate on a 5G internet, in case it helps with further development (or with tweaking the default parameters etc).

So, I have Verizon 5G Home Internet and OpenWRT w/ Cake installed on an Raspberry Pi 4 that sits between the Verizon router (not actually acting as a router) and my Wifi Access Point.

I initially installed Cake (without autorate) due to bufferbloat that easily seen with bufferbloat tests and empirically when playing online games requiring low latency (mainly Rocket League). As I configured the SQM to about half of the top download and upload speeds (so in my case to 50Mbps down and 5Mbps up, when the max speed should be double that), it solved the bufferbloat problem pretty well and made the online gaming enjoyable.

Then after learning about this project (Cake w/ adaptive bandwidth) I gave it a go (I believe it was v. 3.0.0) trying to get more out of my connection when needed. However, just by changing the min/base/max bandwidth settings while using otherwise the default parameters, the experience was quite horrible.

On bufferbloat/speed tests the upload and download speeds were much lower than without the adaptive features, and the emprical tests playing Rocket League were even worse, as the lag was worse than it has even been. It is possible that latency improved slightly, but since that wasn't a problem even at higher speeds I didn't pay too much attention to latency at this point.

As I monitored the logs during bufferbloat tests, I saw the the download and upload speeds were being adjusted extremely fast both up and down (especially down when bufferbloat was detected, with the recovery adjustments being slower than that of course), and since the bufferbloat issue was already sufficiently well resolved with fixed SQM bandwidth settings, it seems like these speed adjustments were made more or less randomly rather than due to a kind of bufferbloat that cand actually be controlled via SQM adjustments.

That also seems to explain why online games became unplayable, as the wildly fluctuating bandwidth speeds seem to throw the game (Rocket League) completely off, causing way more lag that the actual bufferbloat would have caused.

By adjusting the parameters that control how quickly the traffic shaper adjusts up and down, I was able to resolve a lot of the issues and both make Rocket League playable again, and to get good speed test/bufferbloat results with low latency and better upload and download speeds that I had before using cake-autorate.

Here are my current settings in case anyone is interested. I'm sure they are still far from optimal even for my situtation, but to me they work infinitely better than the default values.

Finally, thanks to everyone who have been working on this project, as when configured well, it can be really useful for variable speed connection like the 5G home internet I have.

min_dl_shaper_rate_kbps=25000
base_dl_shaper_rate_kbps=50000
max_dl_shaper_rate_kbps=85000

min_ul_shaper_rate_kbps=4000
base_ul_shaper_rate_kbps=5000
max_ul_shaper_rate_kbps=10000

dl_owd_delta_thr_ms=30.0
ul_owd_delta_thr_ms=30.0

dl_avg_owd_delta_thr_ms=90.0
ul_avg_owd_delta_thr_ms=90.0

shaper_rate_min_adjust_down_bufferbloat=0.998
shaper_rate_max_adjust_down_bufferbloat=0.90
shaper_rate_adjust_up_load_high=1.001
shaper_rate_adjust_down_load_low=0.999
shaper_rate_adjust_up_load_low=1.002
2 Likes

Thanks for sharing. Would you be willing to reset the log file, then run a couple of speed tests, then export the log file and upload it for us to look at?

Certainly. I can get it done over the weekend.

Do you want it to be run with the default settings instead of my customized ones, or does it matter for the data you are looking for? What about the logging parameters, anything extra that should be enabled to capture the full picture?

Also, as I upgraded from version 3.0 to 3.1 during my tests, I'm not 100% sure if that could also change the results from my earlier experience. I think the worst results I saw were with 3.0 before I started tweaking the settings.

I think I'd like to see how it performs with your settings. Maybe @moeller0 has some ideas as he's better in respect of this sort of thing.

I agree that is the most interesting starting point. As convenient as it would be if there would be no toggles required, at the moment autorate does not do any meaningful auto-tuning of the control variables. (Not because we/I would not like that, but more because that is a hard problem with no obvious solution, might be worth exploring though, now that the code seems relatively stable :wink: )

Got it. I will run the tests and upload the logs in a day or two.

@Lynx and @moeller0 Here is the log file: https://easyupload.io/mfv5ha

The file should cover the 3 times I ran the Waveform bufferbloat test back-to-back (first one has slower average speed b/c it started from 50/5 Mbps and my settings make the speed adjustments much slower):

Bufferbloat and Internet Speed Test - Waveform

Bufferbloat and Internet Speed Test - Waveform

Bufferbloat and Internet Speed Test - Waveform

I don't think any bufferbloat issues happened during these tests though, so I am not sure how useful this will be, as even the default settings might have worked fine during this test.

It might be that the issues I had seen before happened when my 5G Home Internet happened to use 4G LTE as a fallback either for the download or the upload link (or both), while now it stayed on 5G and seemed to be able to support even higher speeds than the maximum download and upload speeds I had configured it for.

Still, for my use case these results are excellent. Latency is very good for online gaming, and both the upload and download speeds increase much higher than I would be able to use without the adaptive bandwith functionality.

THANKS!

Here are plots from your log files:



Analysis: No noticeable rate reduction step occurred during the speedtests, indicating that the configured maximum rate stayed below the minimum rate of the link during the test. But we still see differences in delays between low and high loads separately for both directions:

Samples per reflector:
ReflectorID: 156.154.70.2; N: 295
ReflectorID: 156.154.70.5; N: 295
ReflectorID: 185.228.168.10; N: 294
ReflectorID: 208.67.220.123; N: 296
ReflectorID: 208.67.222.123; N: 295
ReflectorID: 9.9.9.9; N: 291
DL: maximum 95.000%-ile delta delay over all 6 reflectors: 10.665 ms.
DL: maximum 99.000%-ile delta delay over all 6 reflectors: 39.985 ms.
DL: maximum 99.500%-ile delta delay over all 6 reflectors: 43.425 ms.
DL: maximum 99.900%-ile delta delay over all 6 reflectors: 53.760 ms.
DL: maximum 99.950%-ile delta delay over all 6 reflectors: 53.760 ms.
DL: maximum 99.990%-ile delta delay over all 6 reflectors: 53.760 ms.
DL: maximum 99.999%-ile delta delay over all 6 reflectors: 53.760 ms.
UL: maximum 95.000%-ile delta delay over all 6 reflectors: 10.665 ms.
UL: maximum 99.000%-ile delta delay over all 6 reflectors: 39.985 ms.
UL: maximum 99.500%-ile delta delay over all 6 reflectors: 43.425 ms.
UL: maximum 99.900%-ile delta delay over all 6 reflectors: 53.760 ms.
UL: maximum 99.950%-ile delta delay over all 6 reflectors: 53.760 ms.
UL: maximum 99.990%-ile delta delay over all 6 reflectors: 53.760 ms.
UL: maximum 99.999%-ile delta delay over all 6 reflectors: 53.760 ms.

My guess is that a threshold of 40 ms might be a bit high at least under optimal condition on your link. But that test probably merits repetition under less fortunate conditions (or if conditions stay stubbornly great, just increase the maximum rate for the shapers to something well above your contracted rates to "force" some rate reduction steps*).

*) I am not saying you should do this permanently, but for testing only to see how robust your current configuration might be under less fortunate conditions.

3 Likes

Thank you, that is super informative!

I might be able to force suboptimal conditions by moving the Verizon 5G router away from the windows and to a location with a bad 5G signal (e.g. I was just able to force 4G LTE on my phone by putting in the fridge and running speed tests there, LOL).

Meanwhile, I reran the tests many times (until it looked like the achieved speeds were not increasing any more) with max speeds set to 1Gbps down and 100 Mbps up, far exceeding any realistic maximum speeds.

Logs are here:

The last of the waveform tests looked like this: Bufferbloat and Internet Speed Test - Waveform

Interestingly, these speeds do exceed my contracted rates, since I have Verizon's 5G Home with 100/10 Mbps speeds, while their more expensive 5G Home Plus offers the 300/20 Mbps speeds seen here.

Is this referring to the xx_owd_delta_thr_ms parameters? I had them at the default values of 30, but based on your analysis of the logs, I was actually thinking of increasing it 40.

Assuming I'm interpreting the data correctly, my reasoning is that I would like to eliminate false positive bufferbloat signals as much possible and thought that maybe the 30 ms threshold had been causing false positives before, leading to the "random speed adjustments" I had mentioned before, and what I think was the main issue impacting performance for online gaming.

And if I was able to eliminate false positives, then perhaps I wouldn't need to mess with the other parameters as much. Since if the connection ever actually changes from 5G to 4G, then I probably would want it to adjust the speeds faster than what my current parameters do. I just don't want sharp sawtooth patterns due to normal randomness in the individual latency measurements.

Does this make sense, or do you have a different opinion?

Mmmh, some mobile operators seem to play games with speedtests, and might not enforce your contracted speed when they detect a speed test... (the offered rationale is they want users to see how much would be possible with their open ended tarifs....)

Mainly to xl_owd_delta_thr_ms this is the measured delta delay threshold that autorate uses to detect actionable high latency. But since we switched to xl_owd_delta_thr_ms and xl_avg_owd_delta_thr_ms and shaper_rate_min_adjust_down_bufferbloat and shaper_rate_max_adjust_down_bufferbloat we have considerably more ways to express policies... still xl_owd_delta_thr_ms is the threshold rate reduction will engage at so is IMHO the first threshold to set.

What values you plug in, is really only your business, if you are happy with X it does matter not at all what I think. I can try to give explanation and rartionale for setting things one way or the other but in the end it is each network's admin that needs to taylor the policy to the local requirements/desires.

So the trade-offs are:
lower thresholds -> false positive bufferbloat detections:
unnecessary shaper rate reductions, that cost thoughput and might even throttle game control traffic noticeably

higher thresholds -> detection misses
the shaper rate is kept to high and with sufficient traffic this will clog the unmanaged buffers of the true bottleneck on the path, resulting in increased delay for all traffic (which typically is fine for bulk transfers, but interactive traffic like game control does not really harmonize well with that)

which point in that continuum you consider best for your use-case, is something you will need to decide for yourself, and if false positive bufferbloat responses are your main focus, then I agree that more relaxed thresholds are a decent way to express that policy to cake-autorate.

Well, some level of saw-toothing is unavoidable simply because at its core that is how the control loop operates, increase shaper rate slower than lowering it.... but sure you can change a lot of the dynamics of the controller, and I have no simple recipe to offer how to do that well (my own internet access link is stubbornly uneventful, I get my contracted rate pretty much all of the time, so a simple static sqm-scripts/cake configuration is all I need, and the best autorate can do is not get into the way :wink: ).

Oh that makes sense, just keep an eye on the increasing likelihood of extended true bufferbloat events with higher thresholds, but if you applications/games work well, then I guess you are fine.

Regarding your latest tests:


We see the delay increase during the 300 Mbps downloads, but not really crossing the threshold, so what you mostly see is that your ISP manages to deliver somewhat decent latency under load... if you disable cake-autorate completely, how well is your link usable anf whhat happens if you start bulk data transfers during playing a game?

2 Likes

Thank you again for the insights!

Based on this, I decided against increasing the xl_owd_delta_thr_ms threshold from 30ms to 40ms after all. The default 30ms would indeed seem to be quite a perfect balance between false positives and false negatives in my recent test data too. (I'll also drop the xl_avg_owd_delta_thr_ms back to the default 60 ms, as I don't really see a need to mess with that default either.)

(4G LTE is known to have higher latency than 5G and perhaps 40ms would have worked better there, but since I don't seem to be able to reproduce that behavior anymore and the 40ms was just a guess anyways, I'll start with parameters that can be tested and confirmed to work well with the current performance of my link.)

I'm also going to test increasing maximum DL and UL speeds closer to the observed speeds from speed tests, even if they exceed what I should be getting. I'll start with min/base/max 50/85/250 down and 5/10/20 up, with maximums still lower than where the first instances of bufferbloat adjustments occurred in the previous test, as I'm happy to give up some of the top speed in exchange for more stable speeds. In fact, since this too is designed to keep autorate out of the way, I too would otherwise prefer the static sqm, but I'm still a little too worried about if/when 5G starts having issues again.

Lastly, I'm still really happy with my custom settings for how quickly (meaning how slowly) it adjusted the speeds up and down, as that strikes a perfect balance between being able to transfer large files at high speeds and keeping the speeds stable for optimal online gaming experience. And unlike the other speed settings that were quite specific to my 5G connection, these customizations might actually be useful for others too, in case it provides a better "out of the box" experience for users with similar use case that includes low latency online gaming.

And yes, I can certainly test turning cake-autorate off during game play and try playing while speed tests are running to generate high loads on the background. I'll report back once done (may take until next weekend before I get to it). That is in fact how I started when I switched from Spectrum cable internet to Verizon 5G Home, and before I learned about Cake, and the gaming experience was significantly worse with 5G Home. Installing Cake (without autorate) made a big improvement, though it is possible that I had more flip flopping between 5G and 4G LTE than I have now, so I'm not longer sure if the experience will be the same as it was then.

Edit: May be worth mentioning, that before I started using Cake, I used the Verizon's 5G router as the actual router too. So, I wonder if that alone could have been causing more bufferbloat compared to my current setup where the Verizon router just has a DMZ set for the Raspberry Pi 4 that is now the actual router running OpenWRT? Both scenarios were using LAN (not wifi) connections to the gaming consoles etc.

1 Like

Please do not read too much into the default values, these are more or less values that work reasonably well for one single link and we expect some tuning to be required on each individual link, so having to change these a bit is "par for the course".

These operate in coordination with shaper_rate_max_adjust_down_bufferbloat if congestin is detected and the current delta delay estimate is >= 'xl_avg_owd_delta_thr_ms' autorate will use shaper_rate_max_adjust_down_bufferbloat as rate reduction factor, so the closer xl_avg_owd_delta_thr_ms gets to xl_owd_delta_thr_ms the more often you apply the larger `shaper_rate_max_adjust_down_bufferbloat' rate reduction step, or put differently the stronger the proportional response to delay will be.

This is mostly marketing, it is possible within the LTE framework to deliver low latency (carriers simply do not configure their system for low latency) and it is also possible to deliver crappy latency over 5G (and some of the most promising options for keeping latency low that 5G added are not used in normal carrier networks either). However, with the advent of 5G carrieres clearly acknowledge that low latency is a desirable and marketable feature and put more emphasis on it.

Make sure to also try saturating your link with non-speedtests, just to check whether you ISP might play tricks wih speedtests, that do not reflect normal performance and latency...

+1, by all means once you are happy with your config, post it here with a quick recap of your link, I am sure this can/will be helpful for others.

Sure, take your time! Establishing some sort of baseline is IMHO a good thing to do (and also something worth re-doing like very 12 months, sometimes ISPs change things for the better or worse, and it helps to check this occasionally).

Hard to say, I have zero insight in verizon 5G routers and their capabilities and limitations.

1 Like

Very knowledgeable answer again, thank you for that.

Good advice. And my decision to keep the 30 ms threshold instead of raising it to 40 was indeed based on looking at the data, and I just wanted to add another datapoint in support of the existing defaults by emphasizing how well they work for my link too.

That's good to know too, as it may imply that the 30 ms threshold might work well enough also in the scenario where my 5G connection drops to 4G LTE, so I might not even need to worry about it as much. (I'd need to be able test it of course to know for sure, but the fear of higher latency and higher jitter in 4G LTE was one of the reasons why I was considering increasing the threshold from 30ms as a way to reduce false positives.)

Will do.

As anecdotal initial results for my newest settings, I have already tested them with about an hour of online gaming during the normal load generated by others in the household, and it continued to perform excellently, which is the main goal of my tinkering. However, I was not monitoring what the actual load was and will need to perform more tests to see whether the ISP enforced speed limits are different for other types of traffic than the speed test traffic.

Cool. I will do that after some more testing with different load scenarios etc.

Consider stress testing with gaming whilst downloading a couple of random huge files. If it handles that it'll probably manage the light usage by household.

2 Likes

DEBUG; 2023-10-02-23:33:59; 1696289639.233951; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:34:09; 1696289649.245363; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:34:19; 1696289659.251098; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:34:29; 1696289669.263101; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:34:39; 1696289679.275226; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:34:49; 1696289689.287040; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:34:59; 1696289699.290944; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.
DEBUG; 2023-10-02-23:35:09; 1696289709.303365; Warning: The configured download interface: 'ifb4eth0' does not appear to be present. Waiting 10.0 seconds for the interface to come up.

So why does the interface not exist? Not set up yet?

Hi,

I'm running this app with the following fairly pessimistic settings and it's been able to tame the bufferbloat on the unpredictable beast that is Tmobile home internet.


dl_if=ifb4wan # download interface
ul_if=wan     # upload interface

# Set either of the below to 0 to adjust one direction only
# or alternatively set both to 0 to simply use cake-autorate to monitor a connection
adjust_dl_shaper_rate=1 # enable (1) or disable (0) actually changing the dl shaper rate
adjust_ul_shaper_rate=1 # enable (1) or disable (0) actually changing the ul shaper rate

min_dl_shaper_rate_kbps=18000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=50000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=120000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=6000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=7000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=16000  # maximum bandwidth for upload (Kbit/s)

Without SQM I see between 150-350 down and 8-25 up with an unloaded ping of ~20ms. I've been pretty happy with the service aside from having to use PBR and a VPS with a VPN to split tunnel traffic to allow some services to be exposed.

My issue arises when I try to use Parsec for RDP, sometimes it's alright, other times it's awful, and playing with the settings seems to help a bit but it's always such a headache. Additionally, regardless of ToD when heavy downloading occurs such as downloading a game on steam, VOIP suffers significantly.

I get D or F rating on waveform's bufferbloat test and can see similar behavior on speedtest.net etc. With this package I see A or B rating (30-60ms) which would be great... if it didn't reduce my max speed with even one device to just 20-30mbps down.

I tried to read the docs but am a little confused on the parameters available as defined in defaults.sh, I'd like to understand how I can tune this to prioritize download speed over the aggressive attempts at optimization of my WISP (immediately dropping it down to ~20mbps).

Does anyone else have a FWA ISP and have some settings they can recommend?

Are there any simple settings/rate multipliers I should adjust to nudge this package to prioritize download speeds over reducing buffer bloat? For upload I'd definitely prefer it maintain decent ping for VOIP and parsec.

Am I basically just limited to turning off DL sqm or adjusting these?

# average owd delta threshold in ms at which maximum adjust_down_bufferbloat is applied
dl_avg_owd_delta_thr_ms=60.0 # (milliseconds)
ul_avg_owd_delta_thr_ms=60.0 # (milliseconds)

Thanks!

1 Like