CAKE w/ Adaptive Bandwidth [October 2021 to September 2022]

moeller0 · June 22, 2022, 5:43am

I would probably install and run gping on one of your end devices and simply monitor the RTT to say 8.8.8.8 for a minute of idleness, a minute of upload saturation and a minute of download saturation each individually* take screenshot of each and post them for discussion here....

*) gping helpfully includes some summary statistics, which are only really interpretable if the three conditions are recorded separately, for the saturation plots it is fine (actually preferable) to have a few seconds of idleness before and after the load just not too much to swamp the statistics reported.

gba · June 22, 2022, 7:38pm

OK, I made some plots with SQM disabled.

First, idle:

gba · June 22, 2022, 7:39pm

Download (with some idle at the beginning and end):

gba · June 22, 2022, 7:40pm

Upload (the ending flatish bit is idle bandwidth):

Thanks.

moeller0 · June 22, 2022, 8:08pm

Thanks, this is pretty bleak. I guess I would start with delay_thr_ms=55
This is quite a lot, but will still catch an occasional idle-latency spike (and hence throttle down towards the minimum rates). This will apply on top of the base so will result in around 55+37 = 92ms but that seems required to keep the "jumping" baseline in check (the baseline adaptation code will be too slow to adapt to the steps seen in your idle plot, and that slowness is on purpose, so these jumps probably need to be accounted for in your delay_thr_ms setting).

I guess I would start with that and look at the debug output while the network is idle. If the rates more or less stabilize at base_dl_shaper_rate_kbps and base_ul_shaper_rate_kbps (you do not want to overdo this, occasional deviations to lower rates at idle are IMHO OK as long as these are rare enough to tolerate for you). Then you can play with delay_thr_ms and

# bufferbloat is detected when (bufferbloat_detection_thr) samples
# out of the last (bufferbloat detection window) samples are delayed
bufferbloat_detection_window=4  # number of samples to retain in detection window
bufferbloat_detection_thr=2     # number of delayed samples for bufferbloat detection

to fine-tune the sensitivity. As far as I can tell you are the first starlink user posting, so this is exciting, but also unkown territory.

(Caveat emptor: I only have a rock solid VDSL2 link that stubbornly delivers a reliable fixed rate, so unlike @Lynx I am not running this myself and are arguing from first principles, so take my advice with a sack of salt please.)

Lynx · June 22, 2022, 8:19pm

Seems like a good suggestion.

For these you may want to try something like 8 and 5.

And if you can show us lines from the debug output right before and during saturation of download that would be ace. Either you can post here using the code snippet facility or using pastebin for larger segments of data.

gba · June 22, 2022, 9:15pm

Thanks. It has been great how much faster Starlink has been than my old/slow DSL, but it is just so hugely variable, so this adaptive bandwidth for CAKE project seems ideal.

I tried the config:

delay_thr_ms=55

min_dl_shaper_rate_kbps=10000 # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=50000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=300000 # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=2000 # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=8000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=40000 # maximum bandwidth for upload (Kbit/s)

bufferbloat_detection_window=8 # number of samples to retain in detection window
bufferbloat_detection_thr=5 # number of delayed samples for bufferbloat detection

and ran a download saturation twice.

Here is the results from the first run:

and the log corresponding to this time period:

https://pastebin.com/23L7qVHW

gba · June 22, 2022, 9:16pm

And here is the second run (sorry it's not letting me put more than one image per post since I'm a new user to this forum):

and the log:

https://pastebin.com/pviz4Zv5

The first run averaged 24 Mbps for the 60s download, the second run averaged 34 Mbps for that time period. I tried these both around 4pm so it is likely that Starlink is getting congested for the day. I can try running it again in the morning which would likely have faster speeds. Sometimes I can get over 200 Mbps from Starlink but sometimes its like 10 Mbps so it is hugely variable.

I'm not exactly sure how to read the debug log so if you have suggestions of tweaks I can make to the config variables, I'm all ears! Thanks.

Lynx · June 23, 2022, 5:59am

Will study later and make some Excel plots as have a hearing for work today. But first impression is that the code is working and the parameters need optimisation.

What's your router hardware? It looks like the rate of change of bandwidth may be pretty high so I'm thinking having high granularity may be useful. I see lengthy ascents followed by a huge chop.

The average and p95 values for RTT were considerably lowered.

Your thoughts @moeller0?

moeller0 · June 23, 2022, 6:24am

I agree, it looks like the code is working already, albeit the residual RTT fluctuation is still high, can't say anything about the rates....

Good point, might be helpful to increase the increase and decrease factors a bit to make the code more reactive (which might introduce more undesired overshoots and oscillations, so change these carefully).

gba · June 23, 2022, 4:13pm

I've been playing with this more this morning, as the Starlink speeds are a lot better in the morning when things aren't as congested. I thought I would start with some baseline data about how Starlink performs.

Here is a plot when things are idle and no SQM enabled:

gba · June 23, 2022, 4:14pm

And here is a 60s download from fast.com with SQM disabled. The plot is 70s long so there is some idle time at the beginning and end:

These ping spikes are quite short lasting.

gba · June 23, 2022, 4:23pm

I also wanted to try just using CAKE without the adaptive bandwidth script, with different bandwidths set, in order to make sure it will be possible to control with the autorate script.

Here is a table of the results. This was using Waveform. First column is the target download/upload speed set in CAKE. I repeated each run several times. Then there is the unloaded 25% and 95% latency in ms, then similar for download, followed by download speed in Mbps. Then repeat for upload. Waveform then scores it based on how good they think the results are:

CAKE speed	Run	Unloaded 25%	Unloaded 95%	Down 25%	Down 95%	Down Speed	Up 25%	Up 95%	Up Speed	Score
300/10	1	41	55	52	94	83.5	43	67	5.3	A
300/10	2	41	66	89	253	113.9	56	85	6.6	C
300/10	3	49	68	90	216	116.2	57	80	9.7	C
300/10	4	52	92	127	296	76.6	56	95	3.7	C
300/10	5	48	76	61	150	66.3	52	108	3.5	A
300/10	6	39	59	58	156	88.7	50	91	5.9	B
100/15	1	34	50	71	148	47.2	55	121	5.2	B
100/15	2	47	73	55	124	68.4	41	94	7.9	A
100/15	3	51	74	55	174	45.0	48	95	9.8	A
80/12	1	44	71	58	169	58.2	55	98	2.3	B
80/12	2	29	57	72	225	42.1	67	144	3.9	C
80/12	3	40	67	75	188	38.0	53	76	2.7	C
60/10	1	38	53	51	140	36.3	39	72	6.6	A
60/10	2	51	78	54	156	29.5	48	75	5.6	A
60/10	3	38	58	37	73	48.8	56	107	3.1	A
40/8	1	48	68	54	106	31.9	51	83	3.9	A
40/8	2	58	78	62	90	29.3	47	89	4.0	A
40/8	3	48	69	51	84	32.1	43	87	4.2	A+
30/6	1	51	101	60	118	18.7	68	166	1.9	A
30/6	2	55	78	57	87	22.5	56	90	3.1	A
30/6	3	44	67	44	90	23.9	40	72	4.2	A+
20/4	1	40	66	43	62	17.9	41	77	3.1	A+
20/4	2	60	95	43	77	15.4	49	81	3.1	A+
20/4	3	48	86	62	81	16.0	56	84	2.6	A+

My observations from this is that, yes, bufferbloat should be controllable with autorate, as it gets markedly better with lower speeds set in CAKE. But there is so much variability in Starlink that coming up with the right control parameters seems tricky. In 300/10 Run 1 the bufferbloat wasn't bad at all, and control wouldn't be needed in that case. But in Run 4 it was quite bad. Speeds are likewise all over the place.

I haven't dived into the code of the autorate script too much, but it seems to me that I'm going to need it to be highly reactive for Starlink. Just running with the default parameters it ends up cutting a lot of speed off and reducing the latency somewhat, but not really as much as ideal for how much bandwidth is cut off.

This is running OpenWrt on an EdgeRouter X.

moeller0 · June 23, 2022, 4:56pm

The delay mrasurements are already quite fast... the problem I see is the relative large variability in the idle RTT, which forces you to either use a high rtt threshold (which reacts more sluggish than a lower threshold) or accept noticeable shaper excursions towards the configured minimum if you set a lower threshold.... there is a compromise to be struck there, but you need to decide how much latency under load increas and how much speed sacrifice you are comfortable with.

I wonder whether there might be a better hop to ping though maybe within starlinks own infrastructure. Could you post the output of:
mtr -ezbw -c 100 8.8.8.8
please, maybe we see some pingable hops along the path?

gba · June 23, 2022, 6:12pm

Here's the results (from when the connection was idle):

HOST: OpenWrt                                    Loss%   Snt   Last   Avg  Best  Wrst StDev
@Not a TXT record
  1. AS???    100.64.0.1 (100.64.0.1)             0.0%   100   46.5  50.8  36.3  74.5   8.1
@Not a TXT record
  2. AS???    172.16.249.2 (172.16.249.2)         0.0%   100   40.0  52.2  34.4  96.0  10.8
  3. AS14593  149.19.108.23 (149.19.108.23)       1.0%   100   38.4  53.2  34.4  92.7   9.8
  4. AS15169  142.250.171.128 (142.250.171.128)   0.0%   100   74.1  55.7  37.9  95.0  11.8
  5. AS15169  209.85.142.117 (209.85.142.117)     0.0%   100   68.9  52.6  35.9  87.7   9.3
  6. AS15169  216.239.47.87 (216.239.47.87)       0.0%   100   68.8  52.0  37.0  74.6   9.3
  7. AS15169  dns.google (8.8.8.8)                0.0%   100   71.4  51.7  35.9  74.4   8.5

moeller0 · June 23, 2022, 6:28pm

Thanks! Both 100.64.0.1 and 172.16.249.2 appear to be private address blocks that Starlink uses. Both seem to respond well and with a considerable tighter maximum than 8.8.8.8.
EDIT: I misread the chart and interpreted Last as Worst... with the full table in view 100.64.0.1 seems like the best of the lot, but not by much, so potentially just by chance, a longer measurement run should help to pick the best candidate out of the uninspiring lot ;).

Could you repeat that measurement again over night, for an hour
mtr -ezbw -c 3600 8.8.8.8
to see whether these two stay nice and responsive (unless one reads the table properly then these are barely better than the final hop), if yes these would be candidates to include in your reflector mix.

gba · June 23, 2022, 6:39pm

Thanks, I'll plan on running that tonight. It does look like every address I use mtr on goes through 100.64.0.1 and 172.16.249.2 as the first 2 hops.

Lynx · June 23, 2022, 6:41pm

Recovering from COVID having just finished hearing for work today but I wonder if increasing the granularity by reducing time between pings and increasing concurrent reflectors would help (if the router has spare CPU cycles)?

reflector_ping_interval_s=0.2 # (seconds, e.g. 0.2s or 2s)

so perhaps 0.1 or even lower?

and/or:

reflectors=("1.1.1.1" "1.0.0.1" "8.8.8.8" "8.8.4.4" "9.9.9.9" "9.9.9.10")
no_pingers=4

more pingers?

And then more aggressive bandwidth increases via:

# rate adjustment parameters 
# bufferbloat adjustment works with the lower of the adjusted achieved rate and adjusted shaper rate
# to exploit that transfer rates during bufferbloat provide an indication of line capacity
# otherwise shaper rate is adjusted up on load high, and down on load idle or low
# and held the same on load medium
achieved_rate_adjust_bufferbloat=0.9 # how rapidly to reduce achieved rate upon detection of bufferbloat 
shaper_rate_adjust_bufferbloat=0.9   # how rapidly to reduce shaper rate upon detection of bufferbloat 
shaper_rate_adjust_load_high=1.01    # how rapidly to increase shaper rate upon high load detected 
shaper_rate_adjust_load_low=0.98     # how rapidly to return to base shaper rate upon idle or low load detected

so maybe e.g. shaper_rate_adjust_load_high=1.05 or higher?

Also with:

achieved_rate_adjust_bufferbloat=0.9 # how rapidly to reduce achieved rate upon detection of bufferbloat

should we perhaps exceptionally increase slightly to perhaps even >1.0? The bandwidth cuts seem pretty drastic - so maybe a compromise is in order here with Starlink? And in the end we still take the lower of the achieved rate * factor and the punished shaper rate with this line:

https://github.com/lynxthecat/CAKE-autorate/blob/main/CAKE-autorate.sh#L51

Please take all of this with a pinch of salt since I'm in recovery mode .

moeller0 · June 23, 2022, 6:46pm

Crap! Hope you and your family are recovering well.

gba · June 23, 2022, 7:12pm

Yes, hope you recover quickly!

If anyone really feels like diving into more detail about how Starlink works, I had first heard about this autorate project from Dave Taht, who put out a call for flent benchmarks from Starlink users. I did some and he asked me to post them to this thread.

https://drive.google.com/drive/folders/1zYi8xBqUDkzMtZwKMIM7WbjAyocd-hG2?usp=sharing

In the "nosqm" subfolder are several runs from different times of day with SQM disabled.

Then I did a couple runs using this autorate script (this was several days ago), with the following config.sh parameters changed from default:

delay_thr_ms=50 # (milliseconds)

min_dl_shaper_rate_kbps=10000 # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=50000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=300000 # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=2000 # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=8000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=40000 # maximum bandwidth for upload (Kbit/s)

I can do more flent tests if anyone finds them interesting.