CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

If the network path did not change, the baseline from going to sleep should be a decent estimate after wake-up, potentially for loooong periods of time (network paths can change quickly but not all do that).
Now, our approach for the baseline increase slowly but decrease rapidly is intended to gracefully deal with changing network paths, so if the read-back value is decent, loading it bring us a benefit, if however it was out-of date and not reflective of the path anymore, we will not really be that much worse of than with any other arbitrary value like 0 or 1.5 seconds. So in summary I think storing and reading baseline/EWMA values is a nice little optimization.

Are ifs really that noticeably a cost? If so, only read them on startup or on restart after sleep?

I've avoided wherever possible. So reading in files is fine on startup.

But how do I initialize where no file exists? What if I just knock a zero off so that baseline is set to 100ms and delta to 50ms?

        for (( reflector=0; reflector<$no_reflectors; reflector++ ))
        do
                if [[ -f $run_path/reflector_${reflectors[$reflector]//./-}_baseline_us ]]; then
                        read rtt_baselines_us[${reflectors[$reflector]}] < $run_path/reflector_${reflectors[$reflector]//./-}_baseline_us

                else
                        rtt_baselines_us[${reflectors[$reflector]}]=100000
                fi
                if [[ -f $run_path/reflector_${reflectors[$reflector]//./-}_delta_ewma_us ]]; then
                        read rtt_delta_ewmas_us[${reflectors[$reflector]}] < $run_path/reflector_${reflectors[$reflector]//./-}_delta_ewma_us
                else
                        rtt_delta_ewmas_us[${reflectors[$reflector]}]=150000
                fi
        done

Hanging around with setting up the baseline introduces complexity in terms of the health check monitoring, which wants to start seeing timestamps otherwise it gets unhappy.

100ms is at least the generic "internet" RTT that a lot of other pieces of code use as well (100ms certainly is not magic, but it seems to be in the right order of magnitude).

So this is likely to be rare maybe just run the configured ping binary 3-5 times quickly and generate a baseline estimate?

However the fallout of getting the baseline wrong is going to be relatively short anyway, so maybe your 100ms approach is simply good enough... (it certainly will make autoscaling less of an issue, sure with RTTs >= 10 ms even 100ms is large, but at least not unreasonably large :wink: )

Well, or make the main loop ignore delay samples with sequence numbers below 10 for setting the shaper but still evaluate them for the baseline, then the baseline would ripen during the first few samples.

But hey, this condition is hopefully rare enough that static initialization works. And in all honesty even the current value works for auorate's control loop, it really just the plotting part that has issues.

1 Like

OK thanks - I think I'll stick with just setting baseline/delta ewma to 100/150ms for now. I have also forced requiring an instance identifier. Default is now cake-autorate_config.primary.sh.

If by any chance you'd have any time to test the latest code in the 'testing' branch that would be very helpful.

The rate of change of this code has been pretty high recently!

Excellent!

Will try to find time tonight. I can not do this from work :wink:

Indeed it has :wink:

Thanks for the detailed explanation.
@Lynx not sure if this can be an issue:

cake-autorate.sh: line 368: (( 150000 >= )) ? 1000 : 900000 : syntax error: operand expected (error token is ")) ? 1000 : 900000 ")

cake-autorate commit f562786

Log file: https://drive.google.com/file/d/16z4bzeEx2ePpSvuf91cdoNWfQvrWKZj3/view?usp=share_link

2 Likes

I think that issue - presumably missing baseline on reflector rotation (which should not have stopped running, just error and continue) is fixed by this commit:

See explanation.

Please can you test the latest code in the 'testing' branch?

This is the latest and greatest code, as it were.

This is about to get pulled to 'main' and form a new release version given so many changes since the last version. Please observe that the cake-autorate_config.sh file has now changed to cake-autorate_config.primary.sh because we now support multiple cake-autorate instances.

The default run directory is now /var/run/cake-autorate/primary and the log file output is /var/log/cake-autorate.primary.log.

2 Likes

If I run only one instance, can I skip running the launcher script? I ask this because the service currently enabled executes cake-autorate itself.

EDIT: Oh, man! I just lost everything in the directory where I was previously running autorate :cold_sweat: ... well, I guess this is a small sacrifice for the giant FOSS leap... without looking at the code, it probably deletes everything in the current directory... ok, let's proceed...

Yikes (and I'm very sorry if this is my fault) - is this an issue with the code that needs fixing very urgently? It is just supposed to perform a cleanup on the:

/var/run/cake-autorate/$instance_identifier

directory.

@moeller0 or @patrakov can you suggest a way to make this safer:

Perhaps by checking $run_path is a subdirectory of /var/run/cake-autorate?

Nah... Don't waste your time on that.
It was my fault. Everytime I upgraded the script I would manually edit the running path as the same source code path (remember, I'm not on Openwrt, but Debian, and I never run the setup script).

Really, don't waste time on that.

Mmmh safest would be to explicitly only delete the full path including the instance_id, but

Mmmh run_path is already hard-coded anyway, so I wonder whether that is what wiped @gadolf's directory in the first place...

@gadolf, could you tell us what directory you used, please?

This has made me feel uneasy about that 'rm -r' line though. I think we should make this safer out of paranoia.

1 Like

I have a ~/bin directory where I use to save scripts, where cake-autorate were, and I changed the run_path var to point to it also. That is dumb and unnecessary.

Now about autorate. With this new version. I noticed speedtest is getting some unusual high download latency under load timings: in twelve tests, three of them raised above 200ms (the other ones "normal" values (the ones I'm used to with cake).

The log file: https://drive.google.com/file/d/1mg1IfIAAuTnijmGJqyYaAblmD-1wJt9c/view?usp=share_link

Then maybe just make it more clear in the instructions that the run_path content can be wiped.
Really, I'm fine, apparently nothing of my stuff have broken. I keep this server lean, just to run iptables, routing and dhcp (and cake of course), so it seems that this bin directory wasn't being used for anything important, other than cake-autorate.

@Lynx let's add a comment where run_path is assigned in the script that this directory will get wiped on script shutdown, so "change at own risk".

1 Like

BTW, for instance_id this will leave the /var/run/cake-autorate behind, should this not also be deleted IFF it is empty, that is if the last instance is being shut down?

1 Like

Forgot to mention: this is really cool!

DL: maximum 95.000%-ile delta delay over all 10 reflectors: 4.330 ms.
DL: maximum 99.000%-ile delta delay over all 10 reflectors: 43.960 ms.
DL: maximum 99.500%-ile delta delay over all 10 reflectors: 90.490 ms.
DL: maximum 99.900%-ile delta delay over all 10 reflectors: 300.590 ms.
DL: maximum 99.950%-ile delta delay over all 10 reflectors: 314.780 ms.
DL: maximum 99.990%-ile delta delay over all 10 reflectors: 314.780 ms.
DL: maximum 99.999%-ile delta delay over all 10 reflectors: 314.780 ms.
UL: maximum 95.000%-ile delta delay over all 10 reflectors: 4.330 ms.
UL: maximum 99.000%-ile delta delay over all 10 reflectors: 43.960 ms.
UL: maximum 99.500%-ile delta delay over all 10 reflectors: 90.490 ms.
UL: maximum 99.900%-ile delta delay over all 10 reflectors: 300.590 ms.
UL: maximum 99.950%-ile delta delay over all 10 reflectors: 314.780 ms.
UL: maximum 99.990%-ile delta delay over all 10 reflectors: 314.780 ms.
UL: maximum 99.999%-ile delta delay over all 10 reflectors: 314.780 ms.

I always wanted to see it for real

So a quick test of "testing" seems to show it working like expected...

After posting I suddenly noticed that DL and UL delays are exactly the same.
Should they?

This is simply because currently we can only use ping and fping to collect delay data and both only report round trip time, that is DL+UL, so currently we simply assign DL=RTT/2 and UL=RTT/2 hence they are identical. This is in preparation for optionally using one way delay measurements that will allow to detect the direction experiencing congestion, so we do not need to reduce both directions shaper rate if only one direction is "slow".

Understood.
Anyway, so cool to see delays up to the 99.999th percentile.
Thank you all for this great work!

1 Like