CAKE w/ Adaptive Bandwidth [October 2021 to September 2022]

moeller0 · November 30, 2021, 10:02am

Mmmh, you are right, let's start simple....

Lochnair · November 30, 2021, 12:25pm

Hehe, ingress shaping has gotten rather good I'll admit, but on low bandwidth links like the ADSL line we had, CAKE on ingress couldn't handle things like Steam downloads that fire up multiple connections without RTT going through the roof. I likely could've gotten ingress mode to work alright, but that would mean sacrificing a big percentage of the bandwidth.

They did, and they also switched automatically between fastpath and interleave modes depending on line conditions, so when that happened I could lose/gain a fair bit of bandwidth. I should add that we're 4.5 km from the central office, so it was a miracle really that it worked as well as it did.

Could very well work, if it does it'd save me a lot of time figuring out another way : )

I do note that as long I'm not limited by radio conditions the PGW does a pretty alright job shaping the link to the subscription rate, obviously not a FIFO there, so as long as the signal is good, I could probably get away with just running CAKE on ingress. Should be easier if I could get a modem that doesn't mess me about with changing LTE bands by itself

Lynx · November 30, 2021, 1:25pm

Is that when using TCP? Vodafone UK applies 10Mbit/s throttling on port 443 and otherwise I think many LTE providers employ various forms of traffic shaping in times of congestion. Since I don't want to be subjected to restrictions like the 10Mbit/s on port 443 I use a VPN (NordVPN). But WireGuard is UDP only. So I think I am in this weird situation where I need to use VPN to circumvent overly restrictive / dubious throttling practices by my ISP and then my own SQM to address increased bufferbloat associated with UDP and thus not benefitting from TCP's congestion control? That stated, I think I'd still want to use SQM anyway because although the latency increase without VPN was not as bad, it was still there.

How do you signal to your sender the bandwidth for it to use?

Also why don't you try sprout?

Lochnair · November 30, 2021, 2:11pm

No I haven't tested that. I currently run everything over WireGuard, so my ISP only sees UDP.
I would've done more testing without a tunnel, but I've had some weird issues where connections start failing after a while if I do.

You'd still benefit from TCP congestion control though, no? Sure the encapsulated packets won't directly, but everything using TCP inside the tunnel still would. I might be wrong, but my understanding is that it shouldn't act any different than if it wasn't encapsulated.

But you are of course subject to any buffering WireGuard might do, not sure if it does anything that could affect latency.

If you mean the rate for CAKE on the VPS, I currently do not.
Since I'm mostly hitting the limit for the PGW shaper on the ISP side, I can just set it to the subscription rate and it'll for the most part work fine.

The problem I'm seeing is mostly that the rate can be too low for it to ramp up speed sometimes, so sqm-autorate should be great for this, but I'd have to change it to allow for only egress. And of course, ideally it should be able to see if it's download or upload rates than needs changing, haven't read the whole discussion on that.

I might've missed it, but I haven't seen anything that's "ready to use"?
Is one of the examples in https://github.com/keithw/sprout/tree/master/src/examples something worth testing? Didn't see any documentation anywhere on what's what.

Lynx · November 30, 2021, 2:22pm

@moeller0 any thoughts? I found on my connection by not going over WireGuard I didn't get such high bufferbloat. It is true that my ISP seems to do a lot of funky management like the 10Mbit/s throttling per 443 stream, which presumably could actually help with bufferbloat for those individual streams (not that I am happy with that because it limits downloads and I think even OneDrive activity). But I think there was more to it than that which I do not understand. Maybe the traffic management that ISP puts in has a good component and an evil component, and that by removing the evil component by using VPN I lose out on the good component.

I am not sure. It is just the MIT paper and video looked promising. But it might just be me being gullible. As @moeller0 points out, why didn't sprout get widely adopted and stay as obscure as it has done. Then again it could still be amazing. They set MIT students the task of addressing this problem and found a frontier of latency/bandwidth tradeoff, with sprout sitting in a nice place:

anon50098793 · November 30, 2021, 2:30pm

not really... as discussed...;

no multiple interface handling
no sqm iface awareness (up, down, rate change etc)

feel free to unpack my(well based on your) tar.gz and give it a try... it has enough comments and such and most of the foundation logic to work around all these things...

be aware tho' it is far from finished... but it's enough to grapple with whats involved...

moeller0 · November 30, 2021, 3:09pm

This is a case where @Lynx's autorate script should work very well, since you only need your own home router as ICMP reflector and that you can control well (disable rate-limiting)...

Simple hack, just configure the min and max rate for ingress statically to your desired rate...

Again since you are testing against your home network you should be able to get reliable one-way delay measurements going which would allow to assess bufferbloat for each direction individually.

Maybe this can help?

moeller0 · November 30, 2021, 3:12pm

Throttling individual TCP flows to 10 Mbps will certainly help to keep bufferbloat lower, albeit at a throughput cost.

Lochnair · November 30, 2021, 6:15pm

Yes, definitely one of the advantages of doing it this way.

Not sure we're talking about the same thing. If I run @Lynx's autorate script on the VPS, there won't be any download (no IFB or veth) device for it to set rates on. I could just create a dummy interface of course, and set sqm-autorate to use that for download, thus avoiding having to modify the script.

I probably misunderstood earlier posts, I thought changes to the script were discussed so that it could better detect if if it's egress or ingress causing bufferbloat.

Thanks!
So this repo is what we need to look at: https://github.com/anirudhSK/alfalfa
I see there's also a fork with some examples for how to use it added to the readme as well: https://github.com/HenkPoley/alfalfa

Not surprisingly, it doesn't compile at the moment. Need to update it to compile with newer OpenSSL versions at the very least.

make[3]: Entering directory '/home/lochnair/Downloads/alfalfa/src/crypto'
  CXX      base64.o
base64.cc: In function 'bool base64_decode(const char*, size_t, char*, size_t*)':
base64.cc:48:40: error: invalid conversion from 'const BIO_METHOD*' {aka 'const bio_method_st*'} to 'BIO_METHOD*' {aka 'bio_method_st*'} [-fpermissive]
   48 |   BIO_METHOD *b64_method = BIO_f_base64();
      |                            ~~~~~~~~~~~~^~
      |                                        |
      |                                        const BIO_METHOD* {aka const bio_method_st*}
base64.cc: In function 'void base64_encode(const char*, size_t, char*, size_t)':
base64.cc:101:40: error: invalid conversion from 'const BIO_METHOD*' {aka 'const bio_method_st*'} to 'BIO_METHOD*' {aka 'bio_method_st*'} [-fpermissive]
  101 |   BIO_METHOD *b64_method = BIO_f_base64(), *mem_method = BIO_s_mem();
      |                            ~~~~~~~~~~~~^~
      |                                        |
      |                                        const BIO_METHOD* {aka const bio_method_st*}
base64.cc:101:67: error: invalid conversion from 'const BIO_METHOD*' {aka 'const bio_method_st*'} to 'BIO_METHOD*' {aka 'bio_method_st*'} [-fpermissive]
  101 |   BIO_METHOD *b64_method = BIO_f_base64(), *mem_method = BIO_s_mem();
      |                                                          ~~~~~~~~~^~
      |                                                                   |
      |                                                                   const BIO_METHOD* {aka const bio_method_st*}

Edit: Didn't take much to get it to compile: https://github.com/Lochnair/alfalfa/commit/514ae63267b46610bb6fb8e451f6e7fdbf2c4196

Lynx · November 30, 2021, 6:58pm

Lochnair:

Thanks!
So this repo is what we need to look at: https://github.com/anirudhSK/alfalfa
I see there's also a fork with some examples for how to use it added to the readme as well: https://github.com/HenkPoley/alfalfa

Not surprisingly, it doesn't compile at the moment. Need to update it to compile with newer OpenSSL versions at the very least.

make[3]: Entering directory '/home/lochnair/Downloads/alfalfa/src/crypto'
  CXX      base64.o
base64.cc: In function 'bool base64_decode(const char*, size_t, char*, size_t*)':
base64.cc:48:40: error: invalid conversion from 'const BIO_METHOD*' {aka 'const bio_method_st*'} to 'BIO_METHOD*' {aka 'bio_method_st*'} [-fpermissive]
   48 |   BIO_METHOD *b64_method = BIO_f_base64();
      |                            ~~~~~~~~~~~~^~
      |                                        |
      |                                        const BIO_METHOD* {aka const bio_method_st*}
base64.cc: In function 'void base64_encode(const char*, size_t, char*, size_t)':
base64.cc:101:40: error: invalid conversion from 'const BIO_METHOD*' {aka 'const bio_method_st*'} to 'BIO_METHOD*' {aka 'bio_method_st*'} [-fpermissive]
  101 |   BIO_METHOD *b64_method = BIO_f_base64(), *mem_method = BIO_s_mem();
      |                            ~~~~~~~~~~~~^~
      |                                        |
      |                                        const BIO_METHOD* {aka const bio_method_st*}
base64.cc:101:67: error: invalid conversion from 'const BIO_METHOD*' {aka 'const bio_method_st*'} to 'BIO_METHOD*' {aka 'bio_method_st*'} [-fpermissive]
  101 |   BIO_METHOD *b64_method = BIO_f_base64(), *mem_method = BIO_s_mem();
      |                                                          ~~~~~~~~~^~
      |                                                                   |
      |                                                                   const BIO_METHOD* {aka const bio_method_st*}

Edit: Didn't take much to get it to compile: https://github.com/Lochnair/alfalfa/commit/514ae63267b46610bb6fb8e451f6e7fdbf2c4196

I'm excited...

moeller0 · November 30, 2021, 6:59pm

Ah, okay, clearly the script needs to learn to enable interfaces individually.

Yes, one-way delay measurements are under discussion. In the generic case this is hard because there are only few nodes out there that will give reasonably precise time measurements that can be used for OWD measurements, but in your case you can set one up in your internal network, so that should work better....

francisuk1989 · November 30, 2021, 7:06pm

most of the issues that you may experience with cable/docsis (Virgin Media, UPC) are causes by the shared access infrastructure. If you're in an overloaded CMTS cluster, Theres not much you can do. Of course you can have a small cable cluster that let's you always hit your full bandwidth no problem. Then you're lucky, but that won't happen often. And That's the nature of it, never was designed to transmit individual users internet data.

Neither was phone wire, but with xDSL (FTTC/H/B) you only have shared core network, which rarely is a problem. With dedicated twisted pair you are better off when it comes to stability and reliability of the internet access imho.

@Lynx ISPs do cache things like YouTube and Netflix and can limit bandwidth per session, I have notice that Zen (UK ISP) limts there to about 20Mbps even at 4K.

Lynx · November 30, 2021, 8:23pm

Living in the Scottish Highlands I don't have much choice. It's just a 6 Mbit/s max copper ADSL line or LTE. Happily I live next to a Vodafone cell tower that doesn't seem to be terribly loaded and get healthy enough bandwidth such that with SQM I can get a decent amount that is low latency.

The sqm-autorate script seems to do a pretty good job for my connection in terms of recovering a lot of otherwise lost bandwidth for my file transfers whilst keeping latency low. I now have it running all the time from service file and seems to work fine on my RT3200.

Still room for a lot of improvement. And my hope is that ultimately everyone with variable connections wanting to use CAKE can benefit from an adaptive solution that just works, whether that is based on the shell script or something else I don't mind. It's just that for long enough there has not been one go to solution (only a few sketchy DIY hacks), which is why I started this thread.

About Zen what ever happened to net neutrality? At least you can purchase VPN for not very much.

But now given @Lochnair's posts above I am rather curious about using a VPS.

And @Lochnair I am extremely curious to see if you get sprout to work. Please keep us posted.

francisuk1989 · November 30, 2021, 8:53pm

No idea, I think they either limit to 20Mb per sec to stop it overloading there CDN servers or they think someone streaming Netflix and they dont need full speed to download as is just streams and buffering in background, Not an issue with Akamai as that flats out my VDSL connection speed.

Mobile/Cell data can be VERY hard to do! Peak times especially

Lynx · December 4, 2021, 6:11pm

Experimenting with just working from single rate.

  # in case of supra-threshold RTT spikes decrease the rate unconditionally
        if awk "BEGIN {exit !($cur_delta_RTT >= $cur_max_delta_RTT)}"; then
            next_rate=$( call_awk "int( ${cur_rate}*(1-${cur_rate_adjust_RTT_spike}) )" )
        else
            # ... otherwise take the current load into account
            # high load, so we would like to increase the rate
            if awk "BEGIN {exit !($cur_load >= $cur_load_thresh)}"; then
                next_rate=$( call_awk "int( ${cur_rate}*(1+${cur_rate_adjust_load_high}) )" )
            fi
            # low load, so determine whether to decay down, decay up, or set as base rate
            cur_rate_decayed_down=$( call_awk "int( ${cur_rate}*(1-${cur_rate_adjust_load_low}) )" )
            cur_rate_decayed_up=$( call_awk "int( ${cur_rate}*(1+${cur_rate_adjust_load_low}) )" )

            if awk "BEGIN {exit !($cur_rate_decayed_down > $cur_base_rate)}"; then
                # low load gently decrease to steady state rate
                next_rate=$cur_rate_decayed_down
            elif awk "BEGIN {exit !($cur_rate_decayed_up < $cur_base_rate)}"; then
                # low load gently increase to steady state rate
                next_rate=$cur_rate_decayed_up
            else
                next_rate=$cur_base_rate
        fi
        fi

github.com

lynxthecat/sqm-autorate/blob/experimental/sqm-autorate.sh

#!/bin/sh

# automatically adjust bandwidth for CAKE in dependence on detected load and RTT

# inspired by @moeller0 (OpenWrt forum)
# initial sh implementation by @Lynx (OpenWrt forum)
# requires packages: iputils-ping, coreutils-date and coreutils-sleep

debug=1

enable_verbose_output=0 # enable (1) or disable (0) output monitoring lines showing bandwidth changes

ul_if=wan # upload interface
dl_if=veth-lan # download interface

base_ul_rate=30000 # steady state bandwidth for upload

base_dl_rate=30000 # steady state bandwidth for download

tick_duration=1 # seconds to wait between ticks

This file has been truncated. show original

_FailSafe · December 4, 2021, 6:59pm

Working well here, so far!

Lynx · December 4, 2021, 7:00pm

The new single rate experimental version? For me it scales down too quickly and too much.

_FailSafe · December 4, 2021, 7:02pm

I would concur with that assessment, for sure. But it maintains a much flatter RTT/delta_RTT for me as compared to the "main" version I was testing.

Lynx · December 4, 2021, 7:04pm

What kind of connection do you have? @richb-hanover-priv suggested getting the script to work from a single rate to simplify everything. So I am trying to experiment to see if I can make that happen. But a drawback is that it will KEEP lowering and KEEP raising. That may be fine, but just not sure yet.

_FailSafe · December 4, 2021, 7:15pm

Alright, so don't boot me from this virtual room, please

I am actually a US cable (DOCSIS 3.1) user with 400/20 mbps on the carrier rate card. But as is common for cable, I actually pull around 480/24 mbps with SQM disabled. That said, my go-to SQM settings have been 462500 kbps for ingress and 24500 for egress. This works well about 70-80% of the time, but since COVID increased WFH and school-from-home populations around me, I have been seeing some significant swings in bandwidth/latency depending on time of day.

Hence, my interest in this script. I realize the main intent of the script was for potentially lower-speed connection types, but I figured why not give it a shot. If my feedback will skew your testing, let me know and I'll happily convert to "observer" mode from a forum perspective.

Right now, as it's the weekend, I am setting at the 462500/24500 kbps values. But under load (running a speed test) I do see both values get adjusted downward and fluctuate quite a bit before returning to the configured settings. But honestly, given the much flatter RTT/delta_RTT, I would say that's a good thing in my case. Your thoughts?