CAKE w/ Adaptive Bandwidth [October 2021 to September 2022]

btw a thought that rather than changing the shaper rate, we could plug up the send when we know a handoff is approaching:
https://www.infradead.org/~tgr/libnl/doc/api/group__qdisc__plug.html#gac99edea24c26b1d67f764f55d6f23a3a

Also, I am re-reading this - on iridium - which is a really great book: https://www.amazon.com/dp/B01AGZ8M3A/ and wondering how different l2 is here from way back when.

Sure, I'll have a couple different irtt jobs run overnight when Starlink's network should be about as clear as it gets and get you the data. If there's anything else you'd like just let me know.

I started to look into irtt's help and found an intersting option --sfill which allows to request a specific fill pattern from the server, I tried:

bash-3.2$ irtt client --dscp=0xfe --fill-one --fill=pattern:fe --sfill=pattern:fe  -i3ms -d20m de.starlink.taht.net
[Connecting] connecting to de.starlink.taht.net
[ServerRestriction] server restricted fill from pattern:fe to pattern:69727474

could you enable that on your servers? From irtt help server:\

--fill=fill    payload fill if not requested (default pattern:69727474)
               none: echo client payload (insecure on public servers)
               rand: use random bytes from Go's math.rand
               pattern:XX: use repeating pattern of hex (default 69727474)
--allow-fills= comma separated patterns of fill requests to allow (default rand)
  fills        see options for --fill
               allowing non-random fills insecure on public servers
               use --allow-fills="" to disallow all fill requests
               note: patterns may contain * for matching

My goal here is to simply store the used TOS byte value from each side into the irtt payload so that from packet captures it should be possible to compare the initial TOS/DSCP value and what survived of that over the network path? I see Pete's comment about security, but how insecure would
--allow-fills="**" be, so basically allowing all 256 potential TOS byte values?

@moeller0 any chance you might be able to explain the difference between dropping shaper rates to minimum and buffering up data between such 'plug' and 'release' calls. Why might such plugging of the send work better? Would this be in addition or as an alternative?

Also how do we use this? I tried:

root@OpenWrt:~# opkg install libnl
Installing libnl200 (3.5.0-1) to root...
Downloading https://downloads.openwrt.org/releases/22.03-SNAPSHOT/packages/aarch64_cortex-a53/base/libnl200_3.5.0-1_aarch64_cortex-a53.ipk
Installing libnl-core200 (3.5.0-1) to root...
Downloading https://downloads.openwrt.org/releases/22.03-SNAPSHOT/packages/aarch64_cortex-a53/base/libnl-core200_3.5.0-1_aarch64_cortex-a53.ipk
Installing libnl-genl200 (3.5.0-1) to root...
Downloading https://downloads.openwrt.org/releases/22.03-SNAPSHOT/packages/aarch64_cortex-a53/base/libnl-genl200_3.5.0-1_aarch64_cortex-a53.ipk
Installing libnl-route200 (3.5.0-1) to root...
Downloading https://downloads.openwrt.org/releases/22.03-SNAPSHOT/packages/aarch64_cortex-a53/base/libnl-route200_3.5.0-1_aarch64_cortex-a53.ipk
Installing libnl-nf200 (3.5.0-1) to root...
Downloading https://downloads.openwrt.org/releases/22.03-SNAPSHOT/packages/aarch64_cortex-a53/base/libnl-nf200_3.5.0-1_aarch64_cortex-a53.ipk
Configuring libnl-core200.
Configuring libnl-route200.
Configuring libnl-genl200.
Configuring libnl-nf200.
Configuring libnl200.
root@OpenWrt:~# nl-qdisc-add --dev=ifb0 --parent=root plug --limit=32768\n
-ash: nl-qdisc-add: not found

I do not know exactly what Dave is referring to here. So I am probably misunderstanding things that said IIUC "plugging" a qdisc might make sense if the information that something is plugged is fed back to the actual data producers so they can just stop pushing packets into the network as long as the network is plugged. But I do not understand how that would operate on a router where the sending processes might be on different machines and hence do not see the back pressure building up "above" the plug. Not sure what will happen, but I would guess that we would just drop a ton of packets...

But I am likely misunderstanding the whole line of Dave's reasoning here :wink:

I do note however that @gba's IRTT data suggests it might be sufficient to reduce the egress rate to minimum, while leaving the ingress rate alone, as it appears that the egress side accumulates most of the delay.

1 Like

I thought from the code I looked at plugging means: from the point of 'plug', buffering up the data; and then, from the point of 'release', sending it later (so no dropping of packets), whereas I thought the shaper stuff works by dropping packets? But I assume my understanding is not strong enough to understand why buffering will nevertheless end up in the dropping you refer to?


As an aside I am thinking of merging the Starlink commits to main and also nuking the medium rate logic entirely. I tested the latter on my LTE and it doesn't seem helpful. Is there ever a scenario where it could make sense? Like could there be a situation where shaper rate is held at say 30Mbit/s and there is a 15Mbit/s stream and capacity varies between 45-80Mbit/s and everyone is happy or is this just a reflection of wishful thinking? From my testing is seems streams just don't work like that. They fluctuate up and down a lot. If there is a situation where it could make sense I could keep it in because it can be disabled by just setting the medium rate threshold to the high rate threshold.

So one potential reason to keep the capability (not arguing for enabling it by default) would be for something like starlink, where we have a hard time actually moving from min to max rate in the <15 seconds we have available. However I am not convinced that there are no better ways than an intermediate hold logic.

One other rationale for a "mild" hold (say high 75%, hold 70%) would be that it could help dampen the shaper oscillations, except I am not sure whether these oscillations of the shaper rate above the true achieved rate actually have any negative side-effects (so I am not sure these oscillations we see are problematic).

Not sure that is going to help all that much, we could achieve similar with throttling cake down hard while at the same time increasing the interval (which will also result in us buffering without dropping).

I was thinking of "corking" so if a qdisc accepts no packets whatsoever incoming packets can only be dropped. But I clearly was speaking from lack of knowledge here.

BTW is your thinking that we should only reduce the upload shaper rate? If so I'll effect to that change now.

I'm making stuff up here but could it be that for download data satellite transition handles this by previous satellite passing buffer of data to new satellite and then new satellite working from that. Whereas for upload data there is a blackout period where there is no satellite to upload to.

So by buffering upload data at router we are sort of mirroring what Starlink does?

Plug seems to involve buffering:

"Usage: nl-qdisc-add [...] plug [OPTIONS]...\n"
"\n"
"OPTIONS\n"
"     --help                Show this help text.\n"
"     --limit               Maximum queue length in bytes.\n"
"     --buffer              create a new buffer(plug) and queue incoming traffic into it.\n"
"     --release-one         release traffic from previous buffer.\n"
"     --release-indefinite  stop buffering and release all (buffered and new) packets.\n"
"\n"
"EXAMPLE"
"    # Attach plug qdisc with 32KB queue size to ifb0\n"
"    nl-qdisc-add --dev=ifb0 --parent=root plug --limit=32768\n"
"    # Plug network traffic arriving at ifb0\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --buffer\n"
"    # Unplug traffic arriving at ifb0 indefinitely\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --release-indefinite\n\n"
"    # If operating in output buffering mode:\n"
"    # at time t=t0, create a new output buffer b0 to hold network output\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --buffer\n\n"
"    # at time t=t1, take a checkpoint c0, create a new output buffer b1\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --buffer\n"
"    # at time t=t1+r, after c0 is committed, release b0\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --release-one\n\n"
"    # at time t=t2, take a checkpoint c1, create a new output buffer b2\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --buffer\n"
"    # at time t=t2+r, after c1 is committed, release b1\n"
"    nl-qdisc-add --dev=ifb0 --parent=root --update plug --release-one\n");
}

This will make more sense to you than me:

https://www.infradead.org/~tgr/libnl/doc/api/group__qdisc__plug.html

So I think it's like a dam for water. We erect dam (plug) and data builds up in buffer. Then we open flood gates (release) and data is released from buffer.

But I'm just guessing that from descriptions I see like in the above.

I guess we could do that, but that requires more data to confirm that our hypothesis is correct.

OK, I ran 2 irtt tests overnight at 4am local. SQM was disabled, network was quiet.

First test was run with --dscp=0xfe -i3ms -d5m, so default (small packets):

Total rtt latency is red, receive latency is blue, send latency is green, packet loss % is orange (I did a shifted rolling average here for visualization), and the small brown dots on the bottom indicate the seconds of the Starlink optimization/shift (although pretty obvious from the graph).


                         Min     Mean   Median      Max   Stddev
                         ---     ----   ------      ---   ------
                RTT  57.76ms  84.43ms  83.12ms    200ms  12.71ms
         send delay  15.19ms  33.28ms  31.52ms  146.5ms  10.49ms
      receive delay  39.45ms  51.15ms   51.2ms  105.7ms   5.78ms
                                                                
      IPDV (jitter)    142µs   4.12ms   2.99ms  83.71ms   3.75ms
          send IPDV   6.54µs   3.96ms   2.99ms  81.36ms   3.29ms
       receive IPDV       0s    732µs   26.5µs  46.18ms    2.3ms
                                                                
     send call time   5.68µs   54.6µs            1.77ms     25µs
        timer error       0s   29.3µs           14.58ms   76.8µs
  server proc. time    620ns      3µs            86.5µs   3.48µs

                duration: 5m1s (wait 599.9ms)
   packets sent/received: 99670/87038 (12.67% loss)
 server packets received: 87089/99670 (12.62%/0.06% loss up/down)
     bytes sent/received: 5980200/5222280
       send/receive rate: 159.5 Kbps / 139.3 Kbps
           packet length: 60 bytes
             timer stats: 328/99998 (0.33%) missed, 0.98% error

Then here is the second run with packet size at its maximum --dscp=0xfe -i3ms -d5m -l 1472

                         Min     Mean   Median      Max   Stddev
                         ---     ----   ------      ---   ------
                RTT  59.64ms  115.7ms  82.63ms  501.4ms   97.9ms
         send delay  14.22ms  66.15ms  33.48ms  448.7ms  96.96ms
      receive delay  40.54ms  49.55ms  49.24ms  107.7ms   5.38ms
                                                                
      IPDV (jitter)   35.6µs   4.01ms   2.98ms  111.2ms   3.63ms
          send IPDV     60ns   3.61ms   2.97ms  99.64ms    3.4ms
       receive IPDV       0s    997µs   48.1µs  52.83ms   2.16ms
                                                                
     send call time   9.68µs   66.7µs             2.5ms   22.5µs
        timer error      1ns   36.8µs            8.12ms   51.6µs
  server proc. time    620ns   2.85µs             199µs    2.8µs

                duration: 5m2s (wait 1.5s)
   packets sent/received: 99704/94276 (5.44% loss)
 server packets received: 94335/99704 (5.38%/0.06% loss up/down)
     bytes sent/received: 146764288/138774272
       send/receive rate: 3.91 Mbps / 3.70 Mbps
           packet length: 1472 bytes
             timer stats: 296/100000 (0.30%) missed, 1.23% error

In hindsight I'm wondering if I should have used a smaller packet size than the maximum. From my Starlink data logger I can see that it was uploading a little over 4 Mbps at that time, so perhaps it was hitting the upper limit of upload bandwidth available on a couple satellites, hence the send latency spike? But that's also really interesting, like toward the end it just spiked in one 15 second block and then immediately came back down.

What if Starlink optimizes bandwidth levels at those 15 second intervals, so when dishy switches to a satellite it has a set bandwidth level (based on TDMA timeslots or however Starlink operates) for that entire 15 second interval?

If you or anyone wants to play with the data more, you can download the irtt json output here and the script I used to generate these graphs:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import math
import json
import zipfile
import sys


if len(sys.argv) < 3:
    print('Usage:', sys.argv[0], 'input_filename.json output_filename.svg')
    exit(-1)
    
input_filename = sys.argv[1]
output_filename = sys.argv[2]

f = open(sys.argv[1])
data = json.load(f)

round_trips = data['round_trips']
rtts = []
receive_latency = []
send_latency = []
ts = []
index = []
lost_packet = []
count = 0
for round_trip in round_trips:
    ts.append(round_trip['timestamps']['client']['send']['wall'])
    if round_trip['lost'] == 'false':
        rtts.append(round_trip['delay']['rtt']/1000000)
        receive_latency.append(round_trip['delay']['receive']/1000000)
        send_latency.append(round_trip['delay']['send']/1000000)
        lost_packet.append(0)
    else:
        #rtts.append(-1)
        rtts.append(np.nan)
        receive_latency.append(np.nan)
        send_latency.append(np.nan)
        lost_packet.append(1)
    index.append(count)
    count = count + 1

df = pd.DataFrame()
df['rtts'] = rtts
df['receive_latency'] = receive_latency
df['send_latency'] = send_latency
df['ts'] = ts
df['lost_packet'] = lost_packet
df['rolling_min'] = df['rtts'].rolling(100, 10).min().shift(-100)
df['rolling_max'] = df['rtts'].rolling(100, 10).max().shift(-100)
df['rolling_mean'] = df['rtts'].rolling(100, 10).mean().shift(-100)
df['date'] = df['ts'].astype('datetime64[ns]')
#df['usecs_past_minute'] = df['ts'] % 60000000
#df['secs_past_minute'] = df['usecs_past_minute'] / 1000000
df['usecs_past_minute'] = df['date'].dt.microsecond
df['secs_past_minute'] = df['date'].dt.second
df['tenths_past_minute'] = df['secs_past_minute'] + round(df['usecs_past_minute'] / 1000000, 1)

df.loc[df['secs_past_minute'].isin([12,27,42,57]), 'starlink_switch'] = 1

print(df)

#timeData = df.groupby('secs_past_minute')['rtts'].sum()
timeData = df.groupby('tenths_past_minute')['rtts', 'lost_packet', 'receive_latency', 'send_latency'].mean()
with pd.option_context('display.max_rows', None,):
    print(timeData)

plt.figure()
plt.scatter(df['date'], df['rtts'], s=0.5, color='red')
#plt.scatter(df['date'], df['rolling_min'], color='red')
#plt.scatter(df['date'], df['rolling_max'], color='blue')
#plt.scatter(df['date'], df['rolling_mean'], color='green')
#plt.scatter(df['date'], df['tenths_past_minute'], color='blue')
plt.scatter(df['date'], df['receive_latency'], s=0.5, color='blue')
plt.scatter(df['date'], df['send_latency'], s=0.5, color='green')
plt.scatter(df['date'], df['lost_packet'].rolling(100).sum().shift(-100), s=0.5, color='orange')
plt.scatter(df['date'], df['starlink_switch'], s=2, color='brown')
plt.title('gba Atlanta Starlink RTT')
plt.xlabel('Time')
plt.ylabel('Latency (ms)')
#plt.xticks(rotation=45)
#plt.xticks(rotation=45)
plt.grid()
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.SecondLocator(interval=1))
plt.savefig(output_filename)
plt.show()

1 Like

sorry to leave y'all in the dark, re "plugging". I was always demonstrating how well fq_codel worked with pause frames on dsl. In the case of wifi, and (I think, but actually getting some plots of the analog energy being sent vs a vs the packet burden would be good) starlink, beams are not continuous and not duplex, and I think (please note how speculative this is?) there is a tight interval also where transmit and receive do separately. I also tend to think the encoding "style" is different between up and down.

I look at the array of starlink antennas on their recent cruise ship demo, and scratch my head. There's NO WAY given my admittedly 90s knowledge of how wireless technologies work that they could be that close together and still transmit at the same time! As for downwards reception... I can imagine getting some leverage from that (and regardless of our efforts IMHO they really have to fix their bufferbloat to work well on a cruise ship or airplane)...

Anyway, on "plugging". Making uploads better is simpler than tackling the whole enchalada. The purpose of shaping is to "move control of the queue to your own hardware where you can manage it better". So we know a tight schedule to initiate a virtual pause frame, where we can plug up all packets for 40ms, then release. The FQ aspect makes the small flows go out first, and the aqm will automatically kick in to drop a packet from a fat flow, if needed.

Another big unknown for us is how starlink allocates beams and bandwidth - there is a control loop of theirs that triggers - is it based on the bandwidth in use? an ewma over what period? If we clamp the bandwidth to 2Mbits 300ms early and their algorithm is sampling stats during that 300ms, what decision will starlink make next?

Another example is a typical web page takes 3s to load and doesn't even need a mbit in acks to do that... (optimizing on a 15s interval is silly on starlinks part in this respect - but DASH traffic and BBR optimize on a 10s interval, which is also silly) - so do they make decisions based on bytes AND packets?

Life would be so much easier with source code.

1 Like

really awesome plots thx. I would love to be doing this highres stuff with other tech (ethernet, wifi, lte, 5g) too.

As for your second plot the earlier spikes look a lot like the impact of real, tcp cubic traffic, on the test run, to me. You can see this kind of stuff by a simple single flow
tcp_nup or tcp_ndown test, too.

I make the analogy a lot that by creating a level of background noise like the "cosmic background radiation", it's possible to see patterns over what was formerly "vacuum", especially on wireless tech.

As one example, fire up that long duration irtt thing and have a videoconference, there are istinct pattern vs a vs zoom and galene you can now see, and - oy, facetime....

1 Like

the loss rate for tiny packets was enormous (12%!?) compared to big packets.

Thank you very much for your observations @dtaht.

What specific recommendations would you make for the handling of these transitions or things to test? I have some time this evening and then am off on holiday for a bit so keen to make any changes for @gba to test to keep up momentum.

At the moment we drop BOTH upload and download shaper rates to minimums for Xms (300ms) prior to and Yms (300ms) after our estimated satellite transition times.

Is this the correct approach do you think or should we only reduce the upload shaper rate?

Should plugging be implemented in addition to, or to replace, cutting down on the shaper rates?

I realise this is all guess work, and perhaps you are thinking we need more data, but I am eager to make some changes to my code to allow @gba to test whilst we have this momentum.

There seems to be a difference in behavior between up and down packet size - could you repeat those irtt tests on LTE/5G? (note it makes me twitchy to NOT have a 1x1 correspondence of datapoints to pixels) I had irtt running native under iOs at some point...

Need more data. Let the code rest. Focusing on just an upload for a while would be good.

Where you going for vacation?

1 Like

OK - I'll just code up facility to independently change download and upload shaper rate during our compensation period and @gba can test that.

Just to the Netherlands to visit family (my wife is Dutch). They seem to have rather better internet infrastructure (and infrastructure in general) as compared to what we are stuck with in Scotland (the SNP, just like our NHS, leave a lot to be desired!).

OK @gba and others, I have merged the changes from the 'starlink-testing' branch to the main branch. Clearly more work is to be done in terms of figuring out how to best optimize for Starlink, but I like these changes so far to the overall flow anyway.

I now prepend 'dl' or 'ul' to the 'load_condition' identifiers, as follows:

1657044342.191073 25874  1296   97  5   [1657044342.173162] 8.8.8.8         53     36441  46800  10369  25990  1 dl_high_sss    ul_low_sss     26695  25000
1657044342.249335 23124  1444   86  5   [1657044342.230145] 8.8.4.4         52     38338  49200  10872  25986  0 dl_high        ul_low         26961  25250
1657044342.293023 23124  1444   85  5   [1657044342.269984] 1.1.1.1         56     39484  46900  7423   25976  0 dl_high        ul_low         27230  25250

Thus we have the following form now for the load identifiers:

[dl|ul]_[low|med|high]_[bb|bb_sss]

And this allows differentiation in the case of Starlink satellite switch compensation based on whether the load is download or upload.

For now I have amended the get_next_shaper_rate() function such that we only drop down to the minimum shaper rate in respect of upload:

	case $load_condition in

		# Starlink satelite switching compensation, so drop down to minimum rate through switching period
		ul*sss)
				shaper_rate_kbps=$min_shaper_rate_kbps
			;;
		# bufferbloat detected, so decrease the rate providing not inside bufferbloat refractory period
		*bb*)
			if (( $t_next_rate_us > ($t_last_bufferbloat_us+$bufferbloat_refractory_period_us) )); then
				adjusted_achieved_rate_kbps=$(( ($achieved_rate_kbps*$achieved_rate_adjust_down_bufferbloat)/1000 )) 
				adjusted_shaper_rate_kbps=$(( ($shaper_rate_kbps*$shaper_rate_adjust_down_bufferbloat)/1000 )) 
				shaper_rate_kbps=$(( $adjusted_achieved_rate_kbps < $adjusted_shaper_rate_kbps ? $adjusted_achieved_rate_kbps : $adjusted_shaper_rate_kbps ))
				t_last_bufferbloat_us=${EPOCHREALTIME/./}
			fi
			;;
            	# high load, so increase rate providing not inside bufferbloat refractory period 
		*high*)	
			if (( $t_next_rate_us > ($t_last_bufferbloat_us+$bufferbloat_refractory_period_us) )); then
				shaper_rate_kbps=$(( ($shaper_rate_kbps*$shaper_rate_adjust_up_load_high)/1000 ))
			fi
			;;
		# medium load, so just maintain rate as is, i.e. do nothing
		*med*)
			:
			;;
		# low or idle load, so determine whether to decay down towards base rate, decay up towards base rate, or set as base rate
		*low*|*idle*)
			if (($t_next_rate_us > ($t_last_decay_us+$decay_refractory_period_us) )); then

	                	if (($shaper_rate_kbps > $base_shaper_rate_kbps)); then
							decayed_shaper_rate_kbps=$(( ($shaper_rate_kbps*$shaper_rate_adjust_down_load_low)/1000 ))
							shaper_rate_kbps=$(( $decayed_shaper_rate_kbps > $base_shaper_rate_kbps ? $decayed_shaper_rate_kbps : $base_shaper_rate_kbps))
				elif (($shaper_rate_kbps < $base_shaper_rate_kbps)); then
        			   	 	decayed_shaper_rate_kbps=$(( ($shaper_rate_kbps*$shaper_rate_adjust_up_load_low)/1000 ))
							shaper_rate_kbps=$(( $decayed_shaper_rate_kbps < $base_shaper_rate_kbps ? $decayed_shaper_rate_kbps : $base_shaper_rate_kbps))
                		fi

				t_last_decay_us=${EPOCHREALTIME/./}
			fi
			;;
	esac

@gba you can tweak this behaviour as desired for testing. If you want to drop down to the minimum shaper rate for both upload and download, then just replace 'ul * sss' with '*sss'. Or alternatively we can split things out and provide separate patterns 'ul * sss' and 'dl * sss' to match against upload and download separately, and then we can, for example, drop down to the minimum shaper rate for upload and min(base shaper rate, previous shaper_rate) for download.

Latest code for testing is in the main branch here:

2 Likes

Thanks for making the updates. I'm running it right now and will try to do some tests when I get a chance. Enjoy your vacation!

1 Like

Testing as well. The ul*sss setting appears to improve download bandwidth without affecting my latency, but I'll need to test more to validate that.

After this thunderstorm passes though, so maybe not until tomorrow.

1 Like

Excellent - that was my hope. I'm eager to see the findings from the further testing.