CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

EDIT: this post is not testing the relevant variant of the read-method and hence can be ignored

moeller@work-horse:/usr/lib/bash$ which sleep
/usr/bin/sleep
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do sleep 0.000000001; done

real	0m9.616s
user	0m6.666s
sys	0m2.983s
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 <><(:); done

real	0m2.496s
user	0m5.727s
sys	0m2.382s
moeller@work-horse:/usr/lib/bash$ enable sleep
bash: enable: sleep: not a shell builtin
moeller@work-horse:/usr/lib/bash$ enable -f ./sleep sleep
moeller@work-horse:/usr/lib/bash$ enable sleep
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do sleep 0.000000001; done

real	0m0.045s
user	0m0.045s
sys	0m0.000s

0.000000001* 10000 = 1e-05s
Hmm, at least ubuntu 22LTS on x86_64 the bash sleep module seems to absolutely wipe the floor with both iputils-sleep as well as the read method... that is orders of magnitude faster...

EDIT: again not the relevant optimized variant of the read-method... so wiping of floors not confirmed

Well, feed me the exact test loop you want me to run, please, not just snippets.

$ exec {fd}<><(:)
$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 <&$fd; done

real	0m0.032s
user	0m0.018s
sys	0m0.014s
$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 -u $fd; done

real	0m0.025s
user	0m0.020s
sys	0m0.005s

Thanks:

moeller@work-horse:/usr/lib/bash$ exec {fd}<><(:)
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 -u $fd; done

real	0m0.064s
user	0m0.050s
sys	0m0.014s
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do sleep 0.000000001; done

real	0m0.043s
user	0m0.042s
sys	0m0.002s
moeller@work-horse:/usr/lib/bash$ 

That is certainly much closer, but still not a winner... so I still think that comment might need some tuning.
That is this difference is small enough to ignore IMHO, one way or the other.

1 Like

Another attempt at capturing the Discord issue.

Baseline just before the meeting, without SQM:

Speedtest: https://www.speedtest.net/result/14450146014

Waveform bufferbloat: https://www.waveform.com/tools/bufferbloat?test-id=ab7cd8e9-ccc8-4b20-ab62-3b028e0ece41

I would say again that the experiment is not successful, as the link is marginal today, and I wouldn't say that this time that Discord worked well without SQM. There were some legitimate bufferbloat episodes that cake-autorate was supposed to react to. Maybe it even should have killed Discord this time, as the available bandwidth was indeed too low - but I am not sure whether it should have rolled down to the minimum. I would say that the link properties were NOT the same as two days ago - then it was mostly about spikes, dropouts, and undetected stalls that resolved instantly, and this time there are smooth humps of bufferbloat.

EDIT: at 2023-03-08-13:52:05 (1678283525) there was an undetected, but correctly ignored de-facto stall. I don't know how to interpret tsping output:

1678283524.841369,9.9.9.9,290,49924758,49924810,49924810,49924841,83,31,52
1678283524.976588,9.9.9.9,291,49924883,49924939,49924939,49924976,93,37,56
1678283525.087461,9.9.9.9,292,49925008,49925055,49925055,49925087,79,32,47
1678283525.214940,9.9.9.9,293,49925134,49925183,49925183,49925214,80,31,49
1678283525.344932,9.9.9.9,294,49925259,49925314,49925314,49925344,85,30,55
1678283525.475276,9.9.9.9,295,49925384,49925444,49925444,49925475,91,31,60
1678283525.605449,9.9.9.9,296,49925509,49925564,49925564,49925605,96,41,55
1678283526.273186,9.9.9.9,297,49925634,49925691,49925691,49926273,639,582,57
1678283526.356772,9.9.9.9,298,49925759,49925815,49925815,49926356,597,541,56
1678283526.429059,9.9.9.9,299,49925885,49925947,49925947,49926429,544,482,62
1678283526.462460,9.9.9.9,300,49926010,49926065,49926065,49926462,452,397,55
1678283526.487517,9.9.9.9,301,49926135,49926180,49926180,49926487,352,307,45
1678283526.515939,9.9.9.9,302,49926260,49926307,49926307,49926515,255,208,47
1678283526.559143,9.9.9.9,303,49926385,49926431,49926431,49926559,174,128,46
1678283526.587424,9.9.9.9,304,49926511,49926554,49926554,49926587,76,33,43
1678283526.726691,9.9.9.9,305,49926636,49926693,49926693,49926726,90,33,57

But again, without cake-autorate, it recovers, and with it, it doesn't recover after being throttled. But it may be my config mistake, maybe the default shaper_rate_adjust_up_load_high=1.01 is too low with the reduced ping rate.

The next Discord meeting (and thus a chance to record the bug properly) is on Friday.

I did not use tsping, because I was afraid that it would only impact the upload, thus making me unaware that others can't see and hear me (and yes I warned everyone). The cake-autorate version tested was 32b0bac129b3f265558b3580b3f01db0f266afcd.

Anyway, this time, it was at least recorded properly. The logs are here if you still want to look:

Files:

  • cake-autorate_config.lte.sh: config active during the first part of the meeting. Note that it quickly reduced the bandwidth to the minimum.
  • cake-autorate.lte.log.bad: the corresponding log
  • cake-autorate_config.lte.sh.new: config active during the second part of the meeting - it doesn't control the shaper, which is set statically to 15 Mbps in both directions (so that it is irrelevant)
  • cake-autorate.lte.log: the corresponding log
  • tsping.log: the log of tsping, which covers both parts of the meeting
  • hcsq-2.log: the signal strength and quality log, explained below

The columns in the last log are the timestamp, the constant string "LTE", and the four numbers (r1, r2, r3, r4) that "AT^HCSQ?" returns after it on a Huawei E3372s (well, actually a reflashed Huawei E3372h) modem. Interpretation:

RSSI_dBm = -120 + r1 
RSRP_dBm = -140 + r2
SINR_dB = -20 + (r3 * 0.2)
RSRQ_dB = -19.5 + (r4 * 0.5)

The old logs from Monday are still at https://u.pcloud.link/publink/show?code=kZufOPVZw7Pt1fRuLuy6DUH5YV7VpjksN8bV

Feel free to analyze and make suggestions.

1 Like

So I see a , comma as separator instead of whitespace, so I should be happy. Except a number of locales use coma as decimal separator and might misparse this. There is a reason, why I keep pushing for using ; semi-colons as separators/delimiters these are easy to parse and to my knowledge are use nowhere as decimal separators.

		int32_t down_time = result.finishedTime - result.transmitTime;
		int32_t up_time = result.receiveTime - result.originateTime;
		rtt = result.finishedTime - result.originateTime;
[...]
				printf(FMT_OUTPUT, ip, result.sequence, result.originateTime, result.receiveTime, result.transmitTime, result.finishedTime, rtt, down_time, up_time);

so the column types are:

timestamp; ip; sequence#; local.send; remote.receive; remote.send; local.receice; rtt; receiveOWD; sendOWD

The high receiveOWD show that this "glitch" was caused by problems sending from the base station to your LTE modem (@Lochnair OWDs are great for diagnosis/debugging, kudos!). while you could still send stuff out... for ICMP and UDP this can hobble over some stalls, but TCP with its reliance on reverse ACKs will not be a happy camper on a network that stalls in the >> 300ms range...

I would not call that your fault, but that is certainly a set of conditions on your link that require us to come up with better ideas how the controller should behave. And I agree increasing the rate by a single percent every 130ms might be too sluggish...

With your superior data analysis skills, could you please align and plot together the OWD and signal quality datasets? Does this produce anything useful or at least insightful?

Don't know yet, maybe I find time later tonight to create some plots...

1 Like

I honestly can not see much in these plots, no obvious change in LTE signal data that is aliged with the latency spikes, at least as is visible by the naked eye. So: no smoking gun, there still might be some correlation there, but nothing easy to untangle...
Maybe scaling the 4 r values to dB/dBm is not the ideal approach here, but I know too little aboyt LTE to even venture a guess here.

So at least the dominant multi second delay spikes seem not directly related to LTE signal quality, but it is possible that smaller (and still significant) delay changes are correlated to LTE signal variation.

2 Likes

@moeller0 I have also noticed myself that there can be a tendency to throttle down in a way that can be hard to recover from. I'm not sure what the dynamics are in that respect. Maybe something slightly different should be done right after a bufferbloat throttling event in which we reduce rate to the achieved rate. It's as if we should cut the connection some slack after having punished it. But then that slack could be the reason your hand gets bitten.

@patrakov you could try perversely setting the bufferbloat factor to greater than 1 so that when a bufferbloat even happens it actually sets the shaper rate slightly above the achieved rate.

Is there any way you can increase the ICMP frequency e.g. by pushing through wireguard? Having to work with such a low frequency also seems awkward.

Found a possible cause of this spiral-down. It's a confusion around rx_bytes_path.

By default, with an ifb interface, it is set to /sys/class/net/${dl_if}/statistics/tx_bytes. In other words, monitoring of the achieved rates uses the rates past the shaper. For TCP, it doesn't really matter, because it will quickly settle down to the rate that it is being shaped to. But for media streams over UDP, they are not that responsive, and the difference can be significant, and an overshoot is possible.

Quick demo:

Set both SQM rates to 800 kbps statically, stop cake-autorate. Run these two commands in two terminals in parallel:

iperf3 -u -c speedtest.shinternet.ch -R -t 60

and

r=0
ri=0
while true ; do
    read -r r1 </sys/class/net/wwan0/statistics/rx_bytes
    read -r ri1 < /sys/class/net/ifb4wwan0/statistics/tx_bytes
    echo -e "$(( r1 - r ))\t$(( ri1 - ri ))"
    r=$r1
    ri=$ri1
    sleep 1
done

Result: the iperf server will send 1 Mbit/s of UDP traffic towards your host. The shaper will shape this, but not to 800 kbit/s as configured, but to about 500 kbit/s. OK, doesn't matter for the end result. And the end result, for the monitored rates, is:

135908  66622
134778  66860
132928  66558
134609  66737
134304  67860
141108  66250
135876  66590
134419  66561
135347  67545

In other words, to understand that the download speed of 1 Mbit/s is achieved, despite the shaper being set to a lower value, we really need to use /sys/class/net/wwan0/statistics/rx_bytes.

Now see what happens. The script sets the shaper rate to 90% of the achieved rate on bufferbloat, with this wrong definition of "achieved". But, if somebody persistently overloads the link with UDP, it will progressively set the speed to 90%, then notice that the bufferbloat hasn't gone, and set it to 90% of the achieved rate again (which includes the shaper, so really 81%), and so on, which makes it too hard to recover from.

I think that we should completely delete special-casing of IFB/veth interfaces when figuring out rx_bytes_path and tx_bytes_path, and always use statistics of the upload interface, unless overridden. That is:

# Initialize rx_bytes_path and tx_bytes_path if not set
if [[ -z "${rx_bytes_path:-}" ]]; then
        rx_bytes_path="/sys/class/net/${ul_if}/statistics/rx_bytes"
fi
if [[ -z "${tx_bytes_path:-}" ]]; then
        tx_bytes_path="/sys/class/net/${ul_if}/statistics/tx_bytes"
fi

Well, I am not sure about the tx_bytes_path change, but I don't know how a setup with ifb or veth would look like in this case, and am not sure whether there is a universal rule based solely on the interface type. Yes I know that this will break someone's setup, but the current heuristic is too smart, in a wrong way, and I believe that for such non-standard cases a manual override is the correct solution. But then the setting and its use case should be documented in cake-autorate_defaults.sh.

P.S. One of the consequences of this bug is that all graphs posted so far are wrong regarding the download speeds - they don't include overshoots over the rate set by the shaper.

P.P.S. From a cybersecurity perspective, if my analysis is correct, this qualifies as a vulnerability: an attacker who can send a UDP stream to the target router with enough bitrate to cause bufferbloat, can cause the reduction of usable bandwidth down to the minimal rate, i.e. far more than without the cake-autorate script. Perhaps we need to make an official announcement here and on GitHub, and maybe even release 1.2.1?

1 Like

Keep in mind that at the point of shaper reduction we likely still have some data in flight sent at the old rate that will accumulate in the queue and will need some time to drain, so reducing the rate to below the achieved rate IMHO is still the right thing.

Confusion indeed.

As far as I can tell we:
a) understand that the achieved rate is only meaningful for the download direction (unless Linux has direct control over the the uplink-bottleneck interface, but in that case we should just enable BQL and would have solved the issue).
b) we want the achieved rate to reflect the actually achievable goodput (modulo the gross rate versus net rate overhead)
c) we use the achieved rate as a helper in deciding which rate to reduce our shaper too (IIRC we take the minimum of our normal reduction step calculation and the achieved rate)

Yes this is due to b) above, so I argue this is pretty much as intended.

But due to c) the fact that our shape might drop a lot of packets does not really matter that much, we only act on the achieved rate if we have evidence that the shaper rate is too high already and only use the achieved rate if that gets us lower than our normal heuristic, In that case achieved_rate < shaper_rate by necessity so it is still a useful proxy for what the link can deliver...
In the increase rate direction, we already increase when we are at 75% (is that the current default still?) so again slight imprecision in achieved_rate measurements (e.g. from taking the shaper's egress instead of ingress rate) are not going to affect the control loop significantly.
Keep in mind we are dealing with a set of heuristics here, not hard and fast facts...

To increase the reporting precision here we would need to have additional definitions for the interfaces to collect the traffic data from. Or use something like sqm-script does:

# find the ifb device associated with a specific interface, return nothing of no
# ifb is associated with IF
get_ifb_associated_with_if() {
    local CUR_IF
    local CUR_IFB
    local TMP
    CUR_IF=$1
    # Stray ' in the comment is a fix for broken editor syntax highlighting
    CUR_IFB=$( $TC_BINARY -p filter show parent ffff: dev ${CUR_IF} | grep -o -E ifb'[^)\ ]+' )    # '
    sqm_debug "ifb associated with interface ${CUR_IF}: ${CUR_IFB}"

    # we could not detect an associated IFB for CUR_IF
    if [ -z "${CUR_IFB}" ]; then
        TMP=$( $TC_BINARY -p filter show parent ffff: dev ${CUR_IF} )
        if [ ! -z "${TMP}" ]; then
            # oops, there is output but we failed to properly parse it? Ask for a user report
            sqm_error "#---- CUT HERE ----#"
            sqm_error "get_ifb_associated_with_if failed to extrect the ifb name from:"
            sqm_error $( $TC_BINARY -p filter show parent ffff: dev ${CUR_IF} )
            sqm_error "Please report this as an issue at https://github.com/tohojo/sqm-scripts"
            sqm_error "Please copy and paste everything below the cut-here line into your issue report, thanks."
        else
            sqm_debug "Currently no ifb is associated with ${CUR_IF}, this is normal during starting of the sqm system."
        fi
    fi
    echo ${CUR_IFB}
}

to automatically get the underlaying true interface (assuming that actually works in your wwan0 case). That is a lot of complication, add to this that due to b) we would need to grab these rate in addition to the rates we currently collect...

No, the script sets the rate to min(achieved_ratefactor1, shaper_ratefactor2) at a point at which we know no matter what the achieved rate over the last interval was, it was too large. So I argue, this being a heuristic, we do not gain all that much by obsessing about shaper ingress/egress rates.

This is effectively a DOS-attack, and yes autorate does not solve this problem, as far as I can tell that is unsolvable from the endpoint... there really is nothing we can do here, even if we reduce the shaper gentler that unrelenting UDP flow will still crowd out all usable traffic. So this is pretty much out od scope for autorate, sorry.

That is assuming that this nterface handles both ingress and egress.

I believe that the current heuristic works well enough and your concern (while not wrong) is a case of wanting perfect, while we already have "good enough". But hey, humor me, change this on your link abd see if that noticeably improves things...

Again, not a "bug" but a design goal, have the achieved rates correlate with measurable goodput.

This is an unavoidable consequence of doing ingress shaping after instead of egress shaping before the bottleneck link. We can be DOSed one way or the other, yes not ideal that an attacker will not need to use > 100% of link rate, but can get away with a bit less*... that is what the minimal rate definitions are there fore... I know you dislike them, but they are exactly the kind of back-stop against such shenanigans that we can use....

*) Note that we only persist on low rates if we actually experience bufferbloat, so when our controller arguably should engage. If there is no bufferbloat but high load percentage we increase the rate again, as well as if there is no load we will also increase the rate again. So to be constantly limited to the minimum rate we need to have persistent above threshold latency (effectively, spread around our reflector window), at which point keeping the shaper at the minimum rate is pretty much the right thing to do. I note the reports of "stuck on minimum" so there could be a bug in the implementation or the logicm, but on principle I do not see as catastrophic an issue as you seem to do.

1 Like

Oh boy this seems complicated. Is the issue that we are setting the achieved rate based on 90% of the post-shaper rate rather than 90% of the pre-shaper rate? Am I correct in thinking that ideally we would switch to the latter. But this is ONLY for this special case of setting the shaper rate based on line capacity estimation right? This might be why setting to 120% makes sense? For general case of measuring achieved rates for general monitoring and plotting we should go on using post-shaper rates right? It should achieved rates in general be pre-shaper including just for general monitoring and plotting?

Does it? The point is we only reduce rate if we experience increased latency, if that latency increase persists our shaper rate is too high and then reducing in slightly larger steps is not going to hurt much, if the bufferbloat persists we might have reduced the shaper a bit more than we might have, but normal rate increase rules still apply, that is we are likely experiencing high load so we increase the rate again.

That is @patrakov's argument in a nut-shell. I am less concerned about this, as all of this is a "tower of heuristics" and improving/changing one of the steps gently is IMHO not going to have a big effect on the whole. His attack model essentially is a sustained, below bottleneck rate unrelenting flow above its fair capacity share that uses up link capacity but is unresponsive to cake's signal so will result in pushing the shaper rate down (and successively getting an ever larger share of the ingress capacity). But that IMHO is the problem with unresponsive flows and ingress shaping, and that is something we will not be able to fix from our side. Or to put differently, if we switch the achieved_rate measurement method as proposed, all our attacker needs to do is gently increase its sending rate again and we are back at square one.
I do not dispute that as a reference for true achieved throughput over the link cake's ingress rate is a better proxy that cake's egress rate, I just do not think that this is going to be a big factor. However I am open to be convinced otherwise by data.

=

No this does not make sense generally, we really need to set the shaper a bit below the actual bottleneck rate to allow our queue to drain and increased latency to go away... there might be links where the mode, or basestation needs to see some persistent queue to schedule more capacity to a user, so it might be helpful on special links to try something like 120% but generally this is the wrong thing to do.

I think so, as that is what correlates (modulo the gross versus net difference) which what users can actually measure via speedtests.

In reality on normal links there is going to be a small difference between the two, hardly worth worrying about, after all our main controller really only acts on delta_delay, so we are talking about the margins here how fast to reduce or increase the shaper rate.

But I do understand that on very slow links this might look a bit more dramatic.

This is a partial misunderstanding, the problem is that you did not complete the thought experiment. Yes, the attacker with an unresponsive flow will be able to cause bufferbloat, and we can do nothing about it. Let's say (sorry for the unreasonably-low numbers, scale them as appropriate) the attacker uses up 1.1 Mbit/s via UDP out of 1 Mbit/s available, and there is also a legitimate TCP flow that would like to use as much as possible. The ISP's router will mix the two, and perhaps drop 54% of the attacker's packets to fit the link bandwidth. Without further shaping, we get 500 Kbit/s of the attack and 500 Kbit/s of legitimate traffic. But with the current cake-autorate, its logic will drive the shaper rate to the minimum, let's say 200 Kbit/s, thus throttling the legitimate traffic, too - that's the first concern. The second concern is the time to recover after the attack ends - overshaping obviously increases it.

Another way to view the proposal is: if it is clear that shaping below 90% of the actual incoming rate through wwan0 does not help against bufferbloat, then shaping even further won't help either, so stop persisting and please treat the latency incident as unfixable - i.e. optimize throughput, not latency.

So either we would switch to monitoring achieved rates based on shaper ingress (which might be confusing because speed tests would show values lower than that seen in our monitoring) or additionally monitor rates based on shaper ingress and then use the latter specifically when testing whether to punish on the lower of shaper rate * 0.9 or achieved rate * 0.9?

Regarding the confusing results of speedtests if we log the real achieved rate of wwan0: due to the way TCP works, I don't consider this a big problem, but logging both pre- and post-shaper rates, and only using the real wwan0 achieved rate as the estimate of the link capacity (and in all other decisions), would be a solution.

I would also say that the very use of "shaper rate * 0.9" for ingress is another root cause of the issue. Drop it.

For egress, of course the logic must take the shaper rate into account, as we don't have any estimate how much is dropped/buffered upstream.

But an attacker is just going to flood packets willy nilly; we are pitting a tsunami against a wooden hut and wondering whether we should use beech or ash.

Working out whether to persist or stop as you write is easier said than done. It's always tempted to add in extra heuristics like in event A, do X and in event B do Y, but we need something that works all the time in every situation.

But I do find it intellectually unsatisfying that we base the shaper rate = 0.9 * achieved rate on cake egress when it should be cake ingress. Assuming that I am understanding this issue correctly. It's sufficiently irksome to me to try to think about how to fix this.

Forget about the attacker then :slight_smile: think about a 900 Kbit/s video stream.