CAKE w/ Adaptive Bandwidth

I believe these kinds of things are where I can best contribute to this effort, so I'm happy to be able to help :slight_smile:

Also pushed the fixes I've got so far as v0.2.3:

  • Fixed the memory corruption issue @patrakov found
  • Made the "human-readable" output match the machine output
  • Disabled buffering on standard output
  • Plus hopefully improved error messages when things go wrong
2 Likes

I saw that same drift in the Julia code. Julia parses the json virtually instantaneously (once the initial call which involves a compile is done) so you can easily plot many datasets. The first plot takes 23s (to load all the packages). The next plot takes 0.12 s to read the data and make the plot.

Julia 1.9 is about to come out and it should dramatically lower startup times. I would guess maybe 3-5 seconds to start up based on the good stuff I'm hearing about 1.9.

I didn't spend much time designing a good plot visualization, and I find the offsets confusing... what is -400 ms delay (the red line) what is +500ms (green line)? What are those measured relative to? I think probably for interpretability they should have some offset added to them, but I don't know what. if someone can figure out what is going on there I'm happy to do the calcs though.

It kind of looks to me like rtt = send + receive, why receive is negative I don't know.

That is because one way delay measurements require synchronized clocks...
Let's assume a true RTT of 100ms on a symmetric path so both OWDs equal 50ms:
client clock 1 second ahead of server

  1. client sends packet at local time 0, which is time -1000 at the server
  2. packet arrives at server at 50ms client time, but the server stamps it with receive time -950ms and sends it back with send timestamp -951(1ms local processing time)
  3. client receices packet at local time 101.

Client now calculates OWDs:
up: server receive - client send: -950 - 0 = -950
down: client receive - server send: 101 - (-951) = 1052
rtt: up+ down: -950 + 1052 = 102

For true symmetric paths one can try to assume path delay to be symmetric as well and try to correct for it (NTP does that under the hood) but that is suboptimal if done during load tests like the speedtest here where the link experiences variable directional queueing delay, violating the equality assumption above....

Thank @dlakelan @moeller0 @Lynx for your help today. A draft of my preso is available for further comment here:

Key for you guys is do you want me to point at the code, or the conversation? The conversation here is often fascinating. I also need to add a ton more links that I have to find.

If you have any other nightmares worth sharing, I intend to pound through the last 18 slides like the last scene in Bladerunner.

It is very late here in the US, I am presenting at around, oh, 8:30? 9? AM PST

1 Like

For slide 35 depending on what you're trying to show you might consider instead using e.g. this:

done! thank you!

2 Likes

@moeller0 I looked through some of the timecourse plots in this thread and there are so many. Like also this:

I'm not sure which is best. Maybe it doesn't matter as all of them show the general idea.

2 Likes

Dave seems like a good talk. Is there a video? Glad you did finally get something from the Julia code to put in your slides (slide 12)

3 Likes

Fantastic - really looking forward to testing. I imagine ultimately this will replace fping as the default ping binary in cake-autorate. Tremendous work in piecing this together.

1 Like

EDIT: this post is not testing the relevant variant of the read-method and hence can be ignored

moeller@work-horse:/usr/lib/bash$ which sleep
/usr/bin/sleep
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do sleep 0.000000001; done

real	0m9.616s
user	0m6.666s
sys	0m2.983s
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 <><(:); done

real	0m2.496s
user	0m5.727s
sys	0m2.382s
moeller@work-horse:/usr/lib/bash$ enable sleep
bash: enable: sleep: not a shell builtin
moeller@work-horse:/usr/lib/bash$ enable -f ./sleep sleep
moeller@work-horse:/usr/lib/bash$ enable sleep
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do sleep 0.000000001; done

real	0m0.045s
user	0m0.045s
sys	0m0.000s

0.000000001* 10000 = 1e-05s
Hmm, at least ubuntu 22LTS on x86_64 the bash sleep module seems to absolutely wipe the floor with both iputils-sleep as well as the read method... that is orders of magnitude faster...

EDIT: again not the relevant optimized variant of the read-method... so wiping of floors not confirmed

You should create a file descriptor and then test read. Your method is flawed because you're spawning a subshell every time.

Do:

exec {fd}<> <(:)
read -t 0.0001 <&$fd

Well, feed me the exact test loop you want me to run, please, not just snippets.

$ exec {fd}<><(:)
$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 <&$fd; done

real	0m0.032s
user	0m0.018s
sys	0m0.014s
$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 -u $fd; done

real	0m0.025s
user	0m0.020s
sys	0m0.005s

Thanks:

moeller@work-horse:/usr/lib/bash$ exec {fd}<><(:)
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do read -t 0.000000001 -u $fd; done

real	0m0.064s
user	0m0.050s
sys	0m0.014s
moeller@work-horse:/usr/lib/bash$ time for ((i=1; i<=10000; i++)); do sleep 0.000000001; done

real	0m0.043s
user	0m0.042s
sys	0m0.002s
moeller@work-horse:/usr/lib/bash$ 

That is certainly much closer, but still not a winner... so I still think that comment might need some tuning.
That is this difference is small enough to ignore IMHO, one way or the other.

1 Like

Another attempt at capturing the Discord issue.

Baseline just before the meeting, without SQM:

Speedtest: https://www.speedtest.net/result/14450146014

Waveform bufferbloat: https://www.waveform.com/tools/bufferbloat?test-id=ab7cd8e9-ccc8-4b20-ab62-3b028e0ece41

I would say again that the experiment is not successful, as the link is marginal today, and I wouldn't say that this time that Discord worked well without SQM. There were some legitimate bufferbloat episodes that cake-autorate was supposed to react to. Maybe it even should have killed Discord this time, as the available bandwidth was indeed too low - but I am not sure whether it should have rolled down to the minimum. I would say that the link properties were NOT the same as two days ago - then it was mostly about spikes, dropouts, and undetected stalls that resolved instantly, and this time there are smooth humps of bufferbloat.

EDIT: at 2023-03-08-13:52:05 (1678283525) there was an undetected, but correctly ignored de-facto stall. I don't know how to interpret tsping output:

1678283524.841369,9.9.9.9,290,49924758,49924810,49924810,49924841,83,31,52
1678283524.976588,9.9.9.9,291,49924883,49924939,49924939,49924976,93,37,56
1678283525.087461,9.9.9.9,292,49925008,49925055,49925055,49925087,79,32,47
1678283525.214940,9.9.9.9,293,49925134,49925183,49925183,49925214,80,31,49
1678283525.344932,9.9.9.9,294,49925259,49925314,49925314,49925344,85,30,55
1678283525.475276,9.9.9.9,295,49925384,49925444,49925444,49925475,91,31,60
1678283525.605449,9.9.9.9,296,49925509,49925564,49925564,49925605,96,41,55
1678283526.273186,9.9.9.9,297,49925634,49925691,49925691,49926273,639,582,57
1678283526.356772,9.9.9.9,298,49925759,49925815,49925815,49926356,597,541,56
1678283526.429059,9.9.9.9,299,49925885,49925947,49925947,49926429,544,482,62
1678283526.462460,9.9.9.9,300,49926010,49926065,49926065,49926462,452,397,55
1678283526.487517,9.9.9.9,301,49926135,49926180,49926180,49926487,352,307,45
1678283526.515939,9.9.9.9,302,49926260,49926307,49926307,49926515,255,208,47
1678283526.559143,9.9.9.9,303,49926385,49926431,49926431,49926559,174,128,46
1678283526.587424,9.9.9.9,304,49926511,49926554,49926554,49926587,76,33,43
1678283526.726691,9.9.9.9,305,49926636,49926693,49926693,49926726,90,33,57

But again, without cake-autorate, it recovers, and with it, it doesn't recover after being throttled. But it may be my config mistake, maybe the default shaper_rate_adjust_up_load_high=1.01 is too low with the reduced ping rate.

The next Discord meeting (and thus a chance to record the bug properly) is on Friday.

I did not use tsping, because I was afraid that it would only impact the upload, thus making me unaware that others can't see and hear me (and yes I warned everyone). The cake-autorate version tested was 32b0bac129b3f265558b3580b3f01db0f266afcd.

Anyway, this time, it was at least recorded properly. The logs are here if you still want to look:

Files:

  • cake-autorate_config.lte.sh: config active during the first part of the meeting. Note that it quickly reduced the bandwidth to the minimum.
  • cake-autorate.lte.log.bad: the corresponding log
  • cake-autorate_config.lte.sh.new: config active during the second part of the meeting - it doesn't control the shaper, which is set statically to 15 Mbps in both directions (so that it is irrelevant)
  • cake-autorate.lte.log: the corresponding log
  • tsping.log: the log of tsping, which covers both parts of the meeting
  • hcsq-2.log: the signal strength and quality log, explained below

The columns in the last log are the timestamp, the constant string "LTE", and the four numbers (r1, r2, r3, r4) that "AT^HCSQ?" returns after it on a Huawei E3372s (well, actually a reflashed Huawei E3372h) modem. Interpretation:

RSSI_dBm = -120 + r1 
RSRP_dBm = -140 + r2
SINR_dB = -20 + (r3 * 0.2)
RSRQ_dB = -19.5 + (r4 * 0.5)

The old logs from Monday are still at https://u.pcloud.link/publink/show?code=kZufOPVZw7Pt1fRuLuy6DUH5YV7VpjksN8bV

Feel free to analyze and make suggestions.

1 Like

So I see a , comma as separator instead of whitespace, so I should be happy. Except a number of locales use coma as decimal separator and might misparse this. There is a reason, why I keep pushing for using ; semi-colons as separators/delimiters these are easy to parse and to my knowledge are use nowhere as decimal separators.

		int32_t down_time = result.finishedTime - result.transmitTime;
		int32_t up_time = result.receiveTime - result.originateTime;
		rtt = result.finishedTime - result.originateTime;
[...]
				printf(FMT_OUTPUT, ip, result.sequence, result.originateTime, result.receiveTime, result.transmitTime, result.finishedTime, rtt, down_time, up_time);

so the column types are:

timestamp; ip; sequence#; local.send; remote.receive; remote.send; local.receice; rtt; receiveOWD; sendOWD

The high receiveOWD show that this "glitch" was caused by problems sending from the base station to your LTE modem (@Lochnair OWDs are great for diagnosis/debugging, kudos!). while you could still send stuff out... for ICMP and UDP this can hobble over some stalls, but TCP with its reliance on reverse ACKs will not be a happy camper on a network that stalls in the >> 300ms range...

I would not call that your fault, but that is certainly a set of conditions on your link that require us to come up with better ideas how the controller should behave. And I agree increasing the rate by a single percent every 130ms might be too sluggish...

With your superior data analysis skills, could you please align and plot together the OWD and signal quality datasets? Does this produce anything useful or at least insightful?

Don't know yet, maybe I find time later tonight to create some plots...

1 Like

I honestly can not see much in these plots, no obvious change in LTE signal data that is aliged with the latency spikes, at least as is visible by the naked eye. So: no smoking gun, there still might be some correlation there, but nothing easy to untangle...
Maybe scaling the 4 r values to dB/dBm is not the ideal approach here, but I know too little aboyt LTE to even venture a guess here.

So at least the dominant multi second delay spikes seem not directly related to LTE signal quality, but it is possible that smaller (and still significant) delay changes are correlated to LTE signal variation.

2 Likes

@moeller0 I have also noticed myself that there can be a tendency to throttle down in a way that can be hard to recover from. I'm not sure what the dynamics are in that respect. Maybe something slightly different should be done right after a bufferbloat throttling event in which we reduce rate to the achieved rate. It's as if we should cut the connection some slack after having punished it. But then that slack could be the reason your hand gets bitten.

@patrakov you could try perversely setting the bufferbloat factor to greater than 1 so that when a bufferbloat even happens it actually sets the shaper rate slightly above the achieved rate.

Is there any way you can increase the ICMP frequency e.g. by pushing through wireguard? Having to work with such a low frequency also seems awkward.