CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

Good question, I guessthe question is convenience, do you want to be able to specify a "normal" continous sampling with just one number, or do you essentially always want having to configure sleep >= target. I honestly see only two usecases:
a) equitemporal sampling like autorate does now
b) batched sampling where you queuery with target 0 and then use sleep to define the effective period

since we mostly use a) I am fine with not having to configure sleep, but the behaviour and help text should be adjusted to:

  -s, --sleep-time=TIME      Time to wait between each round of pinging in ms
                             (default to target for equidistant sampling)

I have already filed a bug with a different (more consistent) proposal: https://github.com/Lochnair/tsping/issues/7

@patrakov any clue why this happens:

root@OpenWrt-1:~# tsping --print-timestamps --machine-readable=' ' --sleep-time 100 --target-spacing 1000 9.9.9.10 9.9.9.9
Starting tsping 0.2.2 - pinging 2 targets
1678132116.263362 9.9.9.10 0 71316226 71316239 71316239 71316263 37 24 13
1678132118.388443 9.9.9.10 1 71318327 71318364 71318364 71318388 61 24 37
1678132119.377351 9.9.9.9 1 71319327 71319353 71319353 71319377 50 24 26
1678132120.490509 9.9.9.10 2 71320428 71320464 71320464 71320490 62 26 36

root@OpenWrt-1:~# stdbuf -oL tsping --print-timestamps --machine-readable=' ' --sleep-time 100 --target-spacing 1000 9.9.9.10 9.9.9.9
Starting tsping 0.2.2 - pinging 2 targets
1678132123.268491 9.9.9.10 0 71323218 71323244 71323244 71323268 50 24 26
Something went wrong while sending: -1
1678132125.367498 9.9.9.10 1 71325318 71325343 71325343 71325367 49 24 25
Something went wrong while sending: -1
1678132127.468529 9.9.9.10 2 71327419 71327443 71327443 71327468 49 25 24
Something went wrong while sending: -1

For some reason the address family is corrupted:

[pid 26979] sendto(3, "\r\0\346\253ia\0\0\4B\236\260\0\0\0\0\0\0\0\0", 20, 0, {sa_family=0x7c60 /* AF_??? */, sa_data="\36\205\t\t\t\t\270\352'\205\177\0\0\0"}, 16) = -1 EAFNOSUPPORT (Address family not supported by protocol)

I have compiled it on the desktop, and valgrind complains:

==25044== Thread 3:
==25044== Syscall param socketcall.sendto(to.sa_family) points to uninitialised byte(s)
==25044==    at 0x498DEAC: sendto (sendto.c:27)
==25044==    by 0x10AB28: send_icmp_timestamp_request (main.c:94)
==25044==    by 0x10B60C: sender_loop (main.c:304)
==25044==    by 0x4909BB4: start_thread (pthread_create.c:444)
==25044==    by 0x498BCB3: clone (clone.S:100)
==25044==  Address 0x4a6e3d0 is 0 bytes inside a block of size 32 alloc'd
==25044==    at 0x4846CC3: realloc (vg_replace_malloc.c:1437)
==25044==    by 0x10A76B: parse_opt (args.h:103)
==25044==    by 0x499846C: group_parse (argp-parse.c:257)
==25044==    by 0x499846C: parser_parse_arg (argp-parse.c:693)
==25044==    by 0x499846C: parser_parse_next (argp-parse.c:865)
==25044==    by 0x499846C: argp_parse (argp-parse.c:921)
==25044==    by 0x10B747: main (main.c:341)
1 Like

Simply do not do this, white space is arguable the worst possible delimiter use something explicit here like ; that will allow you to detect empty fields. All you need to do is something
like:

orig_IFs=${IFS}
IFS=";" read -t ...
IFS=${orig_IFS}

to have read use proper delimiters...

root@OpenWrt-1:~# stdbuf -oL tsping --print-timestamps --machine-readable='=' --sleep-time 100 --target-spacing 1000 9.9.9.10 9.9.9.9
Starting tsping 0.2.2 - pinging 2 targets
1678133127.068839=9.9.9.10=0=72327019=72327044=72327044=72327068=49=24=25
Something went wrong while sending: -1
1678133129.167795=9.9.9.10=1=72329120=72329143=72329143=72329167=47=24=23

This was a general remark about not using ' ' or '/tab' as delimiters... delimiters need to be human readable and allow for unambiguous encoding of empty fields.

diff --git a/args.h b/args.h
index 3296020..6262cbd 100644
--- a/args.h
+++ b/args.h
@@ -101,6 +101,8 @@ parse_opt (int key, char *arg, struct argp_state *state)
                
                case ARGP_KEY_ARG:
                        arguments->targets = realloc(arguments->targets, (arguments->targets_len + 1) * sizeof(struct sockaddr_in));
+                       arguments->targets[arguments->targets_len].sin_family = AF_INET;
+                       arguments->targets[arguments->targets_len].sin_port = 0;
 
                        if (inet_pton(AF_INET, arg, &arguments->targets[arguments->targets_len].sin_addr) != 1) {
                                printf("Invalid IP address: %s\n", arg);

Tested on desktop and on the router, this gets rid of the error.

@Lochnair could you please release a fixed version? I don't care for #7 (sleep duration syntax) right now, but uninitialized variables are important. And this is a good time to add the setbuf() call.

2 Likes

I had a go at one of your irtt data sets (with octave*) and I am probably not understanding the data all that well, but could it be that the clocks of both sides are badly synchronized and both drifting? That, or I am looking at the wrong fields of irtt's json output :wink:

*) I use matlab at work so this is the only thing I am somewhat fluent in, but boy compared to commercial matlab, octave is a bumpy ride...

P.S.: The json parsing is still dog-slow, but you only need to do this once per input file, I just needed to find a loading function that actually works, and then massage irtt's output into a simple 2D table to make plotting simple, but since that is now done, refining the plot should not be witchcraft.

P.P.S.: Here is the current state of the plotting:


Since the clocks are out of sync this only shows the RTT, but adds the inter-quartile mean (the number we assume Ookla's speedtests.net reports) and the number of lost probes separated by direction. What is missing are for each of rtt/send/receive CDF and distribution plots. Unlike with the autorate data we have no achieved throughput or CPU load numbers to show concurrently... However irtt timestamps appear to be unix epoch in nanoseconds, so this could be co-plotted with autorate data containing throughput data....

2 Likes

Good catch. Will get this fixed in a moment.

I did try to use Valgrind to catch issues like this, but had some issues with missing debug symbols in glibc that I didn't get fixed before tonight.

1 Like

Really neat that you've written this. I'm not aware of any decent ping binary that offers round robin ICMP type 13 requests. So a nice addition to the OpenWrt landscape.

I believe these kinds of things are where I can best contribute to this effort, so I'm happy to be able to help :slight_smile:

Also pushed the fixes I've got so far as v0.2.3:

  • Fixed the memory corruption issue @patrakov found
  • Made the "human-readable" output match the machine output
  • Disabled buffering on standard output
  • Plus hopefully improved error messages when things go wrong
2 Likes

I saw that same drift in the Julia code. Julia parses the json virtually instantaneously (once the initial call which involves a compile is done) so you can easily plot many datasets. The first plot takes 23s (to load all the packages). The next plot takes 0.12 s to read the data and make the plot.

Julia 1.9 is about to come out and it should dramatically lower startup times. I would guess maybe 3-5 seconds to start up based on the good stuff I'm hearing about 1.9.

I didn't spend much time designing a good plot visualization, and I find the offsets confusing... what is -400 ms delay (the red line) what is +500ms (green line)? What are those measured relative to? I think probably for interpretability they should have some offset added to them, but I don't know what. if someone can figure out what is going on there I'm happy to do the calcs though.

It kind of looks to me like rtt = send + receive, why receive is negative I don't know.

That is because one way delay measurements require synchronized clocks...
Let's assume a true RTT of 100ms on a symmetric path so both OWDs equal 50ms:
client clock 1 second ahead of server

  1. client sends packet at local time 0, which is time -1000 at the server
  2. packet arrives at server at 50ms client time, but the server stamps it with receive time -950ms and sends it back with send timestamp -951(1ms local processing time)
  3. client receices packet at local time 101.

Client now calculates OWDs:
up: server receive - client send: -950 - 0 = -950
down: client receive - server send: 101 - (-951) = 1052
rtt: up+ down: -950 + 1052 = 102

For true symmetric paths one can try to assume path delay to be symmetric as well and try to correct for it (NTP does that under the hood) but that is suboptimal if done during load tests like the speedtest here where the link experiences variable directional queueing delay, violating the equality assumption above....

Thank @dlakelan @moeller0 @Lynx for your help today. A draft of my preso is available for further comment here:

Key for you guys is do you want me to point at the code, or the conversation? The conversation here is often fascinating. I also need to add a ton more links that I have to find.

If you have any other nightmares worth sharing, I intend to pound through the last 18 slides like the last scene in Bladerunner.

It is very late here in the US, I am presenting at around, oh, 8:30? 9? AM PST

1 Like

For slide 35 depending on what you're trying to show you might consider instead using e.g. this:

done! thank you!

2 Likes

@moeller0 I looked through some of the timecourse plots in this thread and there are so many. Like also this:

I'm not sure which is best. Maybe it doesn't matter as all of them show the general idea.

2 Likes

Dave seems like a good talk. Is there a video? Glad you did finally get something from the Julia code to put in your slides (slide 12)

3 Likes

Fantastic - really looking forward to testing. I imagine ultimately this will replace fping as the default ping binary in cake-autorate. Tremendous work in piecing this together.

1 Like