CAKE w/ Adaptive Bandwidth [October 2021 to September 2022]

Well we know from @tievolu that it works, and I just ran a quick test, manually monitoring the output of:

hping3 9.9.9.9 --icmp --icmp-ts -i u200000 -c 1000 2> /dev/null | tail -n+2  | \
    awk 'BEGIN { RS = ""; FS = " "; OOB_val= 3600000 ; min_uplink_time=OOB_val; min_downlink_time=OOB_val; orig=OOB_val ; min_rtt=OOB_val   }; \
    { rtt=$6; sub(/rtt=/, "", rtt); orig=$10; sub(/Originate=/, "", orig); rx=$11; sub(/Receive=/, "", rx); \
    tx=$12; sub(/Transmit=/, "", tx); uplink_time=rx-orig; downlink_time=orig+rtt-tx; \
    min_uplink_time = uplink_time<min_uplink_time ? uplink_time : min_uplink_time; \
    min_downlink_time = downlink_time<min_downlink_time ? downlink_time : min_downlink_time; \
    min_rtt = rtt<min_rtt ? rtt : min_rtt; } \
    { print orig, rtt, uplink_time, downlink_time }'

while running a speedtest, and I mostly saw changes in the expected uplink or downlink time. It is just that somehow my ISP improved its bufferbloat so that the spikes are not that extreme any more.
But I also see shifts in the baseline values reported without load (by a few ms). I have no plots to show for though :wink:

As I wrote to @dlakelan for some ISPs the fist hops are already shared with thousands of users, so I am not sure whether that solves our problem, but it certainly helps enlarge the pool of potential reflectors....

That just keeps the DOS inside your ISPs network... the point is publicly facing anycasted IPs might be prepared to more unusual traffic than gear in your ISPs network that really only responds to ICMP as a courtesy. But then I might have this backward and one's own ISPs might actually be willing to configure robust timestamp servers if that improves user satisfaction. And that ISPs will certainly know which users to contact if the traffic is considered too much.

That part is potentially not considered to be acceptable by an ISP.
Side-note for LTE/5G it would be excellent if the carrier would offer a reliable timestamp server as close to the base station as possible.

That's a good point.
However, another beauty of keeping the reflector targets inside the 1st hop's is simply the reduction of causes for bufferbloat occurements that have nothing to do with the connection itself at end user side, for example if there's a problem between peering points, there's little we can do to reduce that, but on big anycast targets, our script still does unnecessary bottleneck us down

tl;dr: reducing variables where we can't do anything about it

Only from the standpoint that this would be a security risk, which it would if you do a full probing port - scan. but it isn't, all it does is: do ping's with icmp timestamp requests. If that would be illegal, Many customers which accidentally open the network browser (wich actually does more than nmap in the configuration above) on Windows with target anything outside their homenet could be abused.

1 Like

Not saying illegal here, but it might get your ISP's intrusion detection team on your phone :wink: But that is not really a show-stopper, just something to keep in mind. Also to keep in mind that the hosts in a subnet are not necessarily physically close. For my ISP the members /24 of the first hop seem to be distributed across Germany (judged from their DNS names these are located in ~5-7 large cities which host my ISP's PoP) sure that would still be in my ISP's backbone and hence will not suffer from peering, but all these nodes seem shared by 1000-100000s of users.

That said, I think we should keep overload in mind, but not let that stop us from figuring out whether this approach actually works reliably. If it does it might be possible to convince carriers to house time reflectors for their own mobile customers. So let's confirm that we have a robust and reliable solution at hand :wink:

2 Likes

The timestamp stuff definitely works - I've been doing it for about a year now (I started playing with it last September). I posted a couple of example plots from my script a few weeks ago where you can see it successfully mitigating bufferbloat by adjusting the upload bandwidth while leaving download alone.

3 Likes

A few months ago I found that the very first hop on my Virgin Media connection answers timestamped pings, albeit with a crazy offset of over 30 days (-2886528499ms).

I was concerned that using this as a reflector wouldn't detect bufferbloat happening elsewhere in Virgin Media's network though, which is why I opted for external DNS servers. I reasoned that these would provide the best model of responsiveness to other external sites.

2 Likes

Well, the standard allows other time bases than milliseconds past midnight UTC be demands that the highest bit of the 32bit timestamp value to be set, may that is whet they are doing here?

That is true, but only helpful if you know exactly which network paths you want to optimize bufferbloat over. As an example my former ISP is known for its stingy peerings with Cogent which during peak times only allowed a few Kbps/Mbps of traffic from my home; if I would have targeted a reflector there with this script all my traffic would have been throttled to that ridiculously low rate as well. I guess the point is one needs to decide which link one wants to optimize latency over, so users will need to be able to fill in their own reflectors. For a link like @Lynx's however where it is dominantly the access ink itself that is the culprit, getting close by reflectors seems sane, no?

for my understanding (and I'm happily corrected if I'm wrong), if anywhere a decent amount of ping spikes occure after the first hop, there's nothing you can do about it because your setup is not the reason for the spikes, and reducing bandwith is not a direct correlation to reducing those spikes anymore then

1 Like

Probably. Who knows what time they're putting in there! But as discussed previously it doesn't matter as long as the offset is consistent.

Agreed. This is why multiple reflectors hosted by different companies are needed too. One of the ones I use was completely unresponsive for about an hour a couple of days ago.

1 Like

I don't know to be honest. I reasoned that reducing your bandwidth usage in response to a bottleneck anywhere in your ISP would probably be better than doing nothing at all, but I can totally imagine situations where this would be utterly futile.

EDIT: I also only touch the bandwidth if I'm actually using it. High latency that isn't associated with significant bandwidth usage is obviously outside of my control.

1 Like

Yes, one needs to figure out which path one wants the bufferbloat controlled over, but it is always going to adjust to the smallest pipe across the path (which might change depending on load, say in the night the access link might be the limit at peak time the ISPs peering link might actually be the issue). For most users, I would assume controlling the access link might be the best approach that matches what they expect, and for everybody else there is the list of reflectors ;).

Sure, that makes sense.

The approach to steer bandwith to services which you're actually using is totally logical.

The problem I see with this approach is more: with the current abilities of this script, the approach to limit bandwith for several target reflectors, it would limit the bandwith to the poorest reflector

So for example: you have 3 services/reflectors which you use, 2 of them have no problems on latency up to 200Mbit/s connection, but one service is bottlenecked to let's say ~ 30MBit's, then this script limits your bandwith to this (if your minimum bandwith is set below), or to the min set bandwith.

I see this as a suboptimal solution, so i suggested for reflectors that target ping spikes which are coming directly from the "last mile" to the modem

in an Ideal world, the bandwith steering should be steered by connection, and not by interface, but currently i see no realistic solution either performance wise, nor reliability wise (only a few targets respond with timestamps)

1 Like

Not sure about the script you guys are developing, but in my own script I query 3 reflectors and I only modify the bandwidth if all 3 of them exhibit high latency during the same 20 second period, with "high latency" defined as at least 3 of the most recent 10 pings being above a threshold, with pings sent out every 2 seconds (upload and download pings counted separately).

ATM we select the minimum delays, so this will be biased towards the least congested path, which with sufficient diversity in reflectors should be mostly correlated with the actual access link bufferbloat.

Except we select the minimum dellaOWDs which will be dominated by the least congested reflector, no?

In theory I agree, but without buy in from an ISP I am not sure how willing such nodes will be to keep responding to our probes.

Ah, but that is "simple" as long as the bottleneck link employs competent AQM this will work out of the box without further need from our side. The whole issue crops up because our own competent AQM needs to make sure the over-sized and under-managed buffers along the path will never over-fill and bestow unwanted latency under load on every packet :wink:

With our current approach we cam, there we agree, at best control a single path, and only if all reflectors are only reached via that specific path, wich again comes to our help, because the one, likely problematic, segment all paths share is the access link itself.

1 Like

We aim at picking the smallest deltaOWD (per direction) of any of the reflectors, which in essence is another way to get to "all reflectors" show deltaOWD above threshold. But other consensus mechanisms are certainly possible....

3 Likes

Then my apologizes, i understood the actual approach wrong.

so there's only a need for a minimum amount of "good" reflectors then, and with the approach to use the the minimum delta from all out of these.

seems like a good compromise, I agree

In addition to this, it’s also possible to move this awk command outside of the shell script and into a .awk file that can be invoked via awk from within any script(s). That can help with maintenance and allows the awk command itself to be syntactically written in a cleaner format. Happy to share an example if my description isn’t clear.

1 Like

In the spirit of:

@moeller0 what do you reckon? Download saturation looks great (haha, in a perverse sense of course!), but upload saturation doesn't do so much. But perhaps that is as expected?

The downlink OWD also wavers a little more. It's a bit more shaky:

image

1 Like

I saw similar yesterday when playing with gping while running speedtests with and without SQM active. I would really like to learn what TCP variant is used in these uploads. It could well be BBR and that tries to adjust based on its estimate of capacity and RTT, so might avoid too bad bufferbloat.