CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

moeller0 · September 27, 2022, 2:18pm

None of the CDNs or content providers give any guarantee to respond to ICMP requests at all, most do out of courtesy (and because some ICMP processing is required for network stability, but ICMP echos and timestamps are mostly used for debugging). So it is really up to the server operator to decide OK or not OK.

This is different in NTP where the KOD codes can contain stuff like the server telling the client to lower the frequency or to go away....

slh · September 27, 2022, 2:19pm

Of course it is subjective, but based on seeing the impact of automated pings on servers I'm responsible for. Webcrawlers and automated pings are a major performance drain.

Lynx · September 27, 2022, 2:24pm

Understood @slh. Do you think we should abandon pinging public reflectors or reduce the rate even further below 5 Hz?

WNKLER · September 27, 2022, 3:31pm

What about all the hundreds/thousands of "Speed Test" and "Ping Test" sites out there?

Their servers exist to be constantly hammered by these kinds of requests.
I'm guessing our traffic would be much less likely to bother them.

Could they be used as reflectors instead?

moeller0 · September 27, 2022, 4:35pm

Sure, but without first asking for permission/consent we are not better off than with the current set of reflectors.
In the old thread we discussed the possibility of having OpenWrt users volunteer their systems as reflectors, but that in addition to the overload problem has a chicken and egg problem on top....

WNKLER · September 27, 2022, 5:20pm

I found some comments about pinging google's DNS servers HERE and HERE

...Google Public DNS is a Domain Name System service, not an ICMP network testing service.

While Google does not block ICMP or random UDP to 8.8.8.8 or other Google Public DNS IP addresses, there are rate limits on ICMP error replies from Google networking equipment, and ICMP error replies are de-prioritized within Google networks and are thus more likely to be dropped. Even a complete lack of response to traceroute after a certain point may be related to ICMP handling by firewalls...

I also found this white paper which could be of interest: Detecting ICMP Rate Limiting in the Internet

It outlines how to detect rate limiting:

Our first contribution is to provide a new lightweight algorithm to detect ICMP rate-limiting and estimate rate limit across the Internet. Our approach is based on two insights about how rate-limiting affects traffic: first, a rate-limiting will cause probe loss that is visible when we compare slower scans with faster scans, and second, this probe loss is randomized. As a result, we can analyze two ICMP scans taken at different rates to identify rate limiting at any rate less than the faster scan.

And it shares their findings for apparent rate-limiting practices (at that time; paper is from 2017):

First, we use random samples of about 40k /24 blocks to show that ICMP Rate limiting is very rare in the general Internet for rates up to 0.39 packets/s per /24: only about 1 in 10,000 /24 blocks are actually rate limited. Thus it is almost always safe to probe in this range.

Second, we look at higher rate scans (up to 0.97 packets/s) and show the fall-off of responses in higher probing rates is consistent with rate limits at rates from 0.28 to 0.97 packets/s per /24 in parts of the Internet.

Finally, although low-rate scans do not usually trigger rate limiting, we show that rate limiting explains results for error replies when Internet censuses cover non-routed address space.

rosbeef · September 27, 2022, 9:07pm

Your turn Lynx, i make some tests yesterday with stall detection script. here are the results :
https://paste.3cmr.fr/lufi/r/ST3brkFK4p#zyHjpZxul2p80MQaiuJIs2GEWyVPrR+u52VLaIgH0U4=

moeller0 · September 28, 2022, 6:21am

Interesting find, not sure however how to interpret the results. The reported rates seem awfully low, which makes me believe that these are not simply the rate limits of individual devices but something aggregated over the /24s that they operate on.

patrakov · September 28, 2022, 8:41am

I have some relevant operational experience, too. When I worked for SafeDNS, Inc., somebody badly abused one of our servers, by using it as an ICMP reflector, with the IP address of the source packets being likely spoofed, and literally tens of megabits per second of such ICMP traffic. We found that out from a large bill - the service was not impacted at all. Such spoofing (and therefore using the reflector to hide the original source of a DDoS attack), not the actual server resource use, is the main motivation behind server-side ICMP rate-limiting.

moeller0 · September 28, 2022, 9:17am

Mmmh, since we do not spoof the source address (after all we very much want the reply packets back) would we still cause concerns. Or is the fact simply that past abuse has lead to fixed rate limits for all sources? (I would guess the latter as trying to limit by end point address means that mischievious end points now can cause the servers to spend resources on keeping track, while a simple rate limit really only needs minimal state).

patrakov · September 28, 2022, 9:19am

We don't spoof the sources, but there is no way for the other side to know it - that's why the limits on the server side. I don't know why the consumer-side ISPs impose limits.

moeller0 · September 28, 2022, 9:25am

Well intermediate hops are often routers, and these have a day job, where processing ICMP requests is a distraction apparently often not handled by the routing ASIC but punted to the often somewhat slow CPU, which also handles more important stuff for the router than replying to optional ICMP requests. Or so have deduced from some past NANOG posts and the Steenbergen/Roisman traceroute presentation

Lynx · September 28, 2022, 1:46pm

Done! Any more changes @richb-hanover-priv or @moeller0 before I pull all these changes into main and tag '1.1'?

moeller0 · September 28, 2022, 2:00pm

Nah, I owe you some testing of what ever the current state is. Unless you intend to stop development completely on releasing version 1.1, I think there are enough changes in there to get tested by a larger community. If things crop up in testing there is always time for version 1.1.1

richb-hanover-priv · September 28, 2022, 2:03pm

Yes, I think this is a concern. Suppose we're fabulously successful. Suppose we get a million routers to embrace this technology. That's 5 million pings/second, spread across the anycast network of the big providers, which we have conveniently provided as the default four reflectors. I bet there are 200 million routers in the world. If they all implemented some form of ping-based latency control, that would be a bunch of traffic, and may well elicit a response.

I see a few workarounds/alternatives:

It's obviously critical not to ping when there isn't any traffic. CAKE-autorate already does this.
In my Instrumenting CAKE-autorate question, I wanted see if it might be possible to document the behavior of underlying links (fiber, cable modem, DSL, cell modems, etc) to see the actual rate of speed changes. That could inform a proper ping interval for that technology. (Lacking any data, I could imagine that cable modems change slowly over the course of an hour. A couple pings per minute could be sufficient.)
We could spend more time finding suitable reflectors. Obviously, testing to 1.1.1.1 or 9.9.9.9 gets us past all the bottlenecks into the core of the network. But my ISP has equipment a few traceroute hops into their network that honor the type 13 pings. That device could easily survive the latency traffic from customer devices at the edge.
What else?

Thanks.

richb-hanover-priv · September 28, 2022, 2:42pm

I haven't tried it (and won't have a chance to try today). I like the updates to the CHANGELOG and README. A couple comments:

Fussy... I would put the "Present version is 1.1.0..." in italics below the first two paragraphs. (The current version information is less important than those two explanatory paragraphs.)
Have you tested the cake-autorate-setup.sh from a clean install? (I created the commented-out SRC_DIR to pull from the proper branch...)

Thanks!

Lynx · September 28, 2022, 4:07pm

OK well for whatever it is worth, I went ahead and merged the changes from the 'next-generation' branch to the 'main' branch and tagged release v.1.1.0.

As detailed in the changelog, this version incorporates the following changes:

Implemented several new features such as:

Switch default pinger binary to fping - it was identified that using concurrent instances of iputils-ping resulted in drift between ICMP requests, and fping solves this because it offers round robin pinging to multiple reflectors with tightly controlled timing between requests

Generalised pinger functions to support wrappers for different ping binaries - fping and iputils-ping now specifically supported and handled, and new ping binaries can easily be added by including appropriate wrapper functions.

Generalised code to work with one way delays (OWDs) from RTTs in preparation to use ICMP type 13 requests

Only use capacity estimate on bufferbloat detection where the adjusted shaper rate based thereon would exceed the minimum configured shaper rate (avoiding the situation where e.g. idle load on download during upload-related bufferbloat would cause download shaper rate to get punished all the way down to the minimum)

Stall detection and handling

Much better log file handling including defaulting to logging, supporting logging even when running from console, log file rotation on configured time elapsed or configured bytes written to

https://github.com/lynxthecat/cake-autorate/releases/tag/v.1.1.0

@richb-hanover-priv could you possibly update the second post on this thread to reflect new version?

Since the code now accommodates:

OWDs;
a generic ICMP response format; and
the use of different pinger binaries using appropriate wrapper functions

and given the discussion above with @Lochnair, I am hopeful that perhaps sometime soon in a newer version we might be able to work with a new C-based custom pinger binary (see here and here) supporting both ICMP type 13 requests for OWDs and a round robin format for use with multiple reflectors with tight control over timing.

richb-hanover-priv · September 28, 2022, 7:35pm

Done. CAKE w/ Adaptive Bandwidth - #2 by richb-hanover-priv

dtaht · September 28, 2022, 8:32pm

I am pretty sure lwn.net would publish an article on this. Also linux magazine was interested (print) in an article on cake as well. I can put whoever would interested in touch with the editors.

Lochnair · September 29, 2022, 3:46pm

Yes, I hope real life calms down a bit soon so I'll have some time to look at this.
By the way, do you rely on getting timeouts? It's currently stateless, so I'd have to implement some way to track that.