Set CAKE (SQM) RTT automatically

Note

I'm using Gemini to correct my grammar mistakes, so sorry if this sounds like something written by AI :sweat_smile:

Here's a simpler version to set up in OpenWrt compared to the previous project: cake-autortt.

Motivation

After using CAKE (and fq_codel) for probably at least two years across different hardware and software (sometimes OpenWrt & MikroTik), I've found that the bandwidth and RTT parameters provide the most immediately noticeable differences for CAKE. While DiffServ can sometimes be helpful, it highly depends on how packets are marked. I've encountered edge cases where applications likely mismarked packets, causing some devices to be classified as "Bulk" instead of "Video" or "Best Effort." For my specific use case, DiffServ4, DiffServ8, or Best Effort don't really matter, as the per-flow isolation probably already helps significantly in the background.

Overhead parameters likely offer some benefit, but for a normal user like me, raw, conservative, and other overhead parameters don't yield noticeable differences. Regardless, I'm still using "overhead 44 mpu 96 noatm" today.

I mentioned that the two parameters providing the most immediate results for me are bandwidth and RTT. This is because you can instantly observe bandwidth changes when running speed tests. As for RTT, according to the man page, it's related to the AQM (Active Queue Management), so it probably plays a significant role in the background. However, since the services you use daily (YouTube, Netflix, Twitch, etc.) likely employ CDNs (Content Delivery Networks), you probably won't notice the effect of adjusting the RTT parameter.

The easiest way to benchmark whether the RTT parameter makes CAKE react differently is to change it while performing speed tests to servers far from your country (e.g., if you're in Singapore, try speed testing to servers in Germany or the US). I've noticed that adjusting the RTT parameter can further enable CAKE to handle connections to different servers more smoothly. Also, while dropping packets can prevent bufferbloat, for some real-time applications, packet drops can cause ping spikes, so it's a trade-off. Since AQM handles packet drops, that's likely why correctly adjusting the RTT parameter can yield meaningful results for the services you use daily.

Ookla is great for this since you can manually select the speed test server location. Also, as I mentioned before, the services you use daily are probably large enough to utilize CDNs. However, when you need to access resources like research papers or similar use cases, their websites usually don't employ CDNs. Not all universities, for instance, consider using CDNs to speed up access to PDFs and so forth. Therefore, experimenting with the RTT parameter can sometimes help with these scenarios.

cake-autortt is a simple script that automatically handles RTT checking for you. The defaults are sane and work great for my home network. I'm using the latest OpenWrt version (see the README for further details) on an x86/64 platform, but it should work on other platforms since it's just a shell script.

This script is likely beneficial for people who want to further tweak CAKE to adapt to whatever websites or services they're connecting to.

2 Likes

continents away (100-500ms RTT)

Must be Atlantis since light travels half Ecuator in .067s

As expected... the thing is, slightly overestimating the per packet overhead is benign (that is will not increase latency under load, but will have a small cost in potential throughput), underestimating it however can result in unexpected latency spikes if the link is transporting a large fraction of small packets during saturation... so you are doing the right thing there. The other keywords only exist for convenience and conditions in which every bit of throughput counts... (think ADSL/ATM links).

There really is no correct here... the rtt parameter controls both the cake AQM's interval (which is identical to the rtt value configured) and the target (the amount of standing queue that is deemed acceptable even under continuous load, cake sets this to 1/20th of the rtt value (and a bit higher for background/bulk ins in the diffserv modes) which according to theory optimized network power). The way this works is cake (as all CoDel variants) will measure the sojourn time of packets and will keep track of the minimum sojourn time over an interval's duration, if that stays above the target cake will schedule a drop/mark and also reduce the interval so the next mark drop will be scheduled quicker if the minimum delay stays above the target, that way CoDel's adapt their signaling intensitiy to the traffic responsiveness. The minimum time it takes before the result of such a drop or signal can be expected to become visible at the AQM location is >= 1 RTT, as the signal needs to be received by the receiver and reflected back to the sender, which then adjust its congestion window and reduces the amount of data in flight and that will be felt at the AQM only after all traffic already in the pipe between sender and AQM has been serviced.

So conceptually a flow queueing variant of CoDel would best use the appropriate RTT of each flow as interval, but there is no real robust and reliable way to get that for all traffic (for TCP that can be estimated reasonably well, especially if TCP timestamps are used, but other protocols do not allow to measure that). That way each flow would be given enough time after a signal to actually react to the signal, and consecutive Signals would only be sent if the flow did not react "enough". The consequence of setting the RTT to high in cake (or rather higher than the true RTT) is that is takes a bit for cake to reign in a flow exceeding its capacity and that can create latency spikes for all packets/flows mapped to the same hash bin (in cake that typically is a single flow)), other flows in other bins are typically not affected. The consequence of setting the RTT to low ist that flows with considerable higher RTTs will not be given enough time to reduce their rate and will get more slow-down signals than merited, resulting in lower throughput for those flows. The problem now is to find an RTT value that works for most traffic... without too many side effects.... for normal internet traffic RTT 100ms is a reasonable compromise that keeps intra-flow latency spikes for short RTT flows somewhat constrained while at the same time still allows decent throughput for long range bulk traffic. Generally the observation is, that RTT does not need to be perfectly matched but just on the right order of magnitude (probably vase2 not base10).
Most real time applications are application limited in their throughput and on say a 300 Mbps link mostly stay in the regime that the AQM does not engage at all, so changing the RTT should have little effect on those flows... but if the rate-adaptive real time traffic gets above its "fair capacity share" sure reducing the RTT might help a bit to signal such a flow more quickly to scale back a bit. But make no mistake if the signalling is using drops you will either have a glitch in the traffic or an additional delay required for retransmission of the dropped packet.

That said, if that works well for you, by all means use it... But I would guess that simply setting RTT to say 50 or 25 would work well for most users, as really long RTT bulk transfers are typoically rare, and the few MB a research paper might take will be downloaded quick enough even if the true RTT is 200ms and cake is configured for 20ms.

3 Likes