CAKE w/ Adaptive Bandwidth [August 2022 to March 2024]

Thanks, so the whole thing starts with a massive RTT increase that last for several seconds, so the drop to minimum rate is as expected. The question now is why does it stay there for so long, in spite of both rates staying well below the minimum rates?

I think you may be simplifying the calculation, the chance of getting 2 or more weird things in 4 independent tries is going to be given by 1-p(0)-p(1) where p is a binomial PMF with p= .05 but the bigger issue is the assumption of independence. When congestion occurs due to random internet activity it will likely come in a burst and subsequent delays will be much more likely after the first delay.

1 Like

I tried to recreate a better example of this stickiness to low rates issue, but I've had more difficulty reproducing it than I thought.

I set the base rate to 5 Mbit/s and it struggles to break out significantly. But this is perhaps not a fair test. Does it reveal anything to you I wonder?

The second log is missing the header, I can not read them in easily, sorry.

Yes I as certainly assuming a spherical internet with unity mass and zero friction... or essentially pure random noise as driver for high RTTs. But conceptually it seems not completely unreasonable to ask how often to expect certain occurrences even if my estimation was pretty naive.

I also agree on the bursty nature of issues to be expected, but that makes it harder to estimate numbers :wink:

Hi folks,

I am new openwrt user and just installed SQM and because I have a LTE wireless option with Xplornet (50 down/ 10 up).

I have set up autorate bandwidth as:

min_dl_shaper_rate_kbps=15000 
base_dl_shaper_rate_kbps=25000 
max_dl_shaper_rate_kbps=55000  

min_ul_shaper_rate_kbps=3000  
base_ul_shaper_rate_kbps=6000 
max_ul_shaper_rate_kbps=10000

Before I installed auto-rate testing the speed I would get speeds over 30000 down and 8000up.
With auto-rate its constantly on the bottom side of things:

The latest qc qdist ls reports:

qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc cake 8009: dev eth0.2 root refcnt 2 bandwidth 3060Kbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 34
qdisc ingress ffff: dev eth0.2 parent ffff:fff1 ----------------
qdisc noqueue 0: dev br-lan root refcnt 2
qdisc noqueue 0: dev eth1.1 root refcnt 2
qdisc noqueue 0: dev wlan0 root refcnt 2
qdisc noqueue 0: dev wlan1 root refcnt 2
qdisc cake 800a: dev ifb4eth0.2 root refcnt 2 bandwidth 15454Kbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms noatm overhead 34

I have set up SQM with 50000 download and 9900 upload using cake/piece of cake and in the Link Layer I have set the option Ethernet with overhead : select for VSDL2 with overhead 34. (I dont know if this is the right one to choose for a LTE wireless connection, but the manual is vague imo).

What can I do to improve speed?

Welcome to OpenWrt!

The bandwidth measured without sqm represents what's possible with accepting massive bufferbloat. What cake-autorate accepts through depends on your connection and settings. By default it's set rather aggressively in terms of offering low latency and this can be relaxed if you can tolerate more latency.

Can you run a few speed tests and generate and then upload log file (readme details how to export log file)? Then we can get a picture of what is going on.

And please also paste your config?

I will have a go at a considerably less ambitious but also less complicated approach, just extract the low absolute load delay samples from each log file and report the max 95% and 99% over all reflectors (as well as the number of samples in that group)... that should allow us to figure out if these values can be used to base a threshold on.

I appreciate the help.
I have included the link.

Normally speednet shows latency in the 50-60ms unloaded with speeds jump at different times of the day, one time it reaches 55mbps and other times its in the 10-15, but most of the time its within 25-35 range.
Uploads its pretty consistent in the 7-8 range.

This was also achieved with SQM-cake only.

With cake/autorate uploads does not go over 3k and downloads around 15k.
With such low speeds then the bufferbloat is enormous and doesnt feel right.

Quick note the parser will choke on the SHAPER records, as until now it never saw log files with SHAPER records. Will fix.

I think your delay thresholds are simply too low, note that even without much load you persist close to the minimum rate. Try to increase the thresholds by say a factor of 1.5.

What should happen is that the rates slowly inch up to your configured baseline, but even outside of the download peaks close to ~15 Mbps, we see sharp rate reduction steps...

1 Like

I started autorate in manual mode to look at the output:

DEBUG; 2022-11-14-00:29:44; 1668400184.214326; Starting CAKE-autorate 1.1.0
DEBUG; 2022-11-14-00:29:44; 1668400184.216444; Down interface: ifb4eth0.2 (15000 / 25000 / 50000)
DEBUG; 2022-11-14-00:29:44; 1668400184.218474; Up interface: eth0.2 (3000 / 5000 / 10000)
DEBUG; 2022-11-14-00:29:44; 1668400184.220657; rx_bytes_path: /sys/class/net/ifb4eth0.2/statistics/tx_bytes
DEBUG; 2022-11-14-00:29:44; 1668400184.222640; tx_bytes_path: /sys/class/net/eth0.2/statistics/tx_bytes
DEBUG; 2022-11-14-00:29:44; 1668400184.224788; log_file_path: /var/log
DEBUG; 2022-11-14-00:29:44; 1668400184.313899; Warning: bufferbloat refractory period: 300000 us.
DEBUG; 2022-11-14-00:29:44; 1668400184.335878; Warning: but expected time to overwrite samples in bufferbloat detection window is: 1500000 us.
DEBUG; 2022-11-14-00:29:44; 1668400184.338189; Warning: Consider increasing bufferbloat refractory period or decreasing bufferbloat detection window.
DEBUG; 2022-11-14-00:30:03; 1668400203.763477; no ping response from reflector: 1.0.0.1 within reflector_response_deadline: 1s
DEBUG; 2022-11-14-00:30:03; 1668400203.785959; reflector=1.0.0.1, sum_reflector_offences=0 and reflector_misbehaving_detection_thr=3
DEBUG; 2022-11-14-00:30:22; 1668400222.029982; no ping response from reflector: 8.8.8.8 within reflector_response_deadline: 1s
DEBUG; 2022-11-14-00:30:22; 1668400222.032505; reflector=8.8.8.8, sum_reflector_offences=0 and reflector_misbehaving_detection_thr=3
DEBUG; 2022-11-14-00:30:40; 1668400240.314502; no ping response from reflector: 8.8.4.4 within reflector_response_deadline: 1s
DEBUG; 2022-11-14-00:30:40; 1668400240.327080; reflector=8.8.4.4, sum_reflector_offences=0 and reflector_misbehaving_detection_thr=3
DEBUG; 2022-11-14-00:31:55; 1668400315.362999; no ping response from reflector: 8.8.4.4 within reflector_response_deadline: 1s
DEBUG; 2022-11-14-00:31:55; 1668400315.366103; reflector=8.8.4.4, sum_reflector_offences=0 and reflector_misbehaving_detection_thr=3
DEBUG; 2022-11-14-00:31:56; 1668400316.392848; no ping response from reflector: 8.8.4.4 within reflector_response_deadline: 1s
DEBUG; 2022-11-14-00:31:56; 1668400316.406329; reflector=8.8.4.4, sum_reflector_offences=0 and reflector_misbehaving_detection_thr=3
DEBUG; 2022-11-14-00:32:12; 1668400332.653389; no ping response from reflector: 1.1.1.1 within reflector_response_deadline: 1s
DEBUG; 2022-11-14-00:32:12; 1668400332.667514; reflector=1.1.1.1, sum_reflector_offences=1 and reflector_misbehaving_detection_thr=3

What are the warnings about and do I need to change anything?
Also I noticed that its getting NO PING on some reflectors... Is this normal?

No typically not, the code tries to replace unresponsive reflectors, but the initial list is not that long and so it will try to bring these reflectors into service quite quickly again...

@Lynx, maybe we should increase the default list of reflectors to say 10 or so so that the health check has a chance of improving things?

Or we could take inspiration from @tievolu and create a separate script that prunes a large list of reflector candidates down to a reasonable set (say 3 times the number of concurrently used reflectors)?

1 Like

Yes seems like we should. At the moment we use Google, Cloudflare and Quad9. Do you happen to know any other big hitters we should add?

By the way, shouldn't we find it odd that @hammerjoe sees so many missed ICMP responses from Google and Cloudflare?

Could be similar to @patrakov's ISP that has some general ICMP limits, so maybe all that is needed is easy up on the aggregate measurement frequency a bit. For testing I would reduce the frequency by at least a factor of two and see whether that results in less missing replys.

I have not thought about this deeply, anything heavily anycasted should do....
Maybe @tievolu could offer a proposal or @_FailSafe, please?

A few options:

94.140.14.14     AdGuard DNS
64.6.64.6        Neustar DNS 
208.67.222.222   OpenDNS 
185.228.168.168  CleanBrowsing DNS
149.112.112.112  Alternative Quad 9 DNS

There are lots of other alternate IPs for these providers too:

AdGuard

94.140.14.15
94.140.14.140
94.140.14.141
94.140.15.15
94.140.15.16

Neustar

64.6.65.6
156.154.70.1
156.154.70.2
156.154.70.3
156.154.70.4
156.154.70.5
156.154.71.1
156.154.71.2
156.154.71.3
156.154.71.4
156.154.71.5

OpenDNS

208.67.220.2
208.67.220.123
208.67.220.220
208.67.222.2
208.67.222.123

CleanBrowsing

185.228.168.9
185.228.168.10
185.228.169.11
185.228.169.9
185.228.169.168

Quad 9

9.9.9.10
9.9.9.11
149.112.112.10
149.112.112.11
3 Likes

Great, I think we should add like two IPs per public DNS provider, as these likely are anycasted and will expect some traffic/ICMP traffic, or we can add all to the candidate list and simply pick N randomly from the set, and afterwards just do round-robin replacements if necessary?

2 Likes

I think whatever autorate is doing is making explornet throttle my speeds. Yesterday I tested it all day and they would not go over 15-20dpwn and 3up.
I stopped autorate and after a couple hours it went up to 40-50down again and 8up.
I deduct that whatever autorate is doing is deemed too agressive for them.

One thing with xplornet wireless internet and I think its the same for others as well is that altough the speeds do vary thru out the day they do not constantly swing.
ie if the tower is congested then it reduces the speed to say 50% of the plan which means that it will hover around the 25mbps with a bit of fluctuation for quite some time.
It will not swing between 30mbps one second and 15 the other and then 40 the next and so on.
So I think there is no need to check for the dl and up speeds every second.
I am thinking 5 seconds or even longer is probablly more than enough because sqm/cake should still be able to handle that sudden change of speed for that amount of time imo.

So what settings do I need to change so that autorate only checks every 5 seconds?
is it reflector_ping_interval_s? I changed it to 2.

@hammerjoe the first thing you need to do is to increase the delay thresholds in the config file:

# delay threshold in ms is the extent of OWD increase to classify as a delay
# these are automatically adjusted based on maximum on the wire packet size
# (adjustment significant at sub 12Mbit/s rates, else negligible)  
dl_delay_thr_ms=20.0 # (milliseconds)
ul_delay_thr_ms=20.0 # (milliseconds)

(this is still a small change, so you ciuld try larger numbers if these do not work).
According to your cake-autorate_config.sh you had these at the default value of 12.5 each, which is too low for your link (this is in addition to any other issue). To elaborate your idle RTTs cross these thresholds sufficiently often that autorate gets essentially stuck on the configured minimal rates, while under idle conditions without load we actually expect the shaper to slowly creep up to the configured baseline rates. The fact it does not do that even outside of the bigger pink load spikes on your link implies you need to adjust the threshold.

Now, te threshold is something that needs to be adjusted for each link anyway... we might come up with an more automatic way of proposing threshold values, but in the end a network's administrator (aka you for your own network) will need to set these values as there is not going to be one size that fits for all in all circumstances.

I would propose we try this first and only embark on the ping frequency question after we know whether adjusting the thresholds did help or not.