Issues with latency spikes when using SQM/Cake (slooow ramp-up during speedtests without SQM to blame?)

wumme · June 16, 2025, 2:54pm

I've just done this. These are the results:

That file doesn't seem to get created on my machine?

That test seems to be particularly bad at catching the initial latency spike which is the sole reason I created this thread in the first place

With SQM still active:

With SQM temporarily disabled:

Lynx · June 16, 2025, 2:56pm

Oh sorry that should be:

tail -f /var/log/cake-autorate.primary.log | grep SUMMARY

Lynx · June 16, 2025, 3:00pm

Are those funky new charts generated by AI code?

Lynx · June 16, 2025, 3:01pm

To me this looks problematic:

Namely those huge latency spikes.

Can you upload the data file that this relates to?

wumme · June 16, 2025, 3:05pm

Gah, that's something that even I should've caught! Sorry...

Of course, here's the output of the command you've provided me with (the one showing only the SUMMARY lines) and here's the whole log file. If you want me to upload the files somewhere, instead of just the content, please tell me so.

Yes. At some point, it asked me if I want it to add even more graphs that could make debugging the algorithm easier and I replied with "okay". Please don't ask me what any of this means.

Right?!

Lynx · June 16, 2025, 3:12pm

cake-autorate is acting as expected, but I'm not sure whether it's helping or not in terms of these latency spikes:

Does the drop in the cake download rate actually help prevent the latency from spiralling out of control at the spike events?

How does the connection feel with and without cake or cake-autorate?

I hope @moeller0 can chime in here. I'm struggling for time at the moment.

wumme · June 16, 2025, 3:44pm

I think so?

Neither my roommate nor I really care whether get 150, 120 or 100 mbps of bandwidth. All we care about is that we can play online games (Call of Duty specifically ) while the other watches videos on YouTube or downloads stuff. For this CAKE seems to work, mostly. Except for that initial, massive latency spike and that "extrapolation" warning for as long as the bandwidth is somewhat saturated. Whenever that warning shows up in-game, there's a perceived delay, that both of us have noticed. But that's incredibly hard to test for and put into numbers, as it doesn't seem to reflect in ping at all - at least as long as SQM is active. Without SQM, the ping reflects more closely what the gaming experience feels like.

In that regard, I'm sorry to report we couldn't feel (I feel so stupid for using this word when talking to experts) much of a difference when using autorate or not.

I appreciate it so very much that you two even replied to this at all. At this point I assume the source of this problem lays somewhere outside of our LAN. I'll probably call the ISP tomorrow again... really not looking forward to that.

Lynx · June 16, 2025, 4:00pm

A crude test would be to set the cake bandwidth to a fixed 10Mbit/s download and upload. If you still see big latency issues - the initial spike and trouble on saturation - with that then cake-autorate can't fix things.

wumme · June 16, 2025, 5:02pm

Curiously, there's no latency spike. I've used FLENT again, because it best illustrates the problem.

Lynx · June 16, 2025, 5:28pm

In that case I believe this issue would be solvable using cake bandwidth adjustments (cake-autorate) in principle.

Can you show me the same flent run but with cake-autorate on your present settings?

Can you also try setting the base cake-autorate bandwidth to 10Mbit/s download and upload? This would allow controlled bandwidth excursions (from 10MBit/s to say 100 or 200MBit/s) for downloading big files, subject to latency not spiking, but it would keep the initial bandwidth in safer territory, which might work better in respect of sudden loads.

min_dl_shaper_rate_kbps=5000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=10000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=100000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=5000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=10000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=100000  # maximum bandwidth for upload (Kbit/s)

Maybe this change would be enough to solve your issue.

I am speculating that the issue arises because your connection cannot tolerate very sudden increase in bandwidth usage:

and that the increase in bandwidth usage has to be more gentle:

wumme · June 16, 2025, 5:36pm

Thanks for the suggestion!

Lynx:

min_dl_shaper_rate_kbps=5000  # minimum bandwidth for download (Kbit/s)
base_dl_shaper_rate_kbps=10000 # steady state bandwidth for download (Kbit/s)
max_dl_shaper_rate_kbps=100000  # maximum bandwidth for download (Kbit/s)

min_ul_shaper_rate_kbps=5000  # minimum bandwidth for upload (Kbit/s)
base_ul_shaper_rate_kbps=10000 # steady state bandwidth for upload (KBit/s)
max_ul_shaper_rate_kbps=100000  # maximum bandwidth for upload (Kbit/s)

However, with this, the SQM settings in LuCI set to Download speed (ingress) and Upload speed (egress) to 10000 and your previous sugesstions, unfortunately I still get this dreaded latency spike:

Lynx · June 16, 2025, 6:31pm

Doesn’t make much sense to me. Try setting min and max to 10Mbit/s too. This should be identical to setting fixed bandwidth at 10Mbit/s.

wumme · June 16, 2025, 9:16pm

Glad I'm not alone

Done that (and of course I restarted the cake-autorate service, too).

So... yay, I guess, the latency spike is gone (again) and it only costs us about 95% of the bandwidth that we pay for

Btw. using those settings, this is what the latest iteration of the AI generated Python script shows:

Lynx · June 16, 2025, 10:02pm

OK I suggest reducing rate of bandwidth increase on high load:

shaper_rate_max_adjust_up_load_high=1.02 # how rapidly to increase shaper rate upon high load detected (max increase)

wumme · June 17, 2025, 8:23am

But with option download and max_dl_shaper_rate_kbps set to 135000, and option upload and max_ul_shaper_rate_kbps set to 35000 again, correct?

When using autorate, do the bandwidth settings in /etc/config/sqm even matter anymore?

Lynx · June 17, 2025, 10:16am

Yes.

This just sets initial bandwidth but cake-autorate will overwrite it as soon as it starts running.

All cake-autorate does is adjust the cake bandwidth in dependence upon measured load and latency - if load increases without latency increase, cake bandwidth is allowed to increase and if latency increases then the cake bandwidth is reduced, and bandwidth returns to base when there is no load. This is a very crude explanation.

wumme · June 17, 2025, 11:45am

Thank you so much again for taking the time to explain so even I can understand, I really appreciate it!

With these settings, the latency spike seems still gone, but bandwidth ramp-up is (as expected) pretty slow.

I'll try to use this as a starting point and try to tweak the parameters a little more.

wumme · June 17, 2025, 2:40pm

There's just one more question, using the default diffserv3 instead of besteffort, most of the (outgoing) traffic still gets put into the Best Effort bin:

qdisc cake 804e: dev ifb4pppoe-wan root refcnt 2 bandwidth 35Mbit diffserv3 dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100ms noatm overhead 34 mpu 90 memlimit 32Mb
Sent 254147570 bytes 315043 pkt (dropped 185, overlimits 549086 requeues 0)
backlog 0b 0p requeues 0
memory used: 1057600b of 32Mb
capacity estimate: 35Mbit
min/max network layer size:           36 /    1492
min/max overhead-adjusted size:       90 /    1526
average network hdr offset:            0

                 Bulk  Best Effort        Voice
thresh       2187Kbit       35Mbit     8750Kbit
target          8.3ms          5ms          5ms
interval        103ms        100ms        100ms
pk_delay          0us        446us        798us
av_delay          0us         66us         41us
sp_delay          0us          3us         30us
backlog            0b           0b           0b
pkts                0       315133           95
bytes               0    254402022        19940
way_inds            0          617            0
way_miss            0         1183           48
way_cols            0            0            0
drops               0          185            0
marks               0            0            0
ack_drop            0            0            0
sp_flows            0            3            8
bk_flows            0            1            0
un_flows            0            0            0
max_len             0         1492          387
quantum           300         1068          300

Shouldn't the majority of packets be in Bulk? How does OpenWrt decide what traffic gets put into what tin? Can I nudge it somehow to make more appropriate decisions?

Lynx · June 17, 2025, 3:25pm

The obvious next tweak for me is to try increasing the ping frequency. But I hope @moeller0 can advise on ideas here.

To work on this you need to start marking traffic using DSCPs. This is a later optimisation to work on. Like cherry on cake whereas now you have a mud pie.

moeller0 · June 17, 2025, 3:33pm

No, Bulk is the name for a lower priority class, say if you send Bulk traffic all you care is that it arrives eventually. Cake will just look at each packet's DSCP header field and sort (a few) DSCP values into its priority tins. It is up to you to make sure that the packets you send out have the desired DSCP marking. There are a few projects out there that help in doing that and that also tackle the harder problem of how to make sure that incoming packets have meaningful DSCP values for your network.
Just google for cake-qos-simple, qosify or qosmate (all for OpenWrt) and pick your preferred one. cake-qos-simple is @Lynx's brainchild, like cake-autorate. Currently, I believe qosmate is under most active development, but maybe the other options are simply already mature enough.