SQM Cake poor perfs (250Mbps, TPLink C7V5)

kpoman · May 11, 2021, 9:52am

Hello everyone,
Configured SQM + Piece of Cake to its simplest form, using a C7V5 on a 250/25Mbps, and I am getting 120Mbps (dl) and 24Mbps (ul) perfs. When disabling it, I get the nominal values. Tried many tuning parameters, overheads, etc... and I cant seem to get something better than 140Mbps.
I checked with 'top' on terminal when doing dslreport, and do see ksoftirqd going to ~35% cpu usage.
Does this mean my device is causing the slowness ? Does it mean using SQM on a C7V5 is useless on fast links ?????
Thank you !

moeller0 · May 11, 2021, 10:06am

Yes and no. It seems clear that this hardware (single core MIPS @ 750 MHz) is not performant enough for traffic shaping at 250+25 = 275 Mbps, but tops out at around 144Mbps as you observed.
The question then is, is your link better usable with SQM @ 120/24 or without SQM@250/25? If the former, then SQM is not useless, if the latter I would say that SQM is not helpful.
I was in a similar situation, and came down to using SQM@ 49/31 because for my use-cases this performed better than 100/31 without SQM, but that is a judgement call/policy you need to decide on yourself.

BTW, with faster access links becoming more and more common (with extremes like 10Gbps home links in Switzerland or France, just to name a few European countries, this issue of home routers becoming underpowered to robustly and reliable service their owners expectations, will become more prominent.

In your case, I guess there are a number of cheapish router's for OpenWrt that will allow traffic shaping in the required range, albeit most of tjose are either ARM or x86 based...

kpoman · May 11, 2021, 1:30pm

Thanks moeller0 for the insights ! I'd love to keep using sqm as I saw much better perfs for live video calls. I do have some devices that are kinda passive, like TV's watching netflix or youtube. Is there a way to have these devices bypassing SQM with, say, a fixed bandwidth of 120Mbps (dedicated to those TV with fixed IP at home), but still for the more active devices like computers, phones, etc., have them going through SQM ? This way, I could get the max of my connection. What do you think ?
If not possible, is there a way to overclock easily or to patch/tune my kernel/firmware (that I am already building with imagebuilder) to get more out of my router ?
Thank you a lot !

moeller0 · May 11, 2021, 2:52pm

Mmmh, in theory, you can use multiple parallel shapers, day SQM for interactive traffic, and a "simple" HTB for TV use, as long as the sum of the two shapers does not exceed your link speed. But that will not help much, as you are essentially still trying to shape 250 Mbps in a device that tops out at 140... Now, you could run the shaper for the TV on a different device and use some iptables/nftables and virtual interface magic to split traffic into TV and interactive on the router and and only shape the interactive traffic there.
IMHO that will be rather painful to configure and relative brittle (if any of the shapers fails, latency under load goes to hell).
But maybe you could think about investing in, say a raspberry PI 4B with an USB3 gigabit ethernet dongle (should be < 100 EUR for everything), which will easily traffic shape your link with all bells and whistles. you can still use the existing router as wireless access point and switch (the pi does not have a switch and its WiFi is not really suited to act as AP, but as a wired only router it work pretty well.) Now there are other cheapish single board computers like the pi around, it is just that there are a number of positive case reports on using OpenWrt on the pi, searching in the forum should find some.

In theory traffic shaping is amendable to overclocking, but I am not sure whether that will suffice to basically double your rputer's CPU speed, as that is what seems needed to saturate your link, no?

kpoman · May 11, 2021, 8:20pm

I'm living in Brazil and prices and availability of newer devices is compromised. I was thinking about a PI 4B but will wait until having some more $.
What I was thinking is having half of my bandwidth not shaped at all, and all bulk crap like TV and passive bandwidth consumers using that "channel" (maybe via some ipset with all these devices), and have the rest going through SQM. Maybe (just imagining), have my WAN splitted into 2 virtual interfaces with equal bandwidth, having some routing for crap through vlink1 and stuff to be shaped via vlink2, something like this, maybe with some low-level linux driver able to create a virtual device with low-level bandwidth-limit.

moeller0 · May 11, 2021, 8:37pm

Ah, okay, makes sense.

That does, unfortunately not work. SQM needs to be in full control of all data that enter the bottleneck link, otherwse bufferbloat shows up again.

Yepp, that would be nice, but really will not work well, as you still need to traffic shape a total of ~250 Mbps, which your router's CPU will not handle. Traffic policing tends to be computationally cheaper than traffic shaping, but the results are also much choppier...

Also it would be nice to be able to offload traffic shapers onto network cards, but at the moment not an option.