SQM Download Speed slower than expected

I upgraded my internet connection to 400/40 Mbit/s. (Docsis 3.0)
Modem syncs with 450560000 bps / 41984000 bps

Sqm is configured to ~ 95% of those values.

Speedtest shows around 350-365 MBit/s.
Without sqm i get ~ 430 MBit/s (-1518/1448 = ~ 450 MBit/s close to the syncrate)
Upping the download rate in sqm upto 450 Mbit/s makes no difference in download speed.
Always around ~350 Mbit/s.
Setting sqm download rate way higher then 450 MBit/s, like 500 MBit/s, does make a difference in download speed.
CPU usage is around 30%/80% (Dual Core)

sqm config:

config queue 'eth1'
	option interface 'eth1'
	option debug_logging '0'
	option verbosity '5'
	option qdisc 'cake'
	option qdisc_advanced '1'
	option squash_dscp '1'
	option squash_ingress '1'
	option egress_ecn 'NOECN'
	option qdisc_really_really_advanced '1'
	option ingress_ecn 'NOECN'
	option script 'piece_of_cake.qos'
	option upload '39000'
	option download '427000'
	option iqdisc_opts 'nat ingress docsis dual-dsthost'
	option eqdisc_opts 'nat egress docsis dual-srchost'
	option linklayer 'none'
	option enabled '1'

I also have a different question about overhead (again :|)
I know, i already had a long discussion on the cake mailing, why overhead 18 is correct for docsis but
my modem shows this:
Maximum Concatenated Burst: 1522 bytes
Why did they choose 1522 bytes here and not 1518 bytes?

Thank you.

What device are you using. What is idle percentage during speed test. I'm quite sure you are CPU limited, there have been about a hundred posts like yours over the last year. To handle those speeds with SQM requires maybe top of line ARM device but most likely an x86. You can probably do just the uplink but downlink speeds require some serious cycles.

I have the feeling that something else is going here.
Is it possible that the cmts somehow is throttling the download rate (or amount of channels used?)when it notices that the client is not able to archive the full download speed?

The most important question to determine if CPU is the issue is what is the idle percentage during speed test. After we establish whether a single core is saturated or not, then we can move on.

It could be useful to install irqbalance as well.

CPU is a 1,3 Ghz arm dual core.
CPU usage during the speed test 30% on one core and 80% on the other core.
irqbalance doesn't work on this device cause the irq driver doesn't allow remapping of the irq affinities.

caiman? Should work I think.

What's your idle %? Is it hitting 0%?

I was surprised at seeing speed limits while never seeing a really high CPU %, when I first started playing with this stuff.

Yes idle is near 0%
I think my isp is doing some stuff there.

There you have it, you need more cpu. Idle is the reliable indicator here.

Hmm i think i got it wrong.

I tought you mean the cpu usage while the cpu is idle.
But obviously you meant the idle value as seen in top.
That is around ~50%

Ok, so that means one of your cores is saturated. I'm pretty sure cake is only operating on one core, so although your CPU has more cycles on the other core, they aren't accessible to you for SQM purposes, I suspect.

It's plausible that irqbalance would help if you could make it work. Maybe someone else knows more about that @anomeome seems to have some ideas?

In the end, I've been saying that people should move to x86 for anything over 200Mbit for a while, SQM is in my opinion an essential feature for a router, and it requires some considerable horsepower at high speeds, more than even ARM can handle.

I have dropped SQM since an update to a docsis 3.1 modem, but on a rango I was using SQM with irqbalance in the past. A little more horsepower there though.

You might also test with fq_codel based simple.qos instead of cake. I noticed a few weeks ago that cake consumed quite a lot of CPU in ipq806x based R7800.

1 Like

If you do this please try upstream sqm-scripts, we changed how we calculate the HTB burst-buffer and quantum to save a few CPU cycles at higher bandwidths (but since we did not enough have meaningful testing yet, this is not yet in the openwrt packages). So if anybody tests this and notices robustly higher bandwidth utilization with the new code, please open an issue at https://github.com/tohojo/sqm-scripts to report this (and in case this does not help, please also report ;))
But please note you will not get all the goodies that cake offers.

I note that this still could mean one core fully pegged or any combination. But here is another thing, even if top only reports 50% utilisation for a single core rputer, cake might still choke as top reports in a granularity >= 1 second, and if the core is at 0% for 500ms and at 100% for the other 500ms top will veridically report 50% and yet sqm will be severely under-shaping...

So i tried to overlock the cpu by 300Mhz to 1600Mhz.
Cake download limit is set to 420000 kbit/s.
Speedtest shows ~ 300 MBit/s
CPU Idle is around ~ 60% now.

When i disable cake:
Speedtest shows: ~430 MBit/s
CPU Idle around: ~85%

Setting Cake to 500000 kbit/s (way to high)
Speedtest shows: ~385 MBit/s
CPU Idle around ~ 40%

fq_codel is the same.

Could you post the output of "tc -s qdisc" for the cake(420000) and the cake(500000) case? I want to see if/how the quantum variable changes in relation to the set speed. Like HTB/TBF' burst buffer these can help cake to keep the real interface fed during CPU shortages (but at the cost of more induced latency, so this needs to be seen as a trade-off).

Have you manually installed the most recent upstream version of sqm-scripts, before this test? If not please try... And have a look at /usr/lib/sqm/defaults.sh, you will find variables that let you influence the buffer sizing by the duration required to empty that buffer at the configured bandwidth, it would be quite interesting to see whether increasing the buffer duration might give you back the apparently lost bandwidth.

BTW, it is not the leaf qdisc component of cake or fq_codel but either cake's inbuild shaper component (for layer_cake/piece_of_cake) or HTB/TBF in simple/simplest that actually drags in the high computational demands...

Does this change work with cake or does it only work with htb?

My upload speed also seems much slower then configured.
Cake set to ~40 MBit/s. Speedtest shows ~ 30 MBit/s

tc -s qdisc show dev ifb4eth1
qdisc cake 8008: root refcnt 2 bandwidth 420Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100.0ms noatm overhead 18 mpu 64
 Sent 566707793 bytes 387600 pkt (dropped 36, overlimits 470689 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 3210304b of 15140Kb
 capacity estimate: 420Mbit
 min/max network layer size:           46 /    1460
 min/max overhead-adjusted size:       64 /    1478
 average network hdr offset:           14

                  Tin 0
  thresh        420Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay         23us
  av_delay          5us
  sp_delay          1us
  backlog            0b
  pkts           387636
  bytes       566760857
  way_inds           13
  way_miss          482
  way_cols            0
  drops              36
  marks               0
  ack_drop            0
  sp_flows            0
  bk_flows            1
  un_flows            0
  max_len         14560
  quantum          1514
 tc -s qdisc show dev ifb4eth1
qdisc cake 800b: root refcnt 2 bandwidth 500Mbit besteffort dual-dsthost nat wash ingress no-ack-filter split-gso rtt 100.0ms noatm overhead 18 mpu 64
 Sent 3748 bytes 11 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 1984b of 15140Kb
 capacity estimate: 500Mbit
 min/max network layer size:           46 /     913
 min/max overhead-adjusted size:       64 /     931
 average network hdr offset:            1

                  Tin 0
  thresh        500Mbit
  target          5.0ms
  interval      100.0ms
  pk_delay          3us
  av_delay          0us
  sp_delay          0us
  backlog            0b
  pkts               11
  bytes            3748
  way_inds            0
  way_miss            7
  way_cols            0
  drops               0
  marks               0
  ack_drop            0
  sp_flows            1
  bk_flows            1
  un_flows            0
  max_len           927
  quantum          1514

Is it possibly that the CMTS can detect if a device can't reach full speed and then throttles down the download? Like some kind of threshold? If threshold(s) > increase the usage of download channels?

This only works with HTB. Cake is much better at autoscaling things, but if the new buffer-sizing code in sqm-scripts solves the problem for simple.qos then it might be possible to make this also configurable in cake (and even though this will come at the cost of more latency so will never be the preferred solution).

Well what speedtest are you using?

No idea, sorry, but my giess is that is is micro-stalls that make cake not actually use the required bandwidth...