SQM/QoS can saturate the CPU/is this expected or can the code be improved?

Assuming NSS/ NPU offloading would be supported, enabling it would make SQM impossible (same story as hardware flow-offloading for mt7621) - the whole trick of h/w acceleration is to make large parts of your packet flow bypass the main SOC and the kernel's/ netfilter's supervision, while SQM needs exactly this per-packet supervision to function.

--
Yes, the proprietary NSS firmware does implement a crude form of QoS (streamboost) itself, which can run hardware accelerated on the NSS/ NPU cores, but that's distinct from SQM and is black(box) magic.

5 Likes

as mentioned, the router workload depends mostly on the rate of packets (pps), not mbps.

packets sizes are varrying by factor of >15!
average packet size is mostly around 500byte (more if you stream/torrent, less if you use voip and gaming), so a mbps->pps factor of ~3 for the average.

hence a router should be able to process ~3x linerate (in mbps) to be able to cope with a reasonable ammount of small packets at line rate, better 5x, ideally 10x.

4 Likes

That seems especially relevant since the traditional speedtests basically all use maximally sized packets and hence emphasize achievable bandwidth while not saturating PPS at all.

by reasoning about speedtests i recon that a "normal download" (two fullsize packets plus one ack in return) alread gives a baseline average of around (2*1500+100)/3=1033 bytes per packet, so my factors are a bit of...

running a router that can just about reach linerate with downloading is ill advised.

1 Like

Which is often not true anymore, with Linux hosts GRO/GSO will result in considerable lower ACK rate (but for estimating a reasonable case scenario, 1ACK per Two fullMTU packets still has merits). But note that data and ACK packets for speedtests typically happen in opposite directions (which is good in that it is easier to spread over multiple CPUs if available).

Yes, at least one needs to be conscious that in that configuration achieving line rate is pretty much the best-case scenario and will not be terribly robust against any interference.

I will jump into this conversation as it also affects my current router and network speed. At the moment I have a TP-Link C2600 (IPQ8064) running OpenWRT 18.06.5.
My ISP recently upgraded the bandwidth speed from 100 to 200 mbit download/upload. If I have SQM enabled with the recommended scheduler (cake/piece_of_cake) and testing it with speedtest it is limited to 100mbit. If SQM is disabled I can get around 170-180mbit/s.
And I am not sure what could be the cause. Would this mean, as suggested in the 2nd post that for QoS/SQM to work I need at least a x86-64 router because of the per-packet processing ?
At the moment I am looking at Mikrotik offerings (Such as hAP ac2), but I use WireGuard which is not available there.

EDIT: My question is - What device would be sufficient for SQM/QoS to sustain 200 mbit network speed ?

Maybe ... have you tried an alternative queuing discipline and script? fq_codel and simple.qos gives me throughput >215 Mbps and good bufferbloat scores on my R7800.

1 Like

Unfortunately I did not, I tested my network speed with SQM disabled. I will give it a try. Thanks!

a typical x86 mini PC would do it no question, and a lot more. I think the Raspberry PI 4 would do it no problem, even if you don't use an extra USB NIC. you'd need a smart switch and to use VLANs. I also think the Linksys WRT32X or 3200 would do it fine. The espressobin would do it as well.

1 Like

That seems a bit odd, but could be cased by interference between frequency scaling/power saving and the low latency CPU demand of traffic shapers.

You could try to disable frequency scaling (assuming IPQ8064 does that in the first place) and/or you could switch to fq_codel/simple.qos on OpenWrt 19.07-RC there you can edit /usr/lib/sqm/defaults.sh:
Change [ -z "$SHAPER_BURST_DUR_US" ] && SHAPER_BURST_DUR_US=1000 to say [ -z "$SHAPER_BURST_DUR_US" ] && SHAPER_BURST_DUR_US=10000 to allow for 10 ms CPU latency, that will cause an additional 9ms increase in delay, but might get you back more bandwidth (but first try fq_codel/simple.qos without the edit).

That depends a bit on your traffic mix / packet size distributions, but an x86 or even an mvebu based ARM router will allow to do that for normal cases (for worst case saturating loads with minimal packet-sizes, x86_64 is the only affordable game in town).

Can you please explain how to do this? I'm trying to fully utilize the underwhelming power of my R7800.

Are you using any of the R7800 recommended (according to some other threads) CPU on-demand scaling settings in /etc/rc.local

echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

I have set my scaling_governor to performance, so the CPU always runs at maximum frequency.

I can't test max speed with cake as my internet isn't that fast. Sub 200 does seem low though. For discussion sake, Kong's r7800 19.07 change log says:

09/10/19:
-another network throughput optimization e.g. cake can now shape up to 600Mbps (depends on type of wan)

Can you provide some details? My downstream will be upgraded to 600 Mbps shortly but I believe my R7800 with SQM is limiting it currently to about 200 Mbps. Thanks.

I'm successfully using an nbg6817 (so basically the same device) on a 400/200 MBit/s ftth connection (~420/~220 effectively), without SQM (and without software flow-offloading) it can deal with that easily - there is some healthy headroom left, but not that much (600 MBit/s might just work, but not much more). The situation with SQM will be considerably different, but I haven't seen a need for that yet.

That's all the details I can provide. I just quoted directly from kong's changelog...

1 Like

I’ve been able to get fq_codel / simplest to 500mbps flat.

I’ve searched through Kong’s site and haven’t found an explanation or proof of how he is able to squeeze out more - especially with cake (I’m getting a max of ~upper 200’s mbps).

With the NSS cores you can get a little more squeeze potentially.

You can get fq_codel / simplest to 500mbps? I'm getting 350mbps fq_codel + simplest.qos

This is the best I could get under ideal conditions:

600000 for download speed, 34000 for upload, link layer adaption is ethernet + 22 per packet overhead. No advanced settings, wifi turned off, running hynman’s master build (I have two APs, this is testing my r7800 main router dedicated to wired only).

fq_codel + simplest_tbf.qos + software offloading enabled + performance CPU governor: