SQM/QoS can saturate the CPU/is this expected or can the code be improved?

Unfortunately I did not, I tested my network speed with SQM disabled. I will give it a try. Thanks!

a typical x86 mini PC would do it no question, and a lot more. I think the Raspberry PI 4 would do it no problem, even if you don't use an extra USB NIC. you'd need a smart switch and to use VLANs. I also think the Linksys WRT32X or 3200 would do it fine. The espressobin would do it as well.

1 Like

That seems a bit odd, but could be cased by interference between frequency scaling/power saving and the low latency CPU demand of traffic shapers.

You could try to disable frequency scaling (assuming IPQ8064 does that in the first place) and/or you could switch to fq_codel/simple.qos on OpenWrt 19.07-RC there you can edit /usr/lib/sqm/defaults.sh:
Change [ -z "$SHAPER_BURST_DUR_US" ] && SHAPER_BURST_DUR_US=1000 to say [ -z "$SHAPER_BURST_DUR_US" ] && SHAPER_BURST_DUR_US=10000 to allow for 10 ms CPU latency, that will cause an additional 9ms increase in delay, but might get you back more bandwidth (but first try fq_codel/simple.qos without the edit).

That depends a bit on your traffic mix / packet size distributions, but an x86 or even an mvebu based ARM router will allow to do that for normal cases (for worst case saturating loads with minimal packet-sizes, x86_64 is the only affordable game in town).

Can you please explain how to do this? I'm trying to fully utilize the underwhelming power of my R7800.

Are you using any of the R7800 recommended (according to some other threads) CPU on-demand scaling settings in /etc/rc.local

echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 35 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

I have set my scaling_governor to performance, so the CPU always runs at maximum frequency.

I can't test max speed with cake as my internet isn't that fast. Sub 200 does seem low though. For discussion sake, Kong's r7800 19.07 change log says:

09/10/19:
-another network throughput optimization e.g. cake can now shape up to 600Mbps (depends on type of wan)

Can you provide some details? My downstream will be upgraded to 600 Mbps shortly but I believe my R7800 with SQM is limiting it currently to about 200 Mbps. Thanks.

I'm successfully using an nbg6817 (so basically the same device) on a 400/200 MBit/s ftth connection (~420/~220 effectively), without SQM (and without software flow-offloading) it can deal with that easily - there is some healthy headroom left, but not that much (600 MBit/s might just work, but not much more). The situation with SQM will be considerably different, but I haven't seen a need for that yet.

That's all the details I can provide. I just quoted directly from kong's changelog...

1 Like

I’ve been able to get fq_codel / simplest to 500mbps flat.

I’ve searched through Kong’s site and haven’t found an explanation or proof of how he is able to squeeze out more - especially with cake (I’m getting a max of ~upper 200’s mbps).

With the NSS cores you can get a little more squeeze potentially.

You can get fq_codel / simplest to 500mbps? I'm getting 350mbps fq_codel + simplest.qos

This is the best I could get under ideal conditions:

600000 for download speed, 34000 for upload, link layer adaption is ethernet + 22 per packet overhead. No advanced settings, wifi turned off, running hynman’s master build (I have two APs, this is testing my r7800 main router dedicated to wired only).

fq_codel + simplest_tbf.qos + software offloading enabled + performance CPU governor:

My connection has been upgraded to 600 Mbps downstream. I cannot reach it with fq_codel + simplest.qos or simplest_tbf.qos. It seems to max out around 430 Mbps.

Your screenshot is broken. What are you results? Also, I understood that software offloading was incompatible with SQM. Is that not the case? Have tried comparing results with and without it?

Software offloading seems to help and yields better results.

500mbps was just to see how fast the cpu goes. Turning down the SQM to about 470000 yielded better latency and was more consistent.

I am getting some variability with my setup, but I cannot see a difference with software offloading and sqm enabled.

WITH software offloading, downstream:
520 Mbps (speedtest.net)
490 Mbps (speedtest.net)
518 Mbps (dslreports)
320 Mbps (dslreports)
515 Mbps (dslreports)

WITHOUT software offloading, downstream:
482 Mbps (speedtest.net)
510 Mbps (speedtest.net)
527 Mbps (dslreports)
392 Mbps (dslreports)
350 Mbps (dslreports)

Statistically, those two sets are not significantly different (ANOVA) even excluding the outlier value of 320 in the first set. The stddev of the first set is tighter (excluding the outlier) so I could argue with offloading gives a more consistent result.

1 Like

what does this file do exactly?

/etc/hotplug.d/net/20-smp-tune

I had to delete this config in order for my manual irq assignments to take hold.

No idea... as an update, I am finding that the NSS code @quarky @Ansuel (and others?) made available to the community preforms as-good-as SQM with the endpoint of bufferbloat numbers yet allows for faster download speeds on WiFi devices.

See here for key steps to build the NSS code (assuming you have a compatible device). See here for setup of fq_codel on it. You do not use the LuCI package. That thread is pretty large.

That's really impressive! If there's one thing I've learned about linux, road blocks don't exist for very long.