Netgear R7800 Stuck at 110 Mbps with SQM CAKE

image

Without SQM I can hit 200+ Mbps down.

With SQM CAKE I'm at 111.

Switch VLAN looks like the most "correct" option but none seem to work. "@wan" (not wan6) has the fastest performance but I assume does not shape ipv6 whatsoever.

This is a new (used) router freshly installed with OpenWRT 21.02. I haven't done anything else to it. The wifi isn't even on.

I replaced my old router partially because SQM (codel, not CAKE) was stuck in that same 100 Mbps range:

Maybe I need one of the custom builds? Any advice?

edit: fq_codel + simple.qos brings performance closer to line speed but has room for improvement. 200 Mbps input leads to 183 Mbps on waveform.com. Base OpenWRT build (no forks).

No advice, but I can at least corroborate the bad SQM performance you are seeing in 21.02. I don't think it is specific to your set-up. With 19.07 my ER-X could do 150-190 Mbps CAKE. With 21.02 that dropped to 130-150 at best and with recent snapshot on kernel 5.10, 110 Mbps on a good day is all I get.

CAKE SQM performance appears to have severely nose dived over the last releases.

It is interesting that your R7800 is stuck at 110 Mbps too. Obviously it has a lot more CPU power than my ER-X, so it's odd that it is stuck at the same place.

Edit: fq_codel/simple.qos improves download to ~155 Mbps from ~110 Mbps on my ER-X. Appreciate the suggestions later in this thread.

This is dual core router, and is most likely using a single core at this time: try enabling packet steering and see if that makes a difference.

2 Likes
uci -q get network.globals.packet_steering
# nothing here
uci set network.globals.packet_steering=1
uci commit

Helped a ton for codel. With CAKE around 128 Mbps.

Best for codel seems around 220M -> 206M on Waveform.

How is packet steering decided to be on or off by default?

Well, the UI has a comment that “it may improve performance or make it worse” and it is always off. What is your CPU utilization with CAKE? You are better off using “htop -d 1” to see utilization per a core.
Two more things I suggest you do: 1) cpu freq transitions are expensive so use performance governor (it is set per each core on this router) and 2) try IRQ assignments (not irqbalance) per core. Wifi2, wifi5, eh0 and eth1 can all be controlled I think. I would suggest moving eth0 to the second core. If eth0/1 and not movable, then move both wifi interfaces to the second core.

I was able to go over 300Mbps with fq_code/simple on this router when I was using it.

Did you restart the router or run “reload_config”?

1 Like

Many of the performance tweaks in the R7800 performance thread are still likely to be useful for 21.02 - especially as the cpu frequency management is essentially unchanged and, unfortunately, not optimal for best performance.

1 Like

Thanks for the advice.

No, but I did now (reload_config).

I did something like this on 21.02:

echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor	

I didn't verify that it was set to ondemand, however.

I also tried @ACwifidude's build and had similar results on hardware accelerated (HW) fc_codel.

No SQM, ACwifidude's build:

the best for latency is somewhere around 200-220Mbit.

These are my settings in local startup and I reboot every time I changed something. I also changed to "Bootstrap" from their default material theme. Cleared out everything when I did the firmware install (no config carryover).

# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.
echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo 600000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor	
uci set irqbalance.irqbalance.enabled=1; uci set network.globals.packet_steering=1; uci commit
modprobe nss-ifb

ip link set up nssifb

# Shape ingress traffic with chained NSSFQ_CODEL
tc qdisc add dev nssifb root handle 1: nsstbl rate 220Mbit burst 1Mb
tc qdisc add dev nssifb parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 100ms set_default

# Shape egress traffic with chained NSSFQ_CODEL
tc qdisc add dev eth0 root handle 1: nsstbl rate 11.4Mbit burst 1Mb
tc qdisc add dev eth0 parent 1: handle 10: nssfq_codel limit 10240 flows 1024 quantum 1514 target 5ms interval 100ms set_default
exit 0

I might go back to 21.02 and try max performance and see if it can keep up with CAKE. I'm not hopeful based on my other results. There are 3 devices hardwired to the r7800 but I don't think any of them are causing problems. Wifi remains off while I test SQM.

I ran top during one of the tests and CPU remained >40% idle but I didn't get a per-core breakdown.

Back on 21.02, cleared all settings.

echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy1/scaling_governor

If I was starting from scratch I might try a longer term bufferbloat test to give the governor time to shift into faster CPU mode. But I'm already sick of configuring for marginal power usage gains.

Installed and enabled irqbalance. Setting the assignments was beyond my skills with Google. Set packet steering in the luci web UI (network-interfaces-global).

Looks like I'm maxing out one of two cores in htop when I run a speedtest. Other core around 40-60%.

Settled on fq_codel/simple.qos. Apparently codel isn't designed for near 0 latency so I'm happy with 200+ down and <30ms latency. I'd do CAKE if it only took off 20 Mb, but beyond that I prefer speed. CAKE still can't crack 160.

Wifi is on now, though not heavily used. Performance holds up. Wifi bufferbloat tests indicate some <100ms download latency but that's fine. (58 on waveform). Still need to enable the other wifi radio.

One of the reasons I am running wrt3200acm and it can shape 300+ Mbps bidirectionally and still have some CPU to spare as long as both cores are used.