Fine-tuning SQM/QoS for high bandwidth and CPU utilization

Hey there!

Been running OpenWRT for what feels like 3 months now and I'm finally getting acclimated to its boons and quirks, but right now it seems I might have run into a bit of a snag.

I was lucky enough to get upgraded to a FTTH Gigabit connection, and after finishing fine-tuning the IRQ affinities for my router (Raspberry Pi 4B w/4GB RAM), it seems that I've hit a natural limit in regards to reducing CPU utilization without SQM (performing a speedtest currently gives me on average 745Mbps download and 200Mbps upload with 87% utilization of CPU core 1).

The service I'm running is through Telmex Mexico, on a GPON ONT VDSL2 PPPoE asymmetrical connection, and I'm trying to fine-tune cake for my current DL/UL speeds minus 2%. So far running the simple.qos Queue Setup Script with the cake qdisk and a Link Layer Adaptation of 22 leads to reduced latency, and speeds within margin of error, but spikes of 100% on core 1 again, so I was wondering if the was a way to spread the load over the other cores I'm equipped with or how to further fine-tune my setup.

If anyone has any pointers they'll be pretty appreciated.

If this is 21.02 or master, set this setting then execute reload-config. See if this helps.

/etc/config/network

config globals 'globals'
        option packet_steering '1'
2 Likes

or manually experiment with similar

manual-commands
echo -n 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus #0
echo -n 2 > /sys/class/net/eth0/queues/tx-1/xps_cpus #1
echo -n 4 > /sys/class/net/eth0/queues/tx-2/xps_cpus #2
echo -n 4 > /sys/class/net/eth0/queues/tx-3/xps_cpus #2
echo -n 2 > /sys/class/net/eth0/queues/tx-4/xps_cpus #1

echo -n 7 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo -n 7 > /sys/class/net/eth1/queues/rx-0/rps_cpus

what does this show for you?

find /sys/class/net/eth*/queues/. | grep 'cpus$'

Okay, this seems to have helped a lot. Thanks!

/sys/class/net/eth0/queues/./tx-4/xps_cpus
/sys/class/net/eth0/queues/./tx-2/xps_cpus
/sys/class/net/eth0/queues/./tx-0/xps_cpus
/sys/class/net/eth0/queues/./tx-3/xps_cpus
/sys/class/net/eth0/queues/./tx-1/xps_cpus
/sys/class/net/eth0/queues/./rx-0/rps_cpus
/sys/class/net/eth1/queues/./tx-0/xps_cpus
/sys/class/net/eth1/queues/./rx-0/rps_cpus
1 Like

On my dual core ARM device (1.3 GHz) cake limits bandwidth to around ~ 400 Mbit/s.
Without I get the same numbers as you ~ 750 Mbit/s. (even with fq_codel + htb)

Why does cake have so much CPU overhead compared to fq_codel ?
Well, yeah it has some more advanced feature set but almost a 50% reduction?

2 Likes

All your SQM related processing is happening on one single core, because the locks/semaphores whathaveyou associated with multi-threaded processing would increase processing latency (and packet latency, which goes against the point of using SQM in the first place).

Hence why balancing IRQs across cores, and using packet steering can help with performance. This thread might explain more:

What is the bandwidth bottleneck for you? Download or upload? You can disable SQM on ingress/egress selectively to get some cycles back. If you're using layer_cake, maybe piece_of_cake might use less CPU since it's not relying as heavily on DSCP tags. You can selectively disable DSCP priorities for ingress/egress as well with the besteffort keyword, but still have fairness and etc available.

1,3GHz sounds like a lot, but at the end of the day these old ARM cores simply don't have the IPC when compared to x86 of a similar age.

2 Likes

Yes, I know that cake is limited to a single core but why is fq_codel so much more performant?
I mean cake is an improved fq_codel, basically...

That is what I currently do, only shaping on egress.
Congestion on 1 Gbit ingress link is quite rare...

Packet steering is already enabled.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.