Been running OpenWRT for what feels like 3 months now and I'm finally getting acclimated to its boons and quirks, but right now it seems I might have run into a bit of a snag.
I was lucky enough to get upgraded to a FTTH Gigabit connection, and after finishing fine-tuning the IRQ affinities for my router (Raspberry Pi 4B w/4GB RAM), it seems that I've hit a natural limit in regards to reducing CPU utilization without SQM (performing a speedtest currently gives me on average 745Mbps download and 200Mbps upload with 87% utilization of CPU core 1).
The service I'm running is through Telmex Mexico, on a GPON ONT VDSL2 PPPoE asymmetrical connection, and I'm trying to fine-tune cake for my current DL/UL speeds minus 2%. So far running the simple.qos Queue Setup Script with the cake qdisk and a Link Layer Adaptation of 22 leads to reduced latency, and speeds within margin of error, but spikes of 100% on core 1 again, so I was wondering if the was a way to spread the load over the other cores I'm equipped with or how to further fine-tune my setup.
If anyone has any pointers they'll be pretty appreciated.
On my dual core ARM device (1.3 GHz) cake limits bandwidth to around ~ 400 Mbit/s.
Without I get the same numbers as you ~ 750 Mbit/s. (even with fq_codel + htb)
Why does cake have so much CPU overhead compared to fq_codel ?
Well, yeah it has some more advanced feature set but almost a 50% reduction?
All your SQM related processing is happening on one single core, because the locks/semaphores whathaveyou associated with multi-threaded processing would increase processing latency (and packet latency, which goes against the point of using SQM in the first place).
Hence why balancing IRQs across cores, and using packet steering can help with performance. This thread might explain more:
What is the bandwidth bottleneck for you? Download or upload? You can disable SQM on ingress/egress selectively to get some cycles back. If you're using layer_cake, maybe piece_of_cake might use less CPU since it's not relying as heavily on DSCP tags. You can selectively disable DSCP priorities for ingress/egress as well with the besteffort keyword, but still have fairness and etc available.
1,3GHz sounds like a lot, but at the end of the day these old ARM cores simply don't have the IPC when compared to x86 of a similar age.