I'm using OpenWrt 23.05 on a relatively powerful x86 box - a Qotom Q350G4 with an Intel i5-4200U CPU and four Intel I211 NICs.
I've recently upgraded to a symmetric 1Gb FTTP internet connection, and when downloading at full speed the CPU usage on one core jumps to nearly 100%. top and htop both show that this is due to software interrupts:
(eth0 is WAN, eth2 is the guest network, and eth3 is the main LAN)
I've tried enabling packet steering, irqbalance, and even manual balancing as described in this doc, but none of it makes any difference to the sirq load.
I also thought this might be due to SQM, but nothing changes even if I stop the sqm service
So while the box can handle a 1Gb WAN, it seems that it can only just manage it, which is surprising to me. Is there something else I can do to distribute the sirq load across cores?
It turns out the speed test I was using sends everything over a single UDP connection, despite the interface indicating that the number of "threads" is 5.
When I run a reverse iperf test with 10 streams (on the router itself) the results are much nicer:
Yes, roughly, assuming an IP length of 1200
0.9 * 1000^3 / ((1200 + 38) * 8) = 90872 pps
But with full MTU packets that would still be
0.9 * 1000^3 / ((1500 + 38) * 8) = 73147 pps
But maybe the difference is packet steering, with multiple flows rx-0 and rx-1 get distributed to different CPUs, but with the single stream test all processing ends up on the same CPU?
If that is true you should similar abysmal results when doing a single flow iperf test?
Well here around the same SIRQ loads exists (as expected for the same throughput) as in the UDP test, albeit distributed over two CPUs which as far as I understand packet steering is as good as it can get for your NIC.
I am puzzled a bit that a "normal" 900 Mbps load should tax a decent x86 CPU that badly... especially since intel NICs are often claimed to be quite efficient.
If that is true you should similar abysmal results when doing a single flow iperf test?
Yes, a single flow iperf test shows a very similar sirq load on a single CPU.
Well here around the same SIRQ loads exists (as expected for the same throughput) as in the UDP test, albeit distributed over two CPUs which as far as I understand packet steering is as good as it can get for your NIC.
Agreed. My understanding is that this is because there are only two physical cores.
But maybe the difference is packet steering, with multiple flows rx-0 and rx-1 get distributed to different CPUs, but with the single stream test all processing ends up on the same CPU?
I was wondering the same thing. I'm not sure exactly how packet steering works though.
I am puzzled a bit that a "normal" 900 Mbps load should tax a decent x86 CPU that badly... especially since intel NICs are often claimed to be quite efficient.
Me too.
It's probably worth mentioning that the core clock frequency doesn't max out during the test (it spends most of its time under 2Ghz, and the max is 2.6Ghz), so there's probably slightly more headroom here than the htop output suggests, but it still seems like a very high sirq load.
I personally run an x86 (i7-6700T) with performance governor all the time: additional power consumption is non detectable and no need for the CPU to spend time switching frequencies.