Hi openwrt developers, I found that through the following configuration I was able to achieve consistently higher put through, without any noticeable change to latency (sqm remained running), & without spreading the load in an odd fashion across all my raspberry pi 4's cores when it is running as a router.
- Enable IRQ balance
irqbalance.irqbalance=irqbalance
irqbalance.irqbalance.enabled='1'
- Enable packet steering & then modify
/usr/libexec/network/packet-steering.uc
like so ->
rx_queue ??= "rx-*";
let queues = glob(`/sys/class/net/${dev}/queues/${rx_queue}/rps_cpus`);
let val = cpu_mask(cpu);
+ if (dev == "eth1")
+ val = 5;
(Ideally I would assign the value of 5 based upon the driver in use instead of matching directly against dev
= eth1
but this was the easiest solution).
What this does is to enable cores 0 & 2 for eth1 which in my case is the wan
& a r8152 device. I also happen to have another USB eth style device for a back up link (cdc_ether).
For some reason despite not manually tweaking or doing anything other than enabling irqbalance packet steering improves performance on my RPI 4 router ... I think soft & non-soft IRQs still sometimes tends towards CPU 2. Specifying that all cores can handle packets via RPS doesn't seem to help although load is distributed across more cores put through drops.
CPU0 CPU1 CPU2 CPU3
11: 397818 55838 225440 60740 GICv2 30 Level arch_timer
14: 18034 0 0 0 GICv2 65 Level fe00b880.mailbox
15: 30 0 0 0 GICv2 114 Level DMA IRQ
26: 0 0 0 0 GICv2 175 Level PCIe PME, aerdrv
27: 1956 627533 0 0 GICv2 189 Level eth0
28: 3725 0 0 808164 GICv2 190 Level eth0
29: 593900 0 0 0 BRCM STB PCIe MSI 524288 Edge xhci_hcd
30: 12923 0 0 0 GICv2 158 Level mmc1, mmc0
31: 1 0 0 0 GICv2 66 Level VCHIQ doorbell
32: 10 0 0 0 GICv2 153 Level uart-pl011
IPI0: 2929 3109 3001 3439 Rescheduling interrupts
IPI1: 220321 31515 140398 42191 Function call interrupts
IPI2: 0 0 0 0 CPU stop interrupts
IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts
IPI4: 0 0 0 0 Timer broadcast interrupts
IPI5: 28467 16594 14669 19909 IRQ work interrupts
IPI6: 0 0 0 0 CPU wake-up interrupts
Err: 0
I'll be honest and say that I am not entirely sure that my setup is "correct" or not but the idea I had was that
- CPU0 was for eth1 and eth2 (via USB) the related xhci_hcd IRQ CPU affinity is
f
but it mostly seems almost entirely pinned to CPU0 - CPU1 to handle eth0 IRQ 27 (GICv2 189 Level)
- CPU3 handles eth0 IRQ 28 (GICv2 190 Level)
- CPU2 handles arch_timer (more so than other cores) + misc other things
So it made sense to me to add CPU2 to eth1 rx_cpus to spread the load for USB received irqs (CPU0) rx across CPU0 & CPU2.
Here is what /proc/softirqs looks like for my router -
cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI: 154546 0 0 0
TIMER: 33277 21143 12542 22958
NET_TX: 397203 15010 150211 60152
NET_RX: 579858 832927 282525 1690024
BLOCK: 0 0 0 0
IRQ_POLL: 0 0 0 0
TASKLET: 738565 856 160987 16464
SCHED: 74418 51104 42057 48432
HRTIMER: 0 0 0 0
RCU: 55674 43136 45572 39496
In short, it would be great to be able to specify an overriding value for the CPU mask for a given interface rx and tx flows when packet steering is enabled.