RPi4 routing performance numbers

dlakelan · January 28, 2020, 12:07am

Ok, to follow up, I figured out how to enable Receive Packet Steering:

and enabled it on my ethernet devices:

echo 2 > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus

so that CPU1 handling received packets on eth1 (USB)
so that CPU0 handling received interrupts on eth0

The result was 65% cpu idle, and one ksoftirqd/2 using 100% of the CPU... (?) with 35.5% softirq shown in top -d 1

bandwidth is now 830Mbps or so

[  5]   5.00-6.00   sec  99.2 MBytes   832 Mbits/sec                  
[  5]   6.00-7.00   sec  99.0 MBytes   830 Mbits/sec                  
[  5]   7.00-8.00   sec  99.0 MBytes   830 Mbits/sec                  
[  5]   8.00-9.00   sec  99.0 MBytes   830 Mbits/sec

So that feels like progress...

I bumped this to 860 or so via some kind of adjustment to the interrupt affinity for the eth0, and briefly had about 920 at one point... so the possibility is there to route line-speed... At no point does the device go below about 50% idle, so there are basically 2 cores not in use even routing almost a full gigabit.

EDIT: further info.

I installed the simple nftables firewall from my other thread QoS and nftables ... some findings to share

And I put in a custom hfsc based shaper on both eth0 and eth1, that I have used before, and with those in place, the Pi will route 575 Mbps, it shows 67% idle and 34% softirq.

The big issue seems to be failure to be able to multi-thread the handling of packets. I would have thought the two ethernet devices would have split across two CPUs and then we'd be seeing a higher level of throughput, but apparently not.

Here are the /proc/interrupt outputs:

           CPU0       CPU1       CPU2       CPU3       
 17:          0          0          0          0     GICv2  29 Level     arch_timer
 18:      87012    5643314      31811      46223     GICv2  30 Level     arch_timer
 23:        339          0          0          0     GICv2 114 Level     DMA IRQ
 31:       3560          0          0          0     GICv2  65 Level     fe00b880.mailbox
 34:       6554          0          0          0     GICv2 153 Level     uart-pl011
 37:          0          0          0          0     GICv2  72 Level     dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb3
 38:          0          0          0          0     GICv2 169 Level     brcmstb_thermal
 39:      25015          0          0          0     GICv2 158 Level     mmc1, mmc0
 45:          0          0          0          0     GICv2 106 Level     v3d
 47:    4251576          0          0          0     GICv2 189 Level     eth0
 48:   20076230          0          0          0     GICv2 190 Level     eth0
 54:         51          0          0          0     GICv2  66 Level     VCHIQ doorbell
 55:          0          0          0          0     GICv2 175 Level     PCIe PME, aerdrv
 56:    8839918          0          0          0  Brcm_MSI 524288 Edge      xhci_hcd
FIQ:              usb_fiq
IPI0:          0          0          0          0  CPU wakeup interrupts
IPI1:          0          0          0          0  Timer broadcast interrupts
IPI2:       9669      13473     199712      95848  Rescheduling interrupts
IPI3:        474       2470       1309       1222  Function call interrupts
IPI4:          0          0          0          0  CPU stop interrupts
IPI5:      27560       1708       1753       2936  IRQ work interrupts
IPI6:          0          0          0          0  completion interrupts
Err:          0

I wonder if someone like @moeller0 has an idea or knows someone who has an idea about how to improve interrupt handling and speed up shaping etc.