I got myself an RPi 4 since I've been recommending it to people, and had the following experience routing iperf3 via ipv6 in a test network:
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 50.0 MBytes 419 Mbits/sec
[ 5] 1.00-2.00 sec 60.6 MBytes 508 Mbits/sec
[ 5] 2.00-3.00 sec 60.4 MBytes 506 Mbits/sec
[ 5] 3.00-4.00 sec 60.6 MBytes 509 Mbits/sec
[ 5] 4.00-5.00 sec 61.3 MBytes 514 Mbits/sec
[ 5] 5.00-6.00 sec 61.2 MBytes 514 Mbits/sec
[ 5] 6.00-7.00 sec 61.2 MBytes 513 Mbits/sec
The setup was as follows:
AmazonBasics usb3 ASIX based chipset USB ethernet adapter
Jan 27 21:34:30 pitest1 kernel: [ 827.375819] usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
Jan 27 21:34:31 pitest1 kernel: [ 827.412265] usb 2-1: New USB device found, idVendor=0b95, idProduct=1790, bcdDevice= 1.00
Jan 27 21:34:31 pitest1 kernel: [ 827.412273] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jan 27 21:34:31 pitest1 kernel: [ 827.412278] usb 2-1: Product: AX88179
Jan 27 21:34:31 pitest1 kernel: [ 827.412283] usb 2-1: Manufacturer: ASIX Elec. Corp.
Attached to a laptop, and the Pi built in ethernet attached to a switch connected to my main network.
I configured ipv6 only, to avoid complexity, installed dnsmasq on the Pi to provide router advertisements, and did the test between a beefy server on my regular network, and my laptop. Running on the pi was raspbian, with nftables and a couple of empty default tables. So this is pure routing with no firewall and no queueing / SQM.
I installed irqbalance on the pi, which got me up to this 500+ Mbps from about 400 before.
During this run, the CPU idle was about 80 to 85%, whereas 75% would be saturating one core. The kernel softirq thread was taking something like 50-60% of one core.
No SQM/queueing was running here, so it's pure routing, with nftables but zero firewall in place (3 empty tables).
I don't know what's causing it to fail to hit the full speed. When I plugged the ASIX USB device directly into my laptop and went direct laptop->server I got 900+ Mbps, so the USB device is capable of handling near line speed.
It seems like something is slow about transferring packets across USB3 for the PI, but if that could be debugged, it could well be possible to route and SQM a full gigabit, given that about 80% of the available cycles are unused... Of course this is based on the idea we might be able to split the load between different CPUs, like if queueing on upload and download run on separate cores, and/or separate cores per network interface.
Any thoughts on what might be causing the slowdown?