Slow bidirectional routing speed on Raspberry Pi 4

TLDR:

  1. If you do not own any hardware yet, consider alternatives to the Raspberry Pi 4 (see posts by @fakemanhk and @M10)
  1. With some tweaks, >= 900 Mbits/second with SQM can be reached on the Raspberry Pi

Has anybody managed to run simultaneous Gbit up- and download with the Raspberry Pi 4?
If so, what is needed to make it happen?

Backstory:

I got gigabit symmetrical fibre and my trusty Archer C7v2 can not handle SQM any more.

Following recommendations on the forum, I opted for a Pi 4B with a Realtek USB 3.0 Ethernet dongle.

Before putting it into production, I want to find the optimal SQM speeds. As a test setup, I have two laptops:

  1. one connected to the built-in (LAN) Ethernet port of the Raspberry acting as the iperf3 client (iperf3 -c ip.of.the.server)
  2. the other connected to the USB 3.0 Ethernet dongle (WAN) on the Raspberry acting as the iperf3 server (iperf3 -s).

Testing with iperf3 reveals the following speeds:

Unidirectional (LAN - WAN) (~830 Mbits/sec, 3858 Retransmissions)
Unidirectional (WAN - LAN) (~820 Mbits/sec, 0 Retransmissions)
Bidirectional (LAN - WAN) (~225 Mbits/sec, 473 Retransmissions) iperf3 -c ip.of.the.server --bidir
Bidirectional (WAN - LAN) (~810 Mbits/sec, 0 Retransmissions) iperf3 -c ip.of.the.server --bidir

The way I interpret this is that the Pi 4 is not capable of running full duplex. To make sure my test setup is capable of running full duplex, I connected both laptops directly and got the following values:

Unidirectional (LAN - WAN) (~915 Mbits/sec, 3 Retransmissions)
Unidirectional (WAN - LAN) (~767 Mbits/sec, 0 Retransmissions)
Bidirectional (LAN - WAN) (~912 Mbits/sec, 11 Retransmissions) iperf3 -c ip.of.the.server --bidir
Bidirectional (WAN - LAN) (~792 Mbits/sec, 0 Retransmissions) iperf3 -c ip.of.the.server --bidir

Further information

I have already tried the following optimizations with no success:

  • Overclocking the Pi
  • Using irqbalance
  • Enabling Packet steering
  • Disabling EEE on the Pi for both interfaces

Raspberry info:

Model Raspberry Pi 4 Model B Rev 1.5
Firmware Version OpenWrt 23.05.2 r23630-842932a63d
Kernel Version 5.15.137
USB 3.0 Ethernet Dongle TP-Link UE300

Have you tried changing setting of "Received Packet Steeing" so that both NICs using different CPU core? Also check usage of CPU during the bidirectional test.

And meanwhile, I hope you didn't purchase new RPi4 because of this, in some other posts/forums I mentioned the same thing: RPi4 solution is kind of outdated these days and it's not really nice if you want to get best results with USB NIC (CM4 based will be better due to the use of PCIe NIC)

1 Like

For the fun of it, try setting up SQM on the USB interface and try shaper rates of 500, 600. 700, 800 and look at the bidirectional test again... one observation with these USB3 dongles was/is that USB network stack is full of under-managed buffering and lacks BQL, so a traffic shaper could help with that (but is not guaranteed).

2 Likes

Thank you both!

I managed to solve it following the suggestion from @fakemanhk.
I also tested Transmit Packet Steering (XPS) but it did not make a difference.
In essence, the important thing is to pin all processing for one interface to the same core and all processing for the second interface to another.

Now I get the following values (with SQM 980000 in both directions cake piece_of_cake and Ethernet 44 on eth1):
Bidirectional (LAN - WAN) (~915 Mbits/sec, 5 Retransmissions) iperf3 -c ip.of.the.server --bidir
Bidirectional (WAN - LAN) (~915 Mbits/sec, 6 Retransmissions) iperf3 -c ip.of.the.server --bidir

The Raspberry has no heatsink, fan or case. The CPU does get warm. In a 5-minute bidirectional iperf3 test @ 21 °C room temperature, the CPU reaches a max of 78.5 °C towards the end. Power consumption averages around 6.5 Watts and peaks at 7 Watts.

I also used a more powerful laptop on the WAN side, which is probably the reason why retransmissions are lower now. On a direct connection between the two, I get ~938 Mbits/sec and 0 Retransmissions.

My setup is as follows:

/boot/config.txt

over_voltage=6
arm_freq=2000
dtparam=eee=off

/etc/rc.local

# turn energy efficient ethernet off for all NICs because it caused problems:
ethtool --set-eee eth0 eee off
ethtool --set-eee eth1 eee off

# set cpu affinity for eth0 (irq35,36) to third core (#2)
echo 4 > /proc/irq/35/smp_affinity
echo 4 > /proc/irq/36/smp_affinity

# receive queues for eth0 to third core (#2) and eth1 to fourth core (#3):
echo 4 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 8 > /sys/class/net/eth1/queues/rx-0/rps_cpus

@moeller0 what do you mean by setting SQM on the USB interface? Just eth1 (which is the USB Ethernet dongle)?

Edit: The following quote is not true and needed any more after correctly setting RPS to the above settings.

One extra observation I made is that when setting SQM on the LAN interface, this performs better. Does anybody know whether this will have the same effect as having SQM on the WAN interface?

An excellent video explaining the concepts used above can be found here: Network Performance in the Linux Kernel, Getting the most out of the Hardware

Here is some documentation on Receive Packet Steering (RPS):

And here some documentation for Transmit Packet Steering (XPS) in addition:

Yes, that, USB might be over-buffered and putting better queue management in fromt of the USB device can help by never exercising the bad queue management of USB by simply never filling the USB queues too much.

Partly... you will need to keep in mind than on LAN interfaces interface-ingress is equivalent to internet-upload and interface-egress is internet-download (this is inverted compared ot the WAN interface). Other than that, all traffic generated by the outer that does not traverse the LAN interface will not be traffic shaped in such a configuration. If these is no such traffic, fine, but if there is e.g. from WiFi then the internet shaping should happen on the WAN interface.

1 Like

Buy dedicated router with more than one nic instead of sbc for watering plants In garden.
Buing device with only one nic and adding criple usb nics is dumb idea because of problems with those nics. There is plenty of cheap yet powerfull solutions on market like retired utm's , firewalla and tiny computers like Lenovo m920q where You can put dual 10gigs nic or even dual 40gig and have enough packet processing power to run nice.

That's the reason why I said "RPi 4 solution is outdated", at the time of release it was actually good, there weren't many good and cheap mini PCs (not to mention dual NICs), nor much other good SBCs. However we are seeing a lot of cheap mini PC, or router purpose SBC which comes with more NICs now, I can't see much value of using RPi4 as router now (unless you already own it)

Should the pinned post So you have 500Mbps-1Gbps fiber and need a router READ THIS FIRST in this category, which says

maybe be unpinned or updated based on

Since it was pinned and addressed my exact question, it was an important factor for me to decide on a solution. I saw that it was from 2021 but assumed since it is still pinned, it is still relevant today.

That's the point, it was really a good one in 2021, at that time the other alternative NanoPi R4S wasn't even getting any official release support (it was supported since 22.03), and almost no cheap dual NIC mini PC as well.

Also, these days there are so many SoC with built-in network hardware acceleration that you can get very high speed routing/NAT without extremely high clock rate, like the IPQ807x/Filogic 830/880, of course for SQM you still need raw CPU power but not everyone needs it.

Why? There have always been fans of alternative SBCs, and they always had reasonable arguments, but that does not diminish the fact that the rpi4B has sufficient CPU cycles for routing and traffic shaping at around 1 Gbps. The point is rpi4B plus UE300 does work pretty much as expected with a correct configuration. The article does not push the raspberry as the best SBC solution, just as an example of a viable one, and that IMHO has not changed. But what also did not change is that there are similar ARM based devices around for a similar price point or lower that offer a more complete package of features for a router.

As it still is, it is not however a "buy this" kind of article and needs to be read as general guidance.

However if you , like me, dislike hardware acceleration (as it typically comes with a glass yaw, operate just barely outside the accelerators capabilities and you are dropped back to using the CPUs and many of these come with unimpressive CPUs like arm A53*).
That said, I do help out with sqm and as part of the "eat your own dogfood" tradition I tend to run sqm and hence have higher CPU demands than others.

*) Unimpressive for a router, a53 is a plenty fine CPU for what it is, so I do not fault ARM for the design, but the companies putting these into routers.

1 Like

Exactly - I used to have a RPi4 model B with TP-Link UE300 for about one year but eventually switched to RPi CM4 + DFRobot Routerboard. Reasons:

  • TP-Link UE300 added 1.6ms latency
  • TP-Link UE300 started crashing more and more after some time (maybe I had a bad batch - don't know)

Have you tried telling OpenWrt to use all cores for packet steering?

network/interfaces/global network options.

Packet Steering

Enable packet steering across all CPUs. May help or hinder network speed.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.