PPPoE and Receive Packet Steering

I have an x86 mini PC (with an i5-4200U CPU) running OpenWrt 23.05 on bare metal. The CPU has two physical cores and four threads.

I have a symmetric 1Gb fibre internet connection, and until recently I was with an ISP who used simple DHCP / IPoE, and I could easily get the full 1Gb download and upload bandwidth.

In February I switched to a different ISP which uses PPPoE, and with default settings my router cannot attain the full 1Gb download speed - it tops out at about 800-850Mbps. The issue appears to be PPPoE encapsulation.

The NICs in my router have two RX and two TX hardware queues, and my (fairly limited) understanding is that the NIC attempts to distribute packets (and the associated interrupts) between the queues based on a hash calculated from certain properties of the packet - e.g. for TCPv4 packets the hash is calculated from the source and destination IP addresses and ports.

To calculate the hash, the NIC driver extracts the relevant values from the packets at known byte offsets. With my old DHCP ISP this worked as expected, with interrupts for incoming packets being distributed fairly evenly between the RX queues on CPUs 0 and 1.

I believe the problem is that PPPoE encapsulation means that the IPs and port numbers are not at the expected offsets in the incoming packets, so the NIC driver's hash function doesn't work properly. The end result appears to be that all of the interrupts (and presumably further processing of the packet) happen on one RX queue, on CPU 0, and this CPU doesn't have enough grunt on its own to achieve a gigabit download transfer rate.

Uploads are not affected - packets are distributed between the RX and TX queues on all other physical interfaces as expected, and my upload transfer rate is absolutely fine.

I have been reading around this topic for some time, but I couldn't find a way to address this problem. The offsets used by the NIC driver's hash function cannot be changed, irqbalance just switches the problem to a different CPU, and enabling packet steering in the OpenWrt GUI doesn't make any noticeable difference. I also tried manually enabling Receive Packet Steering (RPS) on my physical WAN interface (eth0), but this didn't help either.

However, yesterday I finally found something that does seem to help. I manually enabled RPS on the pppoe-wan virtual interface (which has a single RX queue), like this:

echo e > /sys/class/net/pppoe-wan/queues/rx-0/rps_cpus

(CPUs 1, 2, and 3 => 1110 = 8 + 4 + 2 + 0 = 14 = 0xe)

I think this means that packets on the pppoe-wan interface will be processed on CPUs 1, 2, and 3 - i.e. every CPU apart from CPU 0, which handles the RX interrupts for the underlying WAN hardware device eth0.

With this setting in place I can now achieve the full gigabit transfer rate on downloads (~940Mbps), so this seems to work :tada:

My question is: is this actually the "correct" thing to do in this scenario? I can't find any information in this forum or elsewhere about anyone else having done this, and the script that implements the packet steering option in the OpenWrt GUI (/usr/libexec/network/packet-steering.sh) seems to explicitly ignore virtual interfaces such as pppoe-wan (I don't know why).

If this is the right thing to do, should the OpenWrt packet steering option be updated to do this automatically?

1 Like

Yeah, enabling RPS on the pppoe-wan interface is a smart fix in your case. Your network hardware gets confused by the PPPoE wrapper then dumps all incoming traffic onto one CPU core. But your command tells (specifically, the pppoe-wan interface, after the PPPoE wrapper is removed) to spread the actual internet data processing across your other CPU cores. This bypasses the hardware limitation & lets your other CPU cores help out to reach full gigabit speeds

It's a reasonable suggestion yes.

2 Likes

Thanks for the reassurance :+1:

I was just surprised that I couldn't find anyone else describing the same approach. I guess most people are running OpenWrt on devices that handle PPPoE encapsulation in hardware, so it isn't usually a big concern.

I found lots of threads on other forums describing similar issues with pfSense/OPNsense which also run on x86 boxes, but these sources mostly state confidently that PPPoE performance issues like this are specific to FreeBSD, and that the PPPoE implementation on Linux is inherently multi-threaded, which is, at best, not the whole story. I did find a couple of blog posts (example) mentioning enabling RPS for PPPoE connections on Linux, but with no details about which interface to enable it on.

Anyway, if anyone else wants to try the same thing, there's an additional wrinkle because the RPS settings get wiped out whenever the PPPoE connection is re-established, so a hotplug script is required to re-apply the setting when pppoe-wan comes up.

1 Like

Maybe just modify the script OpenWrt already uses if RPS is enabled to include pppoe-wan? That already has hotplug support, IIRC.

Sidenote: I think there are two scripts to modify one shell and one uc ...

Yes, that's an option, but having been burned in the past I have a personal rule not to mess with OpenWrt's internal scripts unless I have no other choice :slight_smile:

A simple hotplug script is trivial to implement and it keeps my hackery separate.

2 Likes

For testing that is A-OK, however assuming that your hack works for others as well, maybe we really want diffs to the canonical scripts so we could offer a patch as pull request?

2 Likes

Yes, that would be the end goal.

I don't feel confident enough yet to modify that script - I assume there was a good reason for explicitly excluding virtual interfaces. Perhaps PPPoE just wasn't considered at the time, or maybe there's a downside to enabling RPS on pppoe-wan (seems unlikely?).

I'm also wondering whether RPS should be enabled for Wireguard / OpenVPN interfaces as well - as virtual interfaces those too are currently excluded from OpenWrt's global packet steering.

Today I turned off the RPS of ETH0 and 1 and reduced the latency. It is suitable for gaming

For what it's worth, enabling RPS for pppoe-wan on my box has no noticeable affect on latency, but YMMV on different hardware.

Just note that with PPPoE, all traffic on the WAN is going to be a single thread/process...

I'm not sure what you mean. When I enable RPS on pppoe-wan the RX packets on this interface are definitely processed on multiple cores. This can be seen very clearly using htop.

Here's some example htop output.

Without RPS on pppoe-wan

With RPS set to CPUs 1-3 (e) on pppoe-wan

2 Likes

Assuming there is a good reason for the stock packet steering to deliberately ignore pppoe-wan, and a fix not coming anytime soon anyway, where would be a good place to remedy this for the time being? A custom hotplug script targeting pppoe-wan after bringup?

That script, in its last iteration, has been abandoned. The current implementation is in uc and way more elaborate (but still ignores "device-less" virtual devices).

Probably a good time to clean this feature up - there's been more than a few major changes in implementation - for a lot of folks, there are obvious benefits, some more than others, and for the most part, even single threaded tasks likely don't suffer...

RxPS originally was more intended for endpoints like servers with many cores/threads (and CPU sockets) - and when we add acceleration via HFO for some MediaTek and QC-Atheros switching (and even offloads for Wifi these days), what would be the level of benefit as the flows are not as "hands on" for the host CPU cores...

Just out of curiosity - what are the Network Interfaces being used?

Intel, Realtek, Broadcom are fairly common - and even the specific models have characteristics - as a recovering pfSense guy, even intel was someone sku specific because of errata...

OpenWrt 24.10.0 checks whether /usr/libexec/platform/packet-steering.sh exists and executes it, otherwise it will pick /usr/libexec/network/packet-steering.uc. There is no explicit hotplug script anymore for this, instead it is using procd to get re-run on interface changes (required for PPPoE).

It will allow to handle qdiscs on different CPUs, that is especially relevant if one wants traffic shaping (which is quite CPU intensive) for ingress and egress simultaneously, without RPS OpenWrt will run both traffic shapers on the same CPU which can easily result in that CPU becoming overloaded.

Intel I211-AT.

there's also the threaded option to try
echo 1 > /sys/class/net/[interface]/threaded

i'm not sure if it works with pppoe interfaces though

I just gave it a whirl and it looks like it won't work for pppoe-wan:

# echo 1 > /sys/class/net/pppoe-wan/threaded
ash: write error: Not supported

However, I tried enabling it on eth0 instead:

# echo 1 > /sys/class/net/eth0/threaded

And this does seem to do something - CPU usage is spread across cores even when RPS on pppoe-wan is disabled. Curiously however, it doesn't seem to improve the transfer rate :thinking: