I recently got myself a Ubiquiti Edgerouter 4 for experimentation. Installing OpenWrt was very easy. So I did that, then threw the router on the workbench to run some tests. And right away, there's something I don't quite understand.
According to its ToH page, the ER-4 has a quad-core OCTEON CN7130 processor running at 1 GHz. On a fresh install, a LAN-to-WAN iperf3 test showed throughput of approximately 600 Mbps. Enabling offloading brought it to about 930 Mbps, which I like to call "the practical Gigabit".
So maybe someone could help me understand why offloading helped so much despite the seemingly ample processor power... Does it have anything to do with how the internal switch is designed? I'd really like to understand why the device functions the way it does...
slow path is more bound to memory copy provided CPU got well with interrupt getting data to memory. fast path would do just one memory copy (changing few packet fields) in place of equivalent of 3-4 full copies in fastest case of slow path.
OK, your setup is sane If you compile you can pack firewall4 master and it will remove some brakes in default firewall paths.
For iperf may you try --bidir, that should exceed gigabit getting close to gig in gig out. On very weak CPU that kills any performance expectations.
The main thing to see via htop is whether CPU load from IRQs (red, netcards) and softirqs (lilac, firewall and qdisc) is evenly balanced between cpu cores.
I think I've changed both settings you mentioned. I've run iperf3 for 600 seconds and noticed that some time during the test, the command with the highest CPU% changed from ksoftirqd/1 to ksoftirqd/3. I don't know if this is significant.
In a minute, I will post another message with a couple of screen dumps I can't quite make sense of...
Note the imbalance in the 123: line... So I thought I'd check the value in /proc/irq/123/smp_affinity:
root@EdgeRouter4:/# cat /proc/irq/123/smp_affinity
f
If I understand correctly, the value in /proc/irq/123/smp_affinity says that interrupt 123 should run on all four cores (1 + 2 + 4 + 8 = 15, aka f in hex), but /proc/interrupts tells me that it actually runs on one core only... Am I reading this right and if so, does it make any sense?
[Later addition]
There's a similar imbalance in the 121: line, so I ran another check:
root@EdgeRouter4:/# cat /proc/irq/121/smp_affinity
f
Added option packet_steering '1' to the network config. There's definitely an improvement (throughput is up to about 750 Mbps compared to 650 before), but still below what I've seen with offloading (930 Mbps).
Also, when I run service packet_steering status, I get active with no instances. Is this normal?
Yes, there is no process instance, it is just a script that affines network queues to all CPUs in system.
Whats in picture of htop when it runs at max speed without and with offload....