Ubiquiti Edgerouter 4: what drives performance?

I recently got myself a Ubiquiti Edgerouter 4 for experimentation. Installing OpenWrt was very easy. So I did that, then threw the router on the workbench to run some tests. And right away, there's something I don't quite understand.

According to its ToH page, the ER-4 has a quad-core OCTEON CN7130 processor running at 1 GHz. On a fresh install, a LAN-to-WAN iperf3 test showed throughput of approximately 600 Mbps. Enabling offloading brought it to about 930 Mbps, which I like to call "the practical Gigabit".

So maybe someone could help me understand why offloading helped so much despite the seemingly ample processor power... Does it have anything to do with how the internal switch is designed? I'd really like to understand why the device functions the way it does...

https://forum.openwrt.org/t/is-it-possible-to-add-block-ciphers-for-cavium-octeon/97133
We will in sooner or later anyway end up in this fact on this question also.

slow path is more bound to memory copy provided CPU got well with interrupt getting data to memory. fast path would do just one memory copy (changing few packet fields) in place of equivalent of 3-4 full copies in fastest case of slow path.

Are you using irqbalance?

If not, it is something you can try

Sounds interesting... Will try it out... Thank you for the suggestion!

Sadly, no performance improvement with irqbalance. Here are my iperf3 test results:

  • Stock setup: 590 Mbps
  • irqbalance enabled: 566 Mbps
  • Offloading enabled: 933 Mbps

I didn't do any configuration of irqbalance though; just installed and enabled it. Were there any specific settings I should have tried?

Would be nice to state OpenWRT version....
Where is iperf server?

Install htop
In there F2-Setup - unhide kernel threads and enable CPU detail, and make screenshot during speed test.

23.05.3

It's a test bench type of setup. iperf3 client is connected to one of the LAN ports, iperf3 server is connected to the WAN port.

Will do when I get a chance and report back...

OK, your setup is sane :wink: If you compile you can pack firewall4 master and it will remove some brakes in default firewall paths.

For iperf may you try --bidir, that should exceed gigabit getting close to gig in gig out. On very weak CPU that kills any performance expectations.

The main thing to see via htop is whether CPU load from IRQs (red, netcards) and softirqs (lilac, firewall and qdisc) is evenly balanced between cpu cores.

OK, here's the htop screenshot:

I think I've changed both settings you mentioned. I've run iperf3 for 600 seconds and noticed that some time during the test, the command with the highest CPU% changed from ksoftirqd/1 to ksoftirqd/3. I don't know if this is significant.

In a minute, I will post another message with a couple of screen dumps I can't quite make sense of...

Now, the screen dumps. First, this:

root@EdgeRouter4:/# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  8:     190544     188742     190804     188909      Core      timer
 24:        342        289        288        288       CIU  15  Ethernet
 25:          0          0          0          0       CIU 127  cib
 26:          0          0          0          0       CIU 144  cib
 27:          0          0          0          0       CIU 145  cib
 44:          0          0          0          0       CIB   9  xhci-hcd:usb1
 45:       1441       1437       1437       1436       CIU  34  ttyS0
 73:          3          0          0          0     CIU-W      octeon_wdt
 74:          0          3          0          0     CIU-W      octeon_wdt
 75:          0          0          3          0     CIU-W      octeon_wdt
 76:          0          0          0          3     CIU-W      octeon_wdt
 89:          0          0          0          0       CIU  88  cib
 90:          0          0          0          0       CIU  87  cib
 91:       2129        348        347        347       CIU  83  octeon_mmc
 97:          0          0          0          0       CIU  97  cib
105:       7969       7983       9856       8457     CIU-M      SMP-IPI
117:          0          0          0          0       CIU  45  i2c-octeon
121:        869          0          0          0       CIU  53  Ethernet
123:    1881284          0          0          0       CIU  55  oct_ilm
124:          0          0          0          0       CIU 116  cib
ERR:          0

Note the imbalance in the 123: line... So I thought I'd check the value in /proc/irq/123/smp_affinity:

root@EdgeRouter4:/# cat /proc/irq/123/smp_affinity
f

If I understand correctly, the value in /proc/irq/123/smp_affinity says that interrupt 123 should run on all four cores (1 + 2 + 4 + 8 = 15, aka f in hex), but /proc/interrupts tells me that it actually runs on one core only... Am I reading this right and if so, does it make any sense?

[Later addition]

There's a similar imbalance in the 121: line, so I ran another check:

root@EdgeRouter4:/# cat /proc/irq/121/smp_affinity
f

Check for packet steering. It should distribute packets across all CPUs

(Luci/Network/interfacesglobals/)
XOR
/etc/config/network

config globals 'globals'
        option packet_steering '1'
service packet_steering restart

then re-do the tests.

Added option packet_steering '1' to the network config. There's definitely an improvement (throughput is up to about 750 Mbps compared to 650 before), but still below what I've seen with offloading (930 Mbps).

Also, when I run service packet_steering status, I get active with no instances. Is this normal?

Yes, there is no process instance, it is just a script that affines network queues to all CPUs in system.
Whats in picture of htop when it runs at max speed without and with offload....

EDIT:
like this

Here's the picture without offloading:

Looks like it all piles onto a single core...

Here's what happens with offloading:

_ilm is RNG, you need to move irq 122 / ethernet which unlike :smiley: serial :smiley: port :smiley: does not balance across cpus....

You can try to install snapshot shortly ->

  • prepare current image with auc -n -f and download
  • save config backup
  • take snapshot, add htop luci in firmware selector
  • upgrade to snapshot, check htop picture
  • flash back sysupgrade resetting config and restore config via 192.168.1.1

It looks something very platform specific, the blank squares in htop show you have 75% of CPU idle while not being to transfer gigabit.

OK, I have sysupgraded to snapshot, and here's htop output with offloading still active (throughput 900+ Mbps):

Note high usage of core 0...

Here's the picture without offloading (throughput ~600 Mbps):

I have no idea what's happening... :thinking: