A Wireguard comparison DB

I somehow missed the N100 in the table, thanks for pointing it out.
BTW, I found some reports on Google that says J4125 can't handle 1Gbps Wireguard (though they are running OPNsense, not sure if this is relevant)

For some newer Intel CPUs, this could be a bit tricky, pfSense/OPNsense might not handle the dynamic clocking very well, I saw some people forcing a performance mode then the Wireguard result jumps up quickly, also I noticed that older OPNsense has Wireguard running in userspace which is slower than the preferred kernel mode.

But of course, the test in this thread has skipped something that would happen in real world (already mentioned at beginning), like WAN-LAN traffic (so there will be extra hardware needs to handle that, and if it's not server grade more CPU might be consumed), so I won't be surprised that when you build the actual hardware the speed would be lower than what you see here.

When I look into some real world testing results, like RPi 4B, NanoPi R4S, they still capable to give > 800Mbps Wireguard speed which already impressed me, I would guess that J4125 probably not giving full 1Gbps with WAN-LAN in place but shouldn't be too far away from it.

1 Like

Just for comparison sake, I ran it directly on Proxmox:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.95 GBytes  4.25 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  4.94 GBytes  4.25 Gbits/sec                  receiver

Your VM is configuring with i440FX which looks pretty old, I think there should be newer Q35 one? Would that one be faster?

1 Like

Kernel 6.6 has been available in some snapshots for a week now.

1 Like

Yep, turns out it is a limitation of OPNsense (or FreeBSD in general). (Relevant thread from the OPNsense forum)

I guess I will install OpenWrt on my mini PC now...

Is there a drop in speed for real world usage? Or just during this artificial testing?

In theory if WireGuard is slower via this test then it could be for real world usage too. I don't have gigabit internet to prove that though.

Either way, the benchmark proves that something isn't quite right with kernel 6.6.

That's doing a lot of heavy lifting. This 'benchmark' is questionable at best when it comes to assessing real world performance.

It proves that something might have changed. But given this benchmark relies on an artificial environment and adds a number of (significant) overheads that wouldn't be present in a real world scenario it's difficult to definitively say the kernel change is problematic.

This issue requires investigation though since the results show that kernel 6.6 is slower than both 6.1 and 5.15 with this test. And since nobody knows why it's slower we can't rule out that it's not going to cause issues elsewhere.

Right now I'd just like to know if things are worse for lower end MediaTek devices and if it affects other targets.

When I've got a bit more time I plan on rolling back to when 6.6 was first added as an option, just to see if the regression has been there from the start. And that type of information is useful in a bug report.

It'd probably be more useful to determine whether it has any impact in the real world. How high up on the list of work do you think a bug report about a slowdown in a synthetic benchmark (which is questionable about what it tests as it is) is going to be?

Not to say don't do whatever testing with the benchmark you think is appropriate, but you may want to temper your expectations around what may come of it.

I disagree, since with real-world usage you have to factor in the reliability of your chosen VPN and how good your own connection is. But of course if I had gigabit internet then I'd perform speed tests while using WireGuard and OpenVPN.

I think it'd be foolish to rule out issues simply because you might only see them when running a synthetic benchmark. Especially when we don't know if there's another way to see the regression and historically this benchmark has gave somewhat consistent results.

At least if I report the issue with plenty of data then it'll be investigated and we'll either be told why it's slower with kernel 6.6 or a bug might be patched.

I have a J4125 mini PC running Proxmox 8.2.2.

It can run ~900Mbps with Wireguard. htop shows that every core of the J4125 is 30%-40% loaded when I run iperf3 against the mini PC (through the Wireguard tunnel). Your mileage might vary.

2 Likes

Nice to see this, I also found a pull request for x86 to switch to Linux 6.6.
Not sure if Raspberry Pi will get this as well since official Raspberry Pi OS already has the 6.6 introduced 2 months ago.

I have gigabit internet and would be happy to real world test, but always considered wireguard the most convoluted confusing and pointless thing to setup, so never have. I just connect to a VPN on my PC if I need it and just run this synth benchmark for tests. If anyone has a sensible guide let me know, the wiki is such a mess for it I won't go near it lol.

Just out of curiousty, how can flushing the nft ruleset increase the performance by 20%-30%!?

Finally resolved this error. It was due to missing support for NETNS in my OpenWrt custom build. Look here for more info.
Solution is to use NETNS in the build. OpenWrt default is enabled for this option.
Screenshot 2024-05-12 011703

1 Like

Which means the NSS build missed something?

No, the NSS build doesn't miss anything.
By default it is enabled in OpenWrt menuconfig.
I purposely disabled the setting to try other things not knowing the consequences to wg-bench.
Now everything is clear.

With latest snapshot of BPI-R4:

root@R4:~/wg-bench# ubus call system board
{
	"kernel": "6.6.30",
	"hostname": "R4",
	"system": "ARMv8 Processor rev 0",
	"model": "Bananapi BPI-R4",
	"board_name": "bananapi,bpi-r4",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "SNAPSHOT",
		"revision": "r26302-4f87a4d84f",
		"target": "mediatek/filogic",
		"description": "OpenWrt SNAPSHOT r26302-4f87a4d84f"
	}
}
root@R4:~/wg-bench# ./benchmark.sh 
Connecting to host 169.254.200.2, port 5201
[  5] local 169.254.200.1 port 37570 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   138 MBytes  1.16 Gbits/sec    0    902 KBytes       
[  5]   1.00-2.00   sec   134 MBytes  1.12 Gbits/sec    0    902 KBytes       
[  5]   2.00-3.00   sec   137 MBytes  1.15 Gbits/sec    0    902 KBytes       
[  5]   3.00-4.00   sec   134 MBytes  1.13 Gbits/sec    0    902 KBytes       
[  5]   4.00-5.00   sec   135 MBytes  1.13 Gbits/sec    0    902 KBytes       
[  5]   5.00-6.00   sec   134 MBytes  1.12 Gbits/sec    0    902 KBytes       
[  5]   6.00-7.00   sec   135 MBytes  1.14 Gbits/sec    0   1014 KBytes       
[  5]   7.00-8.00   sec   134 MBytes  1.13 Gbits/sec    0   1014 KBytes       
[  5]   8.00-9.00   sec   135 MBytes  1.13 Gbits/sec    0   1.30 MBytes       
[  5]   9.00-10.00  sec   135 MBytes  1.13 Gbits/sec    0   1.30 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.32 GBytes  1.13 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  1.32 GBytes  1.13 Gbits/sec                  receiver

iperf Done.
root@R4:~/wg-bench# ./benchmark.sh -R
Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[  5] local 169.254.200.1 port 56548 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   127 MBytes  1.07 Gbits/sec                  
[  5]   1.00-2.00   sec   127 MBytes  1.07 Gbits/sec                  
[  5]   2.00-3.00   sec   126 MBytes  1.06 Gbits/sec                  
[  5]   3.00-4.00   sec   128 MBytes  1.07 Gbits/sec                  
[  5]   4.00-5.00   sec   126 MBytes  1.06 Gbits/sec                  
[  5]   5.00-6.00   sec   126 MBytes  1.06 Gbits/sec                  
[  5]   6.00-7.00   sec   127 MBytes  1.07 Gbits/sec                  
[  5]   7.00-8.00   sec   128 MBytes  1.07 Gbits/sec                  
[  5]   8.00-9.00   sec   128 MBytes  1.08 Gbits/sec                  
[  5]   9.00-10.00  sec   126 MBytes  1.05 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.24 GBytes  1.07 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  1.24 GBytes  1.07 Gbits/sec                  receiver

iperf Done.

Then I found that by forcing CPU governor to performance mode, it has a significant jump:
echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor

root@R4:~/wg-bench# ./benchmark.sh 
Connecting to host 169.254.200.2, port 5201
[  5] local 169.254.200.1 port 44072 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   156 MBytes  1.31 Gbits/sec    0    716 KBytes       
[  5]   1.00-2.00   sec   152 MBytes  1.28 Gbits/sec    0    716 KBytes       
[  5]   2.00-3.00   sec   160 MBytes  1.34 Gbits/sec    0   1.22 MBytes       
[  5]   3.00-4.00   sec   154 MBytes  1.29 Gbits/sec    0   1.22 MBytes       
[  5]   4.00-5.00   sec   154 MBytes  1.29 Gbits/sec    0   1.22 MBytes       
[  5]   5.00-6.00   sec   154 MBytes  1.30 Gbits/sec    0   1.22 MBytes       
[  5]   6.00-7.00   sec   154 MBytes  1.29 Gbits/sec    0   1.22 MBytes       
[  5]   7.00-8.00   sec   156 MBytes  1.31 Gbits/sec    0   1.34 MBytes       
[  5]   8.00-9.00   sec   154 MBytes  1.30 Gbits/sec    0   1.34 MBytes       
[  5]   9.00-10.00  sec   152 MBytes  1.27 Gbits/sec    0   1.34 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.51 GBytes  1.30 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  1.51 GBytes  1.29 Gbits/sec                  receiver

iperf Done.
root@R4:~/wg-bench# ./benchmark.sh -R
Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[  5] local 169.254.200.1 port 45368 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   145 MBytes  1.21 Gbits/sec                  
[  5]   1.00-2.00   sec   145 MBytes  1.22 Gbits/sec                  
[  5]   2.00-3.00   sec   145 MBytes  1.21 Gbits/sec                  
[  5]   3.00-4.00   sec   144 MBytes  1.21 Gbits/sec                  
[  5]   4.00-5.00   sec   144 MBytes  1.21 Gbits/sec                  
[  5]   5.00-6.00   sec   146 MBytes  1.23 Gbits/sec                  
[  5]   6.00-7.00   sec   145 MBytes  1.21 Gbits/sec                  
[  5]   7.00-8.00   sec   145 MBytes  1.22 Gbits/sec                  
[  5]   8.00-9.00   sec   145 MBytes  1.22 Gbits/sec                  
[  5]   9.00-10.00  sec   144 MBytes  1.21 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.42 GBytes  1.22 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  1.42 GBytes  1.22 Gbits/sec                  receiver

iperf Done.
2 Likes