A Wireguard comparison DB

|NanoPi R5C | RK3568B2 (Quad Core A55, 2.0GHz) | 24.10rc4 | 339 Mbps|

ubus call system board

{
"kernel": "6.6.67",
"hostname": "R5C",
"system": "ARMv8 Processor rev 0",
"model": "FriendlyElec NanoPi R5C",
"board_name": "friendlyarm,nanopi-r5c",
"rootfs_type": "squashfs",
"release": {
"distribution": "OpenWrt",
"version": "24.10.0-rc4",
"revision": "r28211-d55754ce0d",
"target": "rockchip/armv8",
"description": "OpenWrt 24.10.0-rc4 r28211-d55754ce0d",
"builddate": "1734915335"
}
}

./benchmark.sh

Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 35312 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 43.8 MBytes 367 Mbits/sec 0 331 KBytes
[ 5] 1.00-2.00 sec 42.9 MBytes 360 Mbits/sec 0 346 KBytes
[ 5] 2.00-3.00 sec 44.9 MBytes 376 Mbits/sec 0 363 KBytes
[ 5] 3.00-4.00 sec 43.9 MBytes 368 Mbits/sec 0 363 KBytes
[ 5] 4.00-5.00 sec 42.8 MBytes 359 Mbits/sec 0 363 KBytes
[ 5] 5.00-6.00 sec 43.8 MBytes 367 Mbits/sec 0 363 KBytes
[ 5] 6.00-7.00 sec 44.1 MBytes 370 Mbits/sec 0 363 KBytes
[ 5] 7.00-8.00 sec 42.6 MBytes 358 Mbits/sec 0 363 KBytes
[ 5] 8.00-9.00 sec 44.0 MBytes 369 Mbits/sec 0 379 KBytes
[ 5] 9.00-10.00 sec 42.9 MBytes 360 Mbits/sec 0 379 KBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 436 MBytes 366 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 435 MBytes 365 Mbits/sec receiver

iperf Done.

./benchmark.sh -R

Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[ 5] local 169.254.200.1 port 59536 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 40.6 MBytes 340 Mbits/sec
[ 5] 1.00-2.00 sec 39.8 MBytes 333 Mbits/sec
[ 5] 2.00-3.00 sec 40.6 MBytes 341 Mbits/sec
[ 5] 3.00-4.00 sec 40.1 MBytes 337 Mbits/sec
[ 5] 4.00-5.00 sec 40.1 MBytes 337 Mbits/sec
[ 5] 5.00-6.00 sec 42.1 MBytes 353 Mbits/sec
[ 5] 6.00-7.00 sec 41.0 MBytes 344 Mbits/sec
[ 5] 7.00-8.00 sec 40.5 MBytes 340 Mbits/sec
[ 5] 8.00-9.00 sec 39.9 MBytes 335 Mbits/sec
[ 5] 9.00-10.00 sec 39.1 MBytes 328 Mbits/sec


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 405 MBytes 340 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 404 MBytes 339 Mbits/sec receiver

iperf Done.

Edit/Comment:
R5C performance relative to other hardware with comparable specifications is very underwhelming, just like with the R2S.

@1715173329 this issue affecting multiple Rock chip targets (here for R3S) looks like the culprit.

Yes, I can confirm your observations, my R2S/R4S/R6S all giving me very low numbers which is contradicting to some YouTubers real world tests (for example R4S was tested to give ~800Mbps while R6S can go up to ~1Gbps when using FriendlyElec's build), at the beginning I though it was CPU affinity issue but I wasn't able to fix it by changing CPU affinity, and author of wg-bench also noticed that this issue happens on Big.Little architecture ARM platform, so yeah it could be kernel bug.

Thanks for the GitHub issue link I just added my comments there as well.

tbh I'm not interested in WireGuard, and the issue is more likely WireGuard-specific. Anyway the Kconfig change looks lovely, feel free to create a PR.

This thread is wireguard related, but my R5C tops out around 410 Mbps with CAKE, which also seems a bit slow for a quad core A55 at 2 GHz. But regardless, interest is noted.

On NanoPI R6S, if you just run the wg-bench test on the FriendlyWrt firmware, the result will be about 1.04 Gbits/sec.
But if after running the test - ./benchmark.sh -R simultaneously use the Luci interface, open and switch tabs, or switch to displaying updated graphs of processor load statistics or temperature, the result will be very surprising, since it will give almost 3.5 Gbits/sec!
I repeated this experiment several times on the FriendlyWrt-2024-12-09 firmware -> https://github.com/friendlyarm/Actions-FriendlyWrt/releases
Try to repeat the test conditions I said.
Obviously, this is the same kernel configuration error as in OpenWRT, only with another flag of the kernel configuration -> CONFIG_PREEMPT_VOLUNTARY=y.

P.s. I recorded a video from my computer screen. The file is small, you can download it within 6 days here -> https://dropmefiles.com/o9Fh6
or here (4 days) -> https://drop.chapril.org/download/051b47806e086991/#xhlD5xocnd6c8oI7YFwpvQ

Maybe this is a consequence of the RS6s little-big CPU set up?
Quad-core ARM Cortex-A76(up to 2.4GHz) and quad-core Cortex-A55 CPU (up to 1.8GHz)
So maybe it depends on which CPUs wireguards ends up running?

I did not notice any changes in the CPU cores load. It is not like one cluster is working. All 8 cores are loaded, this can be tracked by the load graphs. The only difference is that when the test gives 1Gbit/s - the cores are not heavily loaded, and when the result is 3.5Gbit/s - the cores load is about 80%.

I do wonder if this is is core-clock/mem subsystem scaling going on, ur sub-optimal memory saturation causing a bottle neck at that transaction insertion rate.

On one RPi3 that I use as remote node in order to get Tailscale/Wireguard running to ~100 Mbit line speed on such cheap device I for example cap my CPU min and max freqs between 900 MHz and 1.2 GHz, but also keep the DDR3 mem scaled up and overclocked vía config.txt.

If I don't peg the DDR and uncore/mem frequency, I would see 100% CPU utilizations with lower throughput. That was because the CPU would stall due to the bottleneck waiting for data, and thus keeping the CPU 'loaded' but not really making forward progress at the same rate.

Since the RS6s of yours has a far newer microarch, it might not be apples to apples, but I am certain there has to be a frequency sweet spot for which you could boost the floor and observe consistent 3.5 Gbit WG rate at 80% OS CPU utilizations - the "open up the GUI and use Luci might be causing those clocks to bump naturally" but leaving things to chance.

I saw that the OpenWrt GitHub issue has accepted the fix and will be back ported to 24.10.0

1 Like

i5 8400T with a single stick of 2666 MHz memory (single channel) in proxmox 8.3:

Debian 12.9 (Proxmox 8.3 with test kernel 6.11).

root@coffeelake:~/wg-bench# uname -a
Linux coffeelake 6.11.0-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.11.0-2 (2024-12-04T10:29Z) x86_64 GNU/Linux
root@coffeelake:~/wg-bench# ./benchmark.sh 
Connecting to host 169.254.200.2, port 5201
[  5] local 169.254.200.1 port 36430 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   414 MBytes  3.47 Gbits/sec    0   1.00 MBytes       
[  5]   1.00-2.00   sec   409 MBytes  3.43 Gbits/sec    0   1.08 MBytes       
[  5]   2.00-3.00   sec   408 MBytes  3.42 Gbits/sec    0   1.14 MBytes       
[  5]   3.00-4.00   sec   408 MBytes  3.42 Gbits/sec    0   1.20 MBytes       
[  5]   4.00-5.00   sec   408 MBytes  3.42 Gbits/sec    0   1.20 MBytes       
[  5]   5.00-6.00   sec   406 MBytes  3.41 Gbits/sec    0   1.20 MBytes       
[  5]   6.00-7.00   sec   408 MBytes  3.42 Gbits/sec    0   1.20 MBytes       
[  5]   7.00-8.00   sec   406 MBytes  3.41 Gbits/sec    0   1.20 MBytes       
[  5]   8.00-9.00   sec   406 MBytes  3.41 Gbits/sec    0   1.26 MBytes       
[  5]   9.00-10.00  sec   408 MBytes  3.42 Gbits/sec    0   1.26 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.98 GBytes  3.42 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  3.98 GBytes  3.42 Gbits/sec                  receiver

iperf Done.

with -R parameter I get 3.1 GBps

Yes, I saw it.
I hope that was the only problem and we won't have to solve any more. :smiley:

Did you pass all cores to the OpenWrt VM?

I ran the test on the proxmox host not on the openwrt VM.

Forgot my little Celeron J4125 mini PC (with dual 1GbE NIC!), time to test with latest RC5.

root@OpenWrt:~# ubus call system board
{
        "kernel": "6.6.69",
        "hostname": "OpenWrt",
        "system": "Intel(R) Celeron(R) J4125 CPU @ 2.00GHz",
        "model": "BESSTAR TECH LIMITED GK41",
        "board_name": "besstar-tech-limited-gk41",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "24.10.0-rc5",
                "revision": "r28304-6dacba30a7",
                "target": "x86/64",
                "description": "OpenWrt 24.10.0-rc5 r28304-6dacba30a7",
                "builddate": "1736026537"
        }
}

root@OpenWrt:~# ./benchmark.sh 
Connecting to host 169.254.200.2, port 5201
[  5] local 169.254.200.1 port 37474 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   255 MBytes  2.13 Gbits/sec    0   1.34 MBytes       
[  5]   1.00-2.00   sec   245 MBytes  2.06 Gbits/sec    0   1.34 MBytes       
[  5]   2.00-3.00   sec   246 MBytes  2.06 Gbits/sec    0   1.53 MBytes       
[  5]   3.00-4.00   sec   243 MBytes  2.04 Gbits/sec    0   1.60 MBytes       
[  5]   4.00-5.00   sec   243 MBytes  2.04 Gbits/sec    0   1.60 MBytes       
[  5]   5.00-6.00   sec   244 MBytes  2.05 Gbits/sec    0   1.60 MBytes       
[  5]   6.00-7.00   sec   242 MBytes  2.03 Gbits/sec    0   1.60 MBytes       
[  5]   7.00-8.00   sec   242 MBytes  2.03 Gbits/sec    0   1.60 MBytes       
[  5]   8.00-9.00   sec   246 MBytes  2.06 Gbits/sec    0   1.69 MBytes       
[  5]   9.00-10.00  sec   245 MBytes  2.05 Gbits/sec    0   1.69 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.39 GBytes  2.06 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.39 GBytes  2.05 Gbits/sec                  receiver

iperf Done.
root@OpenWrt:~# ./benchmark.sh -R
Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[  5] local 169.254.200.1 port 56082 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   211 MBytes  1.77 Gbits/sec                  
[  5]   1.00-2.00   sec   210 MBytes  1.76 Gbits/sec                  
[  5]   2.00-3.00   sec   212 MBytes  1.78 Gbits/sec                  
[  5]   3.00-4.00   sec   210 MBytes  1.76 Gbits/sec                  
[  5]   4.00-5.00   sec   208 MBytes  1.75 Gbits/sec                  
[  5]   5.00-6.00   sec   208 MBytes  1.75 Gbits/sec                  
[  5]   6.00-7.00   sec   209 MBytes  1.75 Gbits/sec                  
[  5]   7.00-8.00   sec   208 MBytes  1.75 Gbits/sec                  
[  5]   8.00-9.00   sec   208 MBytes  1.75 Gbits/sec                  
[  5]   9.00-10.00  sec   208 MBytes  1.75 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.05 GBytes  1.76 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  2.04 GBytes  1.76 Gbits/sec                  receiver

iperf Done.

Just acquired a PC with some broken ports cheaply (well, I don't mind broken DP port and 1 or 2 USB ports not working).

The HP ProDesk 400 G6 SFF with i3-9100 seems to be on par with the N100 (but you know the power consumption)

root@OpenWrt:~/wg-bench# ubus call system board
{
        "kernel": "6.6.69",
        "hostname": "OpenWrt",
        "system": "Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz",
        "model": "HP HP ProDesk 400 G6 SFF",
        "board_name": "hp-hp-prodesk-400-g6-sff",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "24.10.0-rc5",
                "revision": "r28304-6dacba30a7",
                "target": "x86/64",
                "description": "OpenWrt 24.10.0-rc5 r28304-6dacba30a7",
                "builddate": "1736026537"
        }
}
root@OpenWrt:~/wg-bench# ./benchmark.sh 
Connecting to host 169.254.200.2, port 5201
[  5] local 169.254.200.1 port 45178 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   568 MBytes  4.76 Gbits/sec    0   1.09 MBytes       
[  5]   1.00-2.00   sec   558 MBytes  4.68 Gbits/sec    0   1.09 MBytes       
[  5]   2.00-3.00   sec   559 MBytes  4.69 Gbits/sec    0   1.09 MBytes       
[  5]   3.00-4.00   sec   560 MBytes  4.70 Gbits/sec    0   1.09 MBytes       
[  5]   4.00-5.00   sec   561 MBytes  4.71 Gbits/sec    0   1.09 MBytes       
[  5]   5.00-6.00   sec   559 MBytes  4.69 Gbits/sec    0   1.15 MBytes       
[  5]   6.00-7.00   sec   559 MBytes  4.69 Gbits/sec    0   1.15 MBytes       
[  5]   7.00-8.00   sec   560 MBytes  4.69 Gbits/sec    0   1.22 MBytes       
[  5]   8.00-9.00   sec   560 MBytes  4.70 Gbits/sec    0   1.22 MBytes       
[  5]   9.00-10.00  sec   558 MBytes  4.68 Gbits/sec    0   1.22 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.47 GBytes  4.70 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  5.47 GBytes  4.69 Gbits/sec                  receiver

iperf Done.
root@OpenWrt:~/wg-bench# ./benchmark.sh -R
Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[  5] local 169.254.200.1 port 45650 connected to 169.254.200.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   535 MBytes  4.49 Gbits/sec                  
[  5]   1.00-2.00   sec   532 MBytes  4.47 Gbits/sec                  
[  5]   2.00-3.00   sec   532 MBytes  4.46 Gbits/sec                  
[  5]   3.00-4.00   sec   534 MBytes  4.47 Gbits/sec                  
[  5]   4.00-5.00   sec   532 MBytes  4.46 Gbits/sec                  
[  5]   5.00-6.00   sec   532 MBytes  4.46 Gbits/sec                  
[  5]   6.00-7.00   sec   528 MBytes  4.43 Gbits/sec                  
[  5]   7.00-8.00   sec   529 MBytes  4.43 Gbits/sec                  
[  5]   8.00-9.00   sec   528 MBytes  4.43 Gbits/sec                  
[  5]   9.00-10.00  sec   526 MBytes  4.42 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.19 GBytes  4.46 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  5.18 GBytes  4.45 Gbits/sec                  receiver

iperf Done.