A bit surprise that 2 CPUs belonging to same family, J1900 has higher clock rate but result is slightly slower than N2930 (mine is on Jetway NF9HG-N2930 industrial ITX).
Doesn't work
root@QNAP:~# cd /tmp
root@QNAP:/tmp# ./clean-up.sh
Cannot remove namespace file "/var/run/netns/wg-bench": No such file or directory
Cannot find device "wg-bench"
Cannot find device "wg-bench-wg"
root@QNAP:/tmp# ./setup-netns.sh
Cannot open init namespace: No such file or directory
RTNETLINK answers: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
setting the network namespace "wg-bench" failed: Invalid argument
root@QNAP:/tmp#
ubus call system board
root@OpenWrt:~# ubus call system board
"kernel": "6.1.86",
"hostname": "OpenWrt",
"system": "ARMv8 Processor rev 0",
"model": "Bananapi BPI-R4",
"board_name": "bananapi,bpi-r4",
"rootfs_type": "squashfs",
"release": {
"distribution": "OpenWrt",
"version": "SNAPSHOT",
"revision": "r25942-12137cb460",
"target": "mediatek/filogic",
"description": "OpenWrt SNAPSHOT r25942-12137cb460"
./benchmark.sh
root@OpenWrt:~/wg-bench# ./benchmark.sh
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 41866 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 144 MBytes 1.21 Gbits/sec 0 1.58 MBytes
[ 5] 1.00-2.00 sec 140 MBytes 1.17 Gbits/sec 0 1.75 MBytes
[ 5] 2.00-3.00 sec 137 MBytes 1.15 Gbits/sec 0 1.99 MBytes
[ 5] 3.00-4.00 sec 138 MBytes 1.16 Gbits/sec 0 2.09 MBytes
[ 5] 4.00-5.00 sec 136 MBytes 1.14 Gbits/sec 0 2.09 MBytes
[ 5] 5.00-6.00 sec 137 MBytes 1.15 Gbits/sec 0 2.32 MBytes
[ 5] 6.00-7.00 sec 137 MBytes 1.15 Gbits/sec 0 2.45 MBytes
[ 5] 7.00-8.00 sec 137 MBytes 1.15 Gbits/sec 0 2.45 MBytes
[ 5] 8.00-9.00 sec 136 MBytes 1.14 Gbits/sec 0 2.45 MBytes
[ 5] 9.00-10.00 sec 136 MBytes 1.14 Gbits/sec 0 2.45 MBytes
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.35 GBytes 1.16 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.34 GBytes 1.15 Gbits/sec receiver
./benchmark.sh -R
root@OpenWrt:~/wg-bench# ./benchmark.sh -R
Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[ 5] local 169.254.200.1 port 39390 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 147 MBytes 1.23 Gbits/sec
[ 5] 1.00-2.00 sec 149 MBytes 1.25 Gbits/sec
[ 5] 2.00-3.00 sec 147 MBytes 1.23 Gbits/sec
[ 5] 3.00-4.00 sec 149 MBytes 1.25 Gbits/sec
[ 5] 4.00-5.00 sec 149 MBytes 1.25 Gbits/sec
[ 5] 5.00-6.00 sec 148 MBytes 1.25 Gbits/sec
[ 5] 6.00-7.00 sec 148 MBytes 1.24 Gbits/sec
[ 5] 7.00-8.00 sec 146 MBytes 1.23 Gbits/sec
[ 5] 8.00-9.00 sec 151 MBytes 1.26 Gbits/sec
[ 5] 9.00-10.00 sec 151 MBytes 1.26 Gbits/sec
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.45 GBytes 1.25 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.45 GBytes 1.25 Gbits/sec receiver
root@OpenWrt:~#
Now it is better
|fujitsu-futro-s920 | AMD GX-415GA SOC with Radeon(tm) HD Graphics | 23.05.3 | 799 Mbps|
ubus call system board
{
"kernel": "5.15.150",
"hostname": "OpenWrt",
"system": "AMD GX-415GA SOC with Radeon(tm) HD Graphics",
"model": "FUJITSU FUTRO S920",
"board_name": "fujitsu-futro-s920",
"rootfs_type": "ext4",
"release": {
"distribution": "OpenWrt",
"version": "23.05.3",
"revision": "r23809-234f1a2efa",
"target": "x86/64",
"description": "OpenWrt 23.05.3 r23809-234f1a2efa"
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 94.1 MBytes 789 Mbits/sec 0 502 KBytes
[ 5] 1.00-2.00 sec 95.2 MBytes 799 Mbits/sec 0 538 KBytes
[ 5] 2.00-3.00 sec 96.0 MBytes 805 Mbits/sec 0 573 KBytes
[ 5] 3.00-4.00 sec 95.8 MBytes 803 Mbits/sec 0 573 KBytes
[ 5] 4.00-5.00 sec 94.8 MBytes 795 Mbits/sec 0 573 KBytes
[ 5] 5.00-6.00 sec 95.2 MBytes 799 Mbits/sec 0 573 KBytes
[ 5] 6.00-7.00 sec 94.8 MBytes 795 Mbits/sec 0 573 KBytes
[ 5] 7.00-8.00 sec 95.6 MBytes 802 Mbits/sec 0 573 KBytes
[ 5] 8.00-9.00 sec 96.0 MBytes 805 Mbits/sec 0 573 KBytes
[ 5] 9.00-10.00 sec 95.4 MBytes 800 Mbits/sec 0 573 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 953 MBytes 799 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 952 MBytes 798 Mbits/sec receiver
50% clock rate jump + 2 more cores, this result is a lot better.
There seems to be a regression with kernel 6.6 and before I report it I'd like to know how widespread the issue is. So could a few people compare a snapshot or two with kernel 6.1 to kernel 6.6?
My images were compiled with the same packages and commits. No settings were changed.
r26156-fb2475e6bd (kernel 6.1.89)
root@GL-MT6000:~/wg-bench# ./benchmark.sh
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 55986 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 95.6 MBytes 801 Mbits/sec 0 492 KBytes
[ 5] 1.00-2.00 sec 95.8 MBytes 803 Mbits/sec 0 516 KBytes
[ 5] 2.00-3.00 sec 95.1 MBytes 798 Mbits/sec 0 540 KBytes
[ 5] 3.00-4.00 sec 95.6 MBytes 802 Mbits/sec 0 540 KBytes
[ 5] 4.00-5.00 sec 96.2 MBytes 807 Mbits/sec 0 566 KBytes
[ 5] 5.00-6.00 sec 96.6 MBytes 810 Mbits/sec 0 566 KBytes
[ 5] 6.00-7.00 sec 95.2 MBytes 799 Mbits/sec 0 566 KBytes
[ 5] 7.00-8.00 sec 96.0 MBytes 805 Mbits/sec 0 566 KBytes
[ 5] 8.00-9.00 sec 96.0 MBytes 805 Mbits/sec 0 566 KBytes
[ 5] 9.00-10.00 sec 96.0 MBytes 805 Mbits/sec 0 566 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 958 MBytes 804 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 957 MBytes 803 Mbits/sec receiver
r26156-fb2475e6bd (kernel 6.6.29)
root@GL-MT6000:~/wg-bench# ./benchmark.sh
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 40992 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 91.8 MBytes 769 Mbits/sec 0 434 KBytes
[ 5] 1.00-2.00 sec 91.8 MBytes 770 Mbits/sec 0 537 KBytes
[ 5] 2.00-3.00 sec 90.6 MBytes 760 Mbits/sec 0 564 KBytes
[ 5] 3.00-4.00 sec 90.6 MBytes 760 Mbits/sec 0 564 KBytes
[ 5] 4.00-5.00 sec 88.9 MBytes 745 Mbits/sec 0 633 KBytes
[ 5] 5.00-6.00 sec 88.2 MBytes 741 Mbits/sec 0 633 KBytes
[ 5] 6.00-7.00 sec 88.9 MBytes 745 Mbits/sec 0 633 KBytes
[ 5] 7.00-8.00 sec 88.8 MBytes 744 Mbits/sec 0 633 KBytes
[ 5] 8.00-9.00 sec 89.0 MBytes 747 Mbits/sec 0 633 KBytes
[ 5] 9.00-10.00 sec 89.2 MBytes 749 Mbits/sec 0 633 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 898 MBytes 753 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 897 MBytes 752 Mbits/sec receiver
r26199+5-0b0e3e22f8 (kernel 6.1.89)
root@GL-MT6000:~/wg-bench# ./benchmark.sh
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 40830 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 95.4 MBytes 799 Mbits/sec 0 481 KBytes
[ 5] 1.00-2.00 sec 96.0 MBytes 805 Mbits/sec 0 502 KBytes
[ 5] 2.00-3.00 sec 95.5 MBytes 801 Mbits/sec 0 502 KBytes
[ 5] 3.00-4.00 sec 94.8 MBytes 795 Mbits/sec 0 529 KBytes
[ 5] 4.00-5.00 sec 96.0 MBytes 805 Mbits/sec 0 529 KBytes
[ 5] 5.00-6.00 sec 95.9 MBytes 804 Mbits/sec 0 529 KBytes
[ 5] 6.00-7.00 sec 95.9 MBytes 804 Mbits/sec 0 553 KBytes
[ 5] 7.00-8.00 sec 97.0 MBytes 814 Mbits/sec 0 553 KBytes
[ 5] 8.00-9.00 sec 96.1 MBytes 807 Mbits/sec 0 553 KBytes
[ 5] 9.00-10.00 sec 96.0 MBytes 805 Mbits/sec 0 553 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 958 MBytes 804 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 957 MBytes 803 Mbits/sec receiver
r26199+5-0b0e3e22f8 (kernel 6.6.30)
root@GL-MT6000:~/wg-bench# ./benchmark.sh
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 55376 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 92.9 MBytes 778 Mbits/sec 0 564 KBytes
[ 5] 1.00-2.00 sec 90.0 MBytes 755 Mbits/sec 0 564 KBytes
[ 5] 2.00-3.00 sec 90.5 MBytes 759 Mbits/sec 0 564 KBytes
[ 5] 3.00-4.00 sec 90.5 MBytes 759 Mbits/sec 0 564 KBytes
[ 5] 4.00-5.00 sec 91.5 MBytes 768 Mbits/sec 0 632 KBytes
[ 5] 5.00-6.00 sec 89.1 MBytes 748 Mbits/sec 0 632 KBytes
[ 5] 6.00-7.00 sec 88.5 MBytes 742 Mbits/sec 0 632 KBytes
[ 5] 7.00-8.00 sec 88.0 MBytes 739 Mbits/sec 0 632 KBytes
[ 5] 8.00-9.00 sec 88.4 MBytes 741 Mbits/sec 0 632 KBytes
[ 5] 9.00-10.00 sec 89.0 MBytes 746 Mbits/sec 0 632 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 898 MBytes 753 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 897 MBytes 752 Mbits/sec receiver
Just to back this up I note the same regression on my GL-MT6000. I tested this script last week 6.1.88 had 805 Mbps average, new snapshot 6.6.30 is 755 Mbps average.
Meanwhile performance for all other tests (SQM, Ksmbd, etc.) is the same or better.
This cpu should have better numbers with wg, also intel qat should speed up wg a lot.
By default OpenWrt doesn't have QAT compiled in AFAIK.
ubus call system board
{
"kernel": "6.6.30",
"hostname": "Dell_x64",
"system": "Intel(R) Core(TM) i5-7500T CPU @ 2.70GHz",
"model": "QEMU Standard PC (i440FX + PIIX, 1996)",
"board_name": "qemu-standard-pc-i440fx-piix-1996",
"rootfs_type": "ext4",
"release": {
"distribution": "OpenWrt",
"version": "SNAPSHOT",
"revision": "r0+26238-b1cb9a0713",
"target": "x86/64",
"description": "OpenWrt SNAPSHOT r0+26238-b1cb9a0713"
}
}
root@Dell_x64:/tmp/wg-bench# ./benchmark.sh
Connecting to host 169.254.200.2, port 5201
[ 5] local 169.254.200.1 port 36754 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 385 MBytes 3.22 Gbits/sec 0 991 KBytes
[ 5] 1.00-2.00 sec 389 MBytes 3.26 Gbits/sec 0 1.12 MBytes
[ 5] 2.00-3.00 sec 390 MBytes 3.27 Gbits/sec 0 1.12 MBytes
[ 5] 3.00-4.00 sec 391 MBytes 3.28 Gbits/sec 0 1.26 MBytes
[ 5] 4.00-5.00 sec 386 MBytes 3.24 Gbits/sec 0 1.26 MBytes
[ 5] 5.00-6.00 sec 374 MBytes 3.14 Gbits/sec 0 1.26 MBytes
[ 5] 6.00-7.00 sec 388 MBytes 3.26 Gbits/sec 0 1.26 MBytes
[ 5] 7.00-8.00 sec 382 MBytes 3.21 Gbits/sec 0 2.36 MBytes
[ 5] 8.00-9.00 sec 387 MBytes 3.25 Gbits/sec 0 2.36 MBytes
[ 5] 9.00-10.00 sec 387 MBytes 3.25 Gbits/sec 0 2.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 3.77 GBytes 3.24 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 3.77 GBytes 3.23 Gbits/sec receiver
iperf Done.
root@Dell_x64:/tmp/wg-bench# ./benchmark.sh -R
Connecting to host 169.254.200.2, port 5201
Reverse mode, remote host 169.254.200.2 is sending
[ 5] local 169.254.200.1 port 51960 connected to 169.254.200.2 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 343 MBytes 2.88 Gbits/sec
[ 5] 1.00-2.00 sec 350 MBytes 2.94 Gbits/sec
[ 5] 2.00-3.00 sec 349 MBytes 2.93 Gbits/sec
[ 5] 3.00-4.00 sec 349 MBytes 2.93 Gbits/sec
[ 5] 4.00-5.00 sec 350 MBytes 2.93 Gbits/sec
[ 5] 5.00-6.00 sec 355 MBytes 2.98 Gbits/sec
[ 5] 6.00-7.00 sec 352 MBytes 2.95 Gbits/sec
[ 5] 7.00-8.00 sec 356 MBytes 2.99 Gbits/sec
[ 5] 8.00-9.00 sec 358 MBytes 3.00 Gbits/sec
[ 5] 9.00-10.00 sec 357 MBytes 3.00 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 3.44 GBytes 2.95 Gbits/sec 63 sender
[ 5] 0.00-10.00 sec 3.44 GBytes 2.95 Gbits/sec receiver
iperf Done.
I see that you are running qemu, so how many core/threads in use? Or you gave all resources to the VM?
P.S. Snapshot is now on kernel 6.6? Or you built manually?
All CPU resources given, using Proxmox.
Not got it into production just yet. Still tinkering.
Built manually.
Thanks for confirmation, wow.....comparing the same 4C4T x86-64, the i5-7500T loses to N100, and it's not just a little bit, quite a surprise to me.
Not really. QAT doesn't offer any acceleration for WireGuard.
Offtopic... Anyways, it's pretty slow compared to the CPU in AES, so I'm running my C3558 SSD NAS with QAT disabled. It's so slow that I can't saturate these drives. No problem with CPU encryption. I've read somewhere that higher core count Atom parts have faster QAT, so maybe it makes sense to enable it there.
Yes it does chacha20 but only in gen4 qat engines as i now found infos, so not for 2***/3*** atoms. This.means that this is worthless for us since we don't have cheap hw with those hw accelerators.
Yeah. The concern is that if it impacts the GL-MT6000 this much then it might be even worse for lower-end routers. But so far nobody else has shared benchmarks from both 6.1 and 6.6.
It's probably safe to assume that the issue can be seen with all MediaTek devices, so maybe @nbd would be able to look into this.
Does anyone have statistics on how Intel J4125/N100 performs with Wireguard? Looking to maximize Wireguard throughput with 1Gbps internet speed
You didn't check the table? There is one for N100 already.
For J4125, let me find some time to do the test, I've got a mini PC but not tested yet, however you can see that even my super old Celeron N2930 can do 700Mbps+, J4125 which is much faster so I don't think it will be less than 1G
Kernel 6.6 needs manual compiling and not many people wants to (or knowing how to) do it, I am also waiting for master snapshot to move on to 6.6.
I somehow missed the N100 in the table, thanks for pointing it out.
BTW, I found some reports on Google that says J4125 can't handle 1Gbps Wireguard (though they are running OPNsense, not sure if this is relevant)