Wireguard performance numbers on WRT1900ac and Xeon server

So I've been trying out Davidc502's build on my WRT1900AC. The router was collecting dust for a while since the mwlwifi driver was a bit unstable and the router would periodically reboot on the account of the 4.9 kernel issues. It looks like with the release of 18.06.1 and the latest mwlwifi driver, all issues have been resolved.

I'm now playing with Wireguard as it has some promising performance numbers compared to OpenVPN. It appears the limits of the WRT1900AC are about ~ 300Mbits/sec over a Wireguard link.

I'm sure with a WRT3200ACM or perhaps the R7800 with a better CPU I can reach near Gbit speeds. Has anyone else tried testing Wireguard on either of these two routers?

Here are some results. The VMs are running on a DL380-G7 w/ Xeon L5640 and vSphere 6.7

From Linux VM to WRT1900AC (via 1Gb network) (no Wireguard tunnel)

root@ubuntu:~# iperf3 -c 192.168.1.1
Connecting to host 192.168.1.1, port 5201
[  4] local 192.168.1.94 port 38290 connected to 192.168.1.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   109 MBytes   918 Mbits/sec   75    232 KBytes
[  4]   1.00-2.00   sec   109 MBytes   913 Mbits/sec   42    158 KBytes
[  4]   2.00-3.00   sec   110 MBytes   919 Mbits/sec   81    267 KBytes
[  4]   3.00-4.00   sec   108 MBytes   907 Mbits/sec   73    170 KBytes
[  4]   4.00-5.00   sec   107 MBytes   899 Mbits/sec   28    211 KBytes
[  4]   5.00-6.00   sec  97.1 MBytes   814 Mbits/sec   26    334 KBytes

From Linux VM to WRT1900AC (via 1Gb network) (via Wireguard tunnel)

root@ubuntu:~# iperf3 -c 10.20.40.1 -t 10
Connecting to host 10.20.40.1, port 5201
[  4] local 10.20.40.4 port 56268 connected to 10.20.40.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  35.9 MBytes   301 Mbits/sec   44    198 KBytes
[  4]   1.00-2.00   sec  37.5 MBytes   315 Mbits/sec    4    218 KBytes
[  4]   2.00-3.00   sec  35.9 MBytes   301 Mbits/sec    4    248 KBytes
[  4]   3.00-4.00   sec  36.9 MBytes   309 Mbits/sec   16    194 KBytes
[  4]   4.00-5.00   sec  35.9 MBytes   301 Mbits/sec    8    222 KBytes
[  4]   5.00-6.00   sec  36.7 MBytes   308 Mbits/sec   19    231 KBytes
[  4]   6.00-7.00   sec  36.9 MBytes   310 Mbits/sec    3    250 KBytes
[  4]   7.00-8.00   sec  37.0 MBytes   311 Mbits/sec   15    259 KBytes

With the second test, the CPUs on the WRT1900ac are pegged at 100%

Next, I thought I'd try between two VMs on the DL360 over a 10Gb vswitch (VMNET3)

The first VM is running Ubuntu, the second one is OpenWRT x86_64
First, direct link over the virtual network.

root@ubuntu:~# iperf3 -c 192.168.1.254
Connecting to host 192.168.1.254, port 5201
[  4] local 192.168.1.94 port 49170 connected to 192.168.1.254 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.31 GBytes  11.2 Gbits/sec  168    518 KBytes
[  4]   1.00-2.00   sec   944 MBytes  7.92 Gbits/sec    0    836 KBytes
[  4]   2.00-3.00   sec  1.15 GBytes  9.89 Gbits/sec   68    851 KBytes
[  4]   3.00-4.00   sec  1.01 GBytes  8.66 Gbits/sec    0    851 KBytes
[  4]   4.00-5.00   sec  1.15 GBytes  9.85 Gbits/sec  598    798 KBytes
[  4]   5.00-6.00   sec  1019 MBytes  8.54 Gbits/sec    0    884 KBytes
[  4]   6.00-7.00   sec  1.18 GBytes  10.1 Gbits/sec  118    846 KBytes
[  4]   7.00-8.00   sec  1.01 GBytes  8.67 Gbits/sec  378    625 KBytes

Finally, the same two VMs over the Wireguard tunnel via the virtual network

root@ubuntu:~# iperf3 -c 10.20.40.1
Connecting to host 10.20.40.1, port 5201
[  4] local 10.20.40.4 port 56280 connected to 10.20.40.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   190 MBytes  1.59 Gbits/sec  786    350 KBytes
[  4]   1.00-2.00   sec   188 MBytes  1.58 Gbits/sec    0    625 KBytes
[  4]   2.00-3.00   sec   217 MBytes  1.82 Gbits/sec   48    506 KBytes
[  4]   3.00-4.00   sec   222 MBytes  1.86 Gbits/sec    1    612 KBytes
[  4]   4.00-5.00   sec   225 MBytes  1.89 Gbits/sec   64    492 KBytes
[  4]   5.00-6.00   sec   221 MBytes  1.85 Gbits/sec    1    625 KBytes
[  4]   6.00-7.00   sec   228 MBytes  1.92 Gbits/sec    0    848 KBytes
[  4]   7.00-8.00   sec   229 MBytes  1.92 Gbits/sec  103    824 KBytes
[  4]   8.00-9.00   sec   230 MBytes  1.93 Gbits/sec  163    768 KBytes
[  4]   9.00-10.00  sec   212 MBytes  1.78 Gbits/sec   78    387 KBytes

This last test pegged the two vCPUs assigned to the OpenWRT VM running WireGuard. I suppose I could double the vCPUs and see if this improves performance.

1 Like

Update

So I raised the number of vCPUs to 6 on the OpenWRT VM. This did not seem to make a difference as the CPUs were hovering about 50% each. Could there be a limit within the WireGuard or kernel stack?

root@ubuntu:~# iperf3 -c 10.20.40.1 -t 10000
Connecting to host 10.20.40.1, port 5201
[  4] local 10.20.40.4 port 56300 connected to 10.20.40.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   202 MBytes  1.70 Gbits/sec  262    592 KBytes
[  4]   1.00-2.00   sec   211 MBytes  1.77 Gbits/sec  136    436 KBytes
[  4]   2.00-3.00   sec   207 MBytes  1.74 Gbits/sec   75    534 KBytes
[  4]   3.00-4.00   sec   206 MBytes  1.72 Gbits/sec  197    546 KBytes
[  4]   4.00-5.00   sec   237 MBytes  1.99 Gbits/sec   28    601 KBytes
[  4]   5.00-6.00   sec   237 MBytes  1.99 Gbits/sec   36    639 KBytes
[  4]   6.00-7.00   sec   248 MBytes  2.08 Gbits/sec  339    525 KBytes
[  4]   7.00-8.00   sec   226 MBytes  1.90 Gbits/sec  272    476 KBytes
[  4]   8.00-9.00   sec   240 MBytes  2.01 Gbits/sec   28    639 KBytes
[  4]   9.00-10.00  sec   215 MBytes  1.80 Gbits/sec    0    847 KBytes
[  4]  10.00-11.00  sec   210 MBytes  1.76 Gbits/sec   30    782 KBytes
[  4]  11.00-12.00  sec   211 MBytes  1.77 Gbits/sec  225    494 KBytes
[  4]  12.00-13.00  sec   211 MBytes  1.77 Gbits/sec   48    478 KBytes
[  4]  13.00-14.00  sec   208 MBytes  1.75 Gbits/sec  255    534 KBytes

What is your throughput with two desktop OSes in a similar configuration? That would help to determine if something was different about OpenWrt.

Perhaps of interest is that Wireguard's gigabit-rate benchmarks were performed with Intel Core i7 processors, which also have AES support.

If I were promoting something like Wireguard and there was a clear "good enough" point (1 Gbps throughput), I'd want to show how that can be achieved with commodity-grade clients in my public-facing documentation, if it were achievable. Saying that "Product X goes as fast as you need on an m3" (or i3) would be much more compelling than quoting the results on a $500 CPU.

I’m going to assume this is a v1 unit.

The v1 has no NEON support, which WireGuard has support for. I’m guessing it’s falling back to the C implementations for the crypto.

In any case, getting NEON on any Cortex A9 device in OpenWrt currently requires you to compile your own builds. Note that every device except the WRT1900ACv1 supports NEON.

Also note that as far as I know, there hasn’t been much work on getting WireGuard very well integrated with the kernel’s networking subsystem to take advantage of stuff like GRO and GSO.

Speed will improve over time. I don’t know if it’s possible to take advantage of VFP to speed up anything. Maybe poly1305 since it uses floating point.

1 Like

In addition to the above, if you have a neon capable cpu you need to manually change the profile also setting -O2 instead of -Os might help performance a bit.

1 Like

Thanks for the replies. The WRT1900ac is a v1 (mamba series). My reason for the testing is that I will be upgrading from my Internet from 180/20 to Gigabit in the next few months. At the end of the day I'm trying to determine if I can get away with purchasing a newer SOC based OpenWRT compatible router that can handle near Gigabit speed with Wireguard or look at a low powered X86 solution. I know the latter is almost mandatory for better OpenVPN performance.

Is their a tutorial somewhere explaining how to compile with NEON support? I have my own build butr don't see any obvious options anywhere

Edit target/mvebu/cortexa9/...something. look for vfp3 and replace that with neon.

Or set options in menuconfig.for example:

CONFIG_EXTRA_OPTIMIZATION="-O2 -fno-caller-saves -fno-plt -mfpu=neon"
CONFIG_TARGET_OPTIMIZATION="-O2 -pipe -mcpu=cortex-a9 -mfpu=neon"

and there is a bit more to be had with the individual make files on packages.


:slight_smile:

1 Like

It looks like neon support was added month ago so no need to do anything?

Can you be a bit more specific?

If building from TRUNK it seems from above Neon support has been already added and enabled for WRT1900ACS.

Opps, I misread...

No it isn't, https://github.com/openwrt/openwrt/blob/master/target/linux/mvebu/cortexa9/target.mk#L13

So there no easy way to enable this currently.

Three ways above, none are that difficult.

1 Like

You still need to recompile the rest of packages if you want to enable neon as the default CPU flags disables usage of NEON instructions. Do note that everything doesn't use NEON however.

What about " to be had with the individual make files on packages."

How to tackle that or is it not required or does that mean some packages will stop working with Neon?

Thanks btw dizzy for the intel

No packages will stop working, it's just "convenience" I guess...

If you want to pick up the discussion again here's the old PR, https://github.com/lede-project/source/pull/1211

Going to depend on what you include in your build, what the make file currently supports (i.e. does it support NEON OOTB, or do you need to un-comment a line). Mostly AV type transcode packages, just requires some investigation.

1 Like