Wireguard performance

FYI - I have created https://openwrt.org/inbox/wireguard_performance (similar to https://openwrt.org/inbox/openvpn_performance).

Feel free to add more data, it's easy.

4 Likes

Cool!
Question: Up/Down, does this denote bidirectional saturating traffic, or the "uni-directiona"l maximum for each direction? With "uni-directional" I accept that there is required reverse traffic (ACKs and such) but that there is no considerable data traffic in the reverse direction.

1 Like

Despite a fair amount of knowledge of networking, I was never able to set up WireGuard. The interface gets RX but no actual traffic happens.

Same as in the OpenVPN performance page: unidirectional

1 Like

Since WireGuard can reach the limits of GigE with an x86_64/AMD64 and AES-NI, from my WIP notes on testing:

Technology Limits

Packets vs. Bits

Commercial routers are generally rated in terms of packets per second (PPS), rather than bits per second of throughput. This is probably a better metric when one considers that the majority of the processing load is related to understanding the headers and making decisions about them, rather than the managing payload itself. The payload is generally copied to a buffer from the interface and later written out to another interface, without modification (with minor exceptions for the headers and checksums, especially when NAT is in play).

As many home users think of bandwidth and not switching speed, and that their packets are often large, or at the MTU of their upstream link, these tests will use bandwidth as a measure.

Ethernet interfaces have their default, 1500 MTU.

GigE Throughput

A GigE link can't provide a throughput of 1,000 Mbps using TCP or UDP. There are headers, inter-packet gaps, and other overhead at the various layers that limit throughput. For typical IPv4 links, 940-950 Mbps is the highest achievable throughput for GigE without using "jumbo frames". See, for example, http://rickardnobel.se/actual-throughput-on-gigabit-ethernet/

For the purposes of this discussion, the rough numbers of

  • TCP -- 940 Mbps throughput limit
  • UDP -- 950 Mbps throughput limit

will be used.

IPv6 vs. IPv4 and Other Effects

In many cases, IPv6 has slightly larger headers than does IPv4. An IPv6 link will have slightly lower throughput than the IPv4 links tested here.

VLAN tagging, QinQ, or the like often add a few bytes to the on-wire packet. These impacts are on the order of a percent and are not examined in this study. For example, an 802.1Q (VLAN) tag adds 4 bytes to the over 1500 bytes of a "full" Ethernet frame, a fraction of a percent.

This isn't a "scholarly research paper", but more intended to provide general guidance. If you're within, say, 10% of the limits, you're probably too close for robust operation.

WireGuard

WireGuard has its own set of encapsulation, which typically reduces the achievable bandwidth further.

WireGuard sets the interface MTU to 1420. This reduces the throughput by a factor of roughly 1420/1500 ~ 94% (ignoring fragmentation overhead)

  • WireGuard -- 900 Mbps throughput limit
3 Likes

I've written a quick n' dirty tutorial here which might help

1 Like

Great summary. The actual limits are relatively easy to calculate though:

IPv4, Ethernet, TCP/IPv4 goodput:
1000 * ((1500-20-20)/(1500+38)) = 949.28 Mbps
IPv6, Ethernet, TCP/IPv4 goodput:
1000 * ((1500-40-20)/(1500+38)) = 936.28 Mbps
That is a reduction by ~1.3 percentage-points. Any additional "games" like VLAN tags or RFC 1323 timestamps will result in lower throughput.
e.g. for IPv4
1000 * ((1500-20-20-12)/(1500+38+4)) = 939.04 Mbps
and IPv6
1000 * ((1500-40-20-12)/(1500+38+4)) = 926.07 Mbps

Any VPN will essentially add one more layer of TP/IP headers as well as its own headers...
With payload MTU 1420 on a MTU1500 carrier, I would expect
1000 * ((1420-20-20)/(1500+38)) = 897.27 Mbps for IPv4
1000 * ((1420-40-20)/(1500+38)) = 884.27 Mbps for IPv6
as best-case payload goodput through the VPN tunnel.
That essentially confirms your numbers (but also shows how to calculate them)

Why 1500 + 38? Well, that is simply the sum of:
7 byte preamble
1 byte start of frame delimiter
6 bytes destination MAC address
6 byte source MAC address
2 byte Ethertype header
4 byte frame check sequence
12 byte equivalent inter frame gap
7+1+6+6+2+4+12 = 38 (see e.g. https://en.wikipedia.org/wiki/Ethernet_frame)

1 Like

gl-inet b1300(ipq4029), as receiver, Destktop computer as sender, could reach 320Mbits.
b1300 as receiver, iPhone8 as sender, could reach 220Mbits.

From the result table from Mikrotik
80kpps is sufficient to reach Gigabit speeds on 1500MTU

Ethernet test results
RB750GL AR7242 1G all port test
Mode Configuration 1518 byte 512 byte 64 byte
kpps Mbps kpps Mbps kpps Mbps
Bridging none (fast path) 81.2 986.1 178.4 730.9 194 99.3
Bridging 25 bridge filter rules 51.5 625.5 52.3 214.3 53.7 27.5
Routing none (fast path) 81.2 986.1 167 684 183.7 94.1
Routing 25 simple queues 81.2 986.1 88.5 362.4 92.8 47.5
Routing 25 ip filter rules 37.6 457 38.4 157.1 37.5 19.2

The graph https://openwrt.org/inbox/wireguard_performance is filled quite sparsely...
Does someone have data for the missing targets?

Feel free to add more data, it's easy.

I have a few wg tunnels to a few different providers, how can I measure the max throughput?

With iperf3.

This is something of a challenge with WireGuard as its performance on even MIPS-based processors can exceed a user's ISP bandwidth or hit caps that the VPN provider may have in place. "Internet weather" also makes repeatable measurements difficult when using a remote peer/server. As one example, a QCA9563 at 775 MHz, with a single NIC, is quoted by a manufacturer that supplies OEM firmware very closely based on OpenWrt 18.06 at 68 Mbps (assumed to be one way). My measurements on an IPQ4019 are around 280 Mbps (iperf3, one way)

I've been measuring with two mid-/upper-range desktops, Intel i3-7100T and AMD Ryzen 5 2600X. These two can handle 900 Mbps in one direction over WireGuard (two-way not yet tested). I'm planning on using flent's RRUL test as well.

Edit: For "amusement" I just ran the RRUL (bidirectional, multi-stream, multi-protocol) test with the EA8300:

Started Flent 1.3.0 using Python 3.7.3.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2019-08-23T093152.509288.flent.gz.

Summary of rrul test run from 2019-08-23 16:31:52.509288

                             avg       median          # data pts
 Ping (ms) ICMP   :        27.24        30.40 ms              349
 Ping (ms) UDP BE :         2.07         0.37 ms               27
 Ping (ms) UDP BK :         2.11         0.37 ms               27
 Ping (ms) UDP EF :         2.13         0.37 ms               27
 Ping (ms) avg    :        27.32        30.39 ms              350
 TCP download BE  :        66.52        67.13 Mbits/s         298
 TCP download BK  :        52.13        51.24 Mbits/s         296
 TCP download CS5 :        56.14        46.42 Mbits/s         298
 TCP download EF  :        70.29        70.43 Mbits/s         297
 TCP download avg :        61.12        60.86 Mbits/s         301
 TCP download sum :       244.25       243.43 Mbits/s         301
 TCP totals       :       462.96       464.84 Mbits/s         301
 TCP upload BE    :        85.84        76.22 Mbits/s         225
 TCP upload BK    :        51.40        48.45 Mbits/s         261
 TCP upload CS5   :        43.19        44.83 Mbits/s         258
 TCP upload EF    :        42.56        41.62 Mbits/s         255
 TCP upload avg   :        54.86        55.18 Mbits/s         301
 TCP upload sum   :       218.71       220.42 Mbits/s         301

so an IPQ4019 can push WireGuard at some significant rates, over 200 Mbps in both directions simultaneously.

1 Like

You don't necessarily need to make it overly complicated?
Setup a link between a reasonably fast computer (i3 or such) and connect it to WAN on the router.
Connect a client to one of the LAN-ports on the router and run iperf3 between the clients which would give you a "good enough" result.

That’s basically what I’ve got. The client doesn’t need to be able to do more than run flent at rates able to saturate the link, but the upstream server needs to be powerful enough not to be the limit on VPN throughput. Those two machines happen to be two I had available with a Linux-based distro installed. Testing the J4105 is likely going to push the VPN server pretty hard. (OpenVPN over a wire is capped around 500 Mbps with those two desktops.)