Which hardware for x Mbit OpenVPN

Since the question comes up more or less frequently, which hardware to buy for a x mbit line and OpenVPN / SQM: Would it make sense to provide this information in form of a graphic or as tabular display in the wiki?

This graphic / table could help users choosing the right device for their usecase.

Quick and dirty draft (no guaranty for correctness):
grafik
grafik

2 Likes

In terms of routing performance mvebu should be (significantly, close to 1 GBit/s line speed) above ipq806x (at least for as long as ipq806x' NSS cores aren't supported in OpenWrt; that also applies equivalently to SQM), for VPN uses both ipq806x and mvebu should be quite similar.

On x86_64 hardware, I'd further split into with and without AES when you get to VPN performance.

Also, I'm guessing you'll find that you need a minimum of "a core per thing", perhaps plus one.

I like the "blob" view, or a variant of it, or using ranges of some sort, since there certainly will be people who say "well, the graph said that I could get 300 mbps with an IPQ806x and it hits 100% on a 250-mbps line"

1 Like

MT7621 is like ~30mbit and ar71xx ~25mbit IRL
I would expect IPQ4XXX to be about twice as fast.

WNDR3700 V4

Doing aes-256-cbc for 3s on 16 size blocks: 928142 aes-256-cbc's in 2.96s
Doing aes-256-cbc for 3s on 64 size blocks: 249804 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 77217 aes-256-cbc's in 2.95s
Doing aes-256-cbc for 3s on 1024 size blocks: 18419 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 2550 aes-256-cbc's in 2.97s
OpenSSL 1.0.2o  27 Mar 2018
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: mips-openwrt-linux-musl-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/mate/openwrt/staging_dir/target-mips_24kc_musl/usr/include -I/home/mate/openwrt/staging_dir/target-mips_24kc_musl/include -I/home/mate/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/usr/include -I/home/mate/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/include/fortify -I/home/mate/openwrt/staging_dir/toolchain-mips_24kc_gcc-7.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/mate/openwrt/build_dir/target-mips_24kc_musl/openssl-1.0.2o:openssl-1.0.2o -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/mate/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       5016.98k     5382.98k     6700.87k     6350.52k     7033.54k

J4105 Intel X86-64 with openwrt 18.06

root@OpenWrt:~# openssl speed -evp aes-256-cbc

Doing aes-256-cbc for 3s on 16 size blocks: 85740063 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 28905987 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 7736828 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 2011451 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 254323 aes-256-cbc's in 3.00s
OpenSSL 1.0.2p  14 Aug 2018
built on: reproducible build, date unspecified
options:bn(64,64) rc4(16x,int) des(idx,cisc,2,int) aes(partial) blowfish(idx) 

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc     457280.34k   616661.06k   660209.32k   686575.27k   694471.34k
root@OpenWrt:~#

Orange Pi Zero Plus H5 Quad-core 64-bit Cortex-A53

root@OpenWrt:~# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 16699551 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 9812427 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 3634138 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1061078 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 139464 aes-256-cbc's in 3.00s
OpenSSL 1.0.2p  14 Aug 2018
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr) 


The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      89064.27k   209331.78k   310113.11k   362181.29k   380829.70k
root@OpenWrt:~#

While I don't think that this was intended to be a thread around determining "the numbers", the idea that there is a consistent benchmark for "VPN performance" is valuable over "with XXXXX VPN provider I'm getting YY Mbps throughput". Especially with WireGuard and OpenVPN presently using very different ciphers when it comes to computational speed (and OpenVPN likely to be able to use the "faster" ciphers when OpenSSL v1.1 becomes widely available), I'd favor "encryption speed" measures over in situ measurements of VPN throughput.

ChaCha20, used by WireGuard, is available in OpenSSL v1.1.0c. At least as I read the WireGuard whitepaper, quite a bit of the gains in performance over OpenVPN come from the use of ChaCha20 rather than the AES encryption typically used by OpenVPN.

While encryption does affect speed, it's not the primary speed blocker. Test it yourself by turning off encryption.

What does affect speed is interfacing with the kernel through the TUN/TAP interface as far as I can tell. IPsec is magnitudes faster, and I reckon Wireguard is as well although I haven't tested it.

1 Like

ipq8065

root@syno1:/# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 7155616 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 2144111 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 576481 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 145743 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 18331 aes-256-cbc's in 3.00s

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      38419.41k    45741.03k    49193.05k    49746.94k    50055.85k

humbly suggest "speedtester" script/url et.al. benchmarks devices and optionally submits ( via json? ) and plots dynamically serverside ....

new device hits marked and shazam.... pudding proof!.... too many variables to advise on chipset alone..... crappy heatsink etc. etc.

with mvebu - wrt3200acm was able to go over your x86-64 benchmark

Doing aes-256-cbc for 3s on 16 size blocks: 290904 aes-256-cbc's in 0.05s
Doing aes-256-cbc for 3s on 64 size blocks: 285983 aes-256-cbc's in 0.09s
Doing aes-256-cbc for 3s on 256 size blocks: 247111 aes-256-cbc's in 0.05s
Doing aes-256-cbc for 3s on 1024 size blocks: 162460 aes-256-cbc's in 0.11s
Doing aes-256-cbc for 3s on 8192 size blocks: 33972 aes-256-cbc's in 0.02s
OpenSSL 1.0.2p  14 Aug 2018
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: ccache_cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/rmandrad/openwrt/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/usr/include -I/home/rmandrad/openwrt/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/include -I/home/rmandrad/openwrt/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-8.2.0_musl_eabi/usr/include -I/home/rmandrad/openwrt/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-8.2.0_musl_eabi/include/fortify -I/home/rmandrad/openwrt/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-8.2.0_musl_eabi/include -znow -zrelro -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -pipe -mcpu=cortex-a9 -mfpu=vfpv3-d16 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -fmacro-prefix-map=/home/rmandrad/openwrt/build_dir/target-arm_cortex-a9+vfpv3_musl_eabi/openssl-1.0.2p=openssl-1.0.2p -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -I/home/rmandrad/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      93089.28k   203365.69k  1265208.32k  1512354.91k 13914931.20k

Please add -elapsed parameter, or the results are inaccurate

here you go

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 270951 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 261191 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 228262 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 153599 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 33611 aes-256-cbc's in 3.00s

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       1445.07k     5572.07k    19478.36k    52428.46k    91780.44k

Also from a rango but with PR1547 in play which would indicate there are some extra cycles to be had:

********  Test openssl  *********
Tue Jan  8 10:09:24 MST 2019

*********************************
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='SNAPSHOT'
DISTRIB_REVISION='r9007-529c95cc15'
DISTRIB_TARGET='mvebu/cortexa9'
DISTRIB_ARCH='arm_cortex-a9_vfpv3'
DISTRIB_DESCRIPTION='OpenWrt SNAPSHOT r9007-529c95cc15'
DISTRIB_TAINTS='no-all busybox'
Linux bsaedgy 4.14.91 #0 SMP Mon Jan 7 16:13:59 2019 armv7l GNU/Linux

*********************************
(devcrypto) /dev/crypto engine
 [DES-CBC, DES-EDE3-CBC, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR, AES-128-ECB, AES-192-ECB, AES-256-ECB, MD5, SHA1, SHA224, SHA256, SHA384, SHA512]
     [ available ]
(dynamic) Dynamic engine loading support
     [ unavailable ]

*********************************

Running *--> time -v openssl speed -elapsed -evp AES-256-CBC <--*

You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 291174 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 284503 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 248323 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 162513 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 34142 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 18064 aes-256-cbc's in 3.00s
OpenSSL 1.1.1a  20 Nov 2018
built on: Thu Jan  1 00:00:01 1970 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: ccache_cc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -pipe -mcpu=cortex-a9 -mfpu=vfpv3-d16 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DZLIB -DZLIB_SHARED -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc       1552.93k     6069.40k    21190.23k    55471.10k    93230.42k    98653.53k
	Command being timed: "openssl speed -elapsed -evp AES-256-CBC"
	User time (seconds): 0.41
	System time (seconds): 4.88
	Percent of CPU this job got: 29%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 18.03s
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 13440
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 167
	Voluntary context switches: 1038976
	Involuntary context switches: 108
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@rmandra on my wrt3200 results are distinct

OpenWrt
Model	Linksys WRT3200ACM
Architecture	ARMv7 Processor rev 1 (v7l)
Firmware Version	OpenWrt SNAPSHOT r9008-ff62e83211 / LuCI Master (git-19.007.66460-4edac36)
Kernel Version	4.14.91

root@OpenWrt:~# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 7878406 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 2259065 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 590554 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 149235 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 18732 aes-256 cbc's in 2.99s
OpenSSL 1.0.2p  14 Aug 2018
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      42018.17k    48193.39k    50393.94k    50938.88k    51321.92k
  
root@OpenWrt:~# openssl speed aes-128-cbc

OpenSSL 1.0.2p  14 Aug 2018
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      59213.73k    66783.47k    69758.20k    70582.61k    70836.22k

this graphic brings more question than it solves.

i got recently a internet upgrade to 100/40mbit from my isp and because of that, i thought it was time to upgrade my old tp-link wdr3600 (ar71xx) to a zyxel nbg6617 (ipq40xx), because of the better sqm performance (atleast thats what i expected).

now the graph shows both are on the same level when it comes to sqm performance? is this realy true or do i maybe understand somerthing wrong.

I don't think the graph is too precise.

@mezo
It will be faster but probably not by much as ipq4*** isn't that fast per core but if you combine 4....

I'm thinking of taking this to the next level by adding performance indicator numbers to the dataentries, in order to be able to filter the devices easily according the user's criterias.

If your were to chose 3..5 performance indicators which should help the user search for a device suitable for his needs, which indicators would that be?

1 Like

well, in case of sqm performance, it would be great to filter devices by wan dl/upload rate.