BT Home Hub 5A: configuring protonVPN via openVPN

Hi,
I made some progresses. The supports of protonVPN told me that the servers do not accept too frequent reconnection requests (at least 2 minutes). This could be the reason for which we get many AUTH_FAILED.
Now, the script provided by protonVPN works on ubuntu and on openWRT is executed up to:

up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf

Then the update-resolv-conf script requires to run /sbin/resolvconf program. However, the openWRT does not have any program in that folder and I could not find it in any package.
I tried to comment these lines and then the router established a stable connection, but the DNS does not work as expected and I could not browser the Web.
Do you have any idea? We are close to a solution and then we can have a working guide.

P.S. the first procedure does not use file.openvpn and so does not requires resolvconf. What about the https://openwrt.ebilan.co.uk/viewtopic.php?f=7&t=279, the one that works? How does it manage resolvconf? It is very strange...

Could it be the remote DNS servers are hard coded and provided by DHCP server as mentioned in section 2.3 of the HH5A guide?

eg. this is pushed out to connected clients:
DHCP-Options 6,8.8.8.8,8.8.4.4

Hi,
I finally solved all the problems. I summarise the procedure in order to allow to other user to avoid a waste of time.

  1. You can follow the procedure posted by @bill888 here. Remember to setup the VPN for the all networks and not only for a specific one.
  2. You have to manually edit the configuration file of protonVPN server-name.ovpn and comment the following lines since openWRT does not have resolvconf program
up /etc/openvpn/update-resolv-conf
down /etc/openvpn/update-resolv-conf
  1. You have to modify DNS-DHCP server by adding your DNS entries (suggested DNS 1.1.1.1, 208.67.222.222, 208.67.220.220)
    LUCI->network->DNS and DHCP->DNS forwardings
  2. You have to disable resolv conf file
    LUCI->network->DNS and DHCP->resolv and hosts files->ignore resolve file

Then I made several benchmark with speedtest and nperf: by not using protonVPN, by using it with my PC and finally by using it with openWRT.
I noticed that after the boot and some operations performed by openWRT that required all the CPU, the router was in idle. OpenVPN reached about 90% of CPU usage during the benchmark.
The values seems to confirm that the maximum speed should be about 9 Mbit/s as suggested by @bill888. However, adding the option option engine 'dynamic' to /etc/config/openvpn does not produce any effect @mpa @drbrains .
Do you have further suggestion in order to improve connection speed?
Thank you

not using protonVPN speedtest
no%20protonVPN%20speedtest!
not using protonVPN nperf
no%20protonVPN%20nperf
using protonVPN PC speedtest
protonVPN%20PC%20speedtest
using protonVPN PC nperf
protonVPN%20PC%20nperf
using protonVPN openWRT speedtest
protonVPN%20openWRT%20speedtest
using protonVPN openWRT nperf
protonVPN%20openWRT%20nperf
CPU load after boot
htop%20after%20boot
CPU load during benchmark
htop%20load%20protonVPN

Your /proc/crypto shows you have hardware crypto available. But...in order for you to use that you need to have “cryptodev” loaded AND OpenSSL needs to be compiled with the cryptodev feature.

lsmod should show cryptodev loaded (or a message in dmesg).

You can test if OpenSSL has the cryptodev enabled with OpenVPN itself.

ssh into your router
cd /tmp
openvpn —genkey —secret key
openvpn —test-crypto —secret key —cipher aes-256-cbc —engine cryptodev

The output will do a selftest and give you all the information about OpenSSL.

It seems that cryptodev is not loaded.

lsmod | grep crypto
crypto_null             2544  1 aead
cryptomgr               2080  0
dmesg | grep crypto
[   13.040312] ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal file max-sta 128 raw 0 hwcrypto 1

openvpn test

openvpn --test-crypto --secret key --cipher aes-256-cdb --engine cryptodev
Tue Jul  3 10:45:35 2018 disabling NCP mode (--ncp-disable) because not in P2MP client or server mode
Tue Jul  3 10:45:35 2018 OpenVPN 2.4.4 mips-openwrt-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD]
Tue Jul  3 10:45:35 2018 library versions: OpenSSL 1.0.2o  27 Mar 2018, LZO 2.10
Tue Jul  3 10:45:35 2018 OpenVPN 2.4.4 mips-openwrt-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD]
Tue Jul  3 10:45:35 2018 OpenSSL: error:25066067:lib(37):func(102):reason(103)
Tue Jul  3 10:45:35 2018 OpenSSL: error:25070067:lib(37):func(112):reason(103)
Tue Jul  3 10:45:35 2018 OpenSSL: error:260B6084:lib(38):func(182):reason(132)
Tue Jul  3 10:45:35 2018 OpenSSL: error:2606A074:lib(38):func(106):reason(116)
Tue Jul  3 10:45:35 2018 OpenSSL: error:25066067:lib(37):func(102):reason(103)
Tue Jul  3 10:45:35 2018 OpenSSL: error:25070067:lib(37):func(112):reason(103)
Tue Jul  3 10:45:35 2018 OpenSSL: error:260B6084:lib(38):func(182):reason(132)
Tue Jul  3 10:45:35 2018 OpenSSL error: cannot load engine 'cryptodev'
Tue Jul  3 10:45:35 2018 Exiting due to fatal error

Is there any guide, readme to follow?
Thank you

P.S.

openssl engine
(dynamic) Dynamic engine loading support
openssl version -f
compiler: ccache_cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/staging_dir/target-mips_24kc_musl-1.1.16/usr/include -I/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/staging_dir/target-mips_24kc_musl-1.1.16/include -I/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/usr/include -I/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/include/fortify -I/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/staging_dir/toolchain-mips_24kc_gcc-5.4.0_musl-1.1.16/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/build_dir/target-mips_24kc_musl-1.1.16/openssl-1.0.2o:openssl-1.0.2o -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/build/lede-17.01/slaves/phase2/mips_24kc/build/sdk/feeds/base/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM

This is how I prepared my Home Hub 5a for an OpenSSL benchmark with hardware crypto acceleration:

# download the SDK matching the OpenWrt version installed on your router:
wget https://downloads.openwrt.org/releases/18.06.0-rc1/targets/lantiq/xrx200/openwrt-sdk-18.06.0-rc1-lantiq-xrx200_gcc-7.3.0_musl.Linux-x86_64.tar.xz
tar xf openwrt-sdk-18.06.0-rc1-lantiq-xrx200_gcc-7.3.0_musl.Linux-x86_64.tar.xz
cd openwrt-sdk-18.06.0-rc1-lantiq-xrx200_gcc-7.3.0_musl.Linux-x86_64
scripts/feeds update base packages
scripts/feeds install openssl cryptodev-linux
sed -i -e 's/^PKG_RELEASE:=1$/PKG_RELEASE:=1.1/' feeds/base/package/libs/openssl/Makefile
make menuconfig
  Global build settings -> # disable all 4 options
  Kernel modules -> Cryptographic API modules -> <M> kmod-cryptodev
  # return to top level menu: "Linux Kernel Configuration"
  Libraries -> SSL -> <M> libopenssl -> 
    [*] Crypto acceleration support
    [*] Digests acceleration support
  Utilities -> <M> openssl-util
make download
make -j5
# find build results under bin/
bin/targets/lantiq/xrx200/packages/kmod-cryptodev_4.9.109+1.9.git-2017-10-04-lantiq-1_mips_24kc.ipk
bin/packages/mips_24kc/base/libopenssl_1.0.2o-1.1_mips_24kc.ipk
bin/packages/mips_24kc/base/openssl-util_1.0.2o-1.1_mips_24kc.ipk
# copy packages to router, install with opkg

Now to the benchmark results. First with OpenSSL built-in crypto (no acceleration):

# rmmod cryptodev
# openssl engine -c -t
(dynamic) Dynamic engine loading support
     [ unavailable ]

# openssl speed md5 sha1 sha256 aes-128-cbc aes-256-cbc 
[...]
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5               1004.44k     3689.23k    11875.46k    26415.79k    40771.58k
sha1              1221.11k     4219.29k    11907.67k    21875.71k    28983.30k
aes-128 cbc       5853.58k     6478.76k     6663.00k     6718.12k     6725.63k
aes-256 cbc       4589.95k     4977.60k     5083.90k     5106.35k     5114.54k
sha256            2342.86k     5436.57k     9545.22k    11838.67k    12626.60k

Second, with hardware acceleration through cryptodev:

# modprobe cryptodev
# openssl engine -c -t
(cryptodev) BSD cryptodev engine
 [RSA, DSA, DH, DES-CBC, AES-128-CBC, AES-192-CBC, AES-256-CBC, hmacWithMD5, hmacWithSHA1, MD5, SHA1]
     [ available ]
(dynamic) Dynamic engine loading support
     [ unavailable ]

# openssl speed -engine cryptodev md5 sha1 sha256 aes-128-cbc aes-256-cbc
engine "cryptodev" set.
[...]
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                358.83k     1438.15k     5096.50k    19664.49k   163187.98k
sha1               385.79k     1484.09k     6103.97k    23343.18k   211105.93k
aes-128 cbc       5834.75k     6467.58k     6670.76k     6727.68k     6745.39k
aes-256 cbc       4592.57k     4973.93k     5084.07k     5131.64k     5134.38k
sha256            2344.61k     5447.53k     9553.07k    11782.83k    12679.79k

Bulk data transfers over VPN have a payload near 1500 bytes, so the most relevant column for this usecase should be 1024 bytes. For md5, cryptodev even brings a slowdown, and for sha1, a small speedup.
[EDIT: Please don't rely on these numbers. openssl speed should be invoked with option -elapsed as suggested by @drbrains below.]
However, aes-cbc and sha256 seem to be completely unaffected by the acceleration, perhaps it was not used at all. Maybe something about my setup is wrong, or maybe cryptodev is not fully supported by OpenSSL, I don't know.
You can also see that encryption is multiple times faster than 9 Mbit/s. The low performance of OpenVPN is likely also caused by other processing, for example by copying data from and to userspace.

Use faster hardware, or use a VPN software which encrypts/decrypts in kernel space. With IPsec on the TP-Link TD-W8980, I get a throughput of 12..15 Mbits/s without acceleration, and 27 Mbits/s with crypto acceleration enabled (all measurements with conntrack disabled). Wireguard might also have good performance, I have not tried it.

1 Like

IPSeC should use the hardware crypto by default. It doesn’t need the cryptodev. The same for DM-crypt to encrypt Storage devices attached to e.g. USB.

As for AES and OpenSSL, try with the “-evp” switch. With cryptodev loaded it should give better performance:

time -v openssl speed -elapsed -evp aes-256-cbc

On the MT7628 improved driver that Im optimizing I use a fallback to software below a blocksize of 200 bytes. (Thats about the break even point). I got the idea from the OMAP driver. That uses the same thing.

Once you have your device setup properly to use the driver with VPN you might look at the possibility to patch the driver to do the same.

One note on the cryptodev driver. The standard package with OpenWRT is from last year. It’s missing some patches from this April. Specifically a patch to have the zero-copy work properly which improves throughput.

1 Like

Thank you, this looks much better now:

# modprobe cryptodev
# openssl speed -elapsed -evp aes-128-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc       1310.57k     3403.29k    10522.97k    22138.95k    31667.54k

# openssl speed -elapsed -evp aes-256-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc       1292.61k     3334.76k    10087.17k    20084.39k    27779.07k

/proc/crypto does not show any hardware support for sha256, so it cannot be offloaded.

Here's the complete benchmark redone with -elapsed, without and with acceleration.

# rmmod cryptodev
# openssl speed -elapsed md5 sha1 sha256 aes-128-cbc aes-256-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                995.42k     3676.82k    11776.43k    26266.28k    40716.97k
sha1              1198.31k     4153.34k    11765.33k    21776.38k    28915.03k
aes-128 cbc       5840.58k     6461.93k     6663.08k     6702.76k     6733.82k
aes-256 cbc       4597.81k     4983.08k     5092.69k     5124.10k     5117.27k
sha256            2342.94k     5439.96k     9554.09k    11801.94k    12642.99k

# modprobe cryptodev
# for a in md5 sha1 sha256 aes-128-cbc aes-256-cbc; do openssl speed -elapsed -evp $a; done
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                149.26k      559.04k     2142.72k     7494.31k    26681.34k
sha1               149.08k      563.35k     2173.27k     7643.14k    28999.68k
aes-128-cbc       1305.31k     3407.77k    10531.67k    22115.67k    31612.93k
aes-256-cbc       1275.35k     3265.73k     9839.96k    19615.74k    27699.88k
sha256             540.41k     1831.89k     5089.62k     9220.78k    12050.43k
(output edited for readability)

I wonder why accelerated md5 and sha1 are so slow, since the hardware does support them.

For “light” encryption or hashing the overhead of context switching and copying buffers for user space to kernel space will actually make the hardware slower. It depends on the hardware and the driver.

At first glance it seems the hardware can’t do scatter/gather itself so the driver has to provide the block per segment. Also the way it’s queue manager is written: it’s waiting for one request to finish before it can process the next request. (Not possible to queue in hardware),

Even with the same overall performance, take into consideration CPU usage. While the encryption is offloaded your CPU can do something else. Since resources and performance is limited on the average router, every little bit can help.

On different hardware, the general improvement for OpenVPN is only between 10-15%. This says a lot about OpenVPN. Like I mentioned: If you can, try changing to IPSeC. This should show much better performance.

Thank you for the help, the procedure and the benchmark. Crypto accelerator should help a lot at least with aes-256-cbc and 256-8192 bytes interval.

You can also see that encryption is multiple times faster than 9 Mbit/s. The low performance of OpenVPN is likely also caused by other processing, for example by copying data from and to userspace.

I found the same conclusion in a previous message. Even by using only CPU without crypto accelerator the worst openSSL performance are about 4526.87k ~ 4.5 MByte/s ~ 36 Mbit/s. So 9 Mbit/s are limited by something else...

On the MT7628 improved driver that Im optimizing I use a fallback to software below a blocksize of 200 bytes. (Thats about the break even point). I got the idea from the OMAP driver. That uses the same thing.
Once you have your device setup properly to use the driver with VPN you might look at the possibility to patch the driver to do the same.
One note on the cryptodev driver. The standard package with OpenWRT is from last year. It’s missing some patches from this April. Specifically a patch to have the zero-copy work properly which improves throughput.

Does the new coming release 18.06 of openWRT include updated driver version?

Unfortunately, protonVPN does not provide IPsec and this protocol is often blocked by the firewall.
TD-W8980 has the same SoC Lantiq XWAY VRX268 as BT Home Hub 5A. I read about wireguard, but at the moment is not well supported.

Even with the same overall performance, take into consideration CPU usage. While the encryption is offloaded your CPU can do something else. Since resources and performance is limited on the average router, every little bit can help.

You are right and this is the reason for which I'm spending a lot of time on this.

4526.87k is actually 4.5 mbps not MB/s so 9mbps looks like CPU limitation

I upgraded to 18.06-rc1 and I repeated the benchmark in order to check the differences with the previous version 17.04.1.

openssl speed -elapsed md5 sha256 sha512 des-ede3 aes-192-cbc aes-256-cbc rsa2048 dsa2048
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                396.41k     1770.35k     3212.71k     9018.37k    15018.67k
des ede3          1223.26k     1394.79k     1397.33k     1404.59k     1401.62k
aes-192 cbc       5197.93k     5690.41k     5824.94k     5859.33k     5849.09k
aes-256 cbc       4642.82k     5023.49k     5124.18k     5153.79k     5147.31k
sha256             845.22k     2547.22k     3001.75k     5256.42k     3140.27k
sha512             143.48k      623.78k      842.84k     1657.45k     1723.05

Then I followed the procedure of @mpa (I had to install also kmod-crypto-authenc_4.9.109-1_mips_24kc) and I repeated the benchmarks.

rmmod cryptodev
openssl speed -elapsed md5 sha256 sha512 des-ede3 aes-192-cbc aes-256-cbc rsa2048 dsa2048
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                993.93k     3667.86k    11772.25k    26215.08k    40949.08k
des ede3          1380.57k     1399.94k     1401.51k     1394.01k     1355.35k
aes-192 cbc       5157.39k     5663.87k     5775.96k     5835.09k     5860.01k
aes-256 cbc       4627.19k     5013.55k     5127.08k     5132.29k     5158.23k
sha256            2350.09k     5415.74k     9545.39k    11777.37k    12670.29k
sha512             495.86k     1976.55k     2748.25k     3709.27k     4134.23

and

modprobe cryptodev
for a in md5 sha256 sha512 des-ede3 aes-192-cbc aes-256-cbc; do openssl speed -elapsed -evp $a; done
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5                158.66k      596.12k     2288.81k     7885.48k    27560.62k
des-ede3          1349.11k     1375.53k     1383.34k     1382.40k     1381.72k
aes-192-cbc       1334.17k     3452.44k    10609.83k    22038.19k    31812.27k
aes-256-cbc       1325.06k     3382.61k    10154.84k    20216.83k    28292.44k
sha256             547.17k     1854.81k     5163.35k     9316.35k    12119.26k
sha512             287.68k     1179.24k     2206.29k     3408.21k     4033.40k

As expected improvements come for data size equal or above 256 bytes and for supported algorithm (aes-192-cbc and aes-256-cbc).

4526.87k is actually 4.5 mbps not MB/s so 9mbps looks like CPU limitation

No the values are in bytes, you can read here.

didn;t know that but in practice i've noticed openvpn speeds to be around the values of openssl speed result but in mbps

fwiw, I asked one of the devs, mkresin, to take a quick look at this interesting thread. He asked me to post this response:

First of all, cryptodev is a 3rd party kernel module which wasn't accepted by the kernel devs. Instead the Crypto API was added with a 2.6-ish linux kernel [0]. My opinion is the corresponding kernel modules are already packaged for OpenWrt [1].

Support for the Crypto API was added to OpenSSL 1.1.0 [2].

I have no idea whether or not OpenSSL is compiled with Crypto API support by default. Perhaps further special Kernel options need to be selected to enable the base Crypto API support. I don’t know whether the Lantiq DEU (Data Encryption Unit) driver supports the Crypto API. In best case scenario, it might be a matter of loading the correct kernel modules to get hardware accelerated cryptography working.

In my opinion using the cryptodev approach + the 3rd party module is the wrong way. I can only suggest perhaps someone picks this up as a task to have a look at Crypto API based acceleration.

Mathias

[0] https://en.wikipedia.org/wiki/Crypto_API_(Linux)
[1] https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/kernel/linux/modules/crypto.mk

[2] https://github.com/openssl/openssl/commit/7f458a48ff3a231d5841466525d2aacbcd4f6b77

1 Like

The hardware driver natively uses the Linux crypto API. Via the AF_ALG socket it will work from userspace. OpenSSL didn’t officially support it until 1.1.? OpenWRT is now getting the updated OpenSSL version as soon as all the patches are reviewed/added to Master.

Cryptodev is a third party module, but is for now the only way to get user space apps to use the hardware driver together with OpenSSL. The (also third party) AF_ALG engine for the older OpenSSL versions were never officially intergrated. This happened as said before with the 1.1 version.

I never got the separate OpenSSL engine to work with OpenSSL (didn’t try very hard). But according to the older benchmarks done by the cryptodev people, the BSD approach to the /dev/crypto was a lot faster compared to the AF_ALG socket.

As soon as OpenSSL 1.1 is officially in OpenWRT both approaches should work and switching between the two options is just a matter of changing engine. The AF_ALG engine in OpenSSL is a little more flexible in terms of selecting which encryption method or hash is using the engine.

I made some searches on the Web. According to this, cryptodev has better performance than AF_ALG API. However, according to more recent sources here 2014, here 2017 and here, software implementation outperforms both cryptodev and AF_ALG especially for small data size (TCP/UDP/IP about 64kBytes, ethernet between 1.5kBytes for standard frame and 9kBytes for jumbo frame). Moreover, according to latest benchmark here the difference between cryptodev and AF_ALG is not so high as showed here.

In my opinion using the cryptodev approach + the 3rd party module is the wrong way. I can only suggest perhaps someone picks this up as a task to have a look at Crypto API based acceleration.

Since AF_ALG is into the kernel and since OpenSSL 1.1 supports it, I agree that is the right way to go.

As soon as OpenSSL 1.1 is officially in OpenWRT both approaches should work and switching between the two options is just a matter of changing engine. The AF_ALG engine in OpenSSL is a little more flexible in terms of selecting which encryption method or hash is using the engine.

I hope that OpenSSL 1.1 package will come soon after openWRT 18.06 release. I read your [thread](Status of OpenSSL 1.1 Lede/OpenWrt? kmod-crypto-test).
Finally, the performance of various OpenSSL on my PC (i7- i7-3537U) are reported below.

openssl speed -elapsed md5 sha256 sha512 des-ede3 aes-192-cbc aes-256-cbc
OpenSSL 1.0.2g  1 Mar 2016
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              53154.38k   153379.90k   340930.30k   496694.95k   586377.90k
des ede3         23009.45k    23092.95k    23529.30k    23612.07k    23811.41k
aes-192 cbc      93006.03k    98322.52k    98695.85k   102235.14k   100177.24k
aes-256 cbc      80941.49k    83786.88k    83017.22k    77857.45k    82927.62k
sha256           55436.81k   124654.14k   219929.43k   266457.43k   277113.51k
sha512           38305.91k   155056.85k   249469.61k   344811.52k   395487.91
OpenSSL 1.0.2o  27 Mar 2018
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
md5              36815.35k   115230.29k   304914.77k   481625.43k   585973.76k
des ede3         22890.91k    23685.12k    23256.23k    23674.88k    23568.38k
aes-192 cbc      88807.08k    95228.42k    99297.11k    95685.97k    98899.29k
aes-256 cbc      79682.97k    85615.96k    85707.69k    86874.79k    86876.16k
sha256           59590.05k   132226.45k   227302.66k   277670.23k   293915.31k
sha512           41993.43k   165328.75k   238422.27k   356699.48k   410555.73k
OpenSSL 1.1.0h  27 Mar 2018
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5             106596.49k   246759.36k   441293.65k   554538.33k   596413.10k   592942.42k
des ede3         22817.63k    23257.19k    22951.51k    23186.09k    23358.12k    23358.12k
aes-192 cbc      89113.88k    91224.68k    99162.88k   100212.05k    93547.18k    90030.08k
aes-256 cbc      77074.08k    80776.85k    84904.96k    82820.44k    87135.57k    87610.71k
sha256           57051.51k   130352.60k   226432.43k   276851.71k   291703.47k   297451.52k
sha512           39645.48k   160842.07k   248906.67k   362397.35k   416626.01k   414302.21k

Excluding md5 cipher, there is not great difference between latest OpenSSL 1.0.2o and OpenSSL 1.1.0h. So I do not expect any performance improvement on 18.06 and OpenSSL 1.1.0h. However, only benchmark can tell the truth.

Single thread


Multi thread

First an Intel i7 comes with the AES-NI. I wouldn’t call that a “software “ solution, but a hardware solution build-in. Since it’s part of the instruction set of the processor it can be used from user space without any restrictions. No need for expensive context switching. OpenSSL has some optimization for this and OpenVPN will use it via the EVP API.

Second, comparing an i7 with a the lantiq SoC is not realistic. It will just show that for testing purposes between the router and the i7, the bottleneck should be on the SoC side, so that’s the side which should be improved to get better overall performance.

Third, to do a 128 thread benchmark is nice on paper, but OpenVPN is a single thread implementation. It would be nice if the OpenVPN people would update/upgrade their code, but until then, the benchmark is “meaningless”.

I do hope as well that we get OpenSSL 1.1 soon. This means we can do without the cryptodev module. Less steps to do should still help a little. But to have the AF_ALG solution do a better job it should be combined with splice to get a zero copy implementation. From some benchmarks on the MT7628 I noticed 40-50% performance decrease on bigger blocks just because of the copy action.

First an Intel i7 comes with the AES-NI. I wouldn’t call that a “software “ solution, but a hardware solution build-in. Since it’s part of the instruction set of the processor it can be used from user space without any restrictions. No need for expensive context switching. OpenSSL has some optimization for this and OpenVPN will use it via the EVP API.

Of course, but for small data size 16-1024 bytes, software solution are always better than hardware solutions. For the other there is not a great difference. The exception is for aes algorithm that exploits AES-NI instruction set.

lsmod | grep crypto
crypto_simd            16384  1 aesni_intel
cryptd                 24576  3 crypto_simd,ghash_clmulni_intel,aesni_intel
openssl speed -elapsed md5 sha256 sha512 des-ede3 aes-192-cbc aes-256-cbc
OpenSSL 1.1.0h  27 Mar 2018
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5             105415.78k   247264.75k   431623.94k   551934.98k   591801.00k   597240.49k
des ede3         22963.75k    23107.31k    23262.38k    22519.13k    23052.29k    22500.69k
aes-192 cbc      86319.39k    95674.18k    97212.42k   101870.25k   102765.91k   101908.48k
aes-256 cbc      75856.45k    86094.14k    87158.95k    87806.63k    87812.78k    87435.95k
sha256           59242.62k   132046.87k   223446.95k   263048.53k   289035.61k   298467.33k
sha512           39790.47k   161326.63k   233818.71k   348111.87k   415061.33k   419790.85k

for a in md5 sha256 sha512 des-ede3 aes-192-cbc aes-256-cbc; do openssl speed -elapsed -evp $a; done
OpenSSL 1.1.0h  27 Mar 2018
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
md5              54150.99k   159543.49k   357452.54k   513594.71k   590310.06k   596940.12k
des-ede3         22604.08k    22867.03k    22960.81k    22320.47k    22544.38k    21653.01k
aes-192-cbc     435918.46k   480810.82k   499365.55k   500524.71k   502666.58k   502475.43k
aes-256-cbc     378546.90k   419011.65k   428491.86k   429825.71k   430929.24k   428184.92k
sha256           37964.67k   100282.84k   200767.91k   256311.64k   292585.47k   296452.10k
sha512           24929.65k   101383.83k   205816.49k   332540.93k   409840.30k   416956.42k

Second, comparing an i7 with a the lantiq SoC is not realistic. It will just show that for testing purposes between the router and the i7, the bottleneck should be on the SoC side, so that’s the side which should be improved to get better overall performance.

Sure, for this reason I added the benchmark in the previous message and I said that only benchmark can tell the truth. Even if I do not expect a great difference.

Third, to do a 128 thread benchmark is nice on paper, but OpenVPN is a single thread implementation. It would be nice if the OpenVPN people would update/upgrade their code, but until then, the benchmark is “meaningless”.

If you look the benchmarks, even the single thread version, shows that software implementation is better that hardware one and that there is not a great difference between cryptodev and AF_ALG. As I said before, different hardware can achieve different result and we need real benchmark.

I do hope as well that we get OpenSSL 1.1 soon. This means we can do without the cryptodev module. Less steps to do should still help a little. But to have the AF_ALG solution do a better job it should be combined with splice to get a zero copy implementation. From some benchmarks on the MT7628 I noticed 40-50% performance decrease on bigger blocks just because of the copy action.

This will be very interesting. Keep us updated if you have any news.

What about mbedtls? It should have AES-NI support since it was still called PolarSSL. What would it take to make it AES-NI aware, because clearly at the moment it isn't, at least not by default.