Which WiFi routers have hardware AES encryption support?

It is interesting to note that despite have a 1.35Ghz clock frequency compared to the 1.6-1.8Ghz clock frequency of the Marvel Armada benchmarks posted above by @anomeome and @cybrnook, you have the highest benchmarks for CHACHA20-POLY1305. This suggests that having ARM-v8A/AES-NI is not only very beneficial for OpenVPN, but could also benefit Wireguard.

Thank you @Dopam-IT_1987 for sharing these benchmarks!

1 Like

The last one wins, so that implies there is some more to be had here. I assume you overrode in your own build.

@anomeome What do these options mean?

No he doesn't :smiley:

openssl speed -elapsed -evp CHACHA20-POLY1305
You have chosen to measure elapsed time instead of user CPU time.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20-poly1305   108986.44k   235554.69k   327910.40k   355238.57k   364726.95k   365379.58k

sorry, couldn't resist this game of one-upmanship lol

-Os is the default and means optimise for size. I override my build with -O2 for speed but somewhat larger code generation, -O3 is greater speed but with some features that yield even larger code size.

openssl overrides defaults in the makefile witth -O3.

You know, given all this benchmarking, I was just curious to see what a 6 year old core i7-4790s would do by comparison.

Some of the AES numbers in this thread are pretty impressive for low power cpus: even better han the core i7 on big buffers, particularly the rockpi.....although with chacha the i7 kicks ass.

Quite impressive performance for low power cpu's....

openssl speed -elapsed -evp CHACHA20-POLY1305
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20-poly1305   280845.62k   523492.31k  1035778.13k  1948233.73k  2056129.19k  2092531.71k

openssl speed -elapsed -evp aes-128-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     787171.58k   854993.58k   874984.19k   883475.46k   883023.87k   884725.08k

I see. @Dopam-IT_1987 has both - O3 -Os. How does the compiler decide which one has priority over the other?

No problem. Go for it :slight_smile:

I meant highest amongst ARM core benchmarks, but you are indeed right that your x86 setup exceeds the other benchmarks.

On the other hand, looking at it from a different perspective, it is quite creditable that a dual-core 1.35Ghz ARMv8 can even be considered in the same league as as an octa-core 2.2Ghz x86 setup for Wireguard, even knowing that these are single-core benchmarks..

Exactly, to my earlier point.

BTW, great benchmark to share on the i7, especially aes-128-cbc. Just to confirm, the chacha20 benchmark on i7 above as well as the chacha20 benchmark earlier on your octa-core x86 setup were single core benchmarks, right?

yep, all single core benchmarks, so we're just comparing on a per-core basis - the actual number of cores is not really material

good evening i came see than buffered is question mark is normal ??

Capture d’écran 2021-03-29 à 02.59.30

How is this question relevant to this thread? I suggest you start a new thread with your question

Wireguard gains nothing from AES-NI. ARM-v8A also introduced "Neon", the ARM SIMD implementation, and Wireguard makes extensive use of SIMD where available to accelerate its algorithm.

https://developer.arm.com/documentation/102474/latest

1 Like

That makes sense now on why the ARMv8-A showed improved performance for Wireguard, which I was not expecting. Thanks for sharing.

Do you know if Wireguard natively takes advantage of SIMD when running on ARMv8-A without any additional compilation options, similar to how OpenSSL natively uses AES-NI when running on CPUs supporting that instruction set with no additional work required?

Some hand carved code

2 Likes

Woah. Assembly-optimized.

hello everybody this is a new test with a ubi image at this time

root@OpenWrt:~# openssl speed -elapsed -evp aes-128-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 7002290 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 2293045 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 256 size blocks: 632551 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 162890 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 20547 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 16384 size blocks: 10281 aes-128-gcm's in 3.00s
OpenSSL 1.1.1k  25 Mar 2021
built on: Sun Apr  4 09:51:25 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-gcm      37345.55k    48918.29k    53977.69k    55599.79k    56107.01k    56147.97k
root@OpenWrt:~# openssl speed -elapsed -evp AES-128-CBC
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 18884607 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 14584264 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 7414392 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2604928 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 367482 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 186079 aes-128-cbc's in 3.00s
OpenSSL 1.1.1k  25 Mar 2021
built on: Sun Apr  4 09:51:25 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc     100717.90k   311130.97k   632694.78k   889148.76k  1003470.85k  1016239.45k
root@OpenWrt:~# openssl speed -elapsed -evp AES-256-CBC
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 17599572 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 12210670 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 5358006 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1694678 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 229414 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 115320 aes-256-cbc's in 3.00s
OpenSSL 1.1.1k  25 Mar 2021
built on: Sun Apr  4 09:51:25 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      93864.38k   260494.29k   457216.51k   578450.09k   626453.16k   629800.96k
root@OpenWrt:~# openssl speed -elapsed -evp CHACHA20-POLY1305
You have chosen to measure elapsed time instead of user CPU time.
Doing chacha20-poly1305 for 3s on 16 size blocks: 6956996 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 64 size blocks: 4007395 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 256 size blocks: 2045398 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 1024 size blocks: 586258 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 8192 size blocks: 79114 chacha20-poly1305's in 3.00s
Doing chacha20-poly1305 for 3s on 16384 size blocks: 39723 chacha20-poly1305's in 3.00s
OpenSSL 1.1.1k  25 Mar 2021
built on: Sun Apr  4 09:51:25 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
chacha20-poly1305    37103.98k    85491.09k   174540.63k   200109.40k   216033.96k   216940.54k
root@OpenWrt:~#

i will test with fiber and sqm this afternoon :slight_smile:

good evening everybody

this is my new test with linksys E8450 (rt3200) and is a good news

http://www.dslreports.com/speedtest/67969201 without SQM

and now

http://www.dslreports.com/speedtest/67969397 with SQM

3 Likes

Has anyone tested what OpenVPN speeds the Linksys E8450 (aka. Belkin RT3200) can achieve? I'm wondering if it could replace my Asus RT-AC86U, which has hardware AES acceleration (but no OpenWRT support). Just did a speedtest and it reported 125Mbps while connected to my VPN provider with AES-256-GCM encyption. I'm not quite sure how to interpret the SSL benchmarks reported in this thread...

Hey guys, can anyone explain why the GCM performance is so low, and why un-accellerated chacha20-poly1305 wins with cpu's with AES support?

Am I right in thinking that CBC benchmarks won't actually show how fast OpenVPN will go, as it needs to combine them with SHA1 (or other auth) and that GCM benchmarks are the real number we will see with OpenVPN?

I have a NanoPi R2S on 21.02 with aes showing in cpuinfo

My OpenVPN seems to cap out around 120Mbps with 100% CPU load from OpenVPN on one core, so I'm trying to find out how to make it better.

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-gcm 34857.39k 45342.63k 49618.52k 51152.90k 51623.25k 51647.83k
aes-128-gcm 35404.13k 46094.57k 50245.89k 51945.81k 52860.25k 52996.78k
chacha20-poly1305 34020.34k 79896.98k 162838.19k 187422.38k 202787.50k 202544.47k
sha1 9161.62k 36114.97k 122499.50k 304377.17k 539312.13k 569573.38k
root@OpenWrt:~# cat /proc/cpuinfo
processor	: 0
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 1
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 2
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4

processor	: 3
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4
root@OpenWrt:~# cat /proc/crypto
name         : crct10dif
driver       : crct10dif-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : crc32
driver       : crc32-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : crc32c
driver       : crc32c-generic
module       : kernel
priority     : 100
refcnt       : 3
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : aes
driver       : aes-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : ecb(cipher_null)
driver       : ecb-cipher_null
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 0
max keysize  : 0
ivsize       : 0
chunksize    : 1
walksize     : 1

name         : digest_null
driver       : digest_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 0

name         : compress_null
driver       : compress_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : compression

name         : cipher_null
driver       : cipher_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 1
min keysize  : 0
max keysize  : 0

Let me ask another question. Given OpenVPN 2.5 now supports "data-ciphers CHACHA20-POLY1305", should we ever use AES if CHACHA20-POLY1305 benchmarks faster in all the ARM results people above have posted? Isn't AES acceleration broken if it's slower than software CHACHA?

With Xiaomi AX3600 i got pretty good openssl performance as well:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-128-GCM      66949.83k   202263.89k   399432.96k   563127.30k   658262.70k   666184.36k
AES-128-CBC     112155.69k   340084.50k   667803.82k   920050.35k  1032323.07k  1041956.86k
AES-256-CBC     103643.03k   280320.26k   477506.05k   596243.11k   642206.38k   645939.20k
ChaCha20-Poly1305    37526.14k    86704.64k   172491.69k   216496.13k   230266.20k   230877.87k

This should be Armv8 as well,

root@OpenWrt:~# cat /proc/cpuinfo 
processor	: 0
BogoMIPS	: 38.40
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0xd03
CPU revision	: 4
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 
1382400