There are some benchmarks in this thread
Those benchmarks are for the C2000 SoCs. I updated the code for the C3000 SoCs, which perform better than the benchmarks in that thread
There are some benchmarks in this thread
Those benchmarks are for the C2000 SoCs. I updated the code for the C3000 SoCs, which perform better than the benchmarks in that thread
Thanks a ton for sharing your detailed insights as well as the rough benchmarks above! You have certainly enlightened me today. Very kind of you. Appreciate it.
Of course, please ignore. I was just hoping to get whatever you had off the top of your mind, which you have already done above.
Thanks for confirming it.
Do you think you might have any insight on how to measure these benchmarks of on-core AES-NI performance impact on openssl? If you'll see in the thread above, I am struggling a bit to do so, since the benchmarks with AES-NI support seem to be poorer than those without AES-NI. It feels to me that either we have a measurement issue, or AES is somehow not being invoked. It is not clear how to figure it out.
openssl -elapsed -evp aes-128-cbc-hmac-sha1
Or with AES-NI enabled
openssl speed -elapsed -evp aes-128-cbc
With AES-NI disabled
OPENSSL_ia32cap=”~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc
The priority of 10000 certainly seems to confirm that hardware AES is being invoked, but the module of qca_nss_cfi_cryptoapi seems to suggest that it is not the AES-NI but rather the on-silicon crypto engine which is being used. Based on what @dl12345 has shared above, for small buffers, performance of crypto engine is not good, so that could explain why your benchmark is lower than the one I posted for ipq8065.
Is there any way for you to disable nss (perhaps rename the nss driver?), check if that changes /proc/crypto priority and module, and then re-run the openssl benchmark?
You also have to keep in mind that ipq8065 ~= KRAIT300 ~= cortex a15 <-- out of order execution, while cortex a53 is in-order.
No, I can not do a whole lot on the OEM firmware. Only /etc/
is writable, /
is not and I'm not that deep into NSS or openssl, this device is mostly useless in its current state without official OpenWrt support.
Understood.
Let's see if @jiegec has better luck on getting the above information for the E8450.
Thanks for sharing that. It can be useful to @jiegec for comparing benchmarks with and without AES-NI, assuming he finds that his aes priority in /proc/crypto is greater than 100. In his case, he does not have the complication @slh is facing of the nss engine potentially overriding AES-NI.
On Linksys E8450:
cat /proc/crypto says:
name : aes
driver : aes-generic
module : kernel
priority : 100
refcnt : 4
selftest : passed
internal : no
type : cipher
blocksize : 16
min keysize : 16
max keysize : 32
root@OpenWrt:~# openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 18251273 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 64 size blocks: 14115514 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 7278493 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2547200 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 363367 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 183837 aes-128-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 97665.67k 301130.97k 621098.07k 869444.27k 992234.15k 1003995.14k
Thanks for sharing. I guess might explain the low benchmark. The priority being 100 suggests that AES-NI is not being used. Just to confirm that there are no other aes entries in that list, can you post the full output of /proc/crypt, similar to what @slh posted for his AX3200 above?
That looks like gigantic bump for aes-128! I am not even sure how to interpret this. Can you run it again and then immediately run the same command just replacing aes-128 with aes-256? Thanks.
I've done some benchmarks for you with RSA and ECDH. While it's not AES, it does illustrate what I am talking about, which is to say, the vast gulf between synchronous and asynchronous operation of the crypto hardware.
And the killer is that the application has to be specifically coded to take advantage of the asynchronous mode. It's not transparent and not all workloads map well to it.
Note also how on ECDH, the pure software implementation is faster than the synchronous one where smaller buffers are concerned. It only gets to parity on the synchronous vs software at 571 bits. The asynchronous version on the other hand, is fully 18x faster on 571 bits and 11x faster on 160 bits
With RSA, the sign operations are the most expensive and even synchronous mode beats software, although for verify operations, software beats synchronous mode.
The poorer performance of synchronous mode has everything to do with the comparative inefficiency of shunting data over a bus and and using off-die contiguous main memory. And this is the only mode that can be used for applications that are not explicitly recoded to take advantage of asynchronous mode.
So if you have a web server that uses openssl and it has high traffic, you will benefit greatly from using the accelerator. For a single Openvpn tunnel? No, your performance will be worse.
For reference, this is the board that the benchmarks are run on. It's a Intel C3758 8-core x86_64
# RSA 2K
# asynchronous
# openssl speed -engine qat -elapsed -async_jobs 72 rsa2048
engine "qat" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing 2048 bits private rsa's for 10s: 90678 2048 bits private RSA's in 10.01s
Doing 2048 bits public rsa's for 10s: 542922 2048 bits public RSA's in 10.00s
OpenSSL 1.1.1i 8 Dec 2020
built on: Sat Jan 30 15:32:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fpic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -fpic -specs=/opt/openwrt/x86/master/openwrt/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000110s 0.000018s 9058.7 54292.2
# synchronous
# openssl speed -engine qat -elapsed rsa2048
engine "qat" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing 2048 bits private rsa's for 10s: 12060 2048 bits private RSA's in 10.00s
Doing 2048 bits public rsa's for 10s: 70092 2048 bits public RSA's in 10.00s
OpenSSL 1.1.1i 8 Dec 2020
built on: Sat Jan 30 15:32:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fpic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -fpic -specs=/opt/openwrt/x86/master/openwrt/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000829s 0.000143s 1206.0 7009.2
# software
# openssl speed -elapsed rsa2048
You have chosen to measure elapsed time instead of user CPU time.
Doing 2048 bits private rsa's for 10s: 3719 2048 bits private RSA's in 10.00s
Doing 2048 bits public rsa's for 10s: 128740 2048 bits public RSA's in 10.00s
OpenSSL 1.1.1i 8 Dec 2020
built on: Sat Jan 30 15:32:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fpic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -fpic -specs=/opt/openwrt/x86/master/openwrt/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.002689s 0.000078s 371.9 12874.0
# ECDH Compute Key
# Asynchronous
# openssl speed -engine qat -elapsed -async_jobs 36 ecdh
engine "qat" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing 160 bits ecdh's for 10s: 233455 160-bits ECDH ops in 10.01s
Doing 192 bits ecdh's for 10s: 206296 192-bits ECDH ops in 10.00s
Doing 224 bits ecdh's for 10s: 172498 224-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 163355 256-bits ECDH ops in 10.01s
Doing 384 bits ecdh's for 10s: 95264 384-bits ECDH ops in 10.00s
Doing 521 bits ecdh's for 10s: 70993 521-bits ECDH ops in 10.00s
Doing 163 bits ecdh's for 10s: 180585 163-bits ECDH ops in 10.00s
Doing 233 bits ecdh's for 10s: 134987 233-bits ECDH ops in 10.00s
Doing 283 bits ecdh's for 10s: 64415 283-bits ECDH ops in 10.00s
Doing 409 bits ecdh's for 10s: 42718 409-bits ECDH ops in 10.01s
Doing 571 bits ecdh's for 10s: 35187 571-bits ECDH ops in 10.02s
Doing 163 bits ecdh's for 10s: 180784 163-bits ECDH ops in 10.01s
Doing 233 bits ecdh's for 10s: 134922 233-bits ECDH ops in 10.00s
Doing 283 bits ecdh's for 10s: 60481 283-bits ECDH ops in 10.01s
Doing 409 bits ecdh's for 10s: 45157 409-bits ECDH ops in 10.01s
Doing 571 bits ecdh's for 10s: 35105 571-bits ECDH ops in 10.01s
Doing 256 bits ecdh's for 10s: 163370 256-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 163626 256-bits ECDH ops in 10.01s
Doing 384 bits ecdh's for 10s: 92456 384-bits ECDH ops in 10.00s
Doing 384 bits ecdh's for 10s: 91757 384-bits ECDH ops in 10.01s
Doing 512 bits ecdh's for 10s: 72111 512-bits ECDH ops in 10.00s
Doing 512 bits ecdh's for 10s: 72434 512-bits ECDH ops in 10.01s
Doing 253 bits ecdh's for 10s: 79920 253-bits ECDH ops in 10.00s
Doing 448 bits ecdh's for 10s: 6412 448-bits ECDH ops in 10.00s
OpenSSL 1.1.1i 8 Dec 2020
built on: Sat Jan 30 15:32:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fpic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -fpic -specs=/opt/openwrt/x86/master/openwrt/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
op op/s
160 bits ecdh (secp160r1) 0.0000s 23322.2
192 bits ecdh (nistp192) 0.0000s 20629.6
224 bits ecdh (nistp224) 0.0001s 17249.8
256 bits ecdh (nistp256) 0.0001s 16319.2
384 bits ecdh (nistp384) 0.0001s 9526.4
521 bits ecdh (nistp521) 0.0001s 7099.3
163 bits ecdh (nistk163) 0.0001s 18058.5
233 bits ecdh (nistk233) 0.0001s 13498.7
283 bits ecdh (nistk283) 0.0002s 6441.5
409 bits ecdh (nistk409) 0.0002s 4267.5
571 bits ecdh (nistk571) 0.0003s 3511.7
163 bits ecdh (nistb163) 0.0001s 18060.3
233 bits ecdh (nistb233) 0.0001s 13492.2
283 bits ecdh (nistb283) 0.0002s 6042.1
409 bits ecdh (nistb409) 0.0002s 4511.2
571 bits ecdh (nistb571) 0.0003s 3507.0
256 bits ecdh (brainpoolP256r1) 0.0001s 16337.0
256 bits ecdh (brainpoolP256t1) 0.0001s 16346.3
384 bits ecdh (brainpoolP384r1) 0.0001s 9245.6
384 bits ecdh (brainpoolP384t1) 0.0001s 9166.5
512 bits ecdh (brainpoolP512r1) 0.0001s 7211.1
512 bits ecdh (brainpoolP512t1) 0.0001s 7236.2
253 bits ecdh (X25519) 0.0001s 7992.0
448 bits ecdh (X448) 0.0016s 641.2
# Synchronous
# openssl speed -engine qat -elapsed ecdh
engine "qat" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing 160 bits ecdh's for 10s: 14411 160-bits ECDH ops in 10.01s
Doing 192 bits ecdh's for 10s: 13037 192-bits ECDH ops in 10.00s
Doing 224 bits ecdh's for 10s: 11008 224-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 10276 256-bits ECDH ops in 10.01s
Doing 384 bits ecdh's for 10s: 5958 384-bits ECDH ops in 10.00s
Doing 521 bits ecdh's for 10s: 4639 521-bits ECDH ops in 10.00s
Doing 163 bits ecdh's for 10s: 11409 163-bits ECDH ops in 10.00s
Doing 233 bits ecdh's for 10s: 8442 233-bits ECDH ops in 10.00s
Doing 283 bits ecdh's for 10s: 4126 283-bits ECDH ops in 10.00s
Doing 409 bits ecdh's for 10s: 2749 409-bits ECDH ops in 10.00s
Doing 571 bits ecdh's for 10s: 2312 571-bits ECDH ops in 10.00s
Doing 163 bits ecdh's for 10s: 11020 163-bits ECDH ops in 10.00s
Doing 233 bits ecdh's for 10s: 8207 233-bits ECDH ops in 10.00s
Doing 283 bits ecdh's for 10s: 3906 283-bits ECDH ops in 10.00s
Doing 409 bits ecdh's for 10s: 2923 409-bits ECDH ops in 10.00s
Doing 571 bits ecdh's for 10s: 2302 571-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 10479 256-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 10427 256-bits ECDH ops in 10.00s
Doing 384 bits ecdh's for 10s: 5776 384-bits ECDH ops in 10.00s
Doing 384 bits ecdh's for 10s: 5759 384-bits ECDH ops in 10.00s
Doing 512 bits ecdh's for 10s: 4729 512-bits ECDH ops in 10.00s
Doing 512 bits ecdh's for 10s: 4570 512-bits ECDH ops in 10.00s
Doing 253 bits ecdh's for 10s: 79924 253-bits ECDH ops in 10.00s
Doing 448 bits ecdh's for 10s: 6417 448-bits ECDH ops in 10.00s
OpenSSL 1.1.1i 8 Dec 2020
built on: Sat Jan 30 15:32:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fpic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -fpic -specs=/opt/openwrt/x86/master/openwrt/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
op op/s
160 bits ecdh (secp160r1) 0.0007s 1439.7
192 bits ecdh (nistp192) 0.0008s 1303.7
224 bits ecdh (nistp224) 0.0009s 1100.8
256 bits ecdh (nistp256) 0.0010s 1026.6
384 bits ecdh (nistp384) 0.0017s 595.8
521 bits ecdh (nistp521) 0.0022s 463.9
163 bits ecdh (nistk163) 0.0009s 1140.9
233 bits ecdh (nistk233) 0.0012s 844.2
283 bits ecdh (nistk283) 0.0024s 412.6
409 bits ecdh (nistk409) 0.0036s 274.9
571 bits ecdh (nistk571) 0.0043s 231.2
163 bits ecdh (nistb163) 0.0009s 1102.0
233 bits ecdh (nistb233) 0.0012s 820.7
283 bits ecdh (nistb283) 0.0026s 390.6
409 bits ecdh (nistb409) 0.0034s 292.3
571 bits ecdh (nistb571) 0.0043s 230.2
256 bits ecdh (brainpoolP256r1) 0.0010s 1047.9
256 bits ecdh (brainpoolP256t1) 0.0010s 1042.7
384 bits ecdh (brainpoolP384r1) 0.0017s 577.6
384 bits ecdh (brainpoolP384t1) 0.0017s 575.9
512 bits ecdh (brainpoolP512r1) 0.0021s 472.9
512 bits ecdh (brainpoolP512t1) 0.0022s 457.0
253 bits ecdh (X25519) 0.0001s 7992.4
448 bits ecdh (X448) 0.0016s 641.7
# Software
# openssl speed -elapsed ecdh
You have chosen to measure elapsed time instead of user CPU time.
Doing 160 bits ecdh's for 10s: 19934 160-bits ECDH ops in 10.00s
Doing 192 bits ecdh's for 10s: 16298 192-bits ECDH ops in 10.00s
Doing 224 bits ecdh's for 10s: 10878 224-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 54929 256-bits ECDH ops in 10.00s
Doing 384 bits ecdh's for 10s: 3968 384-bits ECDH ops in 10.00s
Doing 521 bits ecdh's for 10s: 1634 521-bits ECDH ops in 10.00s
Doing 163 bits ecdh's for 10s: 16957 163-bits ECDH ops in 10.00s
Doing 233 bits ecdh's for 10s: 12276 233-bits ECDH ops in 10.01s
Doing 283 bits ecdh's for 10s: 7125 283-bits ECDH ops in 10.00s
Doing 409 bits ecdh's for 10s: 4239 409-bits ECDH ops in 10.00s
Doing 571 bits ecdh's for 10s: 1940 571-bits ECDH ops in 10.00s
Doing 163 bits ecdh's for 10s: 16276 163-bits ECDH ops in 10.00s
Doing 233 bits ecdh's for 10s: 11936 233-bits ECDH ops in 10.00s
Doing 283 bits ecdh's for 10s: 6797 283-bits ECDH ops in 10.00s
Doing 409 bits ecdh's for 10s: 4020 409-bits ECDH ops in 10.00s
Doing 571 bits ecdh's for 10s: 1809 571-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 9784 256-bits ECDH ops in 10.00s
Doing 256 bits ecdh's for 10s: 9779 256-bits ECDH ops in 10.00s
Doing 384 bits ecdh's for 10s: 3976 384-bits ECDH ops in 10.00s
Doing 384 bits ecdh's for 10s: 4026 384-bits ECDH ops in 10.00s
Doing 512 bits ecdh's for 10s: 2290 512-bits ECDH ops in 10.00s
Doing 512 bits ecdh's for 10s: 2192 512-bits ECDH ops in 10.01s
Doing 253 bits ecdh's for 10s: 79917 253-bits ECDH ops in 10.00s
Doing 448 bits ecdh's for 10s: 6411 448-bits ECDH ops in 10.00s
OpenSSL 1.1.1i 8 Dec 2020
built on: Sat Jan 30 15:32:43 2021 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fpic -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -fpic -specs=/opt/openwrt/x86/master/openwrt/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
op op/s
160 bits ecdh (secp160r1) 0.0005s 1993.4
192 bits ecdh (nistp192) 0.0006s 1629.8
224 bits ecdh (nistp224) 0.0009s 1087.8
256 bits ecdh (nistp256) 0.0002s 5492.9
384 bits ecdh (nistp384) 0.0025s 396.8
521 bits ecdh (nistp521) 0.0061s 163.4
163 bits ecdh (nistk163) 0.0006s 1695.7
233 bits ecdh (nistk233) 0.0008s 1226.4
283 bits ecdh (nistk283) 0.0014s 712.5
409 bits ecdh (nistk409) 0.0024s 423.9
571 bits ecdh (nistk571) 0.0052s 194.0
163 bits ecdh (nistb163) 0.0006s 1627.6
233 bits ecdh (nistb233) 0.0008s 1193.6
283 bits ecdh (nistb283) 0.0015s 679.7
409 bits ecdh (nistb409) 0.0025s 402.0
571 bits ecdh (nistb571) 0.0055s 180.9
256 bits ecdh (brainpoolP256r1) 0.0010s 978.4
256 bits ecdh (brainpoolP256t1) 0.0010s 977.9
384 bits ecdh (brainpoolP384r1) 0.0025s 397.6
384 bits ecdh (brainpoolP384t1) 0.0025s 402.6
512 bits ecdh (brainpoolP512r1) 0.0044s 229.0
512 bits ecdh (brainpoolP512t1) 0.0046s 219.0
253 bits ecdh (X25519) 0.0001s 7991.7
448 bits ecdh (X448) 0.0016s 641.1
I've copy/pasted from the other thread. This benchmark is run on the less capable C2758
It's a crypto-accelerator AES vs AES-NI benchmark
See how anything less than an 8K buffer is faster using the AES-NI version than using the crypto hardware. Only the really large buffers benefit from the crypto hardware
root@OpenWrt:~# openssl -elapsed -engine qat -async_jobs 32 -multi 2 -evp aes-128-cbc-hmac-sha1
90743.27k 199864.47k 298705.41k 353766.06k 524913.32k 613946.71k
root@OpenWrt:~# openssl -elapsed -async_jobs 32 -multi 2 -evp aes-128-cbc-hmac-sha1
153131.80k 257124.37k 329312.17k 364508.16k 376572.59k 377416.36k
Takeaway from all this: don't bother with crypto hardware for Openwrt. The only thing that will make a real difference is AES-NI
AES 128 vs AES 256:
root@OpenWrt:~# openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 18805661 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 14574331 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 7468735 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2609682 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 368604 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 186014 aes-128-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 100296.86k 310919.06k 637332.05k 890771.46k 1006534.66k 1015884.46k
root@OpenWrt:~# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 17664041 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 12219316 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 5379637 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1696935 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 229447 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 115396 aes-256-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 94208.22k 260678.74k 459062.36k 579220.48k 626543.27k 630216.02k
root@OpenWrt:~#
w/ and w/o EVP:
root@OpenWrt:~# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 17608820 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 12195151 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 256 size blocks: 5379602 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1696386 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 229437 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 16384 size blocks: 115389 aes-256-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 94227.80k 261033.33k 459059.37k 579033.09k 628611.34k 630177.79k
root@OpenWrt:~# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 4905394 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 1306013 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 333727 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 83839 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 10493 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 5243 aes-256 cbc's in 2.99s
OpenSSL 1.1.1j 16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256 cbc 26162.10k 27954.79k 28478.04k 28617.05k 28652.89k 28729.54k
root@OpenWrt:~#
So it seems that the EVP ones are optimised: https://security.stackexchange.com/questions/35036/different-performance-of-openssl-speed-on-the-same-hardware-with-aes-256-evp-an
manually built openssl vs opkg openssl-util
root@OpenWrt:~# ./openssl-static speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 21175430 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 64 size blocks: 13804150 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 5672902 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1724868 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 8192 size blocks: 230127 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 115459 aes-256-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Thu Mar 18 11:09:33 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 113313.34k 294488.53k 484087.64k 590724.02k 628400.13k 630560.09k
root@OpenWrt:~# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 17430854 aes-256-cbc's in 2.94s
Doing aes-256-cbc for 3s on 64 size blocks: 12043668 aes-256-cbc's in 2.95s
Doing aes-256-cbc for 3s on 256 size blocks: 5256099 aes-256-cbc's in 2.91s
Doing aes-256-cbc for 3s on 1024 size blocks: 1670277 aes-256-cbc's in 2.96s
Doing aes-256-cbc for 3s on 8192 size blocks: 224852 aes-256-cbc's in 2.92s
Doing aes-256-cbc for 3s on 16384 size blocks: 112725 aes-256-cbc's in 2.94s
OpenSSL 1.1.1j 16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 94861.79k 261286.36k 462392.21k 577825.56k 630817.67k 628192.65k
root@OpenWrt:~#
Sure - non evp does not use AES-NI. It's a pure software implementation
Aha, that makes more sense now. I scrolled up and just noticed that your very first benchmarks were without the evp parameter. My mistake, I should have noticed earlier and asked you to correct it, when I said that the latter benchmark did not make sense to me.
These look like AES-NI optimized numbers!
Thank you so much for all your efforts in running and sharing these benchmarks! Very helpful in understanding what is going on.
Can you also share the full output of 'cat /proc/crypto'? I am still a bit mystified on why it showed a priority of 100 on aes, for that single list you posted? As per OpenWrt documentation, we should have seen at least one entry for aes with > 100 priority due to AES-NI support.
I think your data is indeed quite compelling. Thank you again, for all your hard work in digging up the benchmarks and sharing them here.
In addition, the data posted by @jiegec for a router with no crypto engine but with AES support, fit very nicely with your point. I think perhaps the only piece missing would be for somebody to post similar data for a Linksys WRT family router (WRT1200, WRT1900, WRT3200, WRT32X) which are Marvell Armada based routers with crypto engine but with no AES support. It would be very interesting to see their numbers.
If anybody reading this post on these forums has any of these Linksys routers with Marvell Armada CPU, and can run these two benchmarks of "openssl speed -elapsed -evp aes-256-cbc" and "openssl speed -elapsed -evp aes-128-cbc" as well as the output of their "cat /proc/crypto", I would appreciate it. Thanks.
Here it is:
name : jitterentropy_rng
driver : jitterentropy_rng
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : ecdh
driver : ecdh-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : kpp
name : stdrng
driver : drbg_nopr_hmac_sha256
module : kernel
priority : 207
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_nopr_hmac_sha512
module : kernel
priority : 206
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_nopr_hmac_sha384
module : kernel
priority : 205
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_nopr_hmac_sha1
module : kernel
priority : 204
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_pr_hmac_sha256
module : kernel
priority : 203
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_pr_hmac_sha512
module : kernel
priority : 202
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_pr_hmac_sha384
module : kernel
priority : 201
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : stdrng
driver : drbg_pr_hmac_sha1
module : kernel
priority : 200
refcnt : 1
selftest : passed
internal : no
type : rng
seedsize : 0
name : lzo-rle
driver : lzo-rle-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : lzo-rle
driver : lzo-rle-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : compression
name : lzo
driver : lzo-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : lzo
driver : lzo-generic
module : kernel
priority : 0
refcnt : 2
selftest : passed
internal : no
type : compression
name : crc32
driver : crc32-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 1
digestsize : 4
name : crc32c
driver : crc32c-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 1
digestsize : 4
name : zlib-deflate
driver : zlib-deflate-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : deflate
driver : deflate-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : deflate
driver : deflate-generic
module : kernel
priority : 0
refcnt : 2
selftest : passed
internal : no
type : compression
name : aes
driver : aes-generic
module : kernel
priority : 100
refcnt : 4
selftest : passed
internal : no
type : cipher
blocksize : 16
min keysize : 16
max keysize : 32
name : sha384
driver : sha384-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 128
digestsize : 48
name : sha512
driver : sha512-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 128
digestsize : 64
name : sha224
driver : sha224-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 28
name : sha256
driver : sha256-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 32
name : sha1
driver : sha1-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 20
name : ecb(cipher_null)
driver : ecb-cipher_null
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 0
max keysize : 0
ivsize : 0
chunksize : 1
walksize : 1
name : digest_null
driver : digest_null-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 1
digestsize : 0
name : compress_null
driver : compress_null-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : compression
name : cipher_null
driver : cipher_null-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : cipher
blocksize : 1
min keysize : 0
max keysize : 0
I guess the conclusion is that AES-NI does not increase priority (based on your data), while crypto engine does increase priority (based on @slh's data), , in /proc/crypto.
Thanks to you and @slh for helping figure that out.
If the goal is for VPN, have you considered using WireGuard as an alternative to OpenVPN? Personally I am similar in the fact that I have a handful of mvebu devices that all support /dev/crypto. All my builds have it enabled, so it's there.... but the reality is that since I switched over to WG, I am not looking back. Just something to think about before you limit your choices based on getting HW accell for OVPN.
For example, I pull on average 600+ Mbps over WireGuard using only the native implementation, no HW accell available. That demolishes anything my system can do EVEN with using /dev/crypto.
Great point. As much as Wireguard is really very attractive, I believe that it is not as widely prevalent as OpenVPN. So for the near future, I think OpenVPN will have to be a use case to consider. Also, the hope is that if a WiFi router satisfies the OpenVPN use case, then it is reasonable to assume that Wireguard use case will also be more than covered. The reverse may not be true.
Since you own hardware acceleration enabled mvebu devices, you would be a great resource to help gather data for the missing piece that I referred to in one of my earlier posts. Would it be possible for you to run "openssl speed -elapsed -evp aes-256-cbc" and "openssl speed -elapsed -evp aes-128-cbc", accompanied with the output of "cat /proc/crypto", on your mvebu devices and share it here? It would be very helpful to complete the picture that has been drawn so far with the kind contributions of @slh , @dl12345 and @jiegec above. Thanks.
root@OpenWrt:~# openssl engine -t -c -pre DUMP_INFO devcrypto
(devcrypto) /dev/crypto engine
Information about ciphers supported by the /dev/crypto engine:
Cipher DES-CBC, NID=31, /dev/crypto info: id=1, driver=mv-cbc-des (hw accelerated)
Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, driver=mv-cbc-des3-ede (hw accelerated)
Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed
Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed
Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, driver=mv-cbc-aes (hw accelerated)
Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, driver=mv-cbc-aes (hw accelerated)
Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, driver=mv-cbc-aes (hw accelerated)
Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed
Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, CIOCGSESSION (session open call) failed
Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, CIOCGSESSION (session open call) failed
Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, CIOCGSESSION (session open call) failed
Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, driver=mv-ecb-aes (hw accelerated)
Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, driver=mv-ecb-aes (hw accelerated)
Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, driver=mv-ecb-aes (hw accelerated)
Information about digests supported by the /dev/crypto engine:
Digest MD5, NID=4, /dev/crypto info: id=13, driver=mv-md5 (hw accelerated), CIOCCPHASH capable
Digest SHA1, NID=64, /dev/crypto info: id=14, driver=mv-sha1 (hw accelerated), CIOCCPHASH capable
Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA224, NID=675, /dev/crypto info: id=103, driver=sha224-neon (software), CIOCCPHASH capable
Digest SHA256, NID=672, /dev/crypto info: id=104, driver=mv-sha256 (hw accelerated), CIOCCPHASH capable
Digest SHA384, NID=673, /dev/crypto info: id=105, driver=sha384-neon (software), CIOCCPHASH capable
Digest SHA512, NID=674, /dev/crypto info: id=106, driver=sha512-neon (software), CIOCCPHASH capable
[Success]: DUMP_INFO
[DES-CBC, DES-EDE3-CBC, AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-ECB, AES-192-ECB, AES-256-ECB]
[ available ]
root@OpenWrt:~# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 5910428 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1711964 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 454575 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 115368 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 14497 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 7242 aes-256-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Wed Mar 17 09:01:21 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: arm-openwrt-linux-muslgnueabi-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 31522.28k 36521.90k 38790.40k 39378.94k 39586.47k 39550.98k
root@OpenWrt:~# openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 7215941 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 2194148 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 591696 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 150850 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 18294 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 9496 aes-128-cbc's in 3.00s
OpenSSL 1.1.1j 16 Feb 2021
built on: Wed Mar 17 09:01:21 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: arm-openwrt-linux-muslgnueabi-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 38485.02k 46808.49k 50491.39k 51490.13k 49954.82k 51860.82k
root@OpenWrt:~# cat /proc/crypto
name : poly1305
driver : poly1305-neon
module : poly1305_arm
priority : 200
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 16
digestsize : 16
name : poly1305
driver : poly1305-arm
module : poly1305_arm
priority : 150
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 16
digestsize : 16
name : xchacha12
driver : xchacha12-neon
module : chacha_neon
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 32
max keysize : 32
ivsize : 32
chunksize : 64
walksize : 256
name : xchacha20
driver : xchacha20-neon
module : chacha_neon
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 32
max keysize : 32
ivsize : 32
chunksize : 64
walksize : 256
name : chacha20
driver : chacha20-neon
module : chacha_neon
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 32
max keysize : 32
ivsize : 16
chunksize : 64
walksize : 256
name : xchacha12
driver : xchacha12-arm
module : chacha_neon
priority : 200
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 32
max keysize : 32
ivsize : 32
chunksize : 64
walksize : 64
name : xchacha20
driver : xchacha20-arm
module : chacha_neon
priority : 200
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 32
max keysize : 32
ivsize : 32
chunksize : 64
walksize : 64
name : chacha20
driver : chacha20-arm
module : chacha_neon
priority : 200
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 32
max keysize : 32
ivsize : 16
chunksize : 64
walksize : 64
name : hmac(sha256)
driver : mv-hmac-sha256
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ahash
async : yes
blocksize : 64
digestsize : 32
name : hmac(sha1)
driver : mv-hmac-sha1
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ahash
async : yes
blocksize : 64
digestsize : 20
name : hmac(md5)
driver : mv-hmac-md5
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ahash
async : yes
blocksize : 64
digestsize : 16
name : sha256
driver : mv-sha256
module : kernel
priority : 300
refcnt : 689
selftest : passed
internal : no
type : ahash
async : yes
blocksize : 64
digestsize : 32
name : sha1
driver : mv-sha1
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ahash
async : yes
blocksize : 64
digestsize : 20
name : md5
driver : mv-md5
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : ahash
async : yes
blocksize : 64
digestsize : 16
name : cbc(aes)
driver : mv-cbc-aes
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 16
min keysize : 16
max keysize : 32
ivsize : 16
chunksize : 16
walksize : 16
name : ecb(aes)
driver : mv-ecb-aes
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 16
min keysize : 16
max keysize : 32
ivsize : 0
chunksize : 16
walksize : 16
name : cbc(des3_ede)
driver : mv-cbc-des3-ede
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 8
min keysize : 24
max keysize : 24
ivsize : 8
chunksize : 8
walksize : 8
name : ecb(des3_ede)
driver : mv-ecb-des3-ede
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 8
min keysize : 24
max keysize : 24
ivsize : 8
chunksize : 8
walksize : 8
name : cbc(des)
driver : mv-cbc-des
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 8
min keysize : 8
max keysize : 8
ivsize : 8
chunksize : 8
walksize : 8
name : ecb(des)
driver : mv-ecb-des
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 8
min keysize : 8
max keysize : 8
ivsize : 0
chunksize : 8
walksize : 8
name : sha512
driver : sha512-neon
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 128
digestsize : 64
name : sha384
driver : sha384-neon
module : kernel
priority : 300
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 128
digestsize : 48
name : sha512
driver : sha512-arm
module : kernel
priority : 250
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 128
digestsize : 64
name : sha384
driver : sha384-arm
module : kernel
priority : 250
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 128
digestsize : 48
name : sha224
driver : sha224-neon
module : kernel
priority : 250
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 28
name : sha256
driver : sha256-neon
module : kernel
priority : 250
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 32
name : sha224
driver : sha224-asm
module : kernel
priority : 150
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 28
name : sha256
driver : sha256-asm
module : kernel
priority : 150
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 32
name : sha1
driver : sha1-neon
module : kernel
priority : 250
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 20
name : sha1
driver : sha1-asm
module : kernel
priority : 150
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 20
name : aes
driver : aes-arm
module : kernel
priority : 200
refcnt : 1
selftest : passed
internal : no
type : cipher
blocksize : 16
min keysize : 16
max keysize : 32
name : lzo-rle
driver : lzo-rle-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : lzo-rle
driver : lzo-rle-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : compression
name : lzo
driver : lzo-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : lzo
driver : lzo-generic
module : kernel
priority : 0
refcnt : 2
selftest : passed
internal : no
type : compression
name : crc32
driver : crc32-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 1
digestsize : 4
name : crc32c
driver : crc32c-generic
module : kernel
priority : 100
refcnt : 2
selftest : passed
internal : no
type : shash
blocksize : 1
digestsize : 4
name : zlib-deflate
driver : zlib-deflate-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : deflate
driver : deflate-scomp
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : scomp
name : deflate
driver : deflate-generic
module : kernel
priority : 0
refcnt : 2
selftest : passed
internal : no
type : compression
name : aes
driver : aes-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : cipher
blocksize : 16
min keysize : 16
max keysize : 32
name : des3_ede
driver : des3_ede-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : cipher
blocksize : 8
min keysize : 24
max keysize : 24
name : des
driver : des-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : cipher
blocksize : 8
min keysize : 8
max keysize : 8
name : sha1
driver : sha1-generic
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 64
digestsize : 20
name : ecb(cipher_null)
driver : ecb-cipher_null
module : kernel
priority : 100
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : no
blocksize : 1
min keysize : 0
max keysize : 0
ivsize : 0
chunksize : 1
walksize : 1
name : digest_null
driver : digest_null-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : shash
blocksize : 1
digestsize : 0
name : compress_null
driver : compress_null-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : compression
name : cipher_null
driver : cipher_null-generic
module : kernel
priority : 0
refcnt : 1
selftest : passed
internal : no
type : cipher
blocksize : 1
min keysize : 0
max keysize : 0