Hi guys,
I have recently purchased a SBC with Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz processor hoping that OpenWRT will benefit from its AES-NI implementation in regards to VPN performance (OpenVPN). But unfortunately I am not able to make OpenWRT using crypto hardware acceleration. I have already tried to insall libopenssl-devcrypto module but this also does not help. I am currently using 19.07.3.
What is necessary to activate AES-NI support in OpenWRT?
Here are some outputs:
root@routegateway:~# grep -o aes /proc/cpuinfo
aes
aes
aes
aes
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 78
model name : Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz
stepping : 3
microcode : 0xc2
cpu MHz : 500.062
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips : 4608.00
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
.......
root@routegateway:~# openssl engine -t -c -pre DUMP_INFO
(rdrand) Intel RDRAND engine
[Failure]: DUMP_INFO
140148315393384:error:260AB089:engine routines:ENGINE_ctrl_cmd_string:invalid cmd name:crypto/engine/eng_ctrl.c:255:
[RAND]
[ available ]
(dynamic) Dynamic engine loading support
[Failure]: DUMP_INFO
140148315393384:error:260AC089:engine routines:int_ctrl_helper:invalid cmd name:crypto/engine/eng_ctrl.c:87:
140148315393384:error:260AB089:engine routines:ENGINE_ctrl_cmd_string:invalid cmd name:crypto/engine/eng_ctrl.c:255:
[ unavailable ]
root@routegateway:~# time openssl speed -evp aes-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 101430319 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 28768441 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 7312470 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1835629 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 231995 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 115916 aes-256-cbc's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 540961.70k 613726.74k 623997.44k 626561.37k 633501.01k 633055.91k
real 0m 18.00s
user 0m 18.00s
sys 0m 0.00s
root@routegateway:~# openssl engine -t -c
(rdrand) Intel RDRAND engine
[RAND]
[ available ]
(dynamic) Dynamic engine loading support
[ unavailable ]
name : cbc(aes)
driver : cbc-aes-aesni
module : kernel
priority : 400
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 16
min keysize : 16
max keysize : 32
ivsize : 16
chunksize : 16
walksize : 16
name : ecb(aes)
driver : ecb-aes-aesni
module : kernel
priority : 400
refcnt : 1
selftest : passed
internal : no
type : skcipher
async : yes
blocksize : 16
min keysize : 16
max keysize : 32
ivsize : 0
chunksize : 16
walksize : 16
name : gcm(aes)
driver : generic-gcm-aesni
module : kernel
priority : 400
refcnt : 1
selftest : passed
internal : no
type : aead
async : yes
blocksize : 1
ivsize : 12
maxauthsize : 16
.......
Thanks for any help!
Bye
I have gone through all these steps without success, but thanks for the hint!
I didn't think help you ...
One more thing:
I have now tested my system with and without "-evp" openssl argument:
root@routegateway:~# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 94642867 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 28761073 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 256 size blocks: 7317344 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1836397 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 8192 size blocks: 229774 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 114861 aes-256-cbc's in 2.99s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 504761.96k 615621.63k 624413.35k 628919.91k 627436.20k 629392.18k
root@routegateway:~# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 21879012 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 64 size blocks: 5601877 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 1409064 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 354639 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 44359 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 22178 aes-256 cbc's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256 cbc 117078.33k 119506.71k 120642.27k 121050.11k 121129.64k 121121.45k
root@routegateway:~# openssl speed -evp aes-256-gcm
Doing aes-256-gcm for 3s on 16 size blocks: 49888540 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 38115307 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 17084401 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 6302964 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 8192 size blocks: 929136 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 16384 size blocks: 471805 aes-256-gcm's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-gcm 266072.21k 813126.55k 1457868.89k 2151411.71k 2537160.70k 2576684.37k
root@routegateway:~# openssl speed aes-256-gcm
speed: Unknown algorithm aes-256-gcm
Obviously there is a difference for CBC and for GCM without "-evp" the cipher is not even available. Does that mean that openssl considers the AES-NI extension here?
Thanks!
afaik openssl in OpenWrt is compiled to use AES-NI independently so it should use it if available without need to install other packages.
You can try two different commands and see if performance is different
This should use AES-NI and should have bigger performance
openssl speed -elapsed -evp aes-128-cbc
This has a runtime switch that disables use of AES-NI in openSSL and should therefore have lower performance
OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc
see https://www.highgo.ca/2019/08/22/the-performance-test-on-the-aes-modes/ for more info
Thanks @bobafetthotmail, this proves it I guess:
root@routegateway:~# OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 37905593 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 10779104 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 2769347 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 702288 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 88129 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 44055 aes-128-cbc's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 202163.16k 229954.22k 236317.61k 239714.30k 240650.92k 240599.04k
root@routegateway:~# openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 117879925 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 39584711 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 10062149 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2530718 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 318704 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 158373 aes-128-cbc's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 628692.93k 844473.83k 858636.71k 863818.41k 870274.39k 864927.74k
It clearly shows that including the flag the throughput is much lower.
Bye
Even more obvious with GCM cipher:
root@routegateway:~# openssl speed -elapsed -evp aes-128-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 64818679 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 40327930 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 256 size blocks: 20244667 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 8163187 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 1284854 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 16384 size blocks: 654175 aes-128-gcm's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-gcm 345699.62k 860329.17k 1727544.92k 2786367.83k 3508507.99k 3572667.73k
root@routegateway:~# OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-gcm
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-gcm for 3s on 16 size blocks: 19441905 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 64 size blocks: 5798100 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 256 size blocks: 1517519 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 1024 size blocks: 385739 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 8192 size blocks: 48473 aes-128-gcm's in 3.00s
Doing aes-128-gcm for 3s on 16384 size blocks: 24269 aes-128-gcm's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-gcm 103690.16k 123692.80k 129494.95k 131665.58k 132363.61k 132541.10k
thanks, I updated the wiki with this information.