I have recently purchased a SBC with Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz processor hoping that OpenWRT will benefit from its AES-NI implementation in regards to VPN performance (OpenVPN). But unfortunately I am not able to make OpenWRT using crypto hardware acceleration. I have already tried to insall libopenssl-devcrypto module but this also does not help. I am currently using 19.07.3.
What is necessary to activate AES-NI support in OpenWRT?
root@routegateway:~# time openssl speed -evp aes-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 101430319 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 28768441 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 7312470 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1835629 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 231995 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 115916 aes-256-cbc's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 540961.70k 613726.74k 623997.44k 626561.37k 633501.01k 633055.91k
real 0m 18.00s
user 0m 18.00s
sys 0m 0.00s
I have now tested my system with and without "-evp" openssl argument:
root@routegateway:~# openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 94642867 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 28761073 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 256 size blocks: 7317344 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 1836397 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 8192 size blocks: 229774 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 114861 aes-256-cbc's in 2.99s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 504761.96k 615621.63k 624413.35k 628919.91k 627436.20k 629392.18k
root@routegateway:~# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 21879012 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 64 size blocks: 5601877 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 1409064 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 354639 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 44359 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 22178 aes-256 cbc's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256 cbc 117078.33k 119506.71k 120642.27k 121050.11k 121129.64k 121121.45k
root@routegateway:~# openssl speed -evp aes-256-gcm
Doing aes-256-gcm for 3s on 16 size blocks: 49888540 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 64 size blocks: 38115307 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 256 size blocks: 17084401 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 1024 size blocks: 6302964 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 8192 size blocks: 929136 aes-256-gcm's in 3.00s
Doing aes-256-gcm for 3s on 16384 size blocks: 471805 aes-256-gcm's in 3.00s
OpenSSL 1.1.1g 21 Apr 2020
built on: Sun Aug 2 16:16:00 2020 UTC
options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-gcm 266072.21k 813126.55k 1457868.89k 2151411.71k 2537160.70k 2576684.37k
root@routegateway:~# openssl speed aes-256-gcm
speed: Unknown algorithm aes-256-gcm
Obviously there is a difference for CBC and for GCM without "-evp" the cipher is not even available. Does that mean that openssl considers the AES-NI extension here?
The short answer is yes, unless you configure with "no-asm."
What OpenSSL does is not obvious. The INSTALL document talks about the no-asm configuration option. Details about what the assembler code does in terms of optimization are only available by reading the source code comments in the various Perl files that generate the assembler, mostly.
On x86, the assembly code uses the CPUID instruction (see the OPENSSL_ia32cap.pod manpage) to determine if various instructions (AES, SSE, MMX, etc) are available and will use them if so. For other processors, similar tests are performed if at all possible.