Crypto (HW accelaration) not working? (with tests) APU2

Hi there,

first: mayby im reading/understanding the speed things completly wrong and with crypto is actualy faster than without crypto (this woudnt be the first time) but:

again i need your help im all out of idea's i followed this guide, i stumbled on this guide from the openwrt device page saying:

The AMD GX-412TC supports the [AES-NI instruction set](https://en.wikipedia.org/wiki/AES instruction set), which works without any kernel module or specific configuration.
The SoC also contains a cryptographic co-processor (AMD CCP), which requires kmod-crypto-hw-ccp to be installed. The CCP can be utilized to speed up various cryptographic algorithms in kernel space, like IPSec hashing for example. See Cryptographic Hardware Accelerators on how to enable /dev/crypto and configure userspace libraries like OpenSSL to take advantage of it. AES-GCM is currently the best security vs performance trade off.

On the device pages on openwrt and the pcengines also says it supports AES-NI hw acceleration.

But when following the tutorial of crypto and how to enable it it says after step 4:

beware that AES-NI and similar CPU instructions will have a high priority as well, and do not need /dev/crypto or AF_ALG to be used!

So do we need still use the kmod-crypto-hw-ccp package and the crypto setup it looks like without the crypto setup everything is supported as well and getting the same speeds or do i something wrong?

i did the test also with AF_ALG but its a little bit slower than devcrypto but it has alsmost the same results. only enabled the CIPHERS=AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR found this ciphers trough the tutorial (output is below) and the ECB ciphers not enabled because:

It is recommended to disable the ECB ciphers; in most cases, it will only be used for PRNG, in small blocks, where performance is poor, and there may be problems with apps forking with open crypto.

The speeds with devcrypto enabled and allow all drivers so USE_SOFTDRIVERS = 1:

time openssl speed -evp AES-256-cbc -engine devcrypto -elapsed
engine "devcrypto" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 721307 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 688297 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 573673 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 341695 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 70673 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 37240 aes-256-cbc's in 3.00s
OpenSSL 1.1.1o  3 May 2022
built on: Tue May 17 22:16:11 2022 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -DPIC -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc       3846.97k    14683.67k    48953.43k   116631.89k   192984.41k   203380.05k
real    0m 18.28s
user    0m 1.97s
sys     0m 16.04s

The same speeds with devcrypto enabled and use accelerated drivers so USE_SOFTDRIVERS = 2 or 0:

root@OpenWrt:~# time openssl speed -evp AES-256-cbc -engine devcrypto -elapsed
engine "devcrypto" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 17365778 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 7273655 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2300251 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 621176 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 78882 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 39452 aes-256-cbc's in 3.00s
OpenSSL 1.1.1o  3 May 2022
built on: Tue May 17 22:16:11 2022 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -DPIC -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      92617.48k   155171.31k   196288.09k   212028.07k   215400.45k   215460.52k
real    0m 18.27s
user    0m 18.01s
sys     0m 0.00s

without any crypto packages and checked that cpp_crypto wasn't loaded:

time openssl speed -evp AES-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 17579600 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 7377116 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2326029 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 621836 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 79136 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 39610 aes-256-cbc's in 3.00s
OpenSSL 1.1.1o  3 May 2022
built on: Tue May 17 22:16:11 2022 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -DPIC -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      93757.87k   157378.47k   198487.81k   212253.35k   216094.04k   216323.41k
real    0m 18.03s
user    0m 18.01s
sys     0m 0.00s

It gives the same speeds with or without the crypto packages, so does the apu2 need the crypto packages or do i something wrong?

  • forcing only the software drivers seems slower on the smaller packages but everything else is the same.
  • The other encryption methods gave the same output, ofc differnent numbers but the same results no difference between with or without the crypto packages.

I still continued the tutorial and found the supported encryptions from cat /proc/crypto everything with drivers types skcipher and shash, having priority >= 300:

name         : __ecb(aes)
driver       : cryptd(__ecb-aes-aesni)
module       : kernel
priority     : 450
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

name         : __ctr(aes)
driver       : cryptd(__ctr-aes-aesni)
module       : kernel
priority     : 450
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __cbc(aes)
driver       : cryptd(__cbc-aes-aesni)
module       : kernel
priority     : 450
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : xts(aes)
driver       : xts-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16
walksize     : 16


name         : rfc3686(ctr(aes))
driver       : rfc3686-ctr-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 20
max keysize  : 36
ivsize       : 8
chunksize    : 1
walksize     : 1

name         : ctr(aes)
driver       : ctr-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 1
walksize     : 1

name         : ofb(aes)
driver       : ofb-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 1
walksize     : 1

name         : cfb(aes)
driver       : cfb-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 1
walksize     : 1

name         : cbc(aes)
driver       : cbc-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : ecb(aes)
driver       : ecb-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

name         : xts(aes)
driver       : xts-aes-aesni
module       : kernel
priority     : 401
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : ctr(aes)
driver       : ctr-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : cbc(aes)
driver       : cbc-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : ecb(aes)
driver       : ecb-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

name         : __xts(aes)
driver       : __xts-aes-aesni
module       : kernel
priority     : 401
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __ctr(aes)
driver       : __ctr-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __cbc(aes)
driver       : __cbc-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __ecb(aes)
driver       : __ecb-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

the supported output:

root@OpenWrt:~# openssl engine -pre DUMP_INFO devcrypto
(devcrypto) /dev/crypto engine
Information about ciphers supported by the /dev/crypto engine:
Cipher DES-CBC, NID=31, /dev/crypto info: id=1, CIOCGSESSION (session open call) failed
Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, CIOCGSESSION (session open call) failed
Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed
Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed
Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, driver=cbc-aes-aesni (software)
Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, driver=cbc-aes-aesni (software)
Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, driver=cbc-aes-aesni (software)
Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed
Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, driver=ctr-aes-aesni (software)
Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, driver=ctr-aes-aesni (software)
Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, driver=ctr-aes-aesni (software)
Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, driver=ecb-aes-aesni (software)
Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, driver=ecb-aes-aesni (software)
Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, driver=ecb-aes-aesni (software)

Information about digests supported by the /dev/crypto engine:
Digest MD5, NID=4, /dev/crypto info: id=13, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA1, NID=64, /dev/crypto info: id=14, driver=sha1-ccp (hw accelerated), CIOCCPHASH capable
Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA224, NID=675, /dev/crypto info: id=103, driver=sha224-ccp (hw accelerated), CIOCCPHASH capable
Digest SHA256, NID=672, /dev/crypto info: id=104, driver=sha256-ccp (hw accelerated), CIOCCPHASH capable
Digest SHA384, NID=673, /dev/crypto info: id=105, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA512, NID=674, /dev/crypto info: id=106, driver=unknown. CIOCGSESSION (session open) failed

[Success]: DUMP_INFO

and the hw acceliraton output:

openssl engine -t -c
(dynamic) Dynamic engine loading support
     [ unavailable ]
(devcrypto) /dev/crypto engine
     [ available ]