Crypto (HW accelaration) not working? (with tests) APU2

Hi there,

first: mayby im reading/understanding the speed things completly wrong and with crypto is actualy faster than without crypto (this woudnt be the first time) but:

again i need your help im all out of idea's i followed this guide, i stumbled on this guide from the openwrt device page saying:

The AMD GX-412TC supports the [AES-NI instruction set](https://en.wikipedia.org/wiki/AES instruction set), which works without any kernel module or specific configuration.
The SoC also contains a cryptographic co-processor (AMD CCP), which requires kmod-crypto-hw-ccp to be installed. The CCP can be utilized to speed up various cryptographic algorithms in kernel space, like IPSec hashing for example. See Cryptographic Hardware Accelerators on how to enable /dev/crypto and configure userspace libraries like OpenSSL to take advantage of it. AES-GCM is currently the best security vs performance trade off.

On the device pages on openwrt and the pcengines also says it supports AES-NI hw acceleration.

But when following the tutorial of crypto and how to enable it it says after step 4:

beware that AES-NI and similar CPU instructions will have a high priority as well, and do not need /dev/crypto or AF_ALG to be used!

So do we need still use the kmod-crypto-hw-ccp package and the crypto setup it looks like without the crypto setup everything is supported as well and getting the same speeds or do i something wrong?

i did the test also with AF_ALG but its a little bit slower than devcrypto but it has alsmost the same results. only enabled the CIPHERS=AES-128-CBC, AES-192-CBC, AES-256-CBC, AES-128-CTR, AES-192-CTR, AES-256-CTR found this ciphers trough the tutorial (output is below) and the ECB ciphers not enabled because:

It is recommended to disable the ECB ciphers; in most cases, it will only be used for PRNG, in small blocks, where performance is poor, and there may be problems with apps forking with open crypto.

The speeds with devcrypto enabled and allow all drivers so USE_SOFTDRIVERS = 1:

time openssl speed -evp AES-256-cbc -engine devcrypto -elapsed
engine "devcrypto" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 721307 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 688297 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 573673 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 341695 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 70673 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 37240 aes-256-cbc's in 3.00s
OpenSSL 1.1.1o  3 May 2022
built on: Tue May 17 22:16:11 2022 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -DPIC -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc       3846.97k    14683.67k    48953.43k   116631.89k   192984.41k   203380.05k
real    0m 18.28s
user    0m 1.97s
sys     0m 16.04s

The same speeds with devcrypto enabled and use accelerated drivers so USE_SOFTDRIVERS = 2 or 0:

root@OpenWrt:~# time openssl speed -evp AES-256-cbc -engine devcrypto -elapsed
engine "devcrypto" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 17365778 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 7273655 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2300251 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 621176 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 78882 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 39452 aes-256-cbc's in 3.00s
OpenSSL 1.1.1o  3 May 2022
built on: Tue May 17 22:16:11 2022 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -DPIC -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      92617.48k   155171.31k   196288.09k   212028.07k   215400.45k   215460.52k
real    0m 18.27s
user    0m 18.01s
sys     0m 0.00s

without any crypto packages and checked that cpp_crypto wasn't loaded:

time openssl speed -evp AES-256-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 17579600 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 7377116 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 2326029 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 621836 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 79136 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 39610 aes-256-cbc's in 3.00s
OpenSSL 1.1.1o  3 May 2022
built on: Tue May 17 22:16:11 2022 UTC
options:bn(64,64) rc4(8x,int) des(int) aes(partial) blowfish(ptr)
compiler: x86_64-openwrt-linux-musl-gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -O3 -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -O3 -DPIC -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      93757.87k   157378.47k   198487.81k   212253.35k   216094.04k   216323.41k
real    0m 18.03s
user    0m 18.01s
sys     0m 0.00s

It gives the same speeds with or without the crypto packages, so does the apu2 need the crypto packages or do i something wrong?

  • forcing only the software drivers seems slower on the smaller packages but everything else is the same.
  • The other encryption methods gave the same output, ofc differnent numbers but the same results no difference between with or without the crypto packages.

I still continued the tutorial and found the supported encryptions from cat /proc/crypto everything with drivers types skcipher and shash, having priority >= 300:

name         : __ecb(aes)
driver       : cryptd(__ecb-aes-aesni)
module       : kernel
priority     : 450
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

name         : __ctr(aes)
driver       : cryptd(__ctr-aes-aesni)
module       : kernel
priority     : 450
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __cbc(aes)
driver       : cryptd(__cbc-aes-aesni)
module       : kernel
priority     : 450
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : xts(aes)
driver       : xts-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16
walksize     : 16


name         : rfc3686(ctr(aes))
driver       : rfc3686-ctr-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 20
max keysize  : 36
ivsize       : 8
chunksize    : 1
walksize     : 1

name         : ctr(aes)
driver       : ctr-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 1
walksize     : 1

name         : ofb(aes)
driver       : ofb-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 1
walksize     : 1

name         : cfb(aes)
driver       : cfb-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 1
walksize     : 1

name         : cbc(aes)
driver       : cbc-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : ecb(aes)
driver       : ecb-aes-ccp
module       : ccp_crypto
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

name         : xts(aes)
driver       : xts-aes-aesni
module       : kernel
priority     : 401
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : ctr(aes)
driver       : ctr-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : cbc(aes)
driver       : cbc-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : ecb(aes)
driver       : ecb-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

name         : __xts(aes)
driver       : __xts-aes-aesni
module       : kernel
priority     : 401
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 32
max keysize  : 64
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __ctr(aes)
driver       : __ctr-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __cbc(aes)
driver       : __cbc-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
chunksize    : 16
walksize     : 16

name         : __ecb(aes)
driver       : __ecb-aes-aesni
module       : kernel
priority     : 400
refcnt       : 1
selftest     : passed
internal     : yes
type         : skcipher
async        : no
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
chunksize    : 16
walksize     : 16

the supported output:

root@OpenWrt:~# openssl engine -pre DUMP_INFO devcrypto
(devcrypto) /dev/crypto engine
Information about ciphers supported by the /dev/crypto engine:
Cipher DES-CBC, NID=31, /dev/crypto info: id=1, CIOCGSESSION (session open call) failed
Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, CIOCGSESSION (session open call) failed
Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed
Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed
Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, driver=cbc-aes-aesni (software)
Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, driver=cbc-aes-aesni (software)
Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, driver=cbc-aes-aesni (software)
Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed
Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, driver=ctr-aes-aesni (software)
Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, driver=ctr-aes-aesni (software)
Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, driver=ctr-aes-aesni (software)
Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, driver=ecb-aes-aesni (software)
Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, driver=ecb-aes-aesni (software)
Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, driver=ecb-aes-aesni (software)

Information about digests supported by the /dev/crypto engine:
Digest MD5, NID=4, /dev/crypto info: id=13, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA1, NID=64, /dev/crypto info: id=14, driver=sha1-ccp (hw accelerated), CIOCCPHASH capable
Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA224, NID=675, /dev/crypto info: id=103, driver=sha224-ccp (hw accelerated), CIOCCPHASH capable
Digest SHA256, NID=672, /dev/crypto info: id=104, driver=sha256-ccp (hw accelerated), CIOCCPHASH capable
Digest SHA384, NID=673, /dev/crypto info: id=105, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA512, NID=674, /dev/crypto info: id=106, driver=unknown. CIOCGSESSION (session open) failed

[Success]: DUMP_INFO

and the hw acceliraton output:

openssl engine -t -c
(dynamic) Dynamic engine loading support
     [ unavailable ]
(devcrypto) /dev/crypto engine
     [ available ]
1 Like

Same issue, showing the same openssl engine -t -c output
Actually it does not work somehow.

If you find a way to make it happen, I'm very interested :slight_smile:

Cheers Blinton

1 Like

Did you ever make any headway with this? Despite my CPU having AES-NI, I am seeing driver=*** -aes-aesni (software) for all algorithms.

1 Like

still not working?

Still not working for me :-1:

1 Like

I gonna try to fix it soon, there are a couple of changes that i need to try out.

last time (half year ago) i had still no luck but last year i learned much much more about linux in general, and some stuff actual make sense now.

So i will not rest until there are no options left for me to try out why/if it isn't working.

1 Like

It depands on which hardware we try it, I have the r7800 not sure it has the accelerator

It does not.

--
at least not without NSS offloading.

1 Like

Just tested on my Celeron J4125 system, also not working....

Checked on my NanoPi R2S, same result, the hardware crypto acceleration is not working, so this issue isn't limited on x86.

i see your nanoPI R2S using a RK3328 right?

Support AES 128/192/256
Supports the DES (ECB and CBC modes) and TDES (EDE and DED) algorithms
Supports MD5, SHA-1 and SHA-256 HASH algorithms
Support PKA(RSA) 512/1024/2048 bit Exp Modulator
Support 160-bit Pseudo Random Number Generator (PRNG)
Support 256-bit True Random Number Generator (TRNG)

because if it is than it helps a lot :slight_smile: ,
the problem probably is in the dynamic mod loader for the kernel being as small as possible and openwrt loads modules inside of it (if i understand correctly).
From the "hacking unix" security book there are a couple of securty issues whit the full modules loading. Maybe it worked for a while until a CVE was found and that fix disabled crypto.

But i have no "when" this became a problem, and to research it without a starting time is probably a lot a lot of work, i see in the github commits that changes are done.

So i guess i am gonna to rebuild a linux kernel based on the openwrt kernel with full crypto support in the upcoming month and hoping to enable crypto.

Can you show me where you see this? I would like to dig into it and see what I can do.....

all openwrt changes are in github,
The and just reading the commits like

this is not the only crypto related package but the fastest one i could find right now,
and the commits related to this package what changed and when can be find

september 2021, but other related crypto dependencies where updated more recently.

1 Like

AESNI is not implemented as an engine. It's a CPU instruction. You can't invoke it via an engine.

If it's compiled in, use of the -evp switch will automatically use it.

See the following post - it has an example how to disable AESNI so you can do a "normal" run and one with it disabled to make sure it's working

So if i understand correctly my first post was correct and no additional modules need to be loaded for the APU2 because its a instruction set.

So on this site: https://openwrt.org/toh/pcengines/apu2#cryptographic_hardware

The after installing the ccp driver there is nothing more to setup?