ARMv8 hardware crypto acceleration

Several targets have received support for armv8-CE crypto algorithms on Saturday. Including bcm4908, layerscape armv8_64b, mvebu a53 and a72, octeontx, rockchip and sunxi a53.

I'm trying to get this to work on a Orange Pi Zero Plus (sunxi a53 target), but I can't get OpenSSL to work with it.

First of all, cat /proc/crypto does show a difference. On a 22.03-rc1 build:

name         : ghash
driver       : ghash-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 16
digestsize   : 16

name         : crct10dif
driver       : crct10dif-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : crc32
driver       : crc32-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : crc32c
driver       : crc32c-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : aes
driver       : aes-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : des3_ede
driver       : des3_ede-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 24
max keysize  : 24

name         : des
driver       : des-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 8
max keysize  : 8

name         : sha1
driver       : sha1-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : md5
driver       : md5-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 16

name         : ecb(cipher_null)
driver       : ecb-cipher_null
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 0
max keysize  : 0
ivsize       : 0
chunksize    : 1
walksize     : 1

name         : digest_null
driver       : digest_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 0

name         : compress_null
driver       : compress_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : compression

name         : cipher_null
driver       : cipher_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 1
min keysize  : 0
max keysize  : 0

On today's snapshot:

name         : xchacha12
driver       : xchacha12-neon
module       : chacha_neon
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 32
max keysize  : 32
ivsize       : 32
chunksize    : 64
walksize     : 320

name         : xchacha20
driver       : xchacha20-neon
module       : chacha_neon
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 32
max keysize  : 32
ivsize       : 32
chunksize    : 64
walksize     : 320

name         : chacha20
driver       : chacha20-neon
module       : chacha_neon
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 32
max keysize  : 32
ivsize       : 16
chunksize    : 64
walksize     : 320

name         : poly1305
driver       : poly1305-neon
module       : poly1305_neon
priority     : 200
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 16
digestsize   : 16

name         : aes
driver       : aes-arm64
module       : kernel
priority     : 200
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : ccm(aes)
driver       : ccm-aes-ce
module       : kernel
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : no
blocksize    : 1
ivsize       : 16
maxauthsize  : 16
geniv        : <none>

name         : aes
driver       : aes-ce
module       : kernel
priority     : 250
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : crct10dif
driver       : crct10dif-arm64-ce
module       : kernel
priority     : 200
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : crct10dif
driver       : crct10dif-arm64-neon
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : gcm(aes)
driver       : gcm-aes-ce
module       : kernel
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : no
blocksize    : 1
ivsize       : 12
maxauthsize  : 16
geniv        : <none>

name         : sha1
driver       : sha1-ce
module       : kernel
priority     : 200
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : ghash
driver       : ghash-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 16
digestsize   : 16

name         : crct10dif
driver       : crct10dif-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : crc32
driver       : crc32-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : crc32c
driver       : crc32c-generic
module       : kernel
priority     : 100
refcnt       : 4
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : aes
driver       : aes-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : des3_ede
driver       : des3_ede-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 24
max keysize  : 24

name         : des
driver       : des-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 8
max keysize  : 8

name         : sha1
driver       : sha1-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : md5
driver       : md5-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 16

name         : ecb(cipher_null)
driver       : ecb-cipher_null
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 0
max keysize  : 0
ivsize       : 0
chunksize    : 1
walksize     : 1

name         : digest_null
driver       : digest_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 0

name         : compress_null
driver       : compress_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : compression

name         : cipher_null
driver       : cipher_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 1
min keysize  : 0
max keysize  : 0

Still, support is fairly limited but at least AES-CTR and AES-GCM should work...

I've created an image with the kmod-cryptodev and libopenssl-devcrypto packages, and edited the /etc/ssl/openssl.cnf file according to the Wiki. (I should add that /etc/ssl/openssl.cnf refers to /etc/ssl/engines.cnf.d/devcrypto.cnf, I've tried both the wiki's guidance and this new file.)

But whatever I enter, openssl engine -pre DUMP_INFO devcrypto gives the following output:

(devcrypto) /dev/crypto engine
Information about ciphers supported by the /dev/crypto engine:
Cipher DES-CBC, NID=31, /dev/crypto info: id=1, CIOCGSESSION (session open call) failed
Cipher DES-EDE3-CBC, NID=44, /dev/crypto info: id=2, CIOCGSESSION (session open call) failed
Cipher BF-CBC, NID=91, /dev/crypto info: id=3, CIOCGSESSION (session open call) failed
Cipher CAST5-CBC, NID=108, /dev/crypto info: id=4, CIOCGSESSION (session open call) failed
Cipher AES-128-CBC, NID=419, /dev/crypto info: id=11, CIOCGSESSION (session open call) failed
Cipher AES-192-CBC, NID=423, /dev/crypto info: id=11, CIOCGSESSION (session open call) failed
Cipher AES-256-CBC, NID=427, /dev/crypto info: id=11, CIOCGSESSION (session open call) failed
Cipher RC4, NID=5, /dev/crypto info: id=12, CIOCGSESSION (session open call) failed
Cipher AES-128-CTR, NID=904, /dev/crypto info: id=21, driver=ctr(aes-ce) (software)
Cipher AES-192-CTR, NID=905, /dev/crypto info: id=21, driver=ctr(aes-ce) (software)
Cipher AES-256-CTR, NID=906, /dev/crypto info: id=21, driver=ctr(aes-ce) (software)
Cipher AES-128-ECB, NID=418, /dev/crypto info: id=23, CIOCGSESSION (session open call) failed
Cipher AES-192-ECB, NID=422, /dev/crypto info: id=23, CIOCGSESSION (session open call) failed
Cipher AES-256-ECB, NID=426, /dev/crypto info: id=23, CIOCGSESSION (session open call) failed

Information about digests supported by the /dev/crypto engine:
Digest MD5, NID=4, /dev/crypto info: id=13, driver=md5-generic (software), CIOCCPHASH capable
Digest SHA1, NID=64, /dev/crypto info: id=14, driver=sha1-ce (software), CIOCCPHASH capable
Digest RIPEMD160, NID=117, /dev/crypto info: id=102, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA224, NID=675, /dev/crypto info: id=103, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA256, NID=672, /dev/crypto info: id=104, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA384, NID=673, /dev/crypto info: id=105, driver=unknown. CIOCGSESSION (session open) failed
Digest SHA512, NID=674, /dev/crypto info: id=106, driver=unknown. CIOCGSESSION (session open) failed

[Success]: DUMP_INFO

So OpenSSL can't use any hardware-accelerated cipher... Also, any cipher I used in openssl speed gives equal results whether I enable acceleration or not.

I'm not sure if I'm misconfiguring anything, or if the armv8-CE crypto algorithms just don't match with what OpenSSL is able to use?

I don't think CONFIG_PACKAGE_libopenssl-devcrypto is enabled by default, if you're running official builds. I enabled it manually on x86/64 but e.g. my mvebu/cortexa72 builds don't have it enabled yet.

@cotequeiroz tagging you since you are the expert ahahah

@Goossens there is a pending pr for wolfssl package that enables support for this thing but there are some blocking changes that we need to handle first

But isn't that achieved by installing the kmod-cryptodev and libopenssl-devcrypto packages and editing the /etc/ssl/openssl.cnf file?

ARMv8 CE is a set of instructions, which can also be used in userspace. By default OpenSSL already uses it.

1 Like

OpenSSL will already use CE extensions if ARMASM is enabled which it is by default and OpenSSL has runtime detection, so there will be no speed improvements.
WolfSSL will benefit drastically once this gets merged though:

4 Likes

Sorry, I'm late to the party, but @LGA1150 and @robimarko are correct. The CE crypto support that was added to the targets are for the kernel (primarily by mac80211, but also ipsec, storage encryption), and should not be used directly by openssl or wolfssl.

Armv8 CE is akin to Intel's AES-NI instructions; you just need to use it in assembler.

The cryptodev engine output posted by @Goossens shows it is working as expected. Notice the presence of aes-ctr and sha1 CE drivers. They are not used because both are implemented in software (CPU instructions), which are more efficient if used directly by openssl.

OpenSSL will detect it by issuing an AES instruction and watching for an illegal instruction exception. If the CPU is not capable (bcm27xx, for example), it will raise the illegal instruction exception, and openssl will fallback to using the regular armv8 assembler routine.

WolfSSL, otoh, will not do this; you will have to build either a CE-asm version, or the regular C version. Because not all targets support CE, wolfssl will have be moved from the shared openwrt_base feed, to the target-specific openwrt_core feed. This move has to be carefully considered. openwrt_base packages are constantly being rebuilt, so changes are propagated faster. openwrt_core is constantly rebuilt for master, but for a stable branch, it is built only when a new minor version of OpenWRT is released. In other words, opkg update libwoflssl will not pick up an intermediate wolfssl update.

TLDR: kernel drivers may be backported to stable versions; wolfssl support may be more complicated.

3 Likes

Ok so it should be ""safe"" to merge wolfssl in master but we need to understand the correct approach for the relase branch

Well, it either gets merged into 22.03 now before RC4 or it waits the next release.

1 Like
  • master: We can merge it to master now. It may be worth checking if anything needs to be done/triggered/rebuilt to minimize the delay between the package being deleted in base and being added to core. I'm not familiar with the building infrastructure to assert if/what's needed exactly.

  • release: The commit should be cherry-picked right at release time. If anything goes wrong, it will be a very short-lived release, and we need to revert it and tag another one. I would not recommend adding this to 21.02, but it could make it into 22.03.0-rc4, as adding a -rc5 right away would not be as troublesome.

I will post this to the PR, where the discussion is more visible.

3 Likes

For master there should be nothing to do, buildbots will build it for all targets.
It will take couple of days as there are lots of subtargets

1 Like

@cotequeiroz @robimarko FWIW, I've been running 22.03 HEAD with these patches on the RB5009UG for a while, so you can consider the mvebu/cortexa72 patches tested on 22.03.

Run-tested insofar as: no breakage and no weirdness popping up :innocent:.

1 Like

FWIW, today's snapshot build gives a different output for cat /proc/crypto:

name         : aes
driver       : aes-arm64
module       : kernel
priority     : 200
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : ccm(aes)
driver       : ccm-aes-ce
module       : kernel
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : no
blocksize    : 1
ivsize       : 16
maxauthsize  : 16
geniv        : <none>

name         : aes
driver       : aes-ce
module       : kernel
priority     : 250
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : crct10dif
driver       : crct10dif-arm64-ce
module       : kernel
priority     : 200
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : crct10dif
driver       : crct10dif-arm64-neon
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : gcm(aes)
driver       : gcm-aes-ce
module       : kernel
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : no
blocksize    : 1
ivsize       : 12
maxauthsize  : 16
geniv        : <none>

name         : sha1
driver       : sha1-ce
module       : kernel
priority     : 200
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : ghash
driver       : ghash-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 16
digestsize   : 16

name         : crct10dif
driver       : crct10dif-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 2

name         : crc32
driver       : crc32-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : crc32c
driver       : crc32c-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : aes
driver       : aes-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : des3_ede
driver       : des3_ede-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 24
max keysize  : 24

name         : des
driver       : des-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 8
max keysize  : 8

name         : sha1
driver       : sha1-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : md5
driver       : md5-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 16

name         : ecb(cipher_null)
driver       : ecb-cipher_null
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : skcipher
async        : no
blocksize    : 1
min keysize  : 0
max keysize  : 0
ivsize       : 0
chunksize    : 1
walksize     : 1

name         : digest_null
driver       : digest_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 0

name         : compress_null
driver       : compress_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : compression

name         : cipher_null
driver       : cipher_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 1
min keysize  : 0
max keysize  : 0

But I can't seem to run the WolfSSL benchmark, which should have been merged yesterday?

The non-shared packages have all been built, but the shared packages for aarch64_cortex-a53 have not. It should be finished in an hour or so. Then, you'll be able to run the benchmark.

2 Likes

You should be able to run it now.

1 Like

Unfortunately, the build now fails:

Collected errors:
 * pkg_hash_check_unresolved: cannot find dependency libwolfssl5.3.0.ee39414e for libustream-wolfssl20201210
 * pkg_hash_fetch_best_installation_candidate: Packages for libustream-wolfssl found, but incompatible with the architectures configured
 * opkg_install_cmd: Cannot install package libustream-wolfssl.
 * pkg_hash_check_unresolved: cannot find dependency libwolfssl5.3.0.ee39414e for px5g-wolfssl
 * pkg_hash_fetch_best_installation_candidate: Packages for px5g-wolfssl found, but incompatible with the architectures configured
 * satisfy_dependencies_for: Cannot satisfy the following dependencies for luci-ssl:
 * 	libwolfssl5.3.0.ee39414e

good evening I have a mikrotik rb5009, I'm not.sure.to understand but it could already be integrated by openwrt 22.03?? thank you

You need to cherry-pick both patches from master into your tree:

$ git cherry-pick -x 39b6af114747fbee06cf6fab3a76d7037b53a4cc
$ git cherry-pick -x 06bb5ac1f2b62c3e10f24d7096e86f6368aaf41d

First update your local master tree, then switch back to your own working tree and run the commands above.

1 Like

It appears that the phase2 bots are picking up a different (previous?) version of libwolfssl.so, not the one used to generate the image. Are they going to fix themselves when they run another build, or is some intervention necessary? There's a second round of arc_archs being built that may give me a clue.

1 Like

Where do you track the progress? I can only see buildbots for snapshot builds?