Which WiFi routers have hardware AES encryption support?

The biggest problem I see with those (e.g. CESA or NSS on QCA platforms) is not necessarily the performance (althout it would be for the NSS case), but the compatibility - these hardware implementations remain exotic and need special support for openssl crypto engines. This is fine if available, but easily breaks - especially once your chosen platform becomes less common. x86's AES-NI has the advantage of being too big to ignore, making support for it ubiquituous ((non-mandatory, hellp RPi…) AES for ARMv8 might become common enough in the future as well, CESA/ NSS probably not).

That is a good point, which I had not researched in detail. To the best of your knowledge, what is the current state of support in openssl in OpenWrt for a) Marvell CESA, b) Qualcomm NSS and c) AES for ARMv8?

I have not looked into NSS a lot, beyond the fact that it is used by IPQ806x to offload packet inspection/switching to it. Is QCA NSS a general purpose crypto engine, similar to Marvell's CESA, that can be used by user-space applications like openssl for hardware acceleration of AES? If so, can you elaborate a bit on your comment above on why you see NSS having a performance issue compared to CESA?

I think to remember that it's technically available, but non-default and neither quite trivial to set up. It won't particularly help that mvebu is effectively dead as a consumer routing platform due to the state of mwlwifi (ARMADA 3070/ 7040 and 8040 are mostly wired-only, high priced devices).

AFAIK that's the current state, the NSS cores can do this - but re-writing it in a away that OpenSSL can really make use of it at full speed (and faster than in software) will be difficult, particularly for ipq806x (apparently easier for ipq807x), and non-proprietary (by this I mean cooperating with upstream OpenSSL, without needing a ton of out-of-tree patches) cryptoengine drivers for OpenSSL don't exist yet.

I don't know, but this is probably most likely to see 'usable' support.

Ergo, the only target where this will (for the most part) "just work™", are x86_64 and potentially ARMv8 (except for the RPi4, because Broadcom (and maybe some other chip makers too) didn't license AES acceleration).

4 Likes

Yes, I recently found that out about the RPi4. A real pity.

So based on this analysis, do you know of any ARMv8 WiFi routers, which are currently supported, or in the pipeline of being supported, by OpenWrt?

Anybody on this forum who can share, or point me in the right direction, of what is needed to setup openssl in OpenWrt to utilize the CESA crypto accelerators in Marvell based routers?

Alternately, anybody who has the Linksys WRT series routers, who has figured out how to accelerate OpenVPN using the CESA crypto accelerators, can you guide me about it please? Thanks.

Do not(!) ask me about AES acceleration for mt7622bv though, while I'm interested in the device, I don't know (and AES acceleration is not a deciding factor for me). Support for this device is still very new, so there are still some pending issues, but those seem to be rather straight forward.

--
In terms of CPU performance ipq8074 should be faster, but no ipq807x devices are supported yet (and it doesn't seem to be within range either) - and we'll have to see how far it will go without NSS support.

1 Like

The CPU of Linksys E8450 seems to have AES feature:

root@OpenWrt:~# cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 25.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 1
BogoMIPS        : 25.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4
1 Like

Pinky promise, I won't :slight_smile:

Jokes aside, thanks for bringing this device and the mt7622bv cpu to my attention. Certainly interesting device.

Another interesting observation is that there are a large number of ARMv8-based Broadcom BCM4906/8 routers in the market, which have both AES support and have WiFi chipsets BCM4365/66 which are supported by brcmfmac driver. In theory, they should all be supportable by OpenWrt. In practice, I do not see a single one currently supported by OpenWrt! Am I just missing them in the TOH?

If not, do you know what is the potential issue in their lack of support? The usual reason for devices not being supported is that developers are not buying it, or they are expensive, or they are niche, or they are not available, or they have no WiFi driver support, etc.

But in this case, these type of routers are available from almost all the major vendors e.g. Linksys (EA9300, EA9350, EA9400, EA9500, etc), TP-Link (A20, C2300, C4000, C5400, etc), Asus (AC86U, AC2900, AC5300, etc), Netgear(R7900P, R7960P, R8000P, etc), etc. across multiple price points, in multiple markets. They appear to all have upstream kernel support for the ARMv8 CPU as well as upstream WiFi driver support through brcmfmac driver.

But still zero such devices in TOH of OpenWrt. Are there any technical issues which are constraining support for Broadcom-based ARMv8 based WiFi routers released from 2016 onwards, which may explain this pattern, while on the other hand a Mediatek-based ARMv8 based WiFi router like the E8450/RT3200 that you have linked above, is released in 2020/21 and is chugging along quite well for OpenWrt support?

What is it that I am perhaps missing, which may explain this dichotomy?

Nice! Thanks for sharing, @ jiegec.

Since you are one of the few out here who has this router with OpenWrt loaded on it, would you perhaps also be kind enough to run some benchmarks for OpenSSL, if possible with AES and without AES support, and share them? Would appreciate it.

Broadcom is… 'difficult'…

…but there is quite some work in progress (but not finished yet), https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=target/linux/bcm4908/image;hb=HEAD

1 Like

openssl from opkg install openssl-util:

root@OpenWrt:~# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 4852594 aes-256 cbc's in 2.97s
Doing aes-256 cbc for 3s on 64 size blocks: 1306710 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 333596 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 83857 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 10495 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 5249 aes-256 cbc's in 3.00s
OpenSSL 1.1.1j  16 Feb 2021
built on: Tue Mar 16 11:27:55 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr)
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc      26141.92k    27969.71k    28466.86k    28718.92k    28658.35k    28666.54k

Manually compiled from another arm64 machine (w/ VPAES_ASM):

root@OpenWrt:~# ./openssl-static speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 6532145 aes-256 cbc's in 2.98s
Doing aes-256 cbc for 3s on 64 size blocks: 1740426 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 445092 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 111835 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 8192 size blocks: 14007 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16384 size blocks: 7000 aes-256 cbc's in 2.99s
OpenSSL 1.1.1j  16 Feb 2021
built on: Thu Mar 18 08:16:31 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256 cbc      35071.92k    37253.27k    37981.18k    38173.01k    38248.45k    38357.19k

That's less than 2% throughput of my Apple M1 MacBookAir, by the way.

Thanks for the prompt feedback.

I am not an expert, however the results look positively anemic. :frowning: Not sure if AES instruction set is being used.

How does one verify that AES instruction set is indeed being used by OpenSSL? Would 'openssl engine -t -c' reveal it, as shown in the OpenWrt documentation for crypto engines?

Just for comparison (Xiaomi Mi AIoT Router AX3600, ipq8071a, 4*1.38GHz, cortex a53/ ARMv8, with the rather unusable and unoptimized OEM firmware):

# cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 1
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 2
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 3
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4
# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq 
1017600

# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 
1382400
# openssl engine -t -c
(dynamic) Dynamic engine loading support
     [ unavailable ]
# openssl speed aes-256-cbc
Doing aes-256 cbc for 3s on 16 size blocks: 4630455 aes-256 cbc's in 2.86s
Doing aes-256 cbc for 3s on 64 size blocks: 1313578 aes-256 cbc's in 2.73s
Doing aes-256 cbc for 3s on 256 size blocks: 332944 aes-256 cbc's in 2.59s
Doing aes-256 cbc for 3s on 1024 size blocks: 83464 aes-256 cbc's in 2.70s
Doing aes-256 cbc for 3s on 8192 size blocks: 10427 aes-256 cbc's in 2.75s
OpenSSL 1.0.2q  20 Nov 2018
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) blowfish(ptr) 
compiler: aarch64-openwrt-linux-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/jenkins/romdaily_new_openwrt/system/staging_dir/target-aarch64-openwrt-linux_musl/usr/include -I/home/jenkins/romdaily_new_openwrt/system/staging_dir/target-aarch64-openwrt-linux_musl/include -I/home/jenkins/Xiaoqiangtoolchain/toolchain/external_toolchain/toolchain-aarch64_cortex-a53_gcc-5.5.0_musl//usr/include -I/home/jenkins/Xiaoqiangtoolchain/toolchain/external_toolchain/toolchain-aarch64_cortex-a53_gcc-5.5.0_musl//include -specs=/home/jenkins/romdaily_new_openwrt/system/include/hardened-ld-pie.specs -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DMIWIFI_FEATURE -DHAVE_CRYPTODEV -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -march=armv8-a -mcpu=cortex-a53+crypto -fno-caller-saves  -Wformat -fpic -fstack-protector -D_FORTIFY_SOURCE=2 -Wl,-z,now -Wl,-z,relro -fpic -I/home/jenkins/romdaily_new_openwrt/system/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256 cbc      25904.64k    30794.50k    32908.75k    31654.49k    31061.09k

ipq8074a would be clocked at 4*2.2 GHz, cortex a53/ ARMv8

Here are the results from IPQ8065 (1.7 GHz, 2 cores), with no NSS support and no AES, since it is 32-bit ARM-v7A

openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 6794589 aes-256-cbc's in 2.95s
Doing aes-256-cbc for 3s on 64 size blocks: 2381327 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 256 size blocks: 634833 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 161890 aes-256-cbc's in 2.99s
Doing aes-256-cbc for 3s on 8192 size blocks: 20269 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 16384 size blocks: 10131 aes-256-cbc's in 3.00s
OpenSSL 1.1.1i  8 Dec 2020
built on: Fri Jan 22 23:53:44 2021 UTC
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: ccache_cc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_PREFER_CHACHA_OVER_GCM -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      36852.01k    50971.55k    54719.61k    55443.26k    55719.34k    55328.77k

cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
384000

cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
1725000

cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 0 (v7l)
BogoMIPS        : 12.50
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32
CPU implementer : 0x51
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0x04d
CPU revision    : 0

processor       : 1
model name      : ARMv7 Processor rev 0 (v7l)
BogoMIPS        : 26.04
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32
CPU implementer : 0x51
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0x04d
CPU revision    : 0



Based on above, I am not confident that the previous benchmarks from E8450 and Xiaomi AX3200, are using AES.

Yes, I am recommending that you go for something with AES-NI. That would be either x86_64 or ARMv8. Because the penalty for shunting the data over the bus to the crypto silicon makes it perform worse for small, synchronous operations than AES-NI.

Correct. It's also a pita to integrate - it required a lot of work to port the code, including a large number of patches, most of it to kernel code, along with crypto drivers, contiguous memory drivers and others, as well as a port of a patched openssl version designed to work with the hardware.

As @slh pointed out, actually using it is also non-trivial and the configuration of the hardware itself, while complicated, is the least of the issues.

To use Intel QuickAssist requires a patched asynchronous version of openssl, which was also a gigantic pita to compile and get working (for some reason Intel likes to write software designed for embedded systems that simply cannot be cross-compiled; pretty bizarre and yes, I pointed this out to the Intel folk responsible for maintaining the software). Using QuickAssist in nginx requires significant patches to nginx as well. It's not an "out of the box" experience by any means

If you're curious to look at what it takes to get hardware like this working, the code is here

For typical Openwrt synchronous workloads on smallish buffers (something like Openvpn), performance using the crypto hardware on AES-CBC was about 70% - 80% of the performance of AES-NI. For larger buffers, the performance started to approach parity. For multiple (36+ threads) asynchronous operations, the speeds was about 10x as fast as you'd get using AES-NI.

It would be real hassle to give you benchmarks, as I compiled the AES acceleration out of the Intel QuickAssist drivers. I'd need to recompile a half dozen kernel modules to be able to get you a benchmark.

The performance on RSA is very good, particularly signing operations, which performs much better than software regardless of whether it's sychronous or not.

On core AES-NI definitely, no doubt in my mind.

4 Likes

@slh and @jiegec, can you run "cat /proc/crypto" and share what is the priority you get under aes section?

For the benchmark I posted above, the priority is 100, which indicates no AES and no crypto engines, as below.

cat /proc/crypto

name         : aes
driver       : aes-generic
module       : kernel
**priority     : 100**
refcnt       : 7
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

If AES instruction set was being used on your router, one would expect the priority to be > 100.

ax3600/ ipq8071a:

# cat /proc/crypto 
name         : hmac(sha512)
driver       : nss-hmac-sha512
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 128
digestsize   : 64

name         : hmac(sha384)
driver       : nss-hmac-sha384
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 128
digestsize   : 48

name         : hmac(sha256)
driver       : nss-hmac-sha256
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 32

name         : hmac(sha1)
driver       : nss-hmac-sha1
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 20

name         : hmac(md5)
driver       : nss-hmac-md5
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 16

name         : sha512
driver       : nss-sha512
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 128
digestsize   : 64

name         : sha384
driver       : nss-sha384
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 128
digestsize   : 48

name         : sha256
driver       : nss-sha256
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 32

name         : sha224
driver       : nss-sha224
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 28

name         : sha1
driver       : nss-sha1
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 20

name         : md5
driver       : nss-md5
module       : qca_nss_cfi_cryptoapi
priority     : 1000
refcnt       : 1
selftest     : passed
internal     : no
type         : ahash
async        : yes
blocksize    : 64
digestsize   : 16

name         : gcm(aes)
driver       : nss-gcm
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 12
maxauthsize  : 16
geniv        : <none>

name         : seqiv(rfc4106(gcm(aes)))
driver       : nss-rfc4106-gcm
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 8
maxauthsize  : 16
geniv        : <none>

name         : rfc4106(gcm(aes))
driver       : nss-rfc4106-gcm
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 8
maxauthsize  : 16
geniv        : <none>

name         : authenc(hmac(sha256),cbc(des3_ede))
driver       : nss-hmac-sha256-cbc-3des
module       : qca_nss_cfi_cryptoapi
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 8
ivsize       : 8
maxauthsize  : 32
geniv        : <none>

name         : authenc(hmac(sha1),cbc(des3_ede))
driver       : nss-hmac-sha1-cbc-3des
module       : qca_nss_cfi_cryptoapi
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 8
ivsize       : 8
maxauthsize  : 20
geniv        : <none>

name         : authenc(hmac(sha256),cbc(aes))
driver       : nss-hmac-sha256-cbc-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 16
maxauthsize  : 32
geniv        : <none>

name         : authenc(hmac(sha1),cbc(aes))
driver       : nss-hmac-sha1-cbc-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 16
maxauthsize  : 20
geniv        : <none>

name         : echainiv(authenc(hmac(sha256),cbc(des3_ede)))
driver       : nss-hmac-sha256-cbc-3des
module       : qca_nss_cfi_cryptoapi
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 8
ivsize       : 8
maxauthsize  : 32
geniv        : <none>

name         : echainiv(authenc(hmac(sha1),cbc(des3_ede)))
driver       : nss-hmac-sha1-cbc-3des
module       : qca_nss_cfi_cryptoapi
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 8
ivsize       : 8
maxauthsize  : 20
geniv        : <none>

name         : echainiv(authenc(hmac(md5),cbc(des3_ede)))
driver       : nss-hmac-md5-cbc-3des
module       : qca_nss_cfi_cryptoapi
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 8
ivsize       : 8
maxauthsize  : 16
geniv        : <none>

name         : seqiv(authenc(hmac(sha256),rfc3686(ctr(aes))))
driver       : nss-hmac-sha256-rfc3686-ctr-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 8
maxauthsize  : 32
geniv        : <none>

name         : echainiv(authenc(hmac(sha256),cbc(aes)))
driver       : nss-hmac-sha256-cbc-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 16
maxauthsize  : 32
geniv        : <none>

name         : seqiv(authenc(hmac(sha1),rfc3686(ctr(aes))))
driver       : nss-hmac-sha1-rfc3686-ctr-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 8
maxauthsize  : 20
geniv        : <none>

name         : seqiv(authenc(hmac(md5),rfc3686(ctr(aes))))
driver       : nss-hmac-md5-rfc3686-ctr-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 8
maxauthsize  : 16
geniv        : <none>

name         : echainiv(authenc(hmac(sha1),cbc(aes)))
driver       : nss-hmac-sha1-cbc-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 16
maxauthsize  : 20
geniv        : <none>

name         : echainiv(authenc(hmac(md5),cbc(aes)))
driver       : nss-hmac-md5-cbc-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : aead
async        : yes
blocksize    : 16
ivsize       : 16
maxauthsize  : 16
geniv        : <none>

name         : cbc(des3_ede)
driver       : nss-cbc-des-ede
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : ablkcipher
async        : yes
blocksize    : 8
min keysize  : 24
max keysize  : 24
ivsize       : 8
geniv        : <default>

name         : ecb(aes)
driver       : nss-ecb-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : ablkcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 0
geniv        : <default>

name         : rfc3686(ctr(aes))
driver       : nss-rfc3686-ctr-aes
module       : qca_nss_cfi_cryptoapi
priority     : 30000
refcnt       : 1
selftest     : passed
internal     : no
type         : ablkcipher
async        : yes
blocksize    : 16
min keysize  : 20
max keysize  : 36
ivsize       : 8
geniv        : seqiv

name         : cbc(aes)
driver       : nss-cbc-aes
module       : qca_nss_cfi_cryptoapi
priority     : 10000
refcnt       : 1
selftest     : passed
internal     : no
type         : ablkcipher
async        : yes
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
geniv        : <default>

name         : hmac(sha512)
driver       : hmac(sha512-generic)
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 128
digestsize   : 64

name         : hmac(sha384)
driver       : hmac(sha384-generic)
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 128
digestsize   : 48

name         : hmac(sha256)
driver       : hmac(sha256-generic)
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 32

name         : cbc(cipher_null)
driver       : cbc(cipher_null-generic)
module       : cbc
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : blkcipher
blocksize    : 1
min keysize  : 0
max keysize  : 0
ivsize       : 1
geniv        : <default>

name         : cbc(aes)
driver       : cbc(aes-generic)
module       : cbc
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : blkcipher
blocksize    : 16
min keysize  : 16
max keysize  : 32
ivsize       : 16
geniv        : <default>

name         : hmac(sha1)
driver       : hmac(sha1-generic)
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : sha1
driver       : sha1-generic
module       : sha1_generic
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 20

name         : hmac(md5)
driver       : hmac(md5-generic)
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 16

name         : cbc(des3_ede)
driver       : cbc(des3_ede-generic)
module       : cbc
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : blkcipher
blocksize    : 8
min keysize  : 24
max keysize  : 24
ivsize       : 8
geniv        : <default>

name         : cbc(des)
driver       : cbc(des-generic)
module       : cbc
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : blkcipher
blocksize    : 8
min keysize  : 8
max keysize  : 8
ivsize       : 8
geniv        : <default>

name         : md5
driver       : md5-generic
module       : md5
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 16

name         : des3_ede
driver       : des3_ede-generic
module       : des_generic
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 24
max keysize  : 24

name         : des
driver       : des-generic
module       : des_generic
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 8
min keysize  : 8
max keysize  : 8

name         : ghash
driver       : ghash-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 16
digestsize   : 16

name         : jitterentropy_rng
driver       : jitterentropy_rng
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_nopr_hmac_sha256
module       : kernel
priority     : 207
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_nopr_hmac_sha512
module       : kernel
priority     : 206
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_nopr_hmac_sha384
module       : kernel
priority     : 205
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_nopr_hmac_sha1
module       : kernel
priority     : 204
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_pr_hmac_sha256
module       : kernel
priority     : 203
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_pr_hmac_sha512
module       : kernel
priority     : 202
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_pr_hmac_sha384
module       : kernel
priority     : 201
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : stdrng
driver       : drbg_pr_hmac_sha1
module       : kernel
priority     : 200
refcnt       : 1
selftest     : passed
internal     : no
type         : rng
seedsize     : 0

name         : xz
driver       : xz-generic
module       : kernel
priority     : 0
refcnt       : 2
selftest     : passed
internal     : no
type         : compression

name         : lzo
driver       : lzo-generic
module       : kernel
priority     : 0
refcnt       : 2
selftest     : passed
internal     : no
type         : compression

name         : crc32c
driver       : crc32c-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 4

name         : deflate
driver       : deflate-generic
module       : kernel
priority     : 0
refcnt       : 2
selftest     : passed
internal     : no
type         : compression

name         : ecb(arc4)
driver       : ecb(arc4)-generic
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : blkcipher
blocksize    : 1
min keysize  : 1
max keysize  : 256
ivsize       : 0
geniv        : <default>

name         : arc4
driver       : arc4-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 1
min keysize  : 1
max keysize  : 256

name         : aes
driver       : aes-generic
module       : kernel
priority     : 100
refcnt       : 2
selftest     : passed
internal     : no
type         : cipher
blocksize    : 16
min keysize  : 16
max keysize  : 32

name         : sha384
driver       : sha384-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 128
digestsize   : 48

name         : sha512
driver       : sha512-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 128
digestsize   : 64

name         : sha224
driver       : sha224-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 28

name         : sha256
driver       : sha256-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 64
digestsize   : 32

name         : digest_null
driver       : digest_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : shash
blocksize    : 1
digestsize   : 0

name         : compress_null
driver       : compress_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : compression

name         : ecb(cipher_null)
driver       : ecb-cipher_null
module       : kernel
priority     : 100
refcnt       : 1
selftest     : passed
internal     : no
type         : blkcipher
blocksize    : 1
min keysize  : 0
max keysize  : 0
ivsize       : 0
geniv        : <default>

name         : cipher_null
driver       : cipher_null-generic
module       : kernel
priority     : 0
refcnt       : 1
selftest     : passed
internal     : no
type         : cipher
blocksize    : 1
min keysize  : 0
max keysize  : 0

There are some benchmarks in this thread

Those benchmarks are for the C2000 SoCs. I updated the code for the C3000 SoCs, which perform better than the benchmarks in that thread

Thanks a ton for sharing your detailed insights as well as the rough benchmarks above! You have certainly enlightened me today. Very kind of you. Appreciate it.

Of course, please ignore. I was just hoping to get whatever you had off the top of your mind, which you have already done above.

Thanks for confirming it.

Do you think you might have any insight on how to measure these benchmarks of on-core AES-NI performance impact on openssl? If you'll see in the thread above, I am struggling a bit to do so, since the benchmarks with AES-NI support seem to be poorer than those without AES-NI. It feels to me that either we have a measurement issue, or AES is somehow not being invoked. It is not clear how to figure it out.

openssl -elapsed -evp aes-128-cbc-hmac-sha1 
Or with AES-NI enabled
openssl speed -elapsed -evp aes-128-cbc

With AES-NI disabled
OPENSSL_ia32cap=”~0x200000200000000″ openssl speed -elapsed -evp aes-128-cbc