Hardware crypto for Mediatek missing?

I am using OpenVPN for a site-to-site solution. I noticed that there was a Mediatek AES driver using hardware and offloading the CPU. I would like to use this on my MT7628 soc, but it's not available.

Is there any reason why this is left out? Or why it was never ported up from the Linux 3 kernel to version 4?

1 Like

if you think you can do it - here is the code:



I've patched a 15.01 but there was a very small performance improvement - 10 -15% for openvpn

Thanks. I will have a look how far I can get with this. A little surprised that you only got 10-15% performance increase. Would expec more. Either way, it should off load the CPU to do other tasks in the meantime.

So I managed to get the driver to compile and load :slight_smile:

Doing preliminary test with "insmod tcrypt mode=200" it seem like it's working properly.

Now I can't seem to find where to flip the switch for the cryptodev. This I build and load, but using "OpenSSL engine cryptodev" it's missing the libopenssl.so
It seems it's not looking linked cause I need to define somewhere crypto hardware. Of course this MediaTek Engine is nowhere to be found in any crypt/engine confit I could find.

Since you build it for OpenWRT in the past, maybe you can point me in the right direction which file(s) to patch.

And...I found other references that OpenVPN uses very small blocks... in which case your 10-15% might be right. Still I came this far..now let's see how OpenSSL really performs.

Could this be used on mt7620a too?

mt7620a doesn't have a crypto accelerator

here are my test patches for CC 15.01
https://www.dropbox.com/s/92xygjuz4i306ry/0999-add-AES-Engine-driver-for-MT7628-fully-code-refactor.patch?dl=1
https://www.dropbox.com/s/6lj7dcshw3c4gww/0999-add-support-hardware-crypto-engine.patch?dl=1

Thanks. Once I have it working, I will look into the crypto engine for the MT7621. That is advertised as engine for IPSEC, so I think only AES too.

actually the second one is my attempt to work on mt7621 ipsec crypto so for mt7628 only the first one is relevant

I noticed. Thanks. Later today I should be able to try. But maybe it's like you said, little point to use it for the application I am thinking of (VPN), cause for small blocks the speed difference seems low and the additional (system) calls will probably make it even less useful.

At least I learned how to port/patch to Lede. Looking into a bigger project: the "official" RA-ETH to enable hardware NAT. Again the idea is to offload the CPU, not because software NAT is not able to keep up with my 200Mbps internet line. It should also increase speeds between let's say a NAS and a desktop.

Looks like my way to add this driver is not working cause the OpenSSL gets build before my driver. So my driver compiles perfectly but OpenSSL is missing the flags to include the hardware cryptodev. So I guess putting all files into one big patch file is the way to do it.

Patches are applied in the order of their sorted file names. That's why the prefixes contain numbers to force certain patches to be applied before others. Maybe you can play around with that to get them to apply in the correct order? @drbrains

Initially I tried making it like a new kernel module: putting all my files in a new folder under Packages/Kernel/my kmod
The openSSL (libs) never see my additional config settings to build with hardware.
For some reason, I have extra config using GnuTLS library, but the libOpenSSL doesn't have in the standard "make menuconfig"

AARGH!! I found it: There is a patch in the OpenSSL lib to disable the HW-Engine. I found references to this patch from a long time ago. Are the reasons to patch this still valid, cause then it makes no sense to have a hardware-eninge??

OR

should I change to a different library, use OCF instead of cryptodev, or something??

Does the commit that added the patch mention anything about why the HW-Engine was disabled?

Doing a simple benchmark on the router itself:
root@LEDE:~# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 954011 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 267999 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 69591 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 17557 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 1980 aes-256-cbc's in 3.01s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 5088.06k 5717.31k 5938.43k 5992.79k 5388.76k
root@LEDE:~# insmod cryptodev
root@LEDE:~# openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 314592 aes-256-cbc's in 2.98s
Doing aes-256-cbc for 3s on 64 size blocks: 326500 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 290500 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 153734 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 38931 aes-256-cbc's in 2.97s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1689.08k 7035.69k 25039.73k 53004.58k 107381.40k
root@LEDE:~#

Anyone knows a good "real-life" benchmark / test??

Next problem with this driver. It's very old :slight_smile: which means it still uses IRQF_DISABLED. Since kernel 3 depreciated, since 4.1 removed. We are on 4.4 now (looking into 4.9)

This part I can replace, but I run into problems when I want to use interrupts. This should help to get CPU usage down, but when I enable interrupts it generates a lot of errors.

@maurer, I noticed you were doing this for the MT7621 on the mqmaker forum. It seemed like there the interrupt problem was solved, but I couldn't find how. It just said the "board" was now fully supported. Does that mean I need to look into the DTS(I) files for full interrupt support?

unfortunately the guy on mqmaker forum - stas2z didn't released his source code - only releases binaries and builder files. But there is a hope :slight_smile:
the guy that made the first backports releases his code:



that's about the best chance to have mt7621_hw_ipsec enabled in lede

I'm looking at the Padavan code a lot, but even he didn't activate interrupts. The IRQF reference is still "allowed" in his kernel 3.x versions. Looking (comparing) with the Wive-NG project as well.

His way (he has to) is to modify some kernel code to intercept the IPSec packages. This was never "allowed" by the OpenWRT community. Considered a security issue. The same most likely the the engine disable patch: rumors had it, that the NSA had some backdoor in the hardware engines.

Me, I'm not that concerned about this part...don't think I qualify to spend resources on :wink: so Im just trying t get the most out of the hardware. Using IPSec and/or OpenVPN-OpenSSL to bypass Geo location problems or pass some other government firewall so I can access my favorite website.

Didn't find a good real life benchmark yet. The OpenSSL speed is not a realistic indication (looks good though)

Succes :smiley: !!

root@LEDE:~# rmmod mtk_aes
root@LEDE:~# time -v openssl speed -elapsed -evp aes-256-cbc -engine cryptodev
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 276675 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 64 size blocks: 153719 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 54902 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 13558 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 2002 aes-256-cbc's in 2.97s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1490.51k 3312.46k 4732.29k 4674.54k 5522.01k
Command being timed: "openssl speed -elapsed -evp aes-256-cbc -engine cryptodev"
User time (seconds): 0.49
System time (seconds): 14.02
Percent of CPU this job got: 90%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.96s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 10400
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 111
Voluntary context switches: 83
Involuntary context switches: 454
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
root@LEDE:~# modprobe mtk_aes b=16
root@LEDE:~# time -v openssl speed -elapsed -evp aes-256-cbc -engine cryptodev
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 116585 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 64 size blocks: 116865 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 256 size blocks: 104425 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 1024 size blocks: 95388 aes-256-cbc's in 2.97s
Doing aes-256-cbc for 3s on 8192 size blocks: 34593 aes-256-cbc's in 2.97s
OpenSSL 1.0.2k 26 Jan 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
compiler: mipsel-openwrt-linux-musl-gcc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/usr/include -I/home/richard/lede_new/source/staging_dir/target-mipsel_24kc_musl/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/usr/include -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include/fortify -I/home/richard/lede_new/source/staging_dir/toolchain-mipsel_24kc_gcc-6.3.0_musl/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mno-branch-likely -mips32r2 -mtune=24kc -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -msoft-float -iremap/home/richard/lede_new/source/build_dir/target-mipsel_24kc_musl/openssl-1.0.2k:openssl-1.0.2k -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/richard/lede_new/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DAES_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 628.07k 2518.30k 9000.94k 32887.98k 95416.11k
Command being timed: "openssl speed -elapsed -evp aes-256-cbc -engine cryptodev"
User time (seconds): 1.26
System time (seconds): 8.41
Percent of CPU this job got: 60%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.92s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 10400
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 112
Voluntary context switches: 466871
Involuntary context switches: 284
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
root@LEDE:~# ^C
root@LEDE:~# cat /proc/interrupts
CPU0
4: 1232369 MIPS 4 mt76x2e
5: 627685 MIPS 5 10100000.ethernet
6: 21 MIPS 6 mt7603e
7: 144167 MIPS 7 timer
21: 3936726 INTC 13 aes_engine
25: 9 INTC 17 esw
28: 12 INTC 20 serial
40: 0 GPIO 38 gpio-keys
ERR: 151694
root@LEDE:~#

1 Like

so you got mtk_aes working on mt7628 - good job !
how will you profit from it?