Hardware crypto for Mediatek missing?

Profit :wink: ... haha. For now I'm happy with the result in porting / patching a driver from an older kernel / different project to Lede. Now trying to apply what I learned and port the proprietary drivers for Wifi and Ethernet (with hopefully HW-NAT). Priority on the Wifi since the open source MT76 still gives a lot of problems. Hopefully that will be sorted quickly, bug until then we can have options to use "less" open drivers.

As for the AES, hoping to benefit from it using OpenVPN.

Just wanted to say that that's amazing :slight_smile:
Getting hwnat and the proprietary WiFi drivers up and running would be even more amazing!
Can't wait to test some of your stuff.

Very good job, @drbrains ! You're doing amazing work :slight_smile: . Getting HWnat and the proprietary wifi drivers would be amazing. The open source mt76 drivers in their current state is a bit of a let down unfortunately :frowning:

It is possible to enable this by default on stable LEDE builds? I have some Marvell 88F6192 (from PopoPlug Mobile) and I am interested on using for OpenVPN endpoints too. Do you know if there similar steps to get it working on this SoC? It would be great to put a step-by-step instruction somewhere and keep it for reference. Thanks.

I did a quick google on tha chip. It should have some "security engine", but I couldn't find any driver / source code. Granted, since I don't have any Marvell based devices I didn't try for a long time. There should be some Linux driver for this engine, but to port it to the Lede SDK is not a simple to describe process. But you need done source code first.

Besides: there is a patch to disable al HW Engines and I haven't figured out yet, why someone decided like that?? For myself, I'm not using it in a high value environment, so for private use only at the moment until I understand why the HW was disabled. I don't want to open security holes even I found someone else doing it (remove the patch) dating back to 2013!

@braian87b Here http://forum.doozan.com/read.php?2,26394,26504#msg-26504 is some hints about Marvell's cryptodev engine.
@drbrains Can you share your code? There is also broadcom SoCs with crypto hw I want to play with.

Sure. Keep in mind that I still have to clean it up. For now I just made dirty edits to e.g. the included header files. Not a big problem, since this engine ONLY works with the MT7628 Crypto engine in the SOC.

Don't forget to enable hardware support in the OpenSSL lib AND remove the 150-no-engine patch from package/libs/OpenSSL/patches

I still didn't figure out why this is disabled via a patch, other then most system don't have it, so why enable it.

1 Like

Strange numbers.

The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1490.51k 3312.46k 4732.29k 4674.54k 5522.01k

without HWE

The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 628.07k 2518.30k 9000.94k 32887.98k 95416.11k

with HWE.

on small blocks HW engine 2x times slow than CPU.

This is normal. For small blocks it takes more time to move the data into and out of the crypto module. Performance gain is with approximate block size of 1024 bytes and up. Why I showed the test with the "time -v" in front of the OpenSSL Speedtest was to show the amount of CPU bandwidth gained. 90% without vs 60% usage with the HW engine. Even if the speed would be the same, the "free" cpu cycles would be the benefit. With us doing more and more tasks on a router and getting faster and faster internet speeds, every cycle starts to count on limited devices.

As for real life, I am not sure how much performance is really gained using OpenVPN. I didn't get around to do proper testing. But even if it's just the 10-15% improvement as mentioned above and we win CPU cycles in the process, then why not. This resource is available on the SOC, so at least I wanted to have the option to use it.

any update on the MT7621 crypto engine, without it luks encrypted usb3 device is limited to around 12-13mb/sec

1 Like

mt7621 crypto engine works with ipsec only !

Well...I think I will be able to have an “Alpha” release soon. (Depend how busy my normal work will be). The MT7621 should have a full implementation of the EIP-93. The chip is reporting its capabilities so some versions might be more crippled then others.

The IPSec acceleration makes the most sense cause it’s all kernel side. Not so sure how much improvement it will give. For LUKS the blocks might be big enough to make a significant difference. I don’t have experience with LUKS.

Any improvement is needed. I run mt7621 with a NAS device (6 SATA ports) and performance is terrible with anything crypto related.

@drbrains - Thanks for your efforts! I don't know if the drivers from https://github.com/Nossiac/mtk-openwrt-feeds have anything to do with the crypto engine.

@neheb - yeah pretty slow :

root@DARK:~# cryptsetup benchmark
PBKDF2-sha1        35617 iterations per second for 256-bit key
PBKDF2-sha256      52851 iterations per second for 256-bit key
PBKDF2-sha512      19980 iterations per second for 256-bit key
PBKDF2-ripemd160   35234 iterations per second for 256-bit key
PBKDF2-whirlpool     N/A
#     Algorithm | Key |  Encryption |  Decryption
        aes-cbc   128b    11.0 MiB/s    11.9 MiB/s
    serpent-cbc   128b     8.7 MiB/s     9.6 MiB/s
    twofish-cbc   128b    11.6 MiB/s    13.0 MiB/s
        aes-cbc   256b     9.0 MiB/s     9.3 MiB/s
    serpent-cbc   256b     9.6 MiB/s     9.6 MiB/s
    twofish-cbc   256b    12.7 MiB/s    13.0 MiB/s
        aes-xts   256b    11.3 MiB/s    11.8 MiB/s
    serpent-xts   256b     9.3 MiB/s     9.6 MiB/s
    twofish-xts   256b    12.2 MiB/s    12.8 MiB/s
        aes-xts   512b     9.0 MiB/s     9.0 MiB/s
    serpent-xts   512b     9.6 MiB/s     9.5 MiB/s
    twofish-xts   512b    12.8 MiB/s    12.7 MiB/s

The Nossiac drivers are proprietary wifi drivers and are not using the hardware crypto. Most MT7621 systems I've seen are using the MT7602 or MT7603 in combination with MT7612 for wifi. MT7628 has wifi in the SoC and is usually combined with the MT7612 as well.

The MT7628 only has a AES-128 and AES-256 engine CBC or ECB mode. The MT7621 should have a fully functional EIP-93 engine which should include SHA1 and SHA256 plus AES-128 and AES-256.

@pivanov84, I never used LUKS; it seems I’m missing something. When running “cryptsetup benchmark” it complains about no ciphers available. Of course a “cat /proc/crypto” will show a list of installed ciphers.

Any suggestions?

@drbrains - I haven't seen such thing before, will try to dig something about if i get a free minute. Is there any improvement when doing openssl speed ?

@pivanov84, So I got the cryptsetup benchmark to run on a MT7628 target (only one I have access to at the moment). Without the HW encryption, I get around 7.5 MiB/s for aes-cbc-128. However using the engine, which was working fine with bigger blocks using OpenSSL, the benchmark shows only around 0.2 MiB/s !!!

My best guess is that LUKS is user space and only uses 512 Bytes sectors, so I assume this is the block size. On smaller blocks it doesn't make to much sense to use hardware. This is why nobody really bothered in the past I think. Those who need the performance will eventually switch to better hardware (maybe even x86) and get performance increases across the board.

I will still continue on the MT7621, even if it is just as a sort of prove of concept. Mainly because I am learning a lot from this project.

EDIT: MY BAD!! I didn't disable the DEBUG feature. I was trying to add it again to see my assumption on block size. FYI cryptsetup bench is using 4096 bytes block size.

Bear in mind this is on the MT7628 and I haven't seen so many devices with USB so it will not help a lot of people, but it does show I need to keep working on the Engine for the MT7621 :smile:

# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1         9990 iterations per second for 256-bit key
PBKDF2-sha256       9553 iterations per second for 256-bit key
PBKDF2-sha512       4595 iterations per second for 256-bit key
PBKDF2-ripemd160    8511 iterations per second for 256-bit key
PBKDF2-whirlpool    2851 iterations per second for 256-bit key
#     Algorithm | Key |  Encryption |  Decryption
        aes-cbc   128b     7.9 MiB/s     8.1 MiB/s
    serpent-cbc   128b           N/A           N/A
    twofish-cbc   128b           N/A           N/A
        aes-cbc   256b     5.5 MiB/s     5.5 MiB/s
    serpent-cbc   256b           N/A           N/A
    twofish-cbc   256b           N/A           N/A
        aes-xts   256b     6.8 MiB/s     6.8 MiB/s
    serpent-xts   256b           N/A           N/A
    twofish-xts   256b           N/A           N/A
        aes-xts   512b     5.6 MiB/s     5.5 MiB/s
    serpent-xts   512b           N/A           N/A
    twofish-xts   512b           N/A           N/A

After installing the AES Engine:

# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1        10020 iterations per second for 256-bit key
PBKDF2-sha256       9581 iterations per second for 256-bit key
PBKDF2-sha512       4708 iterations per second for 256-bit key
PBKDF2-ripemd160    9026 iterations per second for 256-bit key
PBKDF2-whirlpool    3022 iterations per second for 256-bit key
#     Algorithm | Key |  Encryption |  Decryption
        aes-cbc   128b    58.5 MiB/s    59.1 MiB/s
    serpent-cbc   128b           N/A           N/A
    twofish-cbc   128b           N/A           N/A
        aes-cbc   256b    55.4 MiB/s    55.5 MiB/s
    serpent-cbc   256b           N/A           N/A
    twofish-cbc   256b           N/A           N/A
        aes-xts   256b     6.8 MiB/s     6.7 MiB/s
    serpent-xts   256b           N/A           N/A
    twofish-xts   256b           N/A           N/A
        aes-xts   512b     5.6 MiB/s     5.5 MiB/s
    serpent-xts   512b           N/A           N/A
    twofish-xts   512b           N/A           N/A

If the encryption level is good enough using AES-CBC this should make a nice difference even if writing/reading from a real USB device might slow things down a bit again.

EDIT: I never used LUKS so I'm not sure how the first part (iterations per second) impact real performance.

2 Likes

@drbrains - I believe default luks uses aes-xts / 256b but will double check when i get back to the device. Real usage usb3 hdd, copy from/to files over samba was close to 12/13mb/sec which is close to the benchmarks i got.

Anyway thanks again for your work it should be usable for opessl/openssh so it should be beneficial for lots of people (even for scp or sock5 proxy over ssh). So maybe focus on "openssl speed" and encryption may come as added bonus.