Profit ... haha. For now I'm happy with the result in porting / patching a driver from an older kernel / different project to Lede. Now trying to apply what I learned and port the proprietary drivers for Wifi and Ethernet (with hopefully HW-NAT). Priority on the Wifi since the open source MT76 still gives a lot of problems. Hopefully that will be sorted quickly, bug until then we can have options to use "less" open drivers.
As for the AES, hoping to benefit from it using OpenVPN.
Just wanted to say that that's amazing
Getting hwnat and the proprietary WiFi drivers up and running would be even more amazing!
Can't wait to test some of your stuff.
Very good job, @drbrains ! You're doing amazing work . Getting HWnat and the proprietary wifi drivers would be amazing. The open source mt76 drivers in their current state is a bit of a let down unfortunately
It is possible to enable this by default on stable LEDE builds? I have some Marvell 88F6192 (from PopoPlug Mobile) and I am interested on using for OpenVPN endpoints too. Do you know if there similar steps to get it working on this SoC? It would be great to put a step-by-step instruction somewhere and keep it for reference. Thanks.
I did a quick google on tha chip. It should have some "security engine", but I couldn't find any driver / source code. Granted, since I don't have any Marvell based devices I didn't try for a long time. There should be some Linux driver for this engine, but to port it to the Lede SDK is not a simple to describe process. But you need done source code first.
Besides: there is a patch to disable al HW Engines and I haven't figured out yet, why someone decided like that?? For myself, I'm not using it in a high value environment, so for private use only at the moment until I understand why the HW was disabled. I don't want to open security holes even I found someone else doing it (remove the patch) dating back to 2013!
Sure. Keep in mind that I still have to clean it up. For now I just made dirty edits to e.g. the included header files. Not a big problem, since this engine ONLY works with the MT7628 Crypto engine in the SOC.
Don't forget to enable hardware support in the OpenSSL lib AND remove the 150-no-engine patch from package/libs/OpenSSL/patches
I still didn't figure out why this is disabled via a patch, other then most system don't have it, so why enable it.
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 1490.51k 3312.46k 4732.29k 4674.54k 5522.01k
without HWE
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 628.07k 2518.30k 9000.94k 32887.98k 95416.11k
This is normal. For small blocks it takes more time to move the data into and out of the crypto module. Performance gain is with approximate block size of 1024 bytes and up. Why I showed the test with the "time -v" in front of the OpenSSL Speedtest was to show the amount of CPU bandwidth gained. 90% without vs 60% usage with the HW engine. Even if the speed would be the same, the "free" cpu cycles would be the benefit. With us doing more and more tasks on a router and getting faster and faster internet speeds, every cycle starts to count on limited devices.
As for real life, I am not sure how much performance is really gained using OpenVPN. I didn't get around to do proper testing. But even if it's just the 10-15% improvement as mentioned above and we win CPU cycles in the process, then why not. This resource is available on the SOC, so at least I wanted to have the option to use it.
Well...I think I will be able to have an “Alpha” release soon. (Depend how busy my normal work will be). The MT7621 should have a full implementation of the EIP-93. The chip is reporting its capabilities so some versions might be more crippled then others.
The IPSec acceleration makes the most sense cause it’s all kernel side. Not so sure how much improvement it will give. For LUKS the blocks might be big enough to make a significant difference. I don’t have experience with LUKS.
The Nossiac drivers are proprietary wifi drivers and are not using the hardware crypto. Most MT7621 systems I've seen are using the MT7602 or MT7603 in combination with MT7612 for wifi. MT7628 has wifi in the SoC and is usually combined with the MT7612 as well.
The MT7628 only has a AES-128 and AES-256 engine CBC or ECB mode. The MT7621 should have a fully functional EIP-93 engine which should include SHA1 and SHA256 plus AES-128 and AES-256.
@pivanov84, I never used LUKS; it seems I’m missing something. When running “cryptsetup benchmark” it complains about no ciphers available. Of course a “cat /proc/crypto” will show a list of installed ciphers.
@drbrains - I haven't seen such thing before, will try to dig something about if i get a free minute. Is there any improvement when doing openssl speed ?
@pivanov84, So I got the cryptsetup benchmark to run on a MT7628 target (only one I have access to at the moment). Without the HW encryption, I get around 7.5 MiB/s for aes-cbc-128. However using the engine, which was working fine with bigger blocks using OpenSSL, the benchmark shows only around 0.2 MiB/s !!!
My best guess is that LUKS is user space and only uses 512 Bytes sectors, so I assume this is the block size. On smaller blocks it doesn't make to much sense to use hardware. This is why nobody really bothered in the past I think. Those who need the performance will eventually switch to better hardware (maybe even x86) and get performance increases across the board.
I will still continue on the MT7621, even if it is just as a sort of prove of concept. Mainly because I am learning a lot from this project.
EDIT: MY BAD!! I didn't disable the DEBUG feature. I was trying to add it again to see my assumption on block size. FYI cryptsetup bench is using 4096 bytes block size.
Bear in mind this is on the MT7628 and I haven't seen so many devices with USB so it will not help a lot of people, but it does show I need to keep working on the Engine for the MT7621
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 9990 iterations per second for 256-bit key
PBKDF2-sha256 9553 iterations per second for 256-bit key
PBKDF2-sha512 4595 iterations per second for 256-bit key
PBKDF2-ripemd160 8511 iterations per second for 256-bit key
PBKDF2-whirlpool 2851 iterations per second for 256-bit key
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 7.9 MiB/s 8.1 MiB/s
serpent-cbc 128b N/A N/A
twofish-cbc 128b N/A N/A
aes-cbc 256b 5.5 MiB/s 5.5 MiB/s
serpent-cbc 256b N/A N/A
twofish-cbc 256b N/A N/A
aes-xts 256b 6.8 MiB/s 6.8 MiB/s
serpent-xts 256b N/A N/A
twofish-xts 256b N/A N/A
aes-xts 512b 5.6 MiB/s 5.5 MiB/s
serpent-xts 512b N/A N/A
twofish-xts 512b N/A N/A
After installing the AES Engine:
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1 10020 iterations per second for 256-bit key
PBKDF2-sha256 9581 iterations per second for 256-bit key
PBKDF2-sha512 4708 iterations per second for 256-bit key
PBKDF2-ripemd160 9026 iterations per second for 256-bit key
PBKDF2-whirlpool 3022 iterations per second for 256-bit key
# Algorithm | Key | Encryption | Decryption
aes-cbc 128b 58.5 MiB/s 59.1 MiB/s
serpent-cbc 128b N/A N/A
twofish-cbc 128b N/A N/A
aes-cbc 256b 55.4 MiB/s 55.5 MiB/s
serpent-cbc 256b N/A N/A
twofish-cbc 256b N/A N/A
aes-xts 256b 6.8 MiB/s 6.7 MiB/s
serpent-xts 256b N/A N/A
twofish-xts 256b N/A N/A
aes-xts 512b 5.6 MiB/s 5.5 MiB/s
serpent-xts 512b N/A N/A
twofish-xts 512b N/A N/A
If the encryption level is good enough using AES-CBC this should make a nice difference even if writing/reading from a real USB device might slow things down a bit again.
EDIT: I never used LUKS so I'm not sure how the first part (iterations per second) impact real performance.
@drbrains - I believe default luks uses aes-xts / 256b but will double check when i get back to the device. Real usage usb3 hdd, copy from/to files over samba was close to 12/13mb/sec which is close to the benchmarks i got.
Anyway thanks again for your work it should be usable for opessl/openssh so it should be beneficial for lots of people (even for scp or sock5 proxy over ssh). So maybe focus on "openssl speed" and encryption may come as added bonus.