Hardware crypto for Mediatek missing?

Ah, since you mentioned OpenSSH, the default SCP in busybox doesn’t support encryption. How do I add this in busybox or replace it with an OpenSSH version. Just so I can do some additional benchmarks and find some more use cases for the engine.

I am still looking for a way to have OpenVPN use the engine correctly; sofar nobody on the OpenVPN forum was able to point me in the righ direction

@drbrains On the default 17.01.4 I use :

openssh-keygen - 7.4p1-1
openssh-server - 7.4p1-1
openssh-sftp-server - 7.4p1-1

instead of dropbear, because dropbear supports only few ciphers. Then in putty I set to use ChaCha20 and i get a bit more bandwidth when doing scp/port forwarding (on a tp-link 1043nd v1). Obviously you'll use AES.

So I “sort of” figured out why the openvpn usage with cryptodev and the current MT7628 AES engine is failing.

Running “only” a encryption AES-CBC works fine. As test write encrypted file. Then run “only” decryption and I get the proper results. Using both ways “together” creates the problem. It seems to me that the “IV” is not updated correctly. Looking at engine code for other hardware, I can’t find that those drivers are updating the IV via some sort of write back.

Most updated drivers use the skcipher API vs the older blkcipher structure. So now the question: will updating the code to use the newer API eliminate my problem or should I start using extra buffers/structures/tokens or something to allow “concurrent” encryption/decryption ? The hardware of course can only do 1 at the time, so any new operating has to wait for the hardware to compete processing the block.

I am using interrupts to wake up the driver once a block is processed. Alternatively I could do polling but I don’t think (please confirm this assumption) that it makes a difference cause the user space application(s) will do request to encrypt or decrypt in any random order.

2 Likes

Hey @drbrains, and thanks for your work on this!

I'm playing with gnubee.org GB-PC1 device (open source hardware for SOHO network attached storage), based on MT7621a and currently have just around 5..9mb/s with cryptsetup speed benchmark.
Luks & cryptsetup are widely used full disc encryption for linux currently, so seeing your results of 50mb/s looks really nice! Typical hdd disk connected to gnubee have around 100mb/s unencrypted.
I've seen your results were good on CBC only, and XTS (generally preferred) were bad. Anyway with some proper configuration you can have quite secure CBC FDE.

I've noticed your new repos like this: https://github.com/vschagen/mtk-eip93

  1. Is this newest version I can/should use with openwrt on my MT7621a?
  2. I'm currently building my own openwrt builds based on 17.01 - do you require 4.6+ kernels? 17.01 still uses 4.4 line
  3. do you have any build scripts integrating your driver into openwrt build environment?

Thank you.

1 Like

The EIP93 driver that you mentioned is not finished yet. I should have a few hours to work on it today and tomorrow to get a proof of concept working. As for AES-xts, that is not in the hardware so it will never be faster.

The results shows are on the AES only crypto engine in the MT76x8. (128/192/256 ECB and CBC only in the hardware)

Thanks, look forward your commits.

@drbrains, will this work on RB750Gr3 (MT7621AT) ? I am really looking forward to be able to use LUKS encryption on this device.

The driver for the MT7621 is a so-called “stop project” for me. It looks good sofar that I will be able to finish, but unfortunately it takes longer then I was hoping.

As for LUKS, then default encryption is AES-XTS which is not supported by hardware. The next best option would be to use AES-256-CBC. Then as far as performance goes. Unfortunately by design the DM-Crypt is using 512 Byte sectors, which means only 512 bytes of encryption is done at one time. This will give some performance increase in hardware, but until now all efforts to get it batched or use bigger blocks/sectors have not been integrated. This is true regardless the hardware we use. So hopefully we can see about double performance but with at least the benefit of offloading the main CPU to do other stuff.

The EIP93 inside the SoC is designed as IPSeC packed engine and by this design doesn’t even do full scatter/gather DMA which also reduces performance. For this reason the proprietary IPSEC (with ESP offload) does not allow for fragmented SKB.

1 Like

What is a stop-project ? (No native english speaker here :slight_smile: )
Well, i can use aes-cbc just fine and if i can get 10MB/s to write on a external USB2 disk from my RB750Gr3 offloading the cpu, i will be very happy. When you finish, drop a message here and i can help you testing.

Starting with cryptsetup 2.0, released december 2017, and if using kernel 4.12+, you can specify sectorsize of encrption volume up to 4096 bytes instead of default 512 in cryptsetup 1.x

Using 4096 will have signifcant performance gain on most embedded devices (equipped with hw crypto) over using 512 on the same device.

MTK devices with working eip93 should gain significant advantage.

2 Likes

@rdslw, I see. Since I actually never use(d) cryptsetup/luks(2) I did some reading up on this topic. Its great to see that sector_size (up to 4096) is now available. This should help performance on our "little" devices a lot. While on the subject, I noticed that for IV generation "random" could be specified. The EIP93 has a PRNG inside (which was mainly designed for IPSEC). Would a hardware PRNG be better (faster) then a software variant? In other words, should I try and get that to work as well?? Until now I skipped that topic on the data sheet that I have (which is not specific for the MT7621, but seems close enough while referencing the old ESP/IPSEC Mediatek code).

Of course it is. Should help with some of the early boot issues.

@drbrains, thanks for your reply!

TL;DR, no need to work on RNG in first phase.

Longer explanation:
For IV generation (and worth to note is that IV usage depends on encryption mode used) there are multiple approaches used by cryptsetup. Two most commonly IV modes used for full disk encryption modes (which are XTS and CBC) are "plain64" and "essiv". Both of them (plain64 and essiv) will not gain anything/much from hw RNG and standard kernel provided /dev/urandom is enough (it is used for essiv's salt which is used in CBC. Especially that these days cryptsetup default is XTS).

The is, even more important in my opinion, thing: I trust much more kernel's /dev/urandom and /dev/random as a randomness source than MTK PRNG of unknown quality. Especially that high entropy in cryptsetup in our scenarios is needed only (1) to generate masterkey. This is currently using /dev/random, which blocks until enough entropy is provided by the kernel.

Of course we'are in embedded world, where true entropy sometimes is scarce(2), but as I said: it is needed once on volume initialization, and kernel is doing good job keeping enough entropy in its pool: cat /proc/sys/kernel/random/entropy_avail

(1) I lied :slight_smile: - entropy is also needed for password, but this is on user providing strong enough password. If somebody is using keys, I suggest: head -c 64 /dev/random > /tmp/keyfile

(2) My recommendation for openwrt devices using cryptsetup is
"uci set system.@system[0].urandom_seed='/etc/urandom.seed.perreboot'"
This will provide additional entropy seed per every reboot, instead of openwrt's default ONCE per firmware installation. The price is some additional flash write wear, but one per reboot is fine for me.

I'will check if we have cryptsetup 2.0 in openwrt packages, and will try to provide update if needed.

P.S. Important thing: I'm skipping in above discussion AEAD modes, there are currently VERY new and experimental in cryptsetup.

Let’s say I can add that feature. The PRNG is according to the Datasheet conform the ANSI standard. Would that mean it would substitute (complement) the DBRG.ko module. And since it’s HW it should have more/enough entryphy as soon as it’s started, which speeds up some applications that are using the (blocking) /dev/(u)random?? I read somewhere (can’t find it anymore) that even OpenVPN would benefit cause it can’t produce enough “randomness” ??

This topic is not very clearly documented for a simple soul like me.

Well, RNG is a pretty huge problem for embedded devices. This impacts stuff like Dropbear SSH keys. It's known that it's a problem as util-linux was a problem in the past when getrandom was blocking.

DBRG is deterministic so maybe not? No idea.

From a driver point of view a request for a result from the HW PRNG is the same as a request to do a AES or SHA256 block. Most (all?) hw drivers I looked at for my studies to get this to work use a simple FIFO queue.

In theory there could be 10 AES requests before the PRNG will be processed. I have know idea how often a PRNG request will be done. Or to ask the question another way: should the PRNG request get priority over the other requests cause AES requested need that random data to generate an IV?? Or is /dev/(u)random used, which in turn will do PRNG requests when the pool gets dry?

(Edit): most other HW solutions I looked at seem to have the PRNG as separate implementation which is not part of the crypto hw engine, which means they don’t have this problem.

For IPSeC processing, the proprietary driver was using the PRNG directly to provide the IV to the engine. (Can be done all in hardware).

The same could be done for “simple” AES-CBC requests. But I’m not clear on the “.geniv” part of the registration of the Algorithm to the kernel. Will e.g. OpenVPN let the driver handle the IV itself or will it still generate its own IV to be passed to the driver?

1 Like

Do you need this document?
MT7621 programming guide

or this?
MT7621 datasheet

1 Like

Thanks, but unfortunately both the programming guide as well as the data sheet only mention that a crypto-engine is "available", it doesn't specify how to use.

great work ...