Marvell Kirkwood Cryptographic Hardware Acceleration

Hi, I have some PogoPlug Mobile around and I am playing and trying to get it working the Cryptographic Hardware Acceleration.

I am trying to use dm-Crypt (LUKS) and OpenSSL ... this could be used as a NAS with an encrypted USB harddrive or OpenVPN server, both with hardware acceleration ...

So far I have done some tests and installed proper packages ... I will post below my findings but I will be glad to hear from you.

Ok, So far I was able to do:

opkg install usbutils kmod-usb-storage block-mount

To be able to use USB block mass storage.

When dmesg:

scsi host1: sata_mv
ata1: SATA max UDMA/133 irq 32
ata1: SATA link down (SStatus 0 SControl F300)
scsi 0:0:0:0: Direct-Access     WDC WD60 PURX-64T0ZY1          PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] 11721045168 512-byte logical blocks: (6.00 TB/5.45 TiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 28 00 00 00
sd 0:0:0:0: [sda] No Caching mode page found
sd 0:0:0:0: [sda] Assuming drive cache: write through
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
 sda: unknown partition table
sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
sd 0:0:0:0: [sda] Attached SCSI disk
Bus 001 Device 002: ID 152d:2337 JMicron Technology Corp. / JMicron USA Technology Corp. ATA/ATAPI Brid
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

To be able to use block encryption:

opkg install cryptsetup # installs: kmod-crypto-aead, kmod-crypto-pcompress, kmod-crypto-manager, kmod-dm, libdevmapper, libreadline, lvm2, libgpg-error, libgcrypt, libpopt
opkg install kmod-crypto-aes kmod-crypto-xts
opkg install kmod-crypto-user kmod-crypto-iv kmod-crypto-misc # installs kmod-crypto-rng kmod-crypto-wq

To do some speed tests as is:

cryptsetup benchmark

To test Cryptographic Hardware Acceleration:

# OCF - Open Cryptographic Framework, Crypto API
# CESA - Cryptographic Engine and Security Acceleration
opkg install kmod-crypto-mv-cesa # 3.18.23-1 - Marvell crypto engine
opkg install kmod-cryptodev # - 3.18.21+1.7-kirkwood-2 - This is a driver for that allows to use the Linux kernel supported hardware ciphers by user-space applications.
opkg install kmod-crypto-core # Already installed ?

Check for hw:

cat /proc/crypto
# lines says: "module : mv_cesa"
ps | grep mv_crypto
# line say: "[mv_crypto]"

To do some speed tests:

cryptsetup benchmark

To test using openssl "speed" benchmark feature:

opkg install openssl-util
openssl version
#OpenSSL 1.0.2g  1 Mar 2016
openssl engine
#(dynamic) Dynamic engine loading support

Do some tests (not done yet):

openssl speed -evp aes-128-cbc
openssl speed -evp bf-cbc aes-128-cbc
openssl speed -elapsed -evp aes-128-cbc

Benchmark using cryptsetup benchmark:

#Tests are approximate using memory only (no storage IO).
#PBKDF2-sha1        38102 iterations per second
#PBKDF2-sha256      24637 iterations per second
#PBKDF2-sha512       5407 iterations per second
#PBKDF2-ripemd160   36008 iterations per second
#PBKDF2-whirlpool    3648 iterations per second
##  Algorithm | Key |  Encryption |  Decryption
#     aes-cbc   128b    17.1 MiB/s    13.9 MiB/s
# serpent-cbc   128b           N/A           N/A
# twofish-cbc   128b           N/A           N/A
#     aes-cbc   256b    16.3 MiB/s    16.5 MiB/s
# serpent-cbc   256b           N/A           N/A
# twofish-cbc   256b           N/A           N/A
#     aes-xts   256b     9.0 MiB/s     9.1 MiB/s
# serpent-xts   256b           N/A           N/A
# twofish-xts   256b           N/A           N/A
#     aes-xts   512b     7.1 MiB/s     7.3 MiB/s
# serpent-xts   512b           N/A           N/A
# twofish-xts   512b           N/A           N/A

Not tested this yet (I just saw the cryptsetup-openssl package, and I read on internet that OpenSSL is capable of use hw crypto acc.:

opkg remove cryptsetup; opkg install cryptsetup-openssl

Now I am stuck with this ( I read that kernel module cryptodev is needed):

root@PogoPlugMobile-3:~# opkg install kmod-cryptodev
Installing kmod-cryptodev (3.18.21+1.7-kirkwood-2) to root...
Downloading http://downloads.openwrt.org/chaos_calmer/15.05.1/kirkwood/generic/packages/packages/kmod-cryptodev_3.18.21+1.7-kirkwood-2_kirkwood.ipk.
Collected errors:
 * satisfy_dependencies_for: Cannot satisfy the following dependencies for kmod-cryptodev:
 * 	kernel (= 3.18.21-1-f964fa2931ce5e11f494d72f9014ffa0) * 
 * opkg_install_cmd: Cannot install package kmod-cryptodev.


root@PogoPlugMobile-3:~# opkg install kmod-cryptodev --force-depends
Installing kmod-cryptodev (3.18.21+1.7-kirkwood-2) to root...
Downloading http://downloads.openwrt.org/chaos_calmer/15.05.1/kirkwood/generic/packages/packages/kmod-cryptodev_3.18.21+1.7-kirkwood-2_kirkwood.ipk.
Installing kmod-crypto-authenc (3.18.23-1) to root...
Downloading http://downloads.openwrt.org/chaos_calmer/15.05.1/kirkwood/generic/packages/base/kmod-crypto-authenc_3.18.23-1_kirkwood.ipk.
Configuring kmod-crypto-authenc.
Configuring kmod-cryptodev.
Collected errors:
 * satisfy_dependencies_for: Cannot satisfy the following dependencies for kmod-cryptodev:
 * 	kernel (= 3.18.21-1-f964fa2931ce5e11f494d72f9014ffa0) * 
root@PogoPlugMobile-3:~#
1 Like

Hi @braian87b

Still have this issue ?
Let me see if it builds with the Pogo4 build listed here

root@OpenWrt:~# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
PBKDF2-sha1        30340 iterations per second for 256-bit key
PBKDF2-sha256      54613 iterations per second for 256-bit key
PBKDF2-sha512      22914 iterations per second for 256-bit key
PBKDF2-ripemd160   28493 iterations per second for 256-bit key
PBKDF2-whirlpool     N/A
#     Algorithm | Key |  Encryption |  Decryption
        aes-cbc   128b    29.6 MiB/s    30.3 MiB/s
    serpent-cbc   128b           N/A           N/A
    twofish-cbc   128b           N/A           N/A
        aes-cbc   256b    27.0 MiB/s    27.8 MiB/s
    serpent-cbc   256b           N/A           N/A
    twofish-cbc   256b           N/A           N/A
        aes-xts   256b     9.5 MiB/s     9.5 MiB/s
    serpent-xts   256b     8.1 MiB/s     8.1 MiB/s
    twofish-xts   256b    10.1 MiB/s    10.0 MiB/s
        aes-xts   512b     7.7 MiB/s     7.6 MiB/s
    serpent-xts   512b     8.1 MiB/s     8.1 MiB/s
    twofish-xts   512b    10.4 MiB/s    10.0 MiB/s

.

root@OpenWrt:~# cat /proc/cpuinfo 
processor	: 0
model name	: Feroceon 88FR131 rev 1 (v5l)
BogoMIPS	: 795.44
Features	: swp half fastmult edsp 
CPU implementer	: 0x56
CPU architecture: 5TE
CPU variant	: 0x2
CPU part	: 0x131
CPU revision	: 1

Hardware	: Marvell Kirkwood (Flattened Device Tree)
Revision	: 0000
Serial		: 0000000000000000
1 Like
root@OpenWrt:~# time openssl speed -elapsed -evp aes-128-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 48065 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 47294 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 45956 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 38070 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 10792 aes-128-cbc's in 3.00s
OpenSSL 1.0.2n  7 Dec 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr) 
compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/user/build/cjdns/openwrt/staging_dir/target-arm_xscale_musl_eabi/usr/include -I/home/user/build/cjdns/openwrt/staging_dir/target-arm_xscale_musl_eabi/include -I/home/user/build/cjdns/openwrt/staging_dir/toolchain-arm_xscale_gcc-5.5.0_musl_eabi/usr/include -I/home/user/build/cjdns/openwrt/staging_dir/toolchain-arm_xscale_gcc-5.5.0_musl_eabi/include/fortify -I/home/user/build/cjdns/openwrt/staging_dir/toolchain-arm_xscale_gcc-5.5.0_musl_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mcpu=xscale -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=soft -iremap/home/user/build/cjdns/openwrt/build_dir/target-arm_xscale_musl_eabi/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/user/build/cjdns/openwrt/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc        256.35k     1008.94k     3921.58k    12994.56k    29469.35k
real	0m 15.16s
user	0m 0.51s
sys	0m 7.38s
1 Like

Didn't tryed anymore, I currently have my pogos on a drawer (I have painted them with white spray paint, they look neat.) I will leave it there until I have spare time again to try to flash a newer release of OpenWrt/LEDE again (hope in the meanwhile someone have success on that). Currently I have only one connected now, using as a temporary NAS to move some files to another NAS, I have around very stable speed of 2MB/s when using rsync with a USB to Sata cable on a 2Tb ext4 formatted drive. I need to test yet if it is a limitation of the Pogo or the other NAS, but it seems to be the pogo.

It is just too damn slow... :roll_eyes:

With that speed I think that I will use these for Windows FileHistory / Mac TimeMachine NAS only. With that slowlyness I could even use fat32/ntfs format, it should be the same I think. I will try and let you know when I finish the rsync, probably will take 4 or 5 more days :man_facepalming: I configured the leds to flash red/green to see if it is still moving the files to my other NAS, it flashes when there is network traffic so it flashes very fast.

Which version and package you used in your tests?
It seems definitely faster than mine, some time ago using OpenWrt 15.05.1:

Please post here the steps that you follow, packages that you installed etc in order to get it working.
If the hw accelerated encryption works well with OpenSSL I want use the pogo for OpenVPN Server too, since it is very CPU consuming, will be interesting to try it, and I could replace the TP-LINK WDR4300 that I use now.

@braian87b

It the latest snapshot release from https://github.com/openwrt/openwrt.git with OpenSSL 1.0.2n 7 Dec 2017

I will also look at what the Orange Pi H3 performance is with regard to OpenSSL performance.

This is a Orange Pi inside a TP-Link TL-WDR4300 router to solve the VPN performance issue you refering to (using OTG NDIS mode for networking)

My build steps:

Ok, I built the git repo locally enabling all the packages. (build takes a good couple of hours)

make menuconfig
Global build settings --->
   [*] Select all packages by default

My steps basically same as yours on top of the post

the kmod-crypto-mv-cesa package does not exist anymore? (maybe support included elsewhere)

something like this

opkg install openssl-util cryptsetup-openssl kmod-crypto-aes kmod-crypto-xts kmod-crypto-user kmod-crypto-iv kmod-crypto-misc

Now when i run

time openssl speed -elapsed -evp aes-128-cbc

with and without installing kmod-cryptodev the benchmark results change and also the real, user, and sys time used. This indicates the crypto hardware is doing the task not the cpu?

Im going to test the crypto hardware support with CJDNS.

1 Like

Interesting, no, I actually have the pogo with their factory case so I will just connect it using ethernet and move the OpenSSL server config file to the pogo and disable on the Router.

If this device builds from the mvebu platform then all HW acceleration should be being built by default. You can verify this by looking for something like this in the bootlog:

[    1.235953] marvell-cesa f1090000.crypto: CESA device successfully registered

If you cat /proc/crypto you will be able to identify what will be accelerated by the crypto engine looking for entries with priority 300. Example:

name         : ecb(des)
driver       : mv-ecb-des
module       : kernel
priority     : 300
refcnt       : 1
selftest     : passed
internal     : no
type         : ablkcipher
async        : yes
blocksize    : 8
min keysize  : 8
max keysize  : 8
ivsize       : 0
geniv        : <default>

Regarding the question above, namely:

Now when i run "time openssl speed -elapsed -evp aes-128-cbc" with and without installing kmod-cryptodev the benchmark results change and also the real, user, and sys time used. This indicates the crypto hardware is doing the task not the cpu?

Monitor cpu usage while the test is running. High cpu% "user" = software encryption; high cpu% "system" = servicing interrupts for HW acceleration.

In many cases on these platforms software encryption is faster but results in near 100% cpu use. Hardware acceleration offloads the cpu resulting in lower cpu loads but is only faster in limited cases. Using the "-elapsed" flag is the correct way to identify this however, it will tell you the actual throughput rate of instructions.

So that tells if is was enabled on the kernel?
So there is no simple way to tell if it is enabled and working right?

Boot the device, then
dmesg | grep CESA
It doesn't get much easier than that. If you:
cat /proc/interrupts
you will be able to see whether the engine is being used or not. Check interrupt count, exercise some crypto, then check again and compare. Mine looks like this:
39: 912058 0 GIC-0 51 Level f1090000.crypto
That's 912K interrupts on that engine. Alternatively, as mentioned earlier you can monitor whether loading is due to raw cpu use or interrupt servicing.

1 Like

On another platform I seem to have problems getting hardware crypto in combination with the cryptodev to work with OpenVPN. Sofar my conclusion is that it’s a OpenVPN “bug”. Could you do a OpenVPN test with the cryptodev and the hardware engine installed to see if it passes the build in cryptotest??

cd /tmp
openvpn —genkey —secret key
openvpn —test-crypto —secret key —cipher AES-256-CBC —engine cryptodev

And one without the “—engine cryptodev” part to compare. This assumes that your OpenSSL library was build with cryptodev support.

Thanks

1 Like

Both tests succeed. Interestingly, only --engine cryptodev exercises the CESA module.

I'm pretty sure running an openvpn server will automatically use the HW engine if appropriate but apparently this isn't the case for test mode -- interrupts are clocking only when specifying --engine cryptodev. (I use GCM for my server, so the HW engine isn't utilized in this case.)

Thanks, then it must be something with my crypto engine. (Or combination).

OpenVPN doesnt use the Cryptodev by default, but you can specify it in your config file. I was consider “downgrading” to AES-256-CBC if it would offload my CPU.

Thanks again for running the tests :smile:

@InkblotAdmirer Indeed the main issue for me also is the high cpu usage

i have also not been able to use more than 1 cpu core for encryption

on my kirkwood i can clearly see low cpu usage but my x86 build not yet

other example below
http://processors.wiki.ti.com/index.php/Build_OpenSSL_for_Sitara

root@arago:/usr/local/ssl/bin# time -v openssl speed -evp aes-256-cbc -engine cryptodev
engine "cryptodev" set.
Doing aes-256-cbc for 3s on 16 size blocks: 112338 aes-256-cbc's in 2.60s
Doing aes-256-cbc for 3s on 64 size blocks: 100421 aes-256-cbc's in 2.39s
Doing aes-256-cbc for 3s on 256 size blocks: 14423 aes-256-cbc's in 0.42s
Doing aes-256-cbc for 3s on 1024 size blocks: 16441 aes-256-cbc's in 0.50s
Doing aes-256-cbc for 3s on 8192 size blocks: 3982 aes-256-cbc's in 0.23s
OpenSSL 1.0.0a 1 Jun 2010
built on: Tue Sep 14 14:22:46 CDT 2010
options:bn(32,32) rc4(ptr,int) des(idx,cisc,2,long) aes(partial) idea(int) blowfish(idx)
compiler: arm-none-linux-gnueabi-gcc -O -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 599.14k 2142.31k 1230.76k 5611.86k 10873.51k
Command being timed: "openssl speed -evp aes-256-cbc -engine cryptodev"
User time (seconds): 0.32
System time (seconds): 5.85
Percent of CPU this job got: 41%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 15.05s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 5440
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 356
Voluntary context switches: 46666
Involuntary context switches: 201214
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
root@arago:/usr/local/ssl/bin#