Is it possible to add block ciphers for Cavium octeon

Hi all,

OpenWRT has already supported some octeon devices such as Ubiquiti EdgeRouter. Octeon SoCs have powerful crypto engine. I saw kernel guys have added some message digest modules
to support OCTEON hardware accelerator.

I wonder if someone has the experience and willing to support
block ciphers such as AES/DES . We can get ASM primitives from

// AES

#define CVMX_MT_AES_ENC_CBC0(val) asm volatile ("dmtc2 %[rt],0x0108" :
: [rt] "d" (val))

#define CVMX_MT_AES_ENC_CBC1(val) asm volatile ("dmtc2 %[rt],0x3109" :
: [rt] "d" (val))

#define CVMX_MT_AES_ENC0(val) asm volatile ("dmtc2 %[rt],0x010a" : :
[rt] "d" (val))

#define CVMX_MT_AES_ENC1(val) asm volatile ("dmtc2 %[rt],0x310b" : :
[rt] "d" (val))

#define CVMX_MT_AES_DEC_CBC0(val) asm volatile ("dmtc2 %[rt],0x010c" :
: [rt] "d" (val))

#define CVMX_MT_AES_DEC_CBC1(val) asm volatile ("dmtc2 %[rt],0x310d" :
: [rt] "d" (val))

#define CVMX_MT_AES_DEC0(val) asm volatile ("dmtc2 %[rt],0x010e" : :
[rt] "d" (val))

#define CVMX_MT_AES_DEC1(val) asm volatile ("dmtc2 %[rt],0x310f" : :
[rt] "d" (val))

// pos can be 0-3

#define CVMX_MT_AES_KEY(val,pos) asm volatile ("dmtc2 %[rt],0x0104+"
CVMX_TMP_STR(pos) : : [rt] "d" (val))

// pos can be 0-1

#define CVMX_MT_AES_IV(val,pos) asm volatile ("dmtc2 %[rt],0x0102+"
CVMX_TMP_STR(pos) : : [rt] "d" (val))

#define CVMX_MT_AES_KEYLENGTH(val) asm volatile ("dmtc2 %[rt],0x0110"
: : [rt] "d" (val)) // write the keylen

// pos can be 0-1

#define CVMX_MT_AES_RESULT(val,pos) asm volatile ("dmtc2
%[rt],0x0100+" CVMX_TMP_STR(pos) : : [rt] "d" (val))

// pos can be 0-1

#define CVMX_MF_AES_RESULT(val,pos) asm volatile ("dmfc2
%[rt],0x0100+" CVMX_TMP_STR(pos) : [rt] "=d" (val) : )

// pos can be 0-1

#define CVMX_MF_AES_IV(val,pos) asm volatile ("dmfc2 %[rt],0x0102+"
CVMX_TMP_STR(pos) : [rt] "=d" (val) : )

// pos can be 0-3

#define CVMX_MF_AES_KEY(val,pos) asm volatile ("dmfc2 %[rt],0x0104+"
CVMX_TMP_STR(pos) : [rt] "=d" (val) : )

#define CVMX_MF_AES_KEYLENGTH(val) asm volatile ("dmfc2 %[rt],0x0110"
: [rt] "=d" (val) : ) // read the keylen

#define CVMX_MF_AES_DAT0(val) asm volatile ("dmfc2 %[rt],0x0111" :
[rt] "=d" (val) : ) // first piece of input data

// 3DES

// pos can be 0-2

#define CVMX_MT_3DES_KEY(val,pos) asm volatile ("dmtc2 %[rt],0x0080+"
CVMX_TMP_STR(pos) : : [rt] "d" (val))

#define CVMX_MT_3DES_IV(val) asm volatile ("dmtc2 %[rt],0x0084" : :
[rt] "d" (val))

#define CVMX_MT_3DES_ENC_CBC(val) asm volatile ("dmtc2 %[rt],0x4088" :
: [rt] "d" (val))

#define CVMX_MT_3DES_ENC(val) asm volatile ("dmtc2 %[rt],0x408a" : :
[rt] "d" (val))

#define CVMX_MT_3DES_DEC_CBC(val) asm volatile ("dmtc2 %[rt],0x408c" :
: [rt] "d" (val))

#define CVMX_MT_3DES_DEC(val) asm volatile ("dmtc2 %[rt],0x408e" : :
[rt] "d" (val))

#define CVMX_MT_3DES_RESULT(val) asm volatile ("dmtc2 %[rt],0x0098" :
: [rt] "d" (val))

// pos can be 0-2

#define CVMX_MF_3DES_KEY(val,pos) asm volatile ("dmfc2 %[rt],0x0080+"
CVMX_TMP_STR(pos) : [rt] "=d" (val) : )

#define CVMX_MF_3DES_IV(val) asm volatile ("dmfc2 %[rt],0x0084" : [rt]
"=d" (val) : )

#define CVMX_MF_3DES_RESULT(val) asm volatile ("dmfc2 %[rt],0x0088" :
[rt] "=d" (val) : )

1 Like

Have you tried crypto functions in a Octegon device?
I now run 21.02-rc1 on my EdgeRouter4 with the same OpenVPN tunnel (AES) server as I had in my old WRT3200ACM (also with 21.02) and my first and second impression is that the crypto function runs very fast in ER4. Like 10-20Mbit/s faster.

I haven’t very scientific measurements of this conclusion but I have never seen these numbers on the VPN tunnel earlier with WRT3200ACM.

I don't think OpenWRT supports octeon HW crypto engine. You can try below commands to test the crypto speed in user space.

openssl speed -evp aes-128-cbc -elapsed
openssl speed -evp aes-128-gcm -elapsed
1 Like

I can try in the near future.

I have tried

Openvpn --show-engines

And some kind of engine turned up in the answer i think.

I just have little hard time seeing a sudden speed increase if 10-20Mb/s with aes-256-gcm on hardware that is similar in CPU performance if no hardware accelerator is involved.

I tried the aes-128-gcm test on both wrt3200acm and er4.

And to be hounest I am surprised because the wrt3200acm is about 2x faster on the blocks than er4.
But the real life performance is the other way around!?

Can the OpenVPN have some code that activate acceleration that recognize the ER4 since OpenVPN is Ubi factory supported?

I don't think so. Maybe you need scientific measurements. You can test like this:

LAN PC1--router running OpenVPN server ---WAN PC2 running VPN client connects router directly without ISP network

Run iperf in PC1 and PC2.

1 Like

I will have to look into it in the future. But I like the idea of hardware acceleration!:smiley:

Maybe there could be a confusing result of many project going on at the same time when this “increased speed” tunnel emerged.

  1. Change of router from wrt3200acm to er4.
  2. Change from 19.07 to 21.02.
  3. Change from OpenVPN 2.4 to 2.5 which is kind of just as big technological change as 19.07 to 21.02. Since it came in combination with the iOS apps to full TLS1.3 support with the new crypto suits that aren’t suits any more.

And maybe it is nr 3 that is the key of the speed for my findings, since I had old standard cipher suit (the one in our user guide) that isn’t TLS1.3 supported (I am still doing the homework on TLS1.3 and that is a completely new ballgame from 1.2 and it isn’t so easy to convert the config files) and I made a emergency fix and put a block in the vpn configs to use max TLS1.2 and block cipher instead of EC and stream cipher that the auto config wanted to use.
And about the same time the speed came, and the same time I started using ER4.
So it is possible it actually runs at “normal” speed now and had ineffective config low speed earlier.

Some info by:

openssl engine -t -c -vv

if devcrypto

openssl engine -pre DUMP_INFO devcrypto

or if AFLAG

openssl engine -pre DUMP_INFO afalg

indicate what is being found and if HW accelerated

In fact, we can use the hardware acceleration in the user space and kernel space for OpenVPN.
However, we can get much faster rate in the kernel space. OpenVPN kernel space implementation
is nearly done. The link is

So are you using same OpenWRT commit ID to build FW for your wrt3200acm and er4?
Are the config files of server and client same?

The Octeon Crypto functions are available in the kernel, under the Cryptographic API section, but I believe it wouldn't apply to the Octeon+/OcteonII, only the OcteonIII CN7xxx and higher. The entire Octeon target is locked to the Octeon+ (Octeon 1.5) via -march in the target defines.

We've had discussions and issues regarding the increasing spread of the Octeon targets across the various revisions. Octeon+ is inhibiting the OcteonIII targets, but in order to maintain compatibility with the ERlite, it has to be this way.. There aren't enough OcteonIII targets to justify a new target in terms of build resources.

It should be, but you see only some digest algorithms are supported without AES cipher.

ER4 never had 19.07 support so I have no speed history for that.

The openvpn configs are for the same TLS1.2 standard. The thing is that iOS OpenVPN3 app with TLS1.3 support came last year but we had OpenVPN 2.4 in OpenWRT 19.07 so it was TLS1.2 anyway because the server didn’t have anything else until now when we also have OpenVPN 2.5 in OpenWRT 21.02.
Then TLS1.3 came alive but TLS1.3 doesn’t allow static key DH key exchange. It demands at least 128-bit security so 2048 bit keys are also history. RSA is allowed for now but under security scrutiny but RSA at 3072 bits are a heavy workload. So for now TLS1.2 is what I use until I get new keys setup.

But now both routers run 21.02-rc1. So that is the same.

So if we are going to measure it scientifically this is a really bad setup since absolutely every single involved component except the crypto keys and basic VPN config setup has been forced to change in order to work at all and after that the tunnel speed for OpenVPN has increased stable 2-3times compared to before.

  Octeon Cavium 7130 (Cavium 3)

Above from ER4 Git commit.

I guess ER4 is what you call OctegonIII CN7…?

But my experience right now is that ER4 has no speed issue on the crypto work with 21.02, if not the imagebuilder 21.02 with my add ons is very different from the usual builds with my add-ons downloaded with opkg. Not for home use at least.
It seems to bottle up at somewhere between 26-30Mbit/s in comparison on 4G and “almost public” wifi where the WiFi is about 10-20Mbit/s more without tunnel.
Never ever more than 16Mbit/s (and that was a sunny day) before on the WRT3200ACM.

So from my viewpoint I feel that this is a small problem to put much energy to right now since the vpn logs has no fault, DNS Leak test cant find any fault and I am connected to the LAN in my router and I don’t see that we will find any real performance improvements when living on-the-go, especially regarding what you say about the number of devices.

It will actually probably go even faster when I implement TLS1.3…

Again, I don't think your speed increase is caused by hardware accelerator because it need use assamese language to implement and AFAIK no one supports it in user space while only some digest
algorithms have been supported in kernel space.

BTW, can you check if your er4 user openvpn use openssl library?

root@openwrt:/# openvpn --version
OpenVPN 2.4.5 aarch64-openwrt-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [MH/PKTINFO] [AEAD] built on Mar 11 2021
library versions: OpenSSL 1.0.2q  20 Nov 2018, LZO 2.10
Originally developed by James Yonan
Copyright (C) 2002-2018 OpenVPN Inc <>

I can check but I really hope it uses openssl because the package I have always used is named openvpn-openssl

I have seen this kind of speed before when I tried Wireguard. Then everything in the logs said Ok, the VPN icon lit up in iOS. The speed was great.
But DNS leak test said i surfed on the 4G network and I couldn’t connect to the devices on my LAN.

So one thing that is on my mind is if the tunnel isn’t actually encrypting the data regardless that the server and client has agreed on AES-256-GCM chiper.
That would probably give this speed.
But I have no real resources and time and knowhow to deep dive into wireshark analysis.

Can it be some conflict between wolfssl and openssl?
Up to 19.07.7 I have always and only used the openssl based packages also for luci-ssl just because I only want one crypto handling package to avoid crypto problems.

But from 21.02 OpenWRT is precoded with wolfssl and only luci-ssl can be installed.
Tried installing luci-ssl-openssl but get dependables error.
So now with openvpn-openssl the routers have both openssl and wolfssl.

Maybe, That's why I wanted you to run command

openvpn --version