Hardware crypto support for PacketEngine-IP-93 (EIP-93) on MTK7621

You can use a mask that allows it to work across multiple cores. Generally, though, you'd want one interrupt to be processed on one core and not different cores. That's how irqbalance assigns it: one interrupt to a single core.

That was just a random example I wrote for the post. They're not actually distributed like that in practise.

I've been fiddling with irqbalance for the last few hours, so the proc/interrupts output above is what irqbalance has been doing. In practise, my manual affinity assignment is just as you suggest.

For performance I'm not debating that we should all use some little "nuc" kind of device base on intel with at least the aes-ni instruction set.

This is about me, developing a crypto-driver for the EIP93 which is incorporated in the Mediatek MT7621 (ramips target). And I know that the ubnt ER-X and the Mikrotik HEX RB750GR3 are using their own proprietary OS with IPsec (only) hardware offload. Those devices with the same MT7621 SoC are doing 300+ mbps with IPsec. and according to some guy who did some benchmarking even the ER-X maxed out at around 125 mbps and according to him that was because it was only using one Core for "upload".

my comment above about better off using aesni was a reply to @markbirss's question

3 Likes

I would imagine that they've done some proprietary mods to the ipsec implementation to make it aware of the crypto accelerator and use it asynchronously, otherwise they're unlikely to get such improved performance.

That is next on my "wish" or "todo" list. See if (how) I can get it to integrate with the ethernet driver and register it as XFRM-offload for ESP. That way a lot of overhead getting it from an SKB to crypto scatterlist including the additional system call for the encryption/decryption and back to an SKB will be bypassed.

If I can make some improvements on the Cipher code in the process that would be great: maybe it would be possible to see at least 10-15% improvement using it for OpenVPN.

@drbrains with regard to the MT7621 Crypto Engine is this the code or you have other ?

her also for the older MT76x8

The other one is the aes-ecb/cbc engine in the MT7628.

1 Like

I have some MT7621 devices that i might look at look at

The wifi is not supported by OpenWRT only by

openvpn is another whole can of worms. openvpn 2.x is single process and single thread and uses only synchronous calls. it's definitely not performance-oriented

openvpn 3.x may in time be more capable of using hardware crypto. They have a number of things on the roadmap: https://community.openvpn.net/openvpn/wiki/RoadMap which look promising.

2 Likes

For OpenVPN I am envisioning some kind of openssl engine running the EIP93 in Direct Host Mode, versus the Autonomous Ring Mode that I'm using now. Not sure if I am allowed to do some kind of concurrency locking within that engine, but it needs to have something like that. And Since OpenVPN is (still) single threaded it should not be a problem as long as its the "only" user.

It will be a separate project and its long term thinking for now. Lets see how far performance can be pushed with smarter (better) code.

Thank you! :v:

I do wonder where WireGuard fits in all of this...

Comparative Throughput Testing Including NAT, SQM, WireGuard, and OpenVPN shows mt7628n at a little less than 100mbps. mt7621 is faster.

There's no multi-core support for WireGuard currently. I assume that's planned.

One of the issues with Wireguard is the crypto algorithms it uses are mostly not supported by hardware accelerators anyway. Certainly for the Intel adapters, of the protocols used by Wireguard only HKDF and Curve25519 are supported. The symmetric ciphers are not supported.

As far as I know WireGuard is using "padata" which makes all the encryption and decryption using as many cores as available. All the data is in a ring buffer and they are processing packets in parallel with each packet given a "node" number to maintain sequence once its time to send.

See WireGuard whitepaper

I did not know this. I wonder how it compares to IPSec then... It has an ASM implementation for ChaCha20 for mips. Should be faster on mipsel as well. But the core used on the mt7621 is quite old...

That padata framework would lend itself quite nicely to using a hardware accelerator asynchronously. Here's a useful blog article on it: https://blogs.oracle.com/linux/unbinding-parallel-jobs-in-padata

I was already looking into this, but I don't think the problem is on the "transmit" to hardware side, but rather on the "receive" from hardware. This is why RPS is more function than XPS for a NIC. The transmitting side are the various applications we are using running on different CPU's. The receiving side is all done by one CPU, which is assigned to the interrupt.

For now I will see if two queue would make a difference: an encryption queue and a decryption queue.

Hi all! Will this crypto engine help me increase the wireguard throughput? What do you think? This patch has already been merged in lede (coolsnowwolf repo).

1 Like

it will not.

1 Like