Mediatek EIP-93 crypto driver (MT7621) - finally pushed upstream... [in progress]

Hi all,

Sorry for this long post but:

I am happy to announce that I finally pushed my EIP93 crypto driver upstream. That means I send the patches and it is in the process of being reviewed etc.

I am sending a scaled down version as initial patch in the hope that it will speed up the acceptance process. I removed some experimental stuff like Polling mode (which speeds up OpenSSL) and full ESP HW offload (which needs some patching to the esp{4,6}_offload so that might take some time (discussion).

What is enabled: DES ECB/CBC AES ECB/CBC/CTR and authenc(hmac(MD5/SHA1/SHA256), DES/AES(ECB/CBC/CTR/RFC3686))

AES-CTR can be used for GCM; AES-ECB for AES-XTS and AES-CBC for AES-CTS.

Mikrotik HEX-S (RB760igs)

Software OpenSSL:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes

aes-128-ecb      11510.47k    13501.51k    14126.95k    14294.16k    14340.08k    14337.36k
aes-256-ecb       8869.99k    10002.43k    10339.42k    10424.05k    10448.20k    10450.92k


aes-128-cbc       8979.48k    11176.46k    11904.00k    12101.57k    12160.09k    12160.09k
aes-256-cbc       7298.06k     8667.13k     9099.23k     9213.62k     9247.98k     9242.54k

aes-128-ctr       8287.57k     9700.27k    10123.82k    10234.90k    10268.58k    10265.86k
aes-256-ctr       6825.28k     7755.59k     8022.75k     8094.02k     8115.80k     8110.35k

cryptsetup benchmarks:

# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-ecb        128b        12.4 MiB/s        12.4 MiB/s
    aes-ecb        256b         9.6 MiB/s         9.6 MiB/s
    aes-cbc        128b        11.3 MiB/s        11.6 MiB/s
    aes-cbc        256b         9.0 MiB/s         9.0 MiB/s
    aes-ctr        128b        11.4 MiB/s        11.4 MiB/s
    aes-ctr        256b         8.9 MiB/s         8.9 MiB/s
    aes-xts        256b        11.5 MiB/s        11.5 MiB/s
    aes-xts        512b         9.0 MiB/s         8.9 MiB/s

OpenSSL via dev/crypto to EIP-93:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-ecb        389.27k     1570.91k     6101.73k    19186.90k    49032.25k    55351.79k
aes-256-ecb        390.35k     1572.49k     5906.71k    17298.80k    38083.27k    41923.44k

aes-128-cbc        384.57k     1546.01k     5968.97k    18756.55k    48049.75k    54257.71k
aes-256-cbc        382.44k     1534.85k     5728.70k    16889.88k    37514.46k    41308.36k

aes-128-ctr        756.15k     1529.39k     5929.08k    18727.29k    48588.63k    55041.53k
aes-256-ctr        756.17k     1523.67k     5732.53k    16923.22k    37827.44k    41760.15k


cryptsetup benchmarks (EIP-93):

# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-ecb        128b        40.6 MiB/s        40.5 MiB/s
    aes-ecb        256b        33.0 MiB/s        33.0 MiB/s
    aes-cbc        128b        40.1 MiB/s        40.4 MiB/s
    aes-cbc        256b        32.6 MiB/s        32.9 MiB/s
    aes-ctr        128b        40.3 MiB/s        40.3 MiB/s
    aes-ctr        256b        32.7 MiB/s        32.6 MiB/s
    aes-xts        256b        27.7 MiB/s        27.7 MiB/s
    aes-xts        512b        23.9 MiB/s        23.9 MiB/s
    
root@debian:~# iperf3 -c 192.168.88.188 -R -T up
up:  Connecting to host 192.168.88.188, port 5201
up:  Reverse mode, remote host 192.168.88.188 is sending
up:  [  5] local 192.168.77.2 port 35292 connected to 192.168.88.188 port 5201
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-1.00   sec  15.4 MBytes   129 Mbits/sec                  
up:  [  5]   1.00-2.00   sec  15.5 MBytes   130 Mbits/sec                  
up:  [  5]   2.00-3.00   sec  15.7 MBytes   132 Mbits/sec                  
up:  [  5]   3.00-4.00   sec  15.5 MBytes   130 Mbits/sec                  
up:  [  5]   4.00-5.00   sec  15.5 MBytes   130 Mbits/sec                  
up:  [  5]   5.00-6.00   sec  15.5 MBytes   130 Mbits/sec                  
up:  [  5]   6.00-7.00   sec  15.5 MBytes   130 Mbits/sec                  
up:  [  5]   7.00-8.00   sec  15.6 MBytes   131 Mbits/sec                  
up:  [  5]   8.00-9.00   sec  15.5 MBytes   130 Mbits/sec                  
up:  [  5]   9.00-10.00  sec  15.5 MBytes   130 Mbits/sec                  
up:  - - - - - - - - - - - - - - - - - - - - - - - - -
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-10.03  sec   157 MBytes   131 Mbits/sec                  sender
up:  [  5]   0.00-10.00  sec   155 MBytes   130 Mbits/sec                  receiver
up:  
up:  iperf Done.
root@debian:~# iperf3 -c 192.168.88.188 -T dn
dn:  Connecting to host 192.168.88.188, port 5201
dn:  [  5] local 192.168.77.2 port 35296 connected to 192.168.88.188 port 5201
dn:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
dn:  [  5]   0.00-1.00   sec  25.9 MBytes   217 Mbits/sec    0   1.05 MBytes       
dn:  [  5]   1.00-2.00   sec  22.5 MBytes   189 Mbits/sec   86    662 KBytes       
dn:  [  5]   2.00-3.00   sec  23.8 MBytes   199 Mbits/sec    0    704 KBytes       
dn:  [  5]   3.00-4.00   sec  23.8 MBytes   199 Mbits/sec    0    731 KBytes       
dn:  [  5]   4.00-5.00   sec  23.8 MBytes   199 Mbits/sec    0    746 KBytes       
dn:  [  5]   5.00-6.00   sec  23.8 MBytes   199 Mbits/sec    0    753 KBytes       
dn:  [  5]   6.00-7.00   sec  23.8 MBytes   199 Mbits/sec    0    754 KBytes       
dn:  [  5]   7.00-8.00   sec  23.8 MBytes   199 Mbits/sec    0    776 KBytes       
dn:  [  5]   8.00-9.00   sec  22.5 MBytes   189 Mbits/sec    0    797 KBytes       
dn:  [  5]   9.00-10.00  sec  23.8 MBytes   199 Mbits/sec    0    818 KBytes       
dn:  - - - - - - - - - - - - - - - - - - - - - - - - -
dn:  [ ID] Interval           Transfer     Bitrate         Retr
dn:  [  5]   0.00-10.00  sec   237 MBytes   199 Mbits/sec   86             sender
dn:  [  5]   0.00-10.02  sec   235 MBytes   197 Mbits/sec                  receiver
dn:  
dn:  iperf Done.
    
root@debian:~# iperf3 -c 192.168.88.188 -p5201 -R -T up -b18M & iperf3 -c 192.168.88.188 -p5202 -T dn&
[1] 2923964
[2] 2923965
root@debian:~# up:  Connecting to host 192.168.88.188, port 5201
up:  Reverse mode, remote host 192.168.88.188 is sending
dn:  Connecting to host 192.168.88.188, port 5202
dn:  [  5] local 192.168.77.2 port 48902 connected to 192.168.88.188 port 5202
up:  [  5] local 192.168.77.2 port 35282 connected to 192.168.88.188 port 5201
dn:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-1.00   sec  2.14 MBytes  17.9 Mbits/sec                  
dn:  [  5]   0.00-1.00   sec  23.2 MBytes   195 Mbits/sec   62    682 KBytes       
up:  [  5]   1.00-2.00   sec  2.24 MBytes  18.8 Mbits/sec                  
dn:  [  5]   1.00-2.00   sec  21.2 MBytes   178 Mbits/sec    0    776 KBytes       
up:  [  5]   2.00-3.00   sec  2.12 MBytes  17.8 Mbits/sec                  
dn:  [  5]   2.00-3.00   sec  20.0 MBytes   168 Mbits/sec    2    594 KBytes       
dn:  [  5]   3.00-4.00   sec  21.2 MBytes   178 Mbits/sec    0    640 KBytes       
up:  [  5]   3.00-4.00   sec  2.12 MBytes  17.8 Mbits/sec                  
dn:  [  5]   4.00-5.00   sec  21.2 MBytes   178 Mbits/sec    0    669 KBytes       
up:  [  5]   4.00-5.00   sec  2.12 MBytes  17.8 Mbits/sec                  
dn:  [  5]   5.00-6.00   sec  21.2 MBytes   178 Mbits/sec    0    686 KBytes       
up:  [  5]   5.00-6.00   sec  2.12 MBytes  17.8 Mbits/sec                  
up:  [  5]   6.00-7.00   sec  2.18 MBytes  18.3 Mbits/sec                  
dn:  [  5]   6.00-7.00   sec  20.0 MBytes   168 Mbits/sec    0    694 KBytes       
up:  [  5]   7.00-8.00   sec  2.19 MBytes  18.4 Mbits/sec                  
dn:  [  5]   7.00-8.00   sec  21.2 MBytes   178 Mbits/sec    0    696 KBytes       
dn:  [  5]   8.00-9.00   sec  21.2 MBytes   178 Mbits/sec    0    715 KBytes       
up:  [  5]   8.00-9.00   sec  2.12 MBytes  17.8 Mbits/sec                  
dn:  [  5]   9.00-10.00  sec  21.2 MBytes   178 Mbits/sec    0    736 KBytes       
dn:  - - - - - - - - - - - - - - - - - - - - - - - - -
dn:  [ ID] Interval           Transfer     Bitrate         Retr
dn:  [  5]   0.00-10.00  sec   212 MBytes   178 Mbits/sec   64             sender
dn:  [  5]   0.00-10.02  sec   209 MBytes   175 Mbits/sec                  receiver
up:  [  5]   9.00-10.00  sec  2.12 MBytes  17.8 Mbits/sec                  
dn:  
up:  - - - - - - - - - - - - - - - - - - - - - - - - -
dn:  iperf Done.
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-10.02  sec  21.6 MBytes  18.1 Mbits/sec                  sender
up:  [  5]   0.00-10.00  sec  21.5 MBytes  18.0 Mbits/sec                  receiver
up:  
up:  iperf Done.

[1]-  Done                    iperf3 -c 192.168.88.188 -p5201 -R -T up -b18M
[2]+  Done                    iperf3 -c 192.168.88.188 -p5202 -T dn

Experimental Polling mode (not included upstream):

Hardware EIP93 (Poll mode) OpenSSl via /dev/crypto

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-ecb       1758.37k     6546.43k    20121.34k    42089.46k    59099.43k    61235.88k
aes-256-ecb       1838.36k     6568.38k    18594.79k    34385.99k    44225.91k    45450.63k

aes-128-cbc       1689.06k     6238.87k    19305.89k    40923.60k    57844.77k    60027.49k
aes-256-cbc       1615.24k     5844.43k    17093.91k    32883.33k    43414.88k    44715.80k

aes-128-ctr       2810.66k     6051.51k    18909.98k    40825.28k    58546.94k    60969.16k
aes-256-ctr       2750.02k     5832.23k    17012.26k    32856.45k    43972.81k    45265.56k

cryptsetup benchmarks:

# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-ecb        128b        42.1 MiB/s        42.0 MiB/s
    aes-ecb        256b        33.9 MiB/s        33.8 MiB/s
    aes-cbc        128b        41.3 MiB/s        41.6 MiB/s
    aes-cbc        256b        33.3 MiB/s        33.5 MiB/s
    aes-ctr        128b        41.7 MiB/s        41.7 MiB/s
    aes-ctr        256b        33.8 MiB/s        33.7 MiB/s
    aes-xts        256b        28.3 MiB/s        28.5 MiB/s
    aes-xts        512b        24.5 MiB/s        24.5 MiB/s

And last but not least (also still experimental and ipv4 only): FULL ESP-HW-offload:

root@debian:~# swanctl -i -c ike2
[IKE] establishing CHILD_SA ike2{5}
[ENC] generating CREATE_CHILD_SA request 1 [ SA No TSi TSr ]
[NET] sending packet: from 2.2.2.1[4500] to 2.2.2.2[4500] (208 bytes)
[NET] received packet: from 2.2.2.2[4500] to 2.2.2.1[4500] (208 bytes)
[ENC] parsed CREATE_CHILD_SA response 1 [ SA No TSi TSr ]
[CFG] selected proposal: ESP:AES_CTR_128/HMAC_SHA2_256_128/NO_EXT_SEQ
[IKE] CHILD_SA ike2{5} established with SPIs cd3c17e7_i cf5ff6e3_o and TS 0.0.0.0/0 === 0.0.0.0/0
initiate completed successfully
root@debian:~# ip x s
src 2.2.2.1 dst 2.2.2.2
	proto esp spi 0xcf5ff6e3 reqid 1 mode tunnel
	replay-window 0 flag af-unspec
	auth-trunc hmac(sha256) 0x0ee09f604efd7be9c57faba4f19ebfca93b1e845e8029d60d68f474d8ae6ece0 128
	enc rfc3686(ctr(aes)) 0xf2b555e58623ffbb7c0dbf25342a36e6a062535e
	anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
	if_id 0x1
src 2.2.2.2 dst 2.2.2.1
	proto esp spi 0xcd3c17e7 reqid 1 mode tunnel
	replay-window 32 flag af-unspec
	auth-trunc hmac(sha256) 0x3c491ce63a3f6a22cf8992addfbc32c1b0f597981a395504d0b2c81f645e1694 128
	enc rfc3686(ctr(aes)) 0x2e1ed5efbc2147dee711aca17a9b2f1682dfee24
	anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
	if_id 0x1
root@debian:~# iperf3 -c 192.168.88.188
Connecting to host 192.168.88.188, port 5201
[  5] local 192.168.77.2 port 57030 connected to 192.168.88.188 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.05   sec  48.6 MBytes   388 Mbits/sec   21   1.31 MBytes       
[  5]   1.05-2.00   sec  50.0 MBytes   443 Mbits/sec    0   1.45 MBytes       
[  5]   2.00-3.00   sec  50.0 MBytes   419 Mbits/sec    0   1.57 MBytes       
[  5]   3.00-4.00   sec  51.2 MBytes   430 Mbits/sec    0   1.67 MBytes       
[  5]   4.00-5.03   sec  51.2 MBytes   417 Mbits/sec    0   1.74 MBytes       
[  5]   5.03-6.00   sec  50.0 MBytes   433 Mbits/sec   25   1.29 MBytes       
[  5]   6.00-7.00   sec  50.0 MBytes   419 Mbits/sec    0   1.36 MBytes       
[  5]   7.00-8.00   sec  50.0 MBytes   419 Mbits/sec    0   1.42 MBytes       
[  5]   8.00-9.00   sec  51.2 MBytes   430 Mbits/sec    0   1.45 MBytes       
[  5]   9.00-10.05  sec  48.8 MBytes   390 Mbits/sec    0   1.48 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.05  sec   501 MBytes   418 Mbits/sec   46             sender
[  5]   0.00-10.05  sec   501 MBytes   418 Mbits/sec                  receiver

iperf Done.
root@debian:~# ip x s
src 2.2.2.1 dst 2.2.2.2
	proto esp spi 0xcf5ff6e3 reqid 1 mode tunnel
	replay-window 0 flag af-unspec
	auth-trunc hmac(sha256) 0x0ee09f604efd7be9c57faba4f19ebfca93b1e845e8029d60d68f474d8ae6ece0 128
	enc rfc3686(ctr(aes)) 0xf2b555e58623ffbb7c0dbf25342a36e6a062535e
	anti-replay context: seq 0x0, oseq 0x5c97d, bitmap 0x00000000
	if_id 0x1
src 2.2.2.2 dst 2.2.2.1
	proto esp spi 0xcd3c17e7 reqid 1 mode tunnel
	replay-window 32 flag af-unspec
	auth-trunc hmac(sha256) 0x3c491ce63a3f6a22cf8992addfbc32c1b0f597981a395504d0b2c81f645e1694 128
	enc rfc3686(ctr(aes)) 0x2e1ed5efbc2147dee711aca17a9b2f1682dfee24
	anti-replay context: seq 0xa357, oseq 0x0, bitmap 0xffffffff
	if_id 0x1


root@debian:~# iperf3 -c 192.168.88.188 -R
Connecting to host 192.168.88.188, port 5201
Reverse mode, remote host 192.168.88.188 is sending
[  5] local 192.168.77.2 port 57034 connected to 192.168.88.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  14.1 MBytes   118 Mbits/sec                  
[  5]   1.00-2.00   sec  12.9 MBytes   108 Mbits/sec                  
[  5]   2.00-3.00   sec  13.6 MBytes   114 Mbits/sec                  
[  5]   3.00-4.00   sec  13.0 MBytes   109 Mbits/sec                  
[  5]   4.00-5.00   sec  13.4 MBytes   113 Mbits/sec                  
[  5]   5.00-6.00   sec  13.6 MBytes   114 Mbits/sec                  
[  5]   6.00-7.00   sec  13.8 MBytes   116 Mbits/sec                  
[  5]   7.00-8.00   sec  13.8 MBytes   116 Mbits/sec                  
[  5]   8.00-9.00   sec  13.0 MBytes   109 Mbits/sec                  
[  5]   9.00-10.00  sec  13.6 MBytes   114 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.01  sec   139 MBytes   116 Mbits/sec                  sender
[  5]   0.00-10.00  sec   135 MBytes   113 Mbits/sec                  receiver

iperf Done.

root@debian:~# swanctl -i -c ike2
[IKE] establishing CHILD_SA ike2{6}
[ENC] generating CREATE_CHILD_SA request 3 [ SA No TSi TSr ]
[NET] sending packet: from 2.2.2.1[4500] to 2.2.2.2[4500] (208 bytes)
[NET] received packet: from 2.2.2.2[4500] to 2.2.2.1[4500] (208 bytes)
[ENC] parsed CREATE_CHILD_SA response 3 [ SA No TSi TSr ]
[CFG] selected proposal: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ
[IKE] CHILD_SA ike2{6} established with SPIs cf3377bf_i c7006356_o and TS 0.0.0.0/0 === 0.0.0.0/0
initiate completed successfully
root@debian:~# iperf3 -c 192.168.88.188
Connecting to host 192.168.88.188, port 5201
[  5] local 192.168.77.2 port 57038 connected to 192.168.88.188 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  39.8 MBytes   333 Mbits/sec    0   1.65 MBytes       
[  5]   1.00-2.00   sec  40.0 MBytes   336 Mbits/sec  346   1.98 MBytes       
[  5]   2.00-3.00   sec  41.2 MBytes   346 Mbits/sec    0   2.17 MBytes       
[  5]   3.00-4.00   sec  38.8 MBytes   325 Mbits/sec    0   2.31 MBytes       
[  5]   4.00-5.00   sec  41.2 MBytes   346 Mbits/sec  103   1.71 MBytes       
[  5]   5.00-6.00   sec  40.0 MBytes   336 Mbits/sec    0   1.81 MBytes       
[  5]   6.00-7.00   sec  40.0 MBytes   336 Mbits/sec    0   1.89 MBytes       
[  5]   7.00-8.00   sec  41.2 MBytes   346 Mbits/sec    0   1.95 MBytes       
[  5]   8.00-9.00   sec  41.2 MBytes   346 Mbits/sec    0   1.99 MBytes       
[  5]   9.00-10.00  sec  40.0 MBytes   336 Mbits/sec    0   2.02 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   404 MBytes   339 Mbits/sec  449             sender
[  5]   0.00-10.03  sec   403 MBytes   337 Mbits/sec                  receiver

iperf Done.
root@debian:~# iperf3 -c 192.168.88.188 -R
Connecting to host 192.168.88.188, port 5201
Reverse mode, remote host 192.168.88.188 is sending
[  5] local 192.168.77.2 port 57042 connected to 192.168.88.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  14.4 MBytes   121 Mbits/sec                  
[  5]   1.00-2.00   sec  13.2 MBytes   111 Mbits/sec                  
[  5]   2.00-3.00   sec  13.6 MBytes   114 Mbits/sec                  
[  5]   3.00-4.00   sec  13.7 MBytes   115 Mbits/sec                  
[  5]   4.00-5.00   sec  13.9 MBytes   117 Mbits/sec                  
[  5]   5.00-6.00   sec  13.2 MBytes   111 Mbits/sec                  
[  5]   6.00-7.00   sec  13.6 MBytes   114 Mbits/sec                  
[  5]   7.00-8.00   sec  13.7 MBytes   115 Mbits/sec                  
[  5]   8.00-9.00   sec  13.8 MBytes   116 Mbits/sec                  
[  5]   9.00-10.00  sec  13.9 MBytes   116 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   141 MBytes   118 Mbits/sec                  sender
[  5]   0.00-10.00  sec   137 MBytes   115 Mbits/sec                  receiver

iperf Done.

AES128-SHA1: 1 up and 1 down stream: Notice the -b38M

root@debian:~# iperf3 -c 192.168.88.188 -R -T up -b38M& iperf3 -c 192.168.88.188 -p5202 -T dn&
[1] 2847184
[2] 2847185
root@debian:~# up:  Connecting to host 192.168.88.188, port 5201
dn:  Connecting to host 192.168.88.188, port 5202
up:  Reverse mode, remote host 192.168.88.188 is sending
dn:  [  5] local 192.168.77.2 port 42454 connected to 192.168.88.188 port 5202
up:  [  5] local 192.168.77.2 port 57070 connected to 192.168.88.188 port 5201
dn:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
dn:  [  5]   0.00-1.00   sec  43.1 MBytes   362 Mbits/sec    0   1.57 MBytes       
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-1.00   sec  4.56 MBytes  38.3 Mbits/sec                  
up:  [  5]   1.00-2.00   sec  4.56 MBytes  38.3 Mbits/sec                  
dn:  [  5]   1.00-2.02   sec  46.2 MBytes   381 Mbits/sec  284   1.35 MBytes       
up:  [  5]   2.00-3.00   sec  4.50 MBytes  37.7 Mbits/sec                  
dn:  [  5]   2.02-3.00   sec  47.5 MBytes   406 Mbits/sec    0   1.44 MBytes       
up:  [  5]   3.00-4.00   sec  4.54 MBytes  38.0 Mbits/sec                  
dn:  [  5]   3.00-4.02   sec  47.5 MBytes   390 Mbits/sec    0   1.50 MBytes       
dn:  [  5]   4.02-5.00   sec  47.5 MBytes   407 Mbits/sec    0   1.55 MBytes       
up:  [  5]   4.00-5.00   sec  4.55 MBytes  38.1 Mbits/sec                  
dn:  [  5]   5.00-6.00   sec  46.2 MBytes   388 Mbits/sec    0   1.58 MBytes       
up:  [  5]   5.00-6.00   sec  4.54 MBytes  38.1 Mbits/sec                  
dn:  [  5]   6.00-7.00   sec  48.8 MBytes   409 Mbits/sec    0   1.60 MBytes       
up:  [  5]   6.00-7.00   sec  4.50 MBytes  37.7 Mbits/sec                  
up:  [  5]   7.00-8.00   sec  4.50 MBytes  37.7 Mbits/sec                  
dn:  [  5]   7.00-8.00   sec  47.5 MBytes   398 Mbits/sec    0   1.61 MBytes       
dn:  [  5]   8.00-9.00   sec  46.2 MBytes   388 Mbits/sec    0   1.61 MBytes       
up:  [  5]   8.00-9.00   sec  4.54 MBytes  38.1 Mbits/sec                  
dn:  [  5]   9.00-10.00  sec  47.5 MBytes   398 Mbits/sec    0   1.61 MBytes       
dn:  - - - - - - - - - - - - - - - - - - - - - - - - -
dn:  [ ID] Interval           Transfer     Bitrate         Retr
dn:  [  5]   0.00-10.00  sec   468 MBytes   393 Mbits/sec  284             sender
dn:  [  5]   0.00-10.03  sec   466 MBytes   390 Mbits/sec                  receiver
dn:  
dn:  iperf Done.
up:  [  5]   9.00-10.00  sec  4.56 MBytes  38.3 Mbits/sec                  
up:  - - - - - - - - - - - - - - - - - - - - - - - - -
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-10.03  sec  45.5 MBytes  38.1 Mbits/sec                  sender
up:  [  5]   0.00-10.00  sec  45.4 MBytes  38.0 Mbits/sec                  receiver
up:  
up:  iperf Done.

Same test without the limit on the upload side and performance drops dramatic...

iperf3 -c 192.168.88.188 -R -T up & iperf3 -c 192.168.88.188 -p5202 -T dn&
[3] 2847188
[4] 2847189
[1]   Done                    iperf3 -c 192.168.88.188 -R -T up -b38M
[2]   Done                    iperf3 -c 192.168.88.188 -p5202 -T dn
root@debian:~# dn:  Connecting to host 192.168.88.188, port 5202
up:  Connecting to host 192.168.88.188, port 5201
up:  Reverse mode, remote host 192.168.88.188 is sending
dn:  [  5] local 192.168.77.2 port 42464 connected to 192.168.88.188 port 5202
up:  [  5] local 192.168.77.2 port 57076 connected to 192.168.88.188 port 5201
dn:  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
dn:  [  5]   0.00-1.00   sec   834 KBytes  6.83 Mbits/sec    0   97.5 KBytes       
up:  [ ID] Interval           Transfer     Bitrate
up:  [  5]   0.00-1.00   sec  12.4 MBytes   104 Mbits/sec                  
up:  [  5]   1.00-2.00   sec  13.4 MBytes   112 Mbits/sec                  
dn:  [  5]   1.00-2.00   sec   382 KBytes  3.13 Mbits/sec    0    119 KBytes       
up:  [  5]   2.00-3.00   sec  12.7 MBytes   106 Mbits/sec                  
dn:  [  5]   2.00-3.00   sec   573 KBytes  4.69 Mbits/sec    0    143 KBytes       
up:  [  5]   3.00-4.00   sec  12.8 MBytes   107 Mbits/sec                  
dn:  [  5]   3.00-4.00   sec  1018 KBytes  8.34 Mbits/sec    0    183 KBytes       
up:  [  5]   4.00-5.00   sec  12.9 MBytes   108 Mbits/sec                  
dn:  [  5]   4.00-5.00   sec   763 KBytes  6.25 Mbits/sec    0    222 KBytes       
up:  [  5]   5.00-6.00   sec  12.9 MBytes   109 Mbits/sec                  
dn:  [  5]   5.00-6.00   sec   954 KBytes  7.81 Mbits/sec    0    271 KBytes       
up:  [  5]   6.00-7.00   sec  12.2 MBytes   102 Mbits/sec                  
dn:  [  5]   6.00-7.00   sec  1.37 MBytes  11.5 Mbits/sec    0    329 KBytes       
up:  [  5]   7.00-8.00   sec  12.3 MBytes   104 Mbits/sec                  
dn:  [  5]   7.00-8.00   sec  1.74 MBytes  14.6 Mbits/sec    0    417 KBytes       
dn:  [  5]   8.00-9.00   sec  2.05 MBytes  17.2 Mbits/sec    0    506 KBytes       
up:  [  5]   8.00-9.00   sec  12.3 MBytes   103 Mbits/sec                  
dn:  [  5]   9.00-10.00  sec  2.48 MBytes  20.8 Mbits/sec    0    616 KBytes       
up:  [  5]   9.00-10.00  sec  12.4 MBytes   104 Mbits/sec                  
dn:  - - - - - - - - - - - - - - - - - - - - - - - - -
up:  - - - - - - - - - - - - - - - - - - - - - - - - -
dn:  [ ID] Interval           Transfer     Bitrate         Retr
up:  [ ID] Interval           Transfer     Bitrate
dn:  [  5]   0.00-10.00  sec  12.1 MBytes  10.1 Mbits/sec    0             sender
up:  [  5]   0.00-10.01  sec   130 MBytes   109 Mbits/sec                  sender
dn:  [  5]   0.00-10.01  sec  11.3 MBytes  9.47 Mbits/sec                  receiver
up:  [  5]   0.00-10.00  sec   126 MBytes   106 Mbits/sec                  receiver
dn:  
up:  
dn:  iperf Done.
up:  iperf Done.

Still a work in progess. I can't seem to get it to send (encrypt) and it can receive. Mikrotik claims that it could do 450mbps combined like 225/225 where I am only getting 420-ish when I limit the upload speed. If you have a asymetric internet connection this should be less of a problem. My own connection here is 300/30 so I should be able to get "line-speed" over IPSec (if I could only test that with something in my area which could send 300 mbps+).

check my github: mtk-eip93

5 Likes

Great work.
Will it improve openvpn perfomance as well?

OpenVPN could be improved at a later stage. For now using OpenVPN with OpenSSL and the cryptodev will actually give worse performance with a standard tunnel MTU of 1500 or less.

IF you can set up a tunnel with a very large MTU (like 64K) than it will improve the performance. This is true for any hardware crypto at the moment. I consider AES-NI instruction set or some ARM neon extensions a (good) software solution.

I have plans to create a cryptodev like engine which talks to my driver specifically. I feel that that approach is better than trying to hack/patch OpenVPN itself (even when I have seen prove that doing it like that improves crypto as well).

1 Like

The user space to kernel space data transfers will kill the gains provided by the hardware crypto engine. The only way that I know of how to improve OpenVPN performance is to move it into the kernel, like WireGuard.

1 Like

Yeah, I'm still waiting for the openvpn-dco driver to be usuable before I try my hand on trying to hack aes-cbc-sha-hmac on to it. It looks like they are only doing aes-gcm and chacha-poly only.

I tried doing something like the openvpn-dco previously, taking inspiration from how the QCA folks did to accelerate OpenVPN with the IPQ807x SoC drivers, but didn't get very far unfortunately. In the end I decided to wait for the openvpn-dco to be ready.

Another approach could be to move the whole driver into user space. More like a choice: either you have the kernel driver OR you have an openssl / wolfssl hardware engine. Most people using OpenVPN will not use IPSec and additional users from the kernel like dm-crypt are not often used on a router..

Since the engine was designed with IPSec in mind, I will continue to focus on getting the most performance out of the little SoC. Sofar I can "beat" Mikrotik by 20mbps when using aes256-sha256 (340 mbps), but I am still falling short when using aes128-sha1 (I'm at around 415-420 while they claimed 450 mbps).

Let's first see if the core driver will be accepted upstream and build from there.

1 Like

Don't you have to send the data to kernel space in order for the crypto-engine to process it?

https://git.openwrt.org/?p=feed/packages.git;a=commit;h=17cd1793bbecb01a802b413c30b15d433af3ebe1

Nice; but as @quarky said: right now it only supports aes-gcm so on the MT7621 it might have very limited improvements since aes-ctr-eip93 could be used for the gcm part; ghash still has to be done in software. I will try to do a quick performance test tomorrow using IPSec and aes-gcm to have some idea about performance using only aes-ctr. I think it might be along the lines op 10-15% improvement.

The OpenVPN people intentionally removed any AES with HMAC support.

@quarky: for a user space driver I could only map the io registers and there would be no need to use DMA and interrupts. I am still not sure how to do DMA in userspace with a pre-allocated buffer without the need to do something in the kernel (problem is actually how to flush the cache in user space). It would involve having to copy the data from the buffer into the engine and the result back so I'm not sure how performance will improve. I'm looking into a WolfSSL hardware engine because it seems easier to do.

1 Like

I did a short iperf3 test both software and partial hardware offload for aes-gcm on IPSec. This hardware can NOT do GCM so it is only used for the AES-CTR part of GCM, the ghash is still in software. I expect similar performance improvements when using OpenVPN with in-kernel DCO.

I removed intervals to shorten the output.

Software IPSec: aes128gcm96


root@debian:~# iperf3 -c 192.168.77.188 
Connecting to host 192.168.77.188, port 5201
[  5] local 192.168.88.2 port 55318 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.06 MBytes  34.0 Mbits/sec    0    202 KBytes       
[  5]   4.00-5.00   sec  4.29 MBytes  36.0 Mbits/sec    0   1.03 MBytes       
[  5]   5.00-6.00   sec  2.42 MBytes  20.3 Mbits/sec  452    585 KBytes      
[  5]   9.00-10.00  sec  4.98 MBytes  41.8 Mbits/sec    0    670 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  43.5 MBytes  36.5 Mbits/sec  452             sender
[  5]   0.00-10.07  sec  41.0 MBytes  34.1 Mbits/sec                  receiver

iperf Done.
root@debian:~# iperf3 -c 192.168.77.188 -R
Connecting to host 192.168.77.188, port 5201
Reverse mode, remote host 192.168.77.188 is sending
[  5] local 192.168.88.2 port 55324 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  3.74 MBytes  31.3 Mbits/sec                  
[  5]   9.00-10.00  sec  3.50 MBytes  29.3 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.06  sec  37.3 MBytes  31.1 Mbits/sec                  sender
[  5]   0.00-10.00  sec  36.2 MBytes  30.4 Mbits/sec                  receiver

iperf Done.

Partial Hardware EIP93: aes128gcm96 gcm_base(ctr(aes-eip93))

root@debian:~# iperf3 -c 192.168.77.188
Connecting to host 192.168.77.188, port 5201
[  5] local 192.168.88.2 port 55330 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  8.47 MBytes  71.1 Mbits/sec    0    441 KBytes       
[  5]   2.00-3.00   sec  6.66 MBytes  55.9 Mbits/sec  270    769 KBytes       
[  5]   3.00-4.00   sec  7.44 MBytes  62.4 Mbits/sec   28    575 KBytes       
[  5]   9.00-10.00  sec  7.39 MBytes  62.0 Mbits/sec    0    655 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  77.3 MBytes  64.8 Mbits/sec  298             sender
[  5]   0.00-10.06  sec  75.6 MBytes  63.1 Mbits/sec                  receiver

iperf Done.
root@debian:~# iperf3 -c 192.168.77.188 -R
Connecting to host 192.168.77.188, port 5201
Reverse mode, remote host 192.168.77.188 is sending
[  5] local 192.168.88.2 port 55334 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  5.19 MBytes  43.6 Mbits/sec                  
[  5]   9.00-10.00  sec  5.24 MBytes  43.9 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.08  sec  54.1 MBytes  45.0 Mbits/sec                  sender
[  5]   0.00-10.00  sec  52.4 MBytes  43.9 Mbits/sec                  receiver

iperf Done.

Software IPSec: aes256gcm128

root@debian:~# iperf3 -c 192.168.77.188
Connecting to host 192.168.77.188, port 5201
[  5] local 192.168.88.2 port 55394 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  3.63 MBytes  30.5 Mbits/sec    0    183 KBytes       
[  5]   6.00-7.00   sec  2.50 MBytes  21.0 Mbits/sec   80    923 KBytes       
[  5]   7.00-8.00   sec  3.75 MBytes  31.5 Mbits/sec  643    635 KBytes       
[  5]   9.00-10.00  sec  3.75 MBytes  31.5 Mbits/sec    0    720 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  38.4 MBytes  32.2 Mbits/sec  723             sender
[  5]   0.00-10.08  sec  35.9 MBytes  29.9 Mbits/sec                  receiver

iperf Done.
root@debian:~# iperf3 -c 192.168.77.188 -R
Connecting to host 192.168.77.188, port 5201
Reverse mode, remote host 192.168.77.188 is sending
[  5] local 192.168.88.2 port 55398 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  2.72 MBytes  22.8 Mbits/sec                  
[  5]   9.00-10.00  sec  3.10 MBytes  26.0 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.08  sec  32.0 MBytes  26.6 Mbits/sec                  sender
[  5]   0.00-10.00  sec  30.9 MBytes  25.9 Mbits/sec                  receiver

iperf Done.

Partial Hardware EIP93: aes256gcm128 gcm_base(ctr(aes-eip93))

root@debian:~# iperf3 -c 192.168.77.188
Connecting to host 192.168.77.188, port 5201
[  5] local 192.168.88.2 port 55406 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  7.39 MBytes  62.0 Mbits/sec    0    371 KBytes       
[  5]   1.00-2.00   sec  8.20 MBytes  68.8 Mbits/sec    0    753 KBytes       
[  5]   3.00-4.00   sec  6.24 MBytes  52.4 Mbits/sec  363    608 KBytes       
[  5]   9.00-10.00  sec  7.50 MBytes  62.9 Mbits/sec    0    727 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  76.7 MBytes  64.4 Mbits/sec  363             sender
[  5]   0.00-10.05  sec  74.4 MBytes  62.0 Mbits/sec                  receiver

iperf Done.
root@debian:~# iperf3 -c 192.168.77.188 -R
Connecting to host 192.168.77.188, port 5201
Reverse mode, remote host 192.168.77.188 is sending
[  5] local 192.168.88.2 port 55410 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  5.13 MBytes  43.0 Mbits/sec                  
[  5]   1.00-2.00   sec  5.19 MBytes  43.6 Mbits/sec                  
[  5]   9.00-10.00  sec  5.23 MBytes  43.8 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.07  sec  53.8 MBytes  44.8 Mbits/sec                  sender
[  5]   0.00-10.00  sec  52.1 MBytes  43.7 Mbits/sec                  receiver

iperf Done.

Moving IRQ to CPU2 has great effect on sending out of WAN:

drbrains@debian:~$ iperf3 -c 192.168.77.188 -R
Connecting to host 192.168.77.188, port 5201
Reverse mode, remote host 192.168.77.188 is sending
[  5] local 192.168.88.2 port 55450 connected to 192.168.77.188 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  10.0 MBytes  83.9 Mbits/sec                  
[  5]   1.00-2.00   sec  9.94 MBytes  83.4 Mbits/sec                  
[  5]   9.00-10.00  sec  9.96 MBytes  83.6 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.29  sec   104 MBytes  84.4 Mbits/sec                  sender
[  5]   0.00-10.00  sec  99.8 MBytes  83.7 Mbits/sec                  receiver

iperf Done.

1 Like

how to do this ?

By changing the smp_affinity of the corresponding IRQ.

cat /proc/interrupts

This will show you a list of all the interrupts on your system. First column is the IRQ number assigned by the system. The other column show how many interrupts which CPU has handled and the last column shows which sub-system is using the interrupt.

On my system this is 22 for crypto. To show which CPU the interrupt is assigned to: (change 22 to whatever is on your own system)

cat /proc/irq/22/smp_affinity

This show a hexadecimal which represents a bitmap of which CPU is allowed to handle the interrupt. Usually this is "f" on our 4 CPU MT7621A.

CPU 0: 1
CPU 1: 2
CPU 2: 4
CPU 3: 8

all CPUs together is 15 or f hexadecimal.

To change eg. to CPU-2 use:

echo 4 > /proc/irq/22/smp_affinity

Most systems have some script that runs at boot to change eg. the WiFi affinity to more evenly spread the interrupts. So your MT76x2 and MT7603 might already be running on an smp_affinity of 4 and 8, meaning they will use CPU-2 and CPU-3.
In that case it might be better to move the crypto-irq to CPU-1 like:

echo 2 > /proc/irq/22/smp_affinity
2 Likes

@drbrains may I ask where you got the interrupt numbers from?
I can't find a reference for them for the life of me. I'd like to just add the binding for the Inside Secure Safexcel crypto engine to a MT7621 device tree, mangle anything else which may get in my way, and see what happens. Assuming that it has matching interrupts.

As a side note, I saw in your repo that registers EIP93_REG_INT_MASK_STAT, and EIP93_REG_INT_CLR are defined with the same address ((INT_BASE)+(0x01 * EIP93_REG_WIDTH)): https://github.com/vschagen/mtk-eip93/blob/ca08387bf8352652129019bb19e2423ab313d5cb/crypto/mtk-eip93/eip93-regs.h#L60
I don't know if that is intentional, or not as my knowledge for all of this is somewhat lacking.

Interrupts are from the MT7621 programming guide see documents on GitHub.

The registers are described as two different registers (one is to write, the other to read). I kept both to keep the naming convention from the documentation.

Not sure why you mentioned the inside secure engine: the mainline driver is for the EIP-(1)97 and is not the same. The EIP-97 can be found in the MT7623. Another driver for the EIP-94 is also in mainline (as AMCC).

The programming guide which mentions the MT7621 seems to only cover switching. The other guides don't seem to cover the MT7621 at all.

That makes complete sense.

I mentioned it as I wanted to see what would happen if trying to use the driver, even though they are not the same. This is partly because I am really unable to find much information on the MT7621 and it's EIP-93 crypto engine, only broken links, or documents which mention that it exists without providing any further details. From your code, I do not think that anything matches up register wise, though I wanted to confirm that with documentation.

I did not know about the EIP-94 driver, thanks.

I’m sure there is room for improvement and optimization in my code. That’s one advantage of actually trying to get upstream: lots of critical eyes going over the code trying to make it better.

I am trying to get a generic way to implement side-offloading of ESP (AH). I have a working version, but I’m not sure if my way will be acceptable upstream. I will wait pushing / discussing that until the basic crypto driver is actually merged. For that it would actually be good to have some documentation for the EIP97; which I didn’t find. And I asked Inside Secure, but they didn’t want the legal hassle of a NDA.

By patching the esp_offload (which normally isn’t used unless there is actually hardware taking advantage of it) I can get 350mbps+ with IPSec, which is less than advertised on the Mikrotik Hex(R or S). I actually have an Hex-S and I didn’t manage to get close to the Mikrotik numbers running their RouterOs.

I've been trying to understand the DMA side of things as there are 2 DMA drivers for the SoC, both are disabled in device tree. I did enable them, only one driver can be loaded at a time. One is General DMA, the other is MT High Speed DMA. I've only found a sound module which depends on one of them, and I'm not sure if they may be part of the reason for MikroTiks performance differences.

Going through the hardware specs for the hEX (RB750Gr3) vs hEX S (RB760iGS) appears to be entirely around the SFP cage, and better PoE. If the SFP cage is populated, it will take one of the gigabit interfaces from the switch. So if that is populated, and throughput testing is running only over switch ports, that might cut performance a bit.

I'd initially thought that it was perhaps the EIP97 engine for a moment due to a single, and I think out of date documentation Compared to the Protocol-IP-93 it offers higher performance, more algorithms, protocol flexibility through token instructions and supports multi-core CPUs.

I do have a hEX MR750Gr3, and I can test stuff on weekends with it, if there is anything in particular you would like me to run. I did have to modify the code in the repo to get it to compiler on master, I think there is an issue with the same changes as for making the changes work with 21.02.