Ipsec differences between devices: is kmod-crypto-ctr the problem?

This may be an indication that the patch did not work, or that something else is wrong here. I still have my doubts that openssl works 100% with AES-CTR. Have you tried the openssl line without the devcrypto engine (if you don't want to uninstall it, just comment the openssl_conf=openssl_conf line, and test that the DUMP_INFO does not return anything) , and with the engine, but without my patch applied?

It encrypts the first (full) block correctly, and gets the second (partial) wrong.

openwrt # echo 'EJjLoYB5bd88Jp2+D8r8DEfB8w==' | base64 -d | openssl aes-128-ctr -d -nopad -iv 0f0e0d0c0b0a09080706050403020100 -K 000102030405060708090a0b0c0d0
e0f
0123456789abcdefgWopenwrt #

It maybe the IV not getting updated correctly (my biggest concern about it not working in openssl). If that's the case, then even if the second block is complete, it will get it wrong. Try this:

# echo '0123456789abcdefghijklmnopqrstuvwxyz' | openssl aes-128-ctr -e -nopad -iv 0f0e0d0c0b0a09080706050403020100 -a -K 000102030405060708090a0b0c0d0e0f
EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdCnMp4HG5A==

I'm just guessing, but perhaps the rfc3686 algorithms are only alloc/created if they are requested. My x86_64 machine does not show them in /proc/crypto either, and it uses the standard aesni drivers in linux-4.19.

I'll take another look at this when I get some time.

Thanks.

With patch and with openssl_conf=openssl_conf:

# echo '0123456789abcdefghijklmnopqrstuvwxyz' | openssl aes-128-ctr -e -nopad -iv 0f0e0d0c0b0a09080706050403020100 -a -K
 000102030405060708090a0b0c0d0e0f
EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdClX0YDovg==
# echo 'EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdClX0YDovg==' | base64 -d | openssl aes-128-ctr -d -nopad -iv 0f0e0d0c0b
0a09080706050403020100 -K 000102030405060708090a0b0c0d0e0f
0123456789abcdefghijklmnopqrstuvwxyz

and with patch but without openssl_conf=openssl_conf:

# echo '0123456789abcdefghijklmnopqrstuvwxyz' | openssl aes-128-ctr -e -nopad -iv 0f0e0d0c0b0a09080706050403020100 -a -K
 000102030405060708090a0b0c0d0e0f
EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdCnMp4HG5A==
# echo 'EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdCnMp4HG5A==' | base64 -d | openssl aes-128-ctr -d -nopad -iv 0f0e0d0c0b
0a09080706050403020100 -K 000102030405060708090a0b0c0d0e0f
0123456789abcdefghijklmnopqrstuvwxyz

so both produce reversible, but different, results. Will try without patch later

Got it. It's the handling of the IV of the last block. It doesn't seems to be the IV increment, as the second block is OK. Here's a comparison of the hexdump of the ciphertexts:

# echo 'EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdClX0YDovg==' | base64 -d | hexdump -C
00000000  10 98 cb a1 80 79 6d df  3c 26 9d be 0f ca fc 0c  |.....ym.<&......|
00000010  20 ce cd cf 1e 32 0d 2a  01 c2 e0 9e 3a 4d 74 29  | ....2.*....:Mt)|
00000020  57 d1 80 e8 be                                    |W....|
00000025
# echo 'EJjLoYB5bd88Jp2+D8r8DCDOzc8eMg0qAcLgnjpNdCnMp4HG5A==' | base64 -d | hexdump -C
00000000  10 98 cb a1 80 79 6d df  3c 26 9d be 0f ca fc 0c  |.....ym.<&......|
00000010  20 ce cd cf 1e 32 0d 2a  01 c2 e0 9e 3a 4d 74 29  | ....2.*....:Mt)|
00000020  cc a7 81 c6 e4                                    |.....|
00000025

Now if I feed just the last block using the same IV, without incrementing it by 2, I get the same sequence as you have. I just need to figure out who's at fault here.

# echo 'wxyz' | openssl aes-128-ctr -e -nopad -iv 0f0e0d0c0b0a09080706050403020100 -K 000102030405060708090a0b0c0d0e0f | hexdump -C
00000000  57 d1 80 e8 be                                    |W....|
00000005

This is what it should have been done. Notice the 02 at the end of the IV:

echo 'wxyz' | openssl aes-128-ctr -e -nopad -iv 0f0e0d0c0b0a09080706050403020102 -K 000102030405060708090a0b0c0d0e0f | hexdump -C
00000000  cc a7 81 c6 e4                                    |.....|
00000005

The reverse works because it uses the same (wrong) procedure.

1 Like

I can confirm that the rfc3686() algorithms are only instantiated, and thus appear in /proc/crypto upon request. Sorry, I'm too lazy to write a standalone AF_ALG app to request it, but I do have a convoluted test for you to perform :wink:.
First, apply this: https://github.com/openwrt/openwrt/pull/1547.patch, which will change the afalg engine to support more algorithms; then patch it to use rfc3686(aes(ctr)) for AES-CTR:

--- a/package/libs/openssl/patches/600-e_afalg-rewrite-of-AF_ALG-engine.patch
+++ b/package/libs/openssl/patches/600-e_afalg-rewrite-of-AF_ALG-engine.patch
@@ -438,9 +438,9 @@ index 7f62d77e5b..79c8b2406c 100644
 +#ifndef OPENSSL_NO_RC4
 +    { NID_rc4, 1, 16, 0, EVP_CIPH_STREAM_CIPHER, "arc4" },
 +#endif
-+    { NID_aes_128_ctr, 16, 128 / 8, 16, EVP_CIPH_CTR_MODE, "ctr(aes)" },
-+    { NID_aes_192_ctr, 16, 192 / 8, 16, EVP_CIPH_CTR_MODE, "ctr(aes)" },
-+    { NID_aes_256_ctr, 16, 256 / 8, 16, EVP_CIPH_CTR_MODE, "ctr(aes)" },
++    { NID_aes_128_ctr, 16, 128 / 8, 16, EVP_CIPH_CTR_MODE, "rfc3686(ctr(aes))" },
++    { NID_aes_192_ctr, 16, 192 / 8, 16, EVP_CIPH_CTR_MODE, "rfc3686(ctr(aes))" },
++    { NID_aes_256_ctr, 16, 256 / 8, 16, EVP_CIPH_CTR_MODE, "rfc3686(ctr(aes))" },
 +#if 0                            /* Not yet supported */
 +    { NID_aes_128_xts, 16, 128 / 8 * 2, 16, EVP_CIPH_XTS_MODE, "xts(aes)" },
 +    { NID_aes_256_xts, 16, 256 / 8 * 2, 16, EVP_CIPH_XTS_MODE, "xts(aes)" },

Now compile openssl, opkg install libopenssl-afalg; and disable (or uninstall) libopenssl-devcrypto, so that it does not interfere. To disable it, comment its line under [engines]; you migth as well add afalg=afalg, and configure it just like devcrypto. You can do all that at once with sed -i -e s/devcrypto/afalg/g /etc/ssl/openssl.cnf. The patchset also changes the kmod-crypto-user package (to expose the drivers being used), so make sure to install your freshly built kmod-crypto-user package first. lsmod | grep crypto_user should show the module installed--notice that crypto_user that, ironically, was not part of kmod-crypto-user in openwrt.


Then run the openssl dump_info on the afalg engine:

openssl engine -t -c -pre DUMP_INFO afalg

You may get a harmless error the first time you run it (probably the algorithm was not available to the kernel when the info was collected):

Cipher AES-128-CTR, NID=904, AF_ALG info: name=rfc3686(ctr(aes)),  driver=**unreliable info** (acceleration status unknown)

Now, you should have the rfc3686 algorithms showing in /proc/crypto:

# cat /proc/crypto  | egrep rfc
name         : rfc3686(ctr(aes))
driver       : rfc3686(ctr-aes-neonbs)

You may run the tests you ran above, but if you do not configure the afalg engine to be used instead of devcrypto (just need to globally replace 'devcrypto' with 'afalg' in openssl.cnf), make sure to add -engine afalg to the openssl commands:

echo '0123456789abcdefghijklmnopqrstuvwxyz' | openssl aes-128-ctr -e -nopad -iv 0f0e0d0c0b0a09080706050403020100 -a -K -engine afalg

I'll wait for your feedback.

Edited to add the paragraph abount installing libopenssl-afalg.

Hopefully it answers the same question in an easier way. I've gone back to my original ipsec configuration (which failed) and run it on your patched build and it now works. I can also see my debug statements hitting from str.c so crypto_rfc3686 is being created on demand.

I'll go back and regression test with an unpatched version of the latest build to prove it is the patch that fixes it.

It does! :smiley:
All I needed was for you to instantiate the algorithm, and ipsec will do that. Now we need to actually prove it works for all cases.

See if you can document your tests, please. I will try to upstream the changes, but will not have the hardware to run tests myself. I don't foresee needing to show so much why it does not work, as it is apparent from the code itself, but we need to show that there are no side-effects due to the hardware not being capable of handling partial blocks, for example. Whatever code the engine runs inside is unknown.

OpenSLL always sends a full AES block to the kernel, even if len < 16, so it does not count. Perhaps it's best that I do write an AF_ALG test-case proving this, :thinking: but this is not happening this week for sure... See if you can document encryption/decryption of partial blocks (len < 16 bytes).

As for the openssl engine failure, it maybe due to cryptodev's relying ont the kernel driver to update the IV, but I can't find if this is actually required, or expected behavior. Documentation is sparse, and each driver seems to do things differently, so it may be a coincidence that CBC works like that, perhaps because everybody is using the same logic, but not for CTR. I'll have to look further into it. I'll probably just have openssl handle IV for all CTR cases, instead of relying on cryptodev to do it. The code for it is already there anyway, disabled by #if defined(COP_FLAG_WRITE_IV).

It seems like I did write that test before the weekend:

#include <stdint.h>
#include <sys/uio.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <linux/if_alg.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct afalg_ctx_st {
  struct sockaddr_alg sa;
  int sfd, bfd;
};
typedef struct afalg_ctx_st afalg_ctx;

#ifndef USE_RFC3686
# define ALG_NAME "ctr(aes)"
# define IV_LEN 16
# define KEY_LEN 16
static char IV[] = { 0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08, 0x07,
                     0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00 };
static char KEY[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
                      0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };
#else
# define ALG_NAME "rfc3686(ctr(aes))"
# define IV_LEN 8
# define KEY_LEN 20
static char IV[] = { 0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00 };
static char KEY[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
                      0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11,
                      0x12, 0x13 };
#endif

static int CipherInit(afalg_ctx *ctx, char *alg_name, char *key,
                      size_t keylen, char *iv, size_t ivlen, int enc)
{
  struct sockaddr_alg sa;
  struct msghdr msg = { 0 };
  struct cmsghdr *cmsg;
  struct af_alg_iv *aiv;
  struct iovec iov;
  int op = enc ? ALG_OP_ENCRYPT : ALG_OP_DECRYPT;
  size_t set_op_len = sizeof op;
  size_t set_iv_len = offsetof(struct af_alg_iv, iv) + ivlen;
  char buf[CMSG_SPACE(set_op_len) + CMSG_SPACE(set_iv_len)];

  memset(&sa, 0, sizeof ctx->sa);
  sa.salg_family = AF_ALG;
  strcpy(sa.salg_type, "skcipher");
  strncpy(sa.salg_name, alg_name, sizeof sa.salg_name);
  if (( ctx->bfd = socket(AF_ALG, SOCK_SEQPACKET, 0)) < 0) {
    perror("Failed to open socket");
    goto err;
  }

  if (bind(ctx->bfd, (struct sockaddr *)&sa, sizeof sa) < 0) {
    perror("Failed to bind socket");
    goto err;
  }
  if (setsockopt(ctx->bfd, SOL_ALG, ALG_SET_KEY, KEY, KEY_LEN) < 0) {
    perror("Failed to set key");
  }
  if ((ctx->sfd = accept(ctx->bfd, NULL, 0)) < 0) {
    perror("Socket accept failed");
    goto err;
  }
  memset(&buf, 0, sizeof buf);
  msg.msg_control = buf;
  /* set op */
  msg.msg_controllen = CMSG_SPACE(set_op_len);
  cmsg = CMSG_FIRSTHDR(&msg);
  cmsg->cmsg_level = SOL_ALG;
  cmsg->cmsg_type = ALG_SET_OP;
  cmsg->cmsg_len = CMSG_LEN(set_op_len);
  memcpy(CMSG_DATA(cmsg), &op, sizeof op);
  /* set IV */
  msg.msg_controllen += CMSG_SPACE(set_iv_len);
  cmsg = CMSG_NXTHDR(&msg, cmsg);
  cmsg->cmsg_level = SOL_ALG;
  cmsg->cmsg_type = ALG_SET_IV;
  cmsg->cmsg_len = CMSG_LEN(set_iv_len);
  aiv = (void *)CMSG_DATA(cmsg);
  aiv->ivlen = IV_LEN;
  memcpy(aiv->iv, iv, IV_LEN);

  iov.iov_base = NULL;
  iov.iov_len = 0;
  if (sendmsg(ctx->sfd, &msg, 0) < 0) {
    perror("sendmsg: Failed to set op, iv");
    goto err;
  }
  return 1;
err:
  if (ctx->bfd >= 0)
    close(ctx->bfd);
  if (ctx->sfd >= 0)
    close(ctx->sfd);
  ctx->bfd = ctx->sfd = -1;
  return 0;
}

static int CipherUpdate(afalg_ctx *ctx, char *out, size_t *outl,
                        const char* in, size_t inl)
{
  struct msghdr msg = { 0 };
  struct cmsghdr *cmsg;
  struct iovec iov;
  ssize_t nbytes;
  int ret = 1;

  iov.iov_base = (void *)in;
  iov.iov_len = inl;
  msg.msg_iov = &iov;
  msg.msg_iovlen = 1;
  if ((nbytes = send(ctx->sfd, in, inl, MSG_MORE)) != (ssize_t) inl) {
    fprintf(stderr, "CipherUpdate: sent %zd bytes != inl %zd\n", nbytes, inl);
    if (nbytes <= 0)
      return 0;
    ret = 0;
  }
  if ((nbytes = read(ctx->sfd, out, (size_t) nbytes)) != (ssize_t) inl) {
    fprintf(stderr, "CipherUpdate: read %zd bytes != inl %zd\n", nbytes, inl);
    if (nbytes < 0)
      return 0;
    ret = 0;
  }
  if (outl != NULL)
    *outl = (size_t) nbytes;
  return ret;
}

static int CipherFinal(afalg_ctx *ctx)
{
   close(ctx->sfd);
   close(ctx->bfd);
   ctx->bfd = ctx->sfd = -1;
   return 1;
}

static char *CipherHex(unsigned char *text, unsigned int text_len)
{
  char *res = malloc(text_len * 3 + 1);
  char *res_ptr = res;
  unsigned char *text_ptr = text;

  if (res == NULL)
    return NULL;
  for(int i=0; i < text_len; i++) {
    snprintf(res_ptr, 4, "%02hhx ", *(text_ptr++));
    res_ptr += 3;
  }
  return res;
}

static int do_enc(char *cipher, char *key, size_t keylen, char* iv,
                  size_t ivlen, const char *text, size_t len, int enc, int n)
{
  afalg_ctx ctx;
  char *cipher_hex;
  char cipher_out[1024];
  int roundlen = len / n;

  if (!CipherInit(&ctx, cipher, key, keylen, iv, ivlen, enc)) {
    fprintf(stderr, "Error in CipherInit\n");
    return -1;
  }

  for (int i = 0; i < len; i += roundlen) {
    if (i + roundlen > len)
       roundlen = len - i;
    if(!CipherUpdate(&ctx, cipher_out, NULL, text + i, roundlen)) {
      fprintf(stderr, "Error in CipherUpdate\n");
      return -1;
    }
    cipher_hex = CipherHex(cipher_out, roundlen);
    printf("%36.*s - %s\n", roundlen, text + i, cipher_hex);
    free(cipher_hex);
  }
  if(!CipherFinal(&ctx)) {
    fprintf(stderr, "Error in CipherFinal_ex\n");
    return -1;
  }
  return 0;
}

int main(int argc, char **argv)
{
  afalg_ctx ctx;
  char *cipher_hex;
  char cipher_out[1024];
  char text[] = "0123456789abcdefghijklmnopqrstuvwxyz";

  for (int i=2; i < sizeof text; i += 2) {
    do_enc(ALG_NAME, KEY, KEY_LEN, IV, IV_LEN, text, i, 1, 2);
    printf("\n");
  }
  do_enc(ALG_NAME, KEY, KEY_LEN, IV, IV_LEN, text, sizeof text - 1, 1, 1);
  return 0;
}

Here's my run -- results are the same in my x86_64 desktop and in my WRT3200ACM:

$ ./staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-7.4.0_musl_eabi/bin/arm-openwrt-linux-gcc test_afalg_cipher.c -o arm-test_afalg_cipher && scp arm-test_afalg_cipher root@wrt:/root && ssh root@wrt /root/arm-test_afalg_cipher
arm-openwrt-linux-gcc: warning: environment variable 'STAGING_DIR' not defined
arm-openwrt-linux-gcc: warning: environment variable 'STAGING_DIR' not defined
arm-openwrt-linux-gcc: warning: environment variable 'STAGING_DIR' not defined
./staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-7.4.0_musl_eabi/lib/gcc/arm-openwrt-linux-muslgnueabi/7.4.0/../../../../arm-openwrt-linux-muslgnueabi/bin/ld: skipping incompatible /usr/lib/libc.so when searching for -lc
./staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-7.4.0_musl_eabi/lib/gcc/arm-openwrt-linux-muslgnueabi/7.4.0/../../../../arm-openwrt-linux-muslgnueabi/bin/ld: skipping incompatible /usr/lib/libc.a when searching for -lc
arm-test_afalg_cipher                                                                               100%   34KB   3.7MB/s   00:00
                                   0 - 10
                                   1 - 76

                                  01 - 10 98
                                  23 - 75 95

                                 012 - 10 98 cb
                                 345 - 74 92 91

                                0123 - 10 98 cb a1
                                4567 - 73 93 92 92

                               01234 - 10 98 cb a1 80
                               56789 - 72 90 93 9d 4c

                              012345 - 10 98 cb a1 80 79
                              6789ab - 71 91 9c 9c 14 3c

                             0123456 - 10 98 cb a1 80 79 6d
                             789abcd - 70 9e 9d c4 17 3d 04

                            01234567 - 10 98 cb a1 80 79 6d df
                            89abcdef - 7f 9f c5 c7 16 3a 05 22

                           012345678 - 10 98 cb a1 80 79 6d df 3c
                           9abcdefgh - 7e c7 c6 c6 11 3b 06 23 06

                          0123456789 - 10 98 cb a1 80 79 6d df 3c 26
                          abcdefghij - 26 c4 c7 c1 10 38 07 2c 07 d8

                         0123456789a - 10 98 cb a1 80 79 6d df 3c 26 9d
                         bcdefghijkl - 25 c5 c0 c0 13 39 08 2d 04 d9 fd

                        0123456789ab - 10 98 cb a1 80 79 6d df 3c 26 9d be
                        cdefghijklmn - 24 c2 c1 c3 12 36 09 2e 05 de fc 82

                       0123456789abc - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f
                       defghijklmnop - 23 c3 c2 c2 1d 37 0a 2f 02 df ff 83 39

                      0123456789abcd - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca
                      efghijklmnopqr - 22 c0 c3 cd 1c 34 0b 28 03 dc fe 9c 38 4b

                     0123456789abcde - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc
                     fghijklmnopqrst - 21 c1 cc cc 1f 35 0c 29 00 dd e1 9d 3b 4a 75

                    0123456789abcdef - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c
                    ghijklmnopqrstuv - 20 ce cd cf 1e 32 0d 2a 01 c2 e0 9e 3a 4d 74 29

                   0123456789abcdefg - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c 20
                   hijklmnopqrstuvwx - d3 b6 92 d7 82 ae 07 d5 a1 b2 c7 2c 04 f9 02 e2 3e

                  0123456789abcdefgh - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c 20 ce
                  ijklmnopqrstuvwxyz - d2 b5 93 d0 83 ad 06 ca a0 b1 c6 2b 05 fa 03 ed 3f b4

0123456789abcdefghijklmnopqrstuvwxyz - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c 20 ce cd cf 1e 32 0d 2a 01 c2 e0 9e 3a 4d 74 29 cc a7 81 c6

The results mean that the IV is increased by one for each call, even when there are fewer bytes than one AES block, and even if the next call would still be under the same block. Notice that when the first block is full (3rd partial block from the bottom), the ciphertexts match the one-shot case.

Your run do not need to necessarily match mine--and it probably won't because of the IV handling by the qce hardware--but they should be the same on your openwrt before and after the patch is applied.

qce hw-crypto breaks ipsec, disabling CRYPTO_HW is needed.

Test works for patched kernel, but not for un-patched:

ipherUpdate: read -1 bytes != inl 1
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 2
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 3
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 4
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 5
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 6
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 7
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 8
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 9
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 10
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 11
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 12
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 13
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 14
Error in CipherUpdate

CipherUpdate: read -1 bytes != inl 15
Error in CipherUpdate

                    0123456789abcdef - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c
                    ghijklmnopqrstuv - 47 c1 90 f8 df 20 36 86 6b 6f 8d ae 1f da ec 1c

CipherUpdate: read 16 bytes != inl 17
Error in CipherUpdate

CipherUpdate: read 16 bytes != inl 18
Error in CipherUpdate

CipherUpdate: read 32 bytes != inl 36
Error in CipherUpdate

Also, will confirm later but it appears that though ipsec connects with patch there are error messages and packets not flowing

I should have realized it would not work with partial blocks :disappointed:; with the patch, it accepts operations with any length. As for the ipsec failure, maybe the bad IV handling is causing it. I have come up with a patch to compute and update the IV, software-only. I'd recommend leaving ipsec disabled during the boot, in case this causes a kernel crash upon use. You may bring it up later after it boots. If I got this right, then you should be able to reproduce my run exactly:

diff --git a/target/linux/ipq40xx/patches-4.19/183-crypto-qce-update-ctr-mode-iv.patch b/target/linux/ipq40xx/patches-4.19/183-crypto-qce-update-ctr-mode-iv.patch
new file mode 100644
index 0000000000..198e1ac268
--- /dev/null
+++ b/target/linux/ipq40xx/patches-4.19/183-crypto-qce-update-ctr-mode-iv.patch
@@ -0,0 +1,36 @@
+diff --git a/drivers/crypto/qce/ablkcipher.c b/drivers/crypto/qce/ablkcipher.c
+index 7a98bf5cc967..decbfaf3feeb 100644
+--- a/drivers/crypto/qce/ablkcipher.c
++++ b/drivers/crypto/qce/ablkcipher.c
+@@ -14,6 +14,20 @@
+ 
+ static LIST_HEAD(ablkcipher_algs);
+ 
++static void qce_update_ctr_iv(u8 *iv, unsigned int ivsize, int blocksize,
++			      unsigned int cryptlen)
++{
++	unsigned int nblocks;
++
++	nblocks = DIV_ROUND_UP(cryptlen, blocksize);
++	do {
++		ivsize--;
++		nblocks += iv[ivsize];
++		iv[ivsize] = (u8) nblocks;
++		nblocks >>= 8;
++	} while (ivsize);
++}
++
+ static void qce_ablkcipher_done(void *data)
+ {
+ 	struct crypto_async_request *async_req = data;
+@@ -45,6 +59,10 @@ static void qce_ablkcipher_done(void *data)
+ 	if (error < 0)
+ 		dev_dbg(qce->dev, "ablkcipher operation error (%x)\n", status);
+ 
++	if (IS_CTR(rctx->flags) && IS_AES(rctx->flags))
++		qce_update_ctr_iv(rctx->iv, rctx->ivsize, AES_BLOCK_SIZE,
++				  rctx->cryptlen);
++
+ 	qce->async_req_done(tmpl->qce, error);
+ }
+ 

Please try the previous test-program again after applying the above patch. It should have the same output as I posted above.
After that, I have another the program to test CTR along with CBC-mode to see if CBC needs IV handling as well--I have a thin hope that it may be at least partly responsible for the problem reported by @cwbsw:

#include <stdint.h>
#include <sys/uio.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <linux/if_alg.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

struct afalg_ctx_st {
  struct sockaddr_alg sa;
  int sfd, bfd;
};
typedef struct afalg_ctx_st afalg_ctx;

#define IV_LEN 16
#define KEY_LEN 16
static char IV[] = { 0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08, 0x07,
                     0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00 };
static char KEY[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
                      0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };

static int CipherInit(afalg_ctx *ctx, char *alg_name, char *key,
                      size_t keylen, char *iv, size_t ivlen, int enc)
{
  struct sockaddr_alg sa;
  struct msghdr msg = { 0 };
  struct cmsghdr *cmsg;
  struct af_alg_iv *aiv;
  struct iovec iov;
  int op = enc ? ALG_OP_ENCRYPT : ALG_OP_DECRYPT;
  size_t set_op_len = sizeof op;
  size_t set_iv_len = offsetof(struct af_alg_iv, iv) + ivlen;
  char buf[CMSG_SPACE(set_op_len) + CMSG_SPACE(set_iv_len)];

  memset(&sa, 0, sizeof ctx->sa);
  sa.salg_family = AF_ALG;
  strcpy(sa.salg_type, "skcipher");
  strncpy(sa.salg_name, alg_name, sizeof sa.salg_name);
  if (( ctx->bfd = socket(AF_ALG, SOCK_SEQPACKET, 0)) < 0) {
    perror("Failed to open socket");
    goto err;
  }

  if (bind(ctx->bfd, (struct sockaddr *)&sa, sizeof sa) < 0) {
    perror("Failed to bind socket");
    goto err;
  }
  if (setsockopt(ctx->bfd, SOL_ALG, ALG_SET_KEY, KEY, KEY_LEN) < 0) {
    perror("Failed to set key");
  }
  if ((ctx->sfd = accept(ctx->bfd, NULL, 0)) < 0) {
    perror("Socket accept failed");
    goto err;
  }
  memset(&buf, 0, sizeof buf);
  msg.msg_control = buf;
  /* set op */
  msg.msg_controllen = CMSG_SPACE(set_op_len);
  cmsg = CMSG_FIRSTHDR(&msg);
  cmsg->cmsg_level = SOL_ALG;
  cmsg->cmsg_type = ALG_SET_OP;
  cmsg->cmsg_len = CMSG_LEN(set_op_len);
  memcpy(CMSG_DATA(cmsg), &op, sizeof op);
  /* set IV */
  msg.msg_controllen += CMSG_SPACE(set_iv_len);
  cmsg = CMSG_NXTHDR(&msg, cmsg);
  cmsg->cmsg_level = SOL_ALG;
  cmsg->cmsg_type = ALG_SET_IV;
  cmsg->cmsg_len = CMSG_LEN(set_iv_len);
  aiv = (void *)CMSG_DATA(cmsg);
  aiv->ivlen = IV_LEN;
  memcpy(aiv->iv, iv, IV_LEN);

  iov.iov_base = NULL;
  iov.iov_len = 0;
  if (sendmsg(ctx->sfd, &msg, 0) < 0) {
    perror("sendmsg: Failed to set op, iv");
    goto err;
  }
  return 1;
err:
  if (ctx->bfd >= 0)
    close(ctx->bfd);
  if (ctx->sfd >= 0)
    close(ctx->sfd);
  ctx->bfd = ctx->sfd = -1;
  return 0;
}

static int CipherUpdate(afalg_ctx *ctx, char *out, size_t *outl,
                        const char* in, size_t inl)
{
  struct msghdr msg = { 0 };
  struct cmsghdr *cmsg;
  struct iovec iov;
  ssize_t nbytes;
  int ret = 1;

  iov.iov_base = (void *)in;
  iov.iov_len = inl;
  msg.msg_iov = &iov;
  msg.msg_iovlen = 1;
  if ((nbytes = send(ctx->sfd, in, inl, MSG_MORE)) != (ssize_t) inl) {
    fprintf(stderr, "CipherUpdate: sent %zd bytes != inl %zd\n", nbytes, inl);
    if (nbytes <= 0)
      return 0;
    ret = 0;
  }
  if ((nbytes = read(ctx->sfd, out, (size_t) nbytes)) != (ssize_t) inl) {
    fprintf(stderr, "CipherUpdate: read %zd bytes != inl %zd\n", nbytes, inl);
    if (nbytes < 0)
      return 0;
    ret = 0;
  }
  if (outl != NULL)
    *outl = (size_t) nbytes;
  return ret;
}

static int CipherFinal(afalg_ctx *ctx)
{
   close(ctx->sfd);
   close(ctx->bfd);
   ctx->bfd = ctx->sfd = -1;
   return 1;
}

static char *CipherHex(unsigned char *text, unsigned int text_len)
{
  char *res = malloc(text_len * 3 + 1);
  char *res_ptr = res;
  unsigned char *text_ptr = text;

  if (res == NULL)
    return NULL;
  for(int i=0; i < text_len; i++) {
    snprintf(res_ptr, 4, "%02hhx ", *(text_ptr++));
    res_ptr += 3;
  }
  return res;
}

static int do_enc(char *cipher, char *key, size_t keylen, char* iv,
                  size_t ivlen, const char *text, size_t len, int enc, int n)
{
  afalg_ctx ctx;
  char *cipher_hex;
  char cipher_out[1024];
  int roundlen = len / n;
  size_t outl;

  if (!CipherInit(&ctx, cipher, key, keylen, iv, ivlen, enc)) {
    fprintf(stderr, "Error in CipherInit\n");
    return -1;
  }

  for (int i = 0; i < len; i += roundlen) {
    if (i + roundlen > len)
       roundlen = len - i;
    if(!CipherUpdate(&ctx, cipher_out, &outl, text + i, roundlen)) {
      fprintf(stderr, "Error in CipherUpdate\n");
      if (outl < 1)
        return -1;
    }
    cipher_hex = CipherHex(cipher_out, outl);
    printf("%s: %.*s - %s\n", cipher, roundlen, text + i, cipher_hex);
    free(cipher_hex);
  }
  if(!CipherFinal(&ctx)) {
    fprintf(stderr, "Error in CipherFinal_ex\n");
    return -1;
  }
  return 0;
}

int main(int argc, char **argv)
{
  afalg_ctx ctx;
  char *cipher_hex;
  char cipher_out[1024];
  char text[] = "0123456789abcdefghijklmnopqrstuv"
                "0123456789abcdefghijklmnopqrstuv";

  do_enc("cbc(aes)", KEY, KEY_LEN, IV, IV_LEN, text, sizeof text - 1, 1, 2);
  printf("\n");
  do_enc("ctr(aes)", KEY, KEY_LEN, IV, IV_LEN, text, sizeof text - 1, 1, 2);
  printf("\n");
  do_enc("cbc(aes)", KEY, KEY_LEN, IV, IV_LEN, text, sizeof text - 1, 1, 4);
  printf("\n");
  do_enc("ctr(aes)", KEY, KEY_LEN, IV, IV_LEN, text, sizeof text - 1, 1, 4);
  printf("\n");
  return 0;
}

Here's my output:

cbc(aes): 0123456789abcdefghijklmnopqrstuv - ff 14 db e4 05 cc 0e e2 4d 0d e4 12 89 f0 fc 98 89 c5 61 23 4e d8 19 f1 32 57 3e fe 59 de 7a 21
cbc(aes): 0123456789abcdefghijklmnopqrstuv - e1 81 26 cd 21 30 f9 a8 5f e0 7b bc 0f bc e8 01 ca 04 03 03 14 e1 03 02 a0 32 af a5 48 e7 37 5d

ctr(aes): 0123456789abcdefghijklmnopqrstuv - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c 20 ce cd cf 1e 32 0d 2a 01 c2 e0 9e 3a 4d 74 29
ctr(aes): 0123456789abcdefghijklmnopqrstuv - 8b ee ca 8f da f6 5f 8d e9 fa d4 3d 13 e8 11 f3 21 a6 f6 3f 97 ba a5 c8 3b 28 5d 3f fd 13 7a 3d

cbc(aes): 0123456789abcdef - ff 14 db e4 05 cc 0e e2 4d 0d e4 12 89 f0 fc 98
cbc(aes): ghijklmnopqrstuv - 89 c5 61 23 4e d8 19 f1 32 57 3e fe 59 de 7a 21
cbc(aes): 0123456789abcdef - e1 81 26 cd 21 30 f9 a8 5f e0 7b bc 0f bc e8 01
cbc(aes): ghijklmnopqrstuv - ca 04 03 03 14 e1 03 02 a0 32 af a5 48 e7 37 5d

ctr(aes): 0123456789abcdef - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c
ctr(aes): ghijklmnopqrstuv - 20 ce cd cf 1e 32 0d 2a 01 c2 e0 9e 3a 4d 74 29
ctr(aes): 0123456789abcdef - 8b ee ca 8f da f6 5f 8d e9 fa d4 3d 13 e8 11 f3
ctr(aes): ghijklmnopqrstuv - 21 a6 f6 3f 97 ba a5 c8 3b 28 5d 3f fd 13 7a 3d

PS: The patch is to be applied along with the previous one. As before, expect some fuzziness when applying it. Adjust line numbers as appropriate. I have compile-tested it here.

Edited again to use tabs instead of spaces, so that hopefully, copy & paste will work; also, notice that unlike last time, this patch applies to openwrt, not directly to kernel.

Thank you for the patch. Keen to test, but could you clarify how to apply please? You mention its an openwrt patch rather than a kernel one. Where should I put it? Do I need to use quilt?

Am I mistaken in thinking its a patch to create a patch?

Sorry for all the questions. TIA.

Yes, it’s a patch to create a patch :slight_smile: Just feed it to git apply, or patch -p1.

I'm now getting same results as you from arm-test_afalg_cipher. Will test ipsec and your other test shortly and post results on those.

1 Like

The results of the second test aren't the same I'm afraid:

cbc(aes): 0123456789abcdefghijklmnopqrstuv - ff 14 db e4 05 cc 0e e2 4d 0d e4 12 89 f0 fc 98 89 c5 61 23 4e d8 19 f1 32 57 3e fe 59 de 7a 21 
cbc(aes): 0123456789abcdefghijklmnopqrstuv - ff 14 db e4 05 cc 0e e2 4d 0d e4 12 89 f0 fc 98 89 c5 61 23 4e d8 19 f1 32 57 3e fe 59 de 7a 21 

ctr(aes): 0123456789abcdefghijklmnopqrstuv - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c 20 ce cd cf 1e 32 0d 2a 01 c2 e0 9e 3a 4d 74 29 
ctr(aes): 0123456789abcdefghijklmnopqrstuv - 8b ee ca 8f da f6 5f 8d e9 fa d4 3d 13 e8 11 f3 21 a6 f6 3f 97 ba a5 c8 3b 28 5d 3f fd 13 7a 3d 

cbc(aes): 0123456789abcdef - ff 14 db e4 05 cc 0e e2 4d 0d e4 12 89 f0 fc 98 
cbc(aes): ghijklmnopqrstuv - 2c 0e 57 9e 86 ad 80 d7 54 61 84 bc a7 6e bf dd 
cbc(aes): 0123456789abcdef - ff 14 db e4 05 cc 0e e2 4d 0d e4 12 89 f0 fc 98 
cbc(aes): ghijklmnopqrstuv - 2c 0e 57 9e 86 ad 80 d7 54 61 84 bc a7 6e bf dd 

ctr(aes): 0123456789abcdef - 10 98 cb a1 80 79 6d df 3c 26 9d be 0f ca fc 0c 
ctr(aes): ghijklmnopqrstuv - 20 ce cd cf 1e 32 0d 2a 01 c2 e0 9e 3a 4d 74 29 
ctr(aes): 0123456789abcdef - 8b ee ca 8f da f6 5f 8d e9 fa d4 3d 13 e8 11 f3 
ctr(aes): ghijklmnopqrstuv - 21 a6 f6 3f 97 ba a5 c8 3b 28 5d 3f fd 13 7a 3d 

Thanks for testing this. The result is OK: it confirms the need to update the IV for cbc-mode as well, which was the purpose of the test. It may well be the cause of the failure @cwbsw mentioned earlier.

Here's the new openwrt kernel patch, to be applied like the last time. Make sure to remove the previous patch ( rm -f target/linux/ipq40xx/patches-4.19/183-crypto-qce-update-ctr-mode-iv.patch). Notice the name change; there should only be one patch starting with 183:

--- /dev/null
+++ b/target/linux/ipq40xx/patches-4.19/183-crypto-qce-update-iv.patch
@@ -0,0 +1,68 @@
+diff --git a/drivers/crypto/qce/ablkcipher.c b/drivers/crypto/qce/ablkcipher.c
+index 7a98bf5cc967..b935ce0acc1c 100644
+--- a/drivers/crypto/qce/ablkcipher.c
++++ b/drivers/crypto/qce/ablkcipher.c
+@@ -14,6 +14,20 @@
+ 
+ static LIST_HEAD(ablkcipher_algs);
+ 
++static void qce_update_ctr_iv(u8 *iv, unsigned int ivsize, u32 add)
++{
++	__be32 *a = (__be32 *)(iv + ivsize);
++	u32 b;
++
++	for (; ivsize >= 4; ivsize -= 4) {
++		b = be32_to_cpu(*--a) + add;
++		*a = cpu_to_be32(b);
++		if (b >= add)
++			return;
++		add = 1;
++	}
++}
++
+ static void qce_ablkcipher_done(void *data)
+ {
+ 	struct crypto_async_request *async_req = data;
+@@ -39,6 +53,18 @@ static void qce_ablkcipher_done(void *data)
+ 		dma_unmap_sg(qce->dev, rctx->src_sg, rctx->src_nents, dir_src);
+ 	dma_unmap_sg(qce->dev, rctx->dst_sg, rctx->dst_nents, dir_dst);
+ 
++	if (IS_CBC(rctx->flags)) {
++		if (IS_ENCRYPT(rctx->flags))
++			sg_pcopy_to_buffer(rctx->dst_sg, rctx->dst_nents,
++					   rctx->iv, rctx->ivsize,
++					   rctx->cryptlen - rctx->ivsize);
++		else
++			memcpy(rctx->iv, rctx->saved_iv, rctx->ivsize);
++	} else if (IS_CTR(rctx->flags) && IS_AES(rctx->flags)) {
++		qce_update_ctr_iv(rctx->iv, rctx->ivsize,
++				  DIV_ROUND_UP(rctx->cryptlen, AES_BLOCK_SIZE));
++	}
++
+ 	sg_free_table(&rctx->dst_tbl);
+ 
+ 	error = qce_check_status(qce, &status);
+@@ -131,6 +157,11 @@ qce_ablkcipher_async_req_handle(struct crypto_async_request *async_req)
+ 
+ 	qce_dma_issue_pending(&qce->dma);
+ 
++	if (IS_CBC(rctx->flags) && IS_DECRYPT(rctx->flags))
++		sg_pcopy_to_buffer(rctx->src_sg, rctx->src_nents,
++				   rctx->saved_iv, rctx->ivsize,
++				   rctx->cryptlen - rctx->ivsize);
++
+ 	ret = qce_start(async_req, tmpl->crypto_alg_type, req->nbytes, 0);
+ 	if (ret)
+ 		goto error_terminate;
+diff --git a/drivers/crypto/qce/cipher.h b/drivers/crypto/qce/cipher.h
+index 5cab8f0706a8..a919022e28df 100644
+--- a/drivers/crypto/qce/cipher.h
++++ b/drivers/crypto/qce/cipher.h
+@@ -43,6 +43,7 @@ struct qce_cipher_reqctx {
+ 	struct sg_table src_tbl;
+ 	struct scatterlist *src_sg;
+ 	unsigned int cryptlen;
++	u8 saved_iv[QCE_MAX_IV_SIZE];
+ };
+ 
+ static inline struct qce_alg_template *to_cipher_tmpl(struct crypto_tfm *tfm)

The tests are getting serious. I've added more standard test vectors, and it got quite big. So instead of posting over a thousand lines of code here, I saved it in a github repo:

I have a try on my ipq4029 router with this patch. But the test program outputs many errors like:

Running using 8-byte updates
Encryption:
Failed to open socket: Address family not supported by protocol
Error in CipherInit
Decryption:
Failed to open socket: Address family not supported by protocol
Error in CipherInit

While on my x86 machine, it outputs 'all tests passed'

Ensure you have the kmod-crypto-user package installed. It brings support to AF_ALG adress family, which is the source of your error message.

Notice that the patch to change AES-CTR block size also needs to be applied to pass the CTR tests.

Thanks for explanation.

I applied 182-crypto-qce-fix-ctr-blocksize.patch and 183-crypto-qce-update-ctr-mode-iv.patch, and compiled another firmware. The test program passed, unfortunately, ipsec performance issue still existed.

So, it means more patch needed, or the bug you found is not related with my issue?

Thanks for testing this. Have you tried to use AES-CTR in IPSEC, to finish up testing my current patches, and to see if performance improves--I'm assuming you've been using AES-CBC?
It seems that the (at least CBC-mode) performance problem lies somewhere else. My skills are stretched thin already. I can take another look at the module, to try to catch anything else. The problem may be in the hardware itself, in which case there's little I can do. What ciphersuite is your ipsec esp using? I'll take that as a starting point.