@Ansuel, I got the multi segment patch done and did some benchmark. As expected, performance is worst compared to software. Small payload performance is very much worst compared to larger payloads. The only consolation would be that the Krait CPU utilisation is low when ciphering is performed.
Results as shown below:
With NSS crypto engine:
cmd: openssl speed -multi <#> -elapsed -evp aes-128-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
multi 1 - evp 57.71k 228.33k 901.21k 3721.22k 25534.46k 42172.42k
multi 2 - evp 115.31k 474.11k 1788.84k 7097.69k 48073.39k 82209.45k
multi 4 - evp 213.83k 804.63k 3343.10k 13139.97k 87851.01k 154785.11k
multi 8 - evp 372.37k 1520.49k 5923.13k 24416.26k 163594.24k 279426.14k
With software based crypto engine:
cmd: openssl speed -multi <#> -elapsed aes-128-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
multi 1 - aes-128 cbc 67744.05k 76908.97k 80787.80k 81865.39k 81619.63k 81324.71k
multi 2 - aes-128 cbc 126358.25k 146981.03k 153913.83k 156381.53k 155462.31k 153796.61k
multi 4 - aes-128 cbc 115177.91k 146950.73k 154273.49k 155684.82k 155689.95k 156975.10k
multi 8 - aes-128 cbc 114552.63k 146711.88k 153911.55k 155999.50k 156242.05k 156512.62k
As expected with 2 or more threads, software based crypto hit the throughput ceiling with the 2 CPU cores both at 100% throughout the benchmark.
Couldn't go to 16 threads as the NSS crypto driver complains about not able to allocate crypto sessions.
@drbrains the results looks comparable to the eip-93 engine you've developed for the mt7621a SoC for 1 thread, but the beefier ipq806x SoC moves ahead with 4 crypto engines when benched with multiple threads. I read the performance benchmark doc you pushed into your Github repo. Have you made any progress in improving the performance of the eip-93 driver? I could learn from your approach if you made a breakthrough.