Well folks, I did a quick test using the NSS crypto AEAD cipher (i.e. aes-128-cbc-hmac-sha1) with OpenVPN. As I suspected, performance is no good. Below are the results tested with iperf3 with the following:
iPad <-- WiFi--> R7800 <-- OpenVPN tunnel/LAN --> iMac
Results:
Without OVPN : 500Mbps
With OVPN-OpenSSL : 50Mbps - CPU 60-70% loaded
With OVPN-NSS-AEAD : 20Mbps - CPU 30-40% loaded
With OVPN-NSS-CBC : 15Mbps - CPU 30-40% loaded
It appears that transferring buffers between user space and kernel space is the limiting factor. When using NSS crypto, this penalty will be doubled, first time sending buffer to the NSS crypto engine, and the second time sending the encrypted/decrypted buffer to the network socket for routing.
The next step would probably to write a virtual interface driver to perform encryption/decryption in kernel before sending it over to the OpenVPN application in user space. This would probably maintain the thruput performance comparable to when using OpenSSL software crypto, but should bring down the CPU load.
The ideal solution is to bring everything into kernel space, with OpenVPN application managing the control plane.