Raspberry Pi4 OpenVPN performance tuning

Hello OpenWRT forum!

I recently adopted OpenWRT for secure travel router project, using a Raspberry Pi CM4 on a DFRobot 2x gbe carrier board. I have SMB setup for media sharing off the SD card, and OpenVPN (PIA) set as the only routable connection allowed through the firewall for the LAN network.

I have seen some older posts indicating close to 100mbps OpenVPN performance on the Pi4. Currently, even with tuning I am only seeing 38mbps via speedtest-cli (running directly from OpenWRT to eliminate any routing overhead). The tuning parameters I have added are:

sndbuf 512000
rcvbuf 512000
txqueuelen 2000
compress (no compression)

I also tried adding "fast-io" and toying with compression (both lzo and lz4) options to no effect.

Is there something I'm missing, a driver, setting, etc.? Or is the 38mbps I'm seeing more or less expected from the Pi4's SOC?

Could IRQBalance help? Is it possible OpenVPN/SSL is sticking to the same CPU as NIC traffic?

The main workload for Openvpn is the encryption and compression, both heavily dependent on CPU and memory speed. The command openssl speed aes will give you insight about the processing power of the CPU.

An example of my SOC:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      20903.87k    22645.43k    23465.61k    23576.58k    23661.23k
aes-192 cbc      18097.18k    19552.68k    19998.89k    20114.77k    20149.59k
aes-256 cbc      16147.54k    17266.71k    17638.49k    17737.39k    17760.26k

a comparison to core I7 laptop:

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc     183127.75k   194145.97k   189470.38k   192785.41k   198868.99k   200447.32k
aes-192 cbc     165710.87k   171087.18k   170660.61k   171523.07k   172512.60k   172321.45k
aes-256 cbc     145358.28k   148898.03k   148209.92k   146003.63k   144547.84k   144703.49k

This means roughly 16MB for a second for an Arm A20 chip and 145MB for core I7 computer.

The 16MB/s is 108Mbps and 145MB/s is 1160 Mbps.

And this is just the processing power for encryption. You will have also to add overhead for the traffic handling and compression.

1 Like

Here are my AES numbers - I am using AES-GCM (seemed ~4-5% faster than CBC in my initial tests, but this is the first thing I adjusted so happy to try CBC again if you think it would help)

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc      71986.42k    75791.15k    78249.30k    78879.06k    79050.07k    78959.96k
aes-192 cbc      63435.66k    66721.00k    68621.82k    69067.43k    69273.09k    69156.86k
aes-256 cbc      57625.96k    59588.86k    61088.26k    61480.07k    61221.55k    61407.23k

It seems like the pi4 is able to push 72MB/575mbit AES there is some other bottleneck or configuration issue I may be experiencing?

From my RPi4 running 21.02.1

OpenSSL 1.1.1l  24 Aug 2021
built on: Wed Oct 27 18:49:44 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) 
compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -DOPENSSL_SMALL_FOOTPRINT
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128 cbc      95419.15k   100673.37k   104400.32k   104724.38k   105473.37k   105903.74k
aes-192 cbc      84373.11k    88429.55k    91483.53k    92181.99k    92390.92k    92292.88k
aes-256 cbc      76293.41k    79045.75k    81544.91k    81726.54k    82054.70k    81996.71k

To add - I am able to route full gigabit (940mbps) between the pi's two interfaces without VPN connected, as well as max out my internet connection (~450mbps) with speedtest-cli. I can certainly accept an answer that the combination of overhead between the two services results in a large reduction of performance. However, seeing less than 1/10th the speed compared to the OpenSSL benchmark (and 1/20th non-tunneled routing speed) seems that I may be missing something.

Are you using active cooling? I am on the DFRobot customized build (21.02.0) which is one version behind, my numbers are considerably worse than yours... though still I feel like openvpn could perform better.

Consider changing the VPN/tunnel server/protocol/port.
Try a less loaded, or a more closely geographically located server.
Try port 443 vs. other ports, TCP vs. UDP, OpenVPN vs. WireGuard, etc.

1 Like

^^^ this. If you can use Wireguard, it is likely to be much higher performance.

1 Like

Unfortunately PIA doesn't offer wireguard outside of their official client (without some fairly expert / officially unsupported configuration) - but TCP is something I can do.

Changing over to TCP has massively improved throughput 2.5-3x! I am now seeing 75-90mbps over the tunnel.

It seems either this build or the raspi itself may struggle with UDP. I even made sure flood protection was off to be sure that wasn't an issue. Would be interested to see if there is anything I can look at to improve UDP performance, but I am more or less satisfied being my throughput is much higher than I could expect from any public/hotel wifi.

Speedtest-CLI seems to be having peering issues over the TCP VPN interestingly, and is sending me to hosts with 90-100+ms ping between 300-3000km away from my VPN host. I'd be curious to see if this is due to slower tcp session negotiation and only a single ping being sent, or some other configuration issue with the CLI version, but, when run from the browser I see ping times consistent with UDP tunnels (~+10ms compared to VPN off) and around 90mbps.

For anyone else setting this up on an Rpi, I dropped my tuning parameters as well, commenting out snd/rcvbuf and txqueuelen gave a modest improvement over TCP.

Back to tuning! Thanks for pointing me in the right direction.

Wigeguard is actually remarkably simple to setup -- you don't need an "official client" application from PIA. Basically you just need a few bits of basic information (the endpoint domain name or IP and port, peer public key, preshared key if any, and your tunnel IP. You'll create a key-pair on your side and upload the public key from your set to the PIA system. I don't know how easy or hard PIA makes this process (getting that info and providing your key), but you don't need to be an expert to set this up.

However, if you are happy with OpenVPN now that you are getting better performance there, you can stick with that option.

From what I have read there are issues both with tunnel setup, as well as keepalive, both of which require scripting.

The keepalive portion is easy enough, just need to send pings to PIA's gateway every so often. The bigger issue is they destroy your public keys after seeing no sessions for more than X minutes.

There is a public github repository (https://github.com/hsand/pia-wg) for generating watchguard config files, however given PIA's policy on key destruction, I'd either have to invoke this manually every time I set up the travel router, or hardcode my credentials in to the python file and create a startup task or something.

For a home setup I could see this working, since downtime would be limited to router reboots and wouldn't expire my keys - but with the ephemeral nature of the travel router, I'd need to go the extra step of adapting those WatchGuard scripts (which rely on PIA's proprietary API) to run automatically, which seems like maybe a bridge too far.

Thank you all for your help on this. Would love to hear any additional recommendations on performance tuning, and will definitely be looking to switch to a provider with better support for watchguard (I've heard Nord is good) - but for now 70-90mbs should be good!

1 Like

A big passive, but CPU temperature is not the issue, as others mentioned.

I have the VPN performing adequately now, but, it is interesting your AES results are considerably higher than mine (to the tune of 24MB/s). I'm using a moderately sized passive heatsink as well.

You can try 21.02.1 or @anon50098793 's build. If I am not mistaken it has some enhancements to optimize OpenSSL as well as irqbalance preinstalled.

Switch to ChaCha20-Poly1305 algo

I tried 21.02.1 last night - I neglected to run the openssl benchmark but saw similar VPN performance compared to 21.02.0 - Do you have a link to @anon50098793 's build/git?

My SMB write speeds suddenly got a lot worse through the course of my updates, so I'm working on a fresh config now.

2 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.