[HELP] VPN Throughput Variance

Here's the config i use.

client

dev tun-vpn

proto udp
remote at.windscribe.com 1194

route-metric 20
route-nopull
route 0.0.0.0 0.0.0.0 vpn_gateway 20
#redirect-gateway def1

nobind
auth-user-pass userdata.txt

resolv-retry infinite

auth SHA512
cipher AES-256-CBC
comp-lzo
verb 3
mute-replay-warnings
remote-cert-tls server
persist-key
persist-tun

key-direction 1

Router's openssl speed rsa

                  sign    verify    sign/s verify/s
rsa  512 bits 0.002448s 0.000216s    408.5   4620.3
rsa 1024 bits 0.013944s 0.000764s     71.7   1308.8
rsa 2048 bits 0.102990s 0.002688s      9.7    372.0
rsa 4096 bits 0.692667s 0.010153s      1.4     98.5

I found weird that during speed, only 25% CPU was used, as reported by Top

This resulted in almost double upload speeds.

No, my ISP does not provide IPv6 access.

Just for comparison, here's what i got on i7-7700HQ for openssl speed rsa

                 sign    verify    sign/s verify/s
rsa  512 bits 0.000061s 0.000004s  16448.6 260642.5
rsa 1024 bits 0.000143s 0.000009s   7004.1 116702.7
rsa 2048 bits 0.000836s 0.000024s   1195.5  41759.6
rsa 4096 bits 0.005418s 0.000082s    184.6  12210.3`

Quad-core router?

Specs say that it's dual-core.

@VoidChronos Contact your VPN provider and find out if they support EC TLS ciphers, as SSL ciphers should be avoided at all costs due to their inefficiency. OpenVPN 2.4 added support for EC ciphers.

  • You also don't need your encyption to be AES-256-CBC, unless you're a potential target of a nation state, as AES128 is currently uncrackable and will remain so until ~2030.
    • To demonstrate the speed difference:
      • openssl speed aes-256-cbc
      • openssl speed aes-128-cbc

    • WRT1900AC v1 Dual Core 1.3GHz (CPU supports hardware encryption processing)
      [root@LEDE] ~ # openssl speed aes-256-cbc
      Doing aes-256 cbc for 3s on 16 size blocks: 3991206 aes-256 cbc's in 3.00s
      Doing aes-256 cbc for 3s on 64 size blocks: 1153834 aes-256 cbc's in 3.00s
      Doing aes-256 cbc for 3s on 256 size blocks: 302044 aes-256 cbc's in 3.00s
      Doing aes-256 cbc for 3s on 1024 size blocks: 76463 aes-256 cbc's in 2.99s
      Doing aes-256 cbc for 3s on 8192 size blocks: 9626 aes-256 cbc's in 3.00s
      
      OpenSSL 1.0.2n  7 Dec 2017
      built on: reproducible build, date unspecified
      options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
      compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/jw/lede/source/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/usr/include -I/home/jw/lede/source/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/include -I/home/jw/lede/source/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-5.5.0_musl_eabi/usr/include -I/home/jw/lede/source/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-5.5.0_musl_eabi/include/fortify -I/home/jw/lede/source/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-5.5.0_musl_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mcpu=cortex-a9 -mfpu=vfpv3-d16 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap/home/jw/lede/source/build_dir/target-arm_cortex-a9+vfpv3_musl_eabi/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/jw/lede/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
      
      The 'numbers' are in 1000s of bytes per second processed.
      type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
      aes-256 cbc      21286.43k    24615.13k    25774.42k    26186.66k    26285.40k
      
      [root@LEDE] ~ # openssl speed aes-128-cbc
      Doing aes-128 cbc for 3s on 16 size blocks: 4695027 aes-128 cbc's in 3.00s
      Doing aes-128 cbc for 3s on 64 size blocks: 1377299 aes-128 cbc's in 3.00s
      Doing aes-128 cbc for 3s on 256 size blocks: 356566 aes-128 cbc's in 2.96s
      Doing aes-128 cbc for 3s on 1024 size blocks: 90063 aes-128 cbc's in 2.96s
      Doing aes-128 cbc for 3s on 8192 size blocks: 11446 aes-128 cbc's in 2.96s
      
      OpenSSL 1.0.2n  7 Dec 2017
      built on: reproducible build, date unspecified
      options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,long) aes(partial) blowfish(ptr)
      compiler: arm-openwrt-linux-muslgnueabi-gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -I/home/jw/lede/source/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/usr/include -I/home/jw/lede/source/staging_dir/target-arm_cortex-a9+vfpv3_musl_eabi/include -I/home/jw/lede/source/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-5.5.0_musl_eabi/usr/include -I/home/jw/lede/source/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-5.5.0_musl_eabi/include/fortify -I/home/jw/lede/source/staging_dir/toolchain-arm_cortex-a9+vfpv3_gcc-5.5.0_musl_eabi/include -znow -zrelro -DOPENSSL_SMALL_FOOTPRINT -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DOPENSSL_NO_ERR -DTERMIOS -Os -pipe -mcpu=cortex-a9 -mfpu=vfpv3-d16 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -mfloat-abi=hard -iremap/home/jw/lede/source/build_dir/target-arm_cortex-a9+vfpv3_musl_eabi/openssl-1.0.2n:openssl-1.0.2n -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -fpic -I/home/jw/lede/source/package/libs/openssl/include -ffunction-sections -fdata-sections -fomit-frame-pointer -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
      
      The 'numbers' are in 1000s of bytes per second processed.
      type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
      aes-128 cbc      25040.14k    29382.38k    30838.14k    31156.93k    31677.58k
      

As I've mentioned several times thus far, your speeds are due to the hardware on the router, and unless building your own router box, OpenVPN should always be run on each individual client, not with the router as a client (unless throughput is not a concern).

To follow through with the weird 25% only load during the speed, here's the TOP for downloading over the VPN from several clients. As we can see, we are getting stuck to a 75% idle once again.

This config is provided by the VPN service(Besides routing changes). I'll contact them about this and EC TLS ciphers.

I use this method to pass devices that do not support OpenVPN through, like gaming consoles.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      10589.71k    11662.39k    11907.15k    12005.72k    12056.66k
aes-256 cbc       8336.87k     8964.51k     9161.56k     9191.51k     9207.15k

If they don't offer both EC TLS ciphers and AES 128, ditch them for a VPN provider that does. EC ciphers should speed up throughput significantly on lower end hardware.

  • Specifically, the following EC TLS ciphers:
    • TLS-ECDHE-ECDSA-WITH-AES-128-GCM-SHA256
    • TLS-ECDHE-ECDSA-WITH-AES-128-CBC-SHA256
      • Note: ECDSA ciphers require a new cert, as normally certs are always RSA

    • TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256
    • TLS-ECDHE-RSA-WITH-AES-128-CBC-SHA256

Unless you're using the gaming consoles for browsing, there's no benefit to this that I'm aware of. All other devices support OpenVPN.

The main benefit is circumventing censorship and blocked access by the ISP

They block games and app content?

  • If so, and you want higher throughput for the consoles, you're going to have to upgrade your router. You can pick up a WRT1200AC for ~$100, of which has hardware encryption processing enabled.
    • Linksys WRT1200AC
      [root@WRT] ~ # openssl speed aes-128-cbc
      Doing aes-128 cbc for 3s on 16 size blocks: 3894224 aes-128 cbc's in 2.98s
      Doing aes-128 cbc for 3s on 64 size blocks: 1014379 aes-128 cbc's in 2.98s
      Doing aes-128 cbc for 3s on 256 size blocks: 260395 aes-128 cbc's in 2.97s
      Doing aes-128 cbc for 3s on 1024 size blocks: 65554 aes-128 cbc's in 2.98s
      Doing aes-128 cbc for 3s on 8192 size blocks: 8204 aes-128 cbc's in 3.00s
      
      The 'numbers' are in 1000s of bytes per second processed.
      type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
      aes-128 cbc      20908.59k    21785.32k   22444.82k   2525.94k     22402.39k
      
      [root@WRT] ~ # openssl speed aes-256-cbc
      Doing aes-256 cbc for 3s on 16 size blocks: 3000958 aes-256 cbc's in 2.99s
      Doing aes-256 cbc for 3s on 64 size blocks: 779250 aes-256 cbc's in 2.97s
      Doing aes-256 cbc for 3s on 256 size blocks: 197398 aes-256 cbc's in 3.00s
      Doing aes-256 cbc for 3s on 1024 size blocks: 49589 aes-256 cbc's in 2.99s
      Doing aes-256 cbc for 3s on 8192 size blocks: 6133 aes-256 cbc's in 2.96s
      
      The 'numbers' are in 1000s of bytes per second processed.
      type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
      aes-256 cbc      16058.64k    16791.92k   16844.63k   16982.99k    16973.49k
      

Unfortunately they do. You can read about Telegram vs Roskomnadzor battle in Russia.
In short, they banned several amazon's /10, /11 subnets. As you can guess, that impacted far more than a single messenger.

The CPU in your router is a dual core with hyperthreading, and hence 4 threads in total. That simply means one thread is completely maxed out. It seems like the test you are running isn't properly multithreaded.

1 Like

after sndbuf/rcvbuf changes:

Now that you're getting ~ 12-15Mbps both directions, it seems unlikely you'll get a large speed boost from here. During download speed test with these settings what is idle percentage?

cat /proc/cpuinfo
cat /proc/stat

might help with the question of why 75% idle during RSA speed test, perhaps you have a newer version of the hardware that is quad core?

If you have 100Mbps and want to do fast VPN, I strongly recommend you get a low cost x86 network appliance box and use that as your router. For future-proofing, I'd suggest one that supports AES-NI for encryption speedup. This kind of device will handle up to a gigabit/s and VPN up to say 300 Mbps

https://www.amazon.com/Firewall-Appliance-Gigabit-AES-NI-Barebone/dp/B072ZTCNLK/

is an example

or if you want to share functions of routing and NAS, you might look at this:

https://www.amazon.com/Synology-bay-DiskStation-DS218-Diskless/dp/B075MZTQBT

If you go that way, you might want to install something like Debian and build a custom router + VPN + NAS solution. Look into FireHOL for setting up high quality firewalls without too much user-complexity if you go that route. You might also consider running OpenWRT in a VM to do the routing, and provide some additional isolation/security.

Or use Wireguard instead of OpenVPN. It is much much faster. Amazing piece of software.

cat /proc/stat

cpu  120372 0 104313 29869726 23 0 139162 0 0 0
cpu0 16303 0 17549 7435634 6 0 88927 0 0 0
cpu1 48577 0 23148 7470883 11 0 15778 0 0 0
cpu2 13850 0 13669 7521234 6 0 9635 0 0 0
cpu3 41642 0 49947 7441975 0 0 24822 0 0 0
intr 71581545 0 0 0 0 0 0 0 0 30233595 23334 2425627 1942471 5347144 115386 454159 194366 204972 0 12 0 10563071 1 17309569 2767838 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 24328312
btime 1525037629
processes 6772
procs_running 1
procs_blocked 0
softirq 114143773 0 29149710 75573 38031337 0 0 7262927 30156500 0 9467726

I greatly appreciate your advice, but both of those devices are WAY outside my budget, and i don't think those devices provide their own Wi-Fi(however, current router should be able to just route wi-fi traffic through them and back to the internet).

Switching VPN provider is out of budget as well, at least until the expiration of my current plan.

Yep, so the output of your /proc/stat confirms that there are 4 "cores" (threads) and as far as top is concerned 75% idle is equal to completely using up one of the cores, and 50% idle would be completely using up both cores (one thread per core) over 50% means cpu bottlenecked on multiple threads. So after adjusting your sndbuf and recvbuf I doubt you're going to get more speed than that, about 15Mbps max possible to encrypt with this device.

Yes, the budget for those devices is a lot, but as opposed to cheap router devices, the PC based devices will have typically longer life. You can probably route with one of those devices for the next 10 years without replacing it. Also, you may be currently paying for 100Mbps but only really able to use ~15, so perhaps overall cost efficiency changes if you take some of these other things into account. It's not all about the up front cost. Of course, if you just can't under any circumstances swing the up-front, then savings over lifetime aren't so relevant.

At this point, after adjusting sndbuf and recvbuf I think you probably just call it a day for now.

The way i set up my router, there's one "VPN" hotspot, which routes through the openVPN, and all the other ethernet and wifi hotspots are routed directly to the internet. So on my PC, I can get full bandwidth of the VPN.

I think 15/15 should be enough for one console to actively play on.

Thank you all for your help.

I have an old-ish Atom Z3740 tablet, i wonder, is it possible to turn it into DIY router?

http://projects.pyret.net/dump/lede/openvpn-benchmarks.txt

D-Link DIR-860L B1 (local)
Linksys WRT3200ACM (remote)

OpenVPN: -O3 -flto
musl: -O2
mbedtls: -O2
OpenVPN 2.4.4 (UDP, AES-192-CBC, no compression)

iperf3 -V -4 -t 60 -c 192.168.2.1

Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-60.01  sec   189 MBytes  26.5 Mbits/sec                  sender
[  5]   0.00-60.26  sec   189 MBytes  26.3 Mbits/sec                  receiver

iperf3 -V -4 -R -t 60 -c 192.168.2.1

Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.20  sec   164 MBytes  22.9 Mbits/sec    0             sender
[  5]   0.00-60.01  sec   164 MBytes  23.0 Mbits/sec                  receiver

That's over the net with low latency (~10ms) between the sites. (100mbit connections on each side)
Have in mind that OpenVPN is single threaded meaning that you can max out one core so you'll never see anything near full load in top. Also, at least on mips mbedtls is ~10% faster than OpenSSL.

Compression should be disabled as it slows down performance in ~98% of all cases.
Changing snd and rcvbuf will most likely do more harm than good.
You might be able to get slightly better performance if you mess around with the MTU settings but I somewhat doubt it. It would be interesting however if someone could give softether a spin and compare OpenVPN performance (it's compatible). Also, the above is pretty much as good as it gets using the MT7621 platform and OpenVPN.

I wouldn't bother with this. And running old desktop hardware will cost you more in power than you'd really like, savings from buying one of the low-powered celeron type devices will reduce your power consumption a lot, and make up for the difference in up front cost in like a year or two.

The only thing you might consider is whether to run your PC with a virtual machine router, and convert the mi3G to a bare access point. If your PC is on all the time anyway, you might find this better performance. You'll need someone else to help figure this one out for you though, I haven't set it up myself. Probably start a new thread for that topic if you want to go that way.