Wireguard caps in one direction but reaches line speed in the other

Hey there.

I have a weird performance issue with Wireguard on OpenWRT. I'm pretty sure it's not the "my hardware is too slow" kind but rather a configuration thing. Let's hope I can do something without compiling anything.

I just installed OpenWRT on a Raspberry Pi 5.
OpenWRT runs Wireguard.
OpenWRT WAN port allows incoming 5001 connections for iperf.
The OpenWRT WAN port has 192.168.178.40
The OpenWRT Wireguard port has 10.0.0.1
I connected my computer via GBit cable to the network the WAN port of the Pi is in (so 192.168.178.0/24).

I'm running "iperf -s" on my Pi.

  • iperf -c 192.168.178.40 gives 1GBit
  • iperf -c 192.168.178.40 -R gives 1GBit
  • iperf -c 10.0.0.1 only gives approximately 400MBit
  • iperf -c 10.0.0.1 -R gives 1GBit
  • My computer is a MacBook Pro M2
  • The Mac "peaks" to 230% CPU, so plenty of CPU left
  • My Pi peaks to (in average) 4* 20% when on 400MBIt mode
  • The Pi peaks to (in average) 4*35% when on 1GBit mode
  • So it shouldn't be CPU bound, right?

If I'm not mistaken, "iperf " sends data, while "iperf -R" makes the iperf server send data.

Any ideas why there's asymmetric performance between sending and receiving, and potentially what I can do about that? It's not the cable because without Wireguard I get full GBit through.

Thanks in advance,
Stephan.

I wouldn't go quite this far…
230% means that more than one core is (fully) used, but networking loads are largely single-threaded in nature (wireguard not quite as strictly, but still). Furthermore you have to keep in mind that there are two ways to implement wireguard, natively as a kernel module (on contemporary linux and OpenWrt) or in userspace, using golang - the later always works, everywhere, but is a lot slower than native kernel support.

1 Like

OK, didn't know that. But I would assume this should affect both, TX and RX the same way, so I should either get 1GBit for both directions or 400MBit for both, but not 1GBit for one direction and 400MBit for the other, right?

On top of that, since my apple CPU has 12 cores (8p, 4e) and I've seen the activity monitor panel show 4 digits of percentage, I'd assumed 230% either means 2.5 cores at 100% or every core at roughly 20%.

Seems like I need to dig more into that. At first glance, htop on the mac

  • shows more than 20% per core
  • but still only roughly 60%.
  • shows the highest CPU usage to be a "WireGuardNetworkExtension" at :drumroll: 7%.

Not at all any number I expected in accordance to what the activity monitor told me.

Thank you for pointing me in that direction.