Comparative Throughput Testing Including NAT, SQM, WireGuard, and OpenVPN

welcome, create a new thread regarding your specific needs :santa:

1 Like

Edited my earlier post. Had a brainfart. I was talking about Mbit/s, not MB/s of course.

Snuck in some mvebu numbers, updated the IPQ4019

Now testing flow offload, as well as explicitly setting AES-256-CBC for OpenVPN. Seems like it was using BF-CBC (Blowfish) as the default.

Had to trim out several comments due to a 32,000-character limit on a post. I'll figure out how to manage it after the holidays.

3 Likes

Asus RT-AC57U is dual core as well. Same for the D-Link DIR-860L Rev B.

Just got a DIR-878 Rev A1 in as well, but that's MT7615E wireless.

Doesn't matter if you are testing wired routing speeds only :slight_smile: Netgear r6260 is also a good option.

If anyone is looking for a cheap multi-core MT7621A router that works with 19.07 - check https://www.amazon.com/gp/product/B01MXXQXW9/ - $90 and shipped from China

It's a ZBT WG3526 with 512MbRAM, 16Mb flash, should be 2 cores but shows 4 in /proc/cpuinfo. Comes with a custom OS that can be re-flashed to openwrt-19.07 using instructions for similar router: https://openwrt.org/toh/zbt/wg2626

I still need to properly test it, especially given [18.06.4] speed fix for BT HomeHub 5a, but it shows ~420Mbit/sec over close range 5Ghz wifi in default config (where wifi irqs are pinned to single cpu and RPS is enabled for all cpus but 0)

That should be right. The MT7621A is a dual-core MIPS 1004Kc, each with SMT(2).

isn't it weird that ath79 and ipq40xx perform almost the same (SQM) while one is single core mips and the other quad core arm? are the 3 other cores used at all?

anyway, for what it's worth, on my mt7621A dir-860l B1 NAT is about 800Mbps and SQM is between 200 and 300 MBps (cake, piece of cake)

1 Like

OpenVPN is single threaded. IPQ40xx could probably do three OpenVPN sessions with same speed though.

hmm yea, i was talking about nat+sqm tho...

I think NAT+SQM is also single threaded?

There's a cool new cake patch someone needs to slam into openwrt, here:

https://lists.bufferbloat.net/pipermail/cake/2020-May/005257.html

There was also a patch to wireguard that went by earlier, which has hopefully also made it in....

To answer the sqm question, yes, dang it, sqm is essentially single threaded. I'd love some help towards making a multicore shaper. Fq_codel_fast helpers, test and testers?

2 Likes

I am curious, what patch are you talking about here? :slight_smile:

I don't know if this has made openwrt yet.

[PATCH 5.6 029/177] wireguard: queueing: preserve flow hash across packet scrubbing

[ Upstream commit c78a0b4a78839d572d8a80f6a62221c0d7843135 ]

It's important that we clear most header fields during encapsulation and
decapsulation, because the packet is substantially changed, and we don't
want any info leak or logic bug due to an accidental correlation. But,
for encapsulation, it's wrong to clear skb->hash, since it's used by
fq_codel and flow dissection in general. Without it, classification does
not proceed as usual. This change might make it easier to estimate the
number of innerflows by examining clustering of out of order packets,
but this shouldn't open up anything that can't already be inferred
otherwise (e.g. syn packet size inference), and fq_codel can be disabled
anyway.

Furthermore, it might be the case that the hash isn't used or queried at
all until after wireguard transmits the encrypted UDP packet, which
means skb->hash might still be zero at this point, and thus no hash
taken over the inner packet data. In order to address this situation, we
force a calculation of skb->hash before encrypting packet data.

Of course this means that fq_codel might transmit packets slightly more
out of order than usual. Toke did some testing on beefy machines with
high quantities of parallel flows and found that increasing the
reply-attack counter to 8192 takes care of the most pathological cases
pretty well.

3 Likes

Rather happy with this. Squint at the bottom for the "after".

6 Likes

Holy shit that is a massive improvement for bufferbloat on wireguard! Is this already included in the kernel version for mainline linux? And when will this hit the Openwrt version? I am assuming both ends of the link require this patch for the full benefits?

Thought it was there

2 Likes

By the way, is this improvement by default, or do you need to set up SQM on the wireguard interface to reap these benefits?

no, sqm on the main egress interface (or line rate on the wifi) - or even line rate on ethernet if that's your bottleneck - "just works".

Note there's a cake patch also, I don't know if that's in openwrt yet, either.

really do want some benchmarking of the differences here in the real world. I figure that MOST of the time wireguard is rate limited by the egress interface, not by crypto, in the openwrt world, but I'd like to know more.... if there is anyone here that can do a before/after with stuff like the rrul test, or for example, voip over wireguard while under other loads - it would be nice.

If it's limited more by crypto, well, I'd long planned to stick something called "crypto queue limits", but
that fix was WAY more invasive and we've not got around to it.

1 Like

well, that would be best yes. Better levels of FQ-anything on either side, though, tend to drive both sides towards better multiplexing in general.