Where to find the latest news on NAT acceleration?

I'm an avid reader of this forum and a frequent struggler with OpenWrt performance. Most of the threads about performance improvements mention speeding up the NAT implementation. From what I read, there are four different methods to speed this up, but the information is 3+ years old, in some cases. I'd like to know where I can read the latest news on each method: Slack channel, mailing list, pull request, etc....

These are the methods I was able to find:

  1. flow-offload via software. Included in OpenWrt releases with kernel 4.14+ and enabled in Luci. Are there improvements scheduled for future releases or is this as good as it gets?
  2. flow-offload via hardware. I read that this is available for mt7621 and may be available for ar8337n in the future. This patch suggests that it is available now for ar8337n, will it be included in an upcoming release of OpenWrt?
  3. Qualcomm Fast Path. I read that this is similar to #1 (flow-offloading via software), are there advantages to it? There is a popular build making use of it here, will it be merged into OpenWrt in the future?
  4. hardware-nat. This is mentioned often as the way that stock firmware outperforms OpenWrt firmware. Are there any recent developments here? Are any hardware manufactures making compiled versions of their drivers available for download (similar to how Nvidia makes their drivers available for Linux OS variants)?

I am not an expert on the topic, but I wouldn't expect any major improvements.

That patch adds a DSA driver, which is basically a specific class of ethernet drivers. It has nothing to do with hardware NAT.

It is software flow offloading as well, but a very hacky implementation that is not implemented upstream. Don't expect this to be ever included in official builds.

Hardware-nat = hardware flow offload. So this is already available on mt7621 devices :slight_smile:

Unfortunately that isn't possible. Those modules are compiled for specific (usually ancient) kernels. They will not work on newer kernels that are used in OpenWRT.

1 Like

This post from late 2017 suggests that hardware flow offload will be available for other hardware besides mt7621. Is it available today in late 2020? If not, where can I find the latest news about its development?

I did some more research and was able to find proprietary drivers includes in recent releases of other router firmwares. FreshTomato and Asuswrt-Merlin are two examples. Are all of those drivers incompatible with the newer kernels? Could the same drivers be included in OpenWrt in the future if they are compatible?

It's being worked on for some Mediatek devices AFAIK (note that ramips is a different arch than Mediatek, despite both being made by Mediatek the company): https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=4fb58813f94ac6cc8167138e23a92189fe50b258

These Mediatek devices are really new and I haven't seen any talked about on these forums.

Yes. Even the description clearly states it's a modification of the original firmware in the case of Merlin:

"Developed by Eric Sauvageau, its primary goals are to enhance the existing firmware without bringing any radical changes"

I am not sure about FreshTomato, but if it has working hardware NAT support it's either a modified firmware as well, or it's using a kernel compatible with the compiled hardware NAT module.

1 Like

The source for Asuswrt-Merlin refers to kernel-4.1, which isn't ancient compared with 4.14 in OpenWrt. Finding a way to re-use the hardware acceleration features of that project in OpenWrt continues to be an active pursuit of mine. I welcome comments or findings from others on this topic.

As for news, I found recent (w/in 12 months) articles on LWN about accelerating packet flows through the Linux kernel. Here is the latest article.

Are there any other places where one can track progress of improving hardware acceleration?

Here is how I achieved gigabit NAT on broadcom. OpenWRT 19.07 and bcm53xx target + flow offloading

@a_guy, I can confirm your findings about bcm53xx. That thread is closed, so replying here. My ASUS 68U cannot handle more than 500 Mbps without "ethtool -K eth0 gro off". This is a regression though and not HW NAT acceleration. I think the original ASUS firmware has HW NAT acceleration. Is it possible to get HW NAT on bcm53xx ?

It also works for MT7622 and possibly MT7623.

HWNAT is very unlikely to happen to majority of targets, it seems. However with those tricks it is not actually needed as CPU is no longer a bottleneck (on my router (1.4x2 ARMv7) it uses like 10% of one core when downloading 900+mbit/s from internet.

Does anyone know how this patch affect SQM performance?

SQM and HW NAT are incompatible. If you enable HW NAT, SQM no longer does anything.

1 Like

And Software flow offloading? Does it support SQM or QoS?

I am not sure, but I think they can be used together. Just test it though. Enable software flow offloading, and put in speeds in SQM way below your connection speeds. If your download/upload is now capped at those speeds, you know that SQM is working :slight_smile:

2 Likes

It works. As data, I have an Archer C60 v1.

1 Like

Flow-offloading (hardware- and software) works by making the kernel not look at each packet individually, but to collate them as flows and apply the same policy for all of them in bulk. SQM however works by handling each packet individually, sending them at exactly the correct time or dropping it (to throttle the other end). Both strategies are at odds, hardware flow-offloading would break SQM - software flow-offloading 'works', but loses its performance advantage (all packets need to take the slow path and can't be offloaded).

So no, SQM and flow-offloading doesn't work (it doesn't break spectacularly, but you lose any advantage of flow-offloading).

4 Likes

So which one should you choose if you have a low power device?

From posts here it seems that you simply should not use flow offloading (hard- nor software).

A low-powered device like Ubiquiti Er-x can use sqm to shape 150 mbit up/down, and perhaps a bit more. I experienced similar results for Asus ac68u (arm dual core @ 800mhz)

1 Like

So I choose flow offloading since I don't have that much bufferbloat either.

The choice between flow offload and SQM depends on your WAN uplink speed. If your WAN speed is >800 Mbps then OpenWRT will likely struggle to achieve full speed without some form of offload. Once you achieve full WAN speed on such a fast connection you probably won't care to prioritize traffic with SQM. On the other hand, if your WAN speed is <300 Mbps then you'll likely have heavy competition for this bandwidth among local devices and will need SQM to prioritize traffic.

1 Like