Netfilter "Flow offload" / HW NAT


I am using the latest trunk version and I noticed if either the software offload, or the software + hw offload is on, my pptp vpn connections are not able to establish.

I am using the nf_nathelper_extra module and auto module loading. Is this and expected limitation?

Otherwise the HW offload works very nicely (mt7621): 900/200Mbits with below 5% utilization via PPPoE (IPv4) in both directions.

Flow offload not work with MWAN3 on Octeon soc.

has anyone seen this? gwlim looks like he was able to get the qca-sdk module included which allows you to set hardware nat.

Will be pointless once it's on kernel 4.14.

Yeah but it’s still on 4.9 isn’t it?

Take a look into my ath79 builds if you are interested in 4.14 builds for the ath79/ar71xx Qualcom/Atheros devices.
Currently there is only a smal subset of the old ar71xx devices supported, but due to ar71xx will be left on 4.9 and 4.14 does support flow offloading by default, the number of ported devices increases...

1 Like

ath79 support recently got merged for the c7v2.

A image for C7c2 is included since yesterday...

FWIW, I'm using my own image because we were doing them at the same time.

I'm on a c7v2 and am noticing some strange interactions between flow offload and mwan3.
Anyone else multi-homed and playing with this yet?

Sadly offloading without hardware support seems to make mwan3 work slow or strange, is this your problem? I have this on ath79, with mediatek switch that can do hardware nat, all seems to work perfectly. Shortcut fe worked also

Will it though? Even with flow offload you can't quite route 1Gbps lan to wan on ipq806x, whereas the stock firmware can.

1 Like

The @gwlim work was on ar71xx, not ipq806x. ar71xx is not as crippled as ipq806x. I also have some patches in my tree that speed up the Ethernet driver there.

Flow offload gets the v2 to around 900mbps I believe.

On WDR3600 i reach ~930MBit/s and this device has only one cpu port connected to the switch and not as much cpu power as the C7v2...

iperf nat performance with flow offloading enabled on wdr3600 (ath79)

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-30.0 sec  3.24 GBytes   929 Mbits/sec

Stock fw will do a offload to the switch and openwrt does all connection handling on cpu (except hw flow offloading on MT7621)

I stand corrected :slight_smile:

My patches may allow reaching 930.

that's my point :slight_smile: HW flow offload, if it can be enabled cleanly for ipq806x would be a good goal.

Now if Qualcomm actually did something useful and didn't stick to Kernel 3.x for their QSDK that might be a possibility without a lot of reverse engineering.

Is @gwlim planning on trying to bang out the C7v2 hardware flow offload now that there is a well running 4.14 kernel for it?

Just checking if you were able to find a solution to this.
I am facing the same problem.
Picked 18.06.0-rc1 for Octeon (ER-Lite-3) and mwan3 didn't work.
@lucize is mwan3 working with SW flow offload for you? I see you raised a case and then closed it.

@rakesh the thing is there are unresolved problems with flow offload
first one was kernel panic (solved, I think), then was the conntrack table that was growing so much that in a mwan3 scenario will let you surf the net only minutes before getting full (solved, after that I closed the issue).
then I used the hardware nat option from mt chipset that seemed to work, but lately I was getting connection refused on many sites and after disabling the flow offload they would work so for the moment I gave up and changed to a device with kernel 4.9 and use SFE.
the software flow offload is taking too much time for the connection to start (maybe a dnsmasq issue, but I added dns servers to every interface)
much else I can't say, SFE is not patched for 4.14, maybe @dissent1 or @quarky would like to look into it (the @quarky one's works better)

I'll try the shorewall way and see how it goes

Software flow offloading works fine with my "hand-written" iptables+iproute2 load balancing/policy based routing rules. I don't use mwan3 so I dunno how it works.

I used the following targets:

-m conntrack --ctstate NEW -m statistic --mode nth --every x --packet y -j MARK --set-mark 0xabc
-m conntrack --ctstate NEW -j CONNMARK --save
-m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore