Netfilter "Flow offload" / HW NAT


#162

Is @gwlim planning on trying to bang out the C7v2 hardware flow offload now that there is a well running 4.14 kernel for it?


#163

Just checking if you were able to find a solution to this.
I am facing the same problem.
Picked 18.06.0-rc1 for Octeon (ER-Lite-3) and mwan3 didn't work.
@lucize is mwan3 working with SW flow offload for you? I see you raised a case and then closed it.


#164

@rakesh the thing is there are unresolved problems with flow offload
first one was kernel panic (solved, I think), then was the conntrack table that was growing so much that in a mwan3 scenario will let you surf the net only minutes before getting full (solved, after that I closed the issue).
then I used the hardware nat option from mt chipset that seemed to work, but lately I was getting connection refused on many sites and after disabling the flow offload they would work so for the moment I gave up and changed to a device with kernel 4.9 and use SFE.
the software flow offload is taking too much time for the connection to start (maybe a dnsmasq issue, but I added dns servers to every interface)
much else I can't say, SFE is not patched for 4.14, maybe @dissent1 or @quarky would like to look into it (the @quarky one's works better)

I'll try the shorewall way and see how it goes


#165

Software flow offloading works fine with my "hand-written" iptables+iproute2 load balancing/policy based routing rules. I don't use mwan3 so I dunno how it works.

I used the following targets:

-m conntrack --ctstate NEW -m statistic --mode nth --every x --packet y -j MARK --set-mark 0xabc
-m conntrack --ctstate NEW -j CONNMARK --save
-m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore

#166

Hi, @nbd, I'm testing OpenWrt 18.06-rc1 (x86-64) on a PC Engines APU2C4 system and I also see a huge number of lingering TCP connections (over 17 000, after about 4 hours) with software flow offloading enabled, when we usually peak at less than 2000 (this is a small company with about 20 total wired/wireless clients). I'll probably do some tests at home with 18.06-rc1, on one of my Turris Omnias, next weekend. Is this issue still being looked into? I can provide more details about my setups, if required. Thanks in advance!


#167

see for trunk commit if is committed to 18.06-rc1 https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=68ab89854fede80ab6a4279204462d6b898a653f


#168

Yes, it is, that's why I asked if the issue is still being looked into.


#169

OFFLOAD work well with mwan3 for me , but it has problem with wireguard. some sites such as inoreader.com load very slowly, and RTSP TCP connection is broken, UDP connection is not affected. I must bypass wireguard device to fix it temporarily.

iptables -I FORWARD -o wireguard -j ACCEPT
iptables -I FORWARD -i wireguard -j ACCEPT

#170

It seems that it's a wireguard issue, other vpn client such as openconnect works fine.


#171

After some more thorough testing on my Omnia, at home, I came to the conclusion that, as far as I can tell, this problem only occurs when both SQM (I only tested cake) and software flow offloading are enabled. I know flow offloading is an experimental feature, but I was expecting suboptimal performance rather than a showstopper. I guess I'll just file flow offloading under "don't do that", for the time being. :sweat_smile:


#172

Anyone know what the impact is of flow offloading to bufferbloat? On my Turris Omnia (mvebu) and my Netgear R7800, I'm getting higher ping while transferring on DSLReports' benchmark.


#173

Strange behavior with latest 18.06.1 OpenWRT, they cant use LEDE code for that?

https://forum.openwrt.org/t/xiaomi-mi-wifi-router-3g-hw-nat/19703


#174

Any idea what's wrong? random reboot seems to occur after enabling HW NAT.
Unfortunately I couldn't find exact step to reproduce.
Router: Mi Router 3G (MT7621)
Firmware: OpenWrt SNAPSHOT r8021-9e58c20

[ 9226.062793] Unhandled kernel unaligned access[#1]:
[ 9226.067594] CPU: 3 PID: 72 Comm: kworker/3:1 Not tainted 4.14.68 #0
[ 9226.073883] Workqueue: events_long nf_ct_kill_acct [nf_conntrack]
[ 9226.079952] task: 8fd712c0 task.stack: 8fdea000
[ 9226.084457] $ 0   : 00000000 00000001 8e222d98 000d904e
[ 9226.089667] $ 4   : 8f0f348c 00000000 0f55bc3d 00000001
[ 9226.094880] $ 8   : 00000000 00007c00 811ca500 0001225e
[ 9226.100092] $12   : 00000000 00000000 ffffffff 00000764
[ 9226.105306] $16   : 8f0f348c 8e9b7000 8e9b7000 00000000
[ 9226.110519] $20   : 000007da 805a0000 00000019 8f0d0000
[ 9226.115729] $24   : 00000000 8f0c06bc
[ 9226.120938] $28   : 8fdea000 8fdebdc0 805c1760 8f0f0864
[ 9226.126149] Hi    : 00000b32
[ 9226.129011] Lo    : 76457000
[ 9226.131889] epc   : 8f0f086c nf_ct_nat_ext_add+0x218/0x928 [nf_nat]
[ 9226.138129] ra    : 8f0f0864 nf_ct_nat_ext_add+0x210/0x928 [nf_nat]
[ 9226.144363] Status: 11007c03 KERNEL EXL IE
[ 9226.148533] Cause : 40800014 (ExcCode 05)
[ 9226.152517] BadVA : 000d904e
[ 9226.155382] PrId  : 0001992f (MIPS 1004Kc)
[ 9226.159453] Modules linked in: pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time
[ 9226.227867] Process kworker/3:1 (pid: 72, threadinfo=8fdea000, task=8fd712c0, tls=00000000)
[ 9226.236177] Stack : 00000000 000007da 805a0000 00000019 8e9b7000 8f0d0208 8f0d021c 8f0caf28
[ 9226.244516]         8e3a82b0 00000010 00000000 8e9b7000 8e9b7000 8e9b7040 00000013 8f0c066c
[ 9226.252857]         805fedc0 805a01b8 8123fdc0 00000001 8e9b7000 8f0c1c30 8fd715d8 80490000
[ 9226.261193]         805a0000 80056a98 8f0d0000 8f0d0000 00000020 00000002 8f0d01a4 8f0cce54
[ 9226.269530]         81242a00 8fd712c0 8f0d01a4 8fd0f180 8123fa40 81242a00 00000000 00000000
[ 9226.277864]         ...
[ 9226.280314] Call Trace:
[ 9226.282764] [<8f0f086c>] nf_ct_nat_ext_add+0x218/0x928 [nf_nat]
[ 9226.288675] Code: 02002025  8e23007c  8e220078 <ac620000> 10400002  00000000  ac430004  24020200  ae22007c
[ 9226.298400]
[ 9226.300064] ---[ end trace bdd2cad862bd9103 ]---

#175

I am going to try again, maybe someone knows how to fix this:

With SW (or HW) flow offload enabled, the PPTP client behind the router stops working.

The proper nat helper module is loaded, maybe nat helper modules are not compatible with Flow offload?

Any help would be appreciated.


#176

Are you still having issues with a combination of mwan3 + wireguard + flow offload? everything is extremely slow for me on Xiaomi R3G with offload enabled (latest snapshot).


#177

Yes, the problem still exists. I use following commangs to make wireguard interface bypass flowoffload.

iptables -A forwarding_rule -i wg -j ACCEPT
iptables -A forwarding_rule -o wg -j ACCEPT

I'm not using mwan3, so I don't know if it has compatible issues with flowoffload.


#178

In my case all traffic goes through wireguard, so there is no need to have flow offload enabled at all then?


#179

How do I check if router is doing hardware flow offloading ?
I built master for Buffalo WZR-HP-G300NH (ar71xx) and iperf was doing 350Mbs without offload and 500Mbs with offload. I expect something more, so I suspect is doing software offloading.
thank you


#180

Currently, HW offloading is only supported in ramips/mt7621.


#181

If disabling flowoffload could solve your problem, just do it.