Netfilter "Flow offload" / HW NAT

I have a MT7621 device with hardware offloading support, which is working well (0% CPU usage) but SQM doesn't work then. I only really need SQM on egress, so I would like to enable software offloading (which does work with SQM) on that and hardware offloading on ingress. Is that possible? I tried disabling "Hardware flow offloading" by default and then adding a "forwarding_rule" rule to enable it on ingress but had no luck. This is what I tried:

iptables -A forwarding_rule -i eth0.2 -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD --hw

I also tried this:

iptables -A forwarding_rule -o br-lan -m conntrack --ctstate RELATED,ESTABLISHED -j FLOWOFFLOAD --hw

Is there something else I should try? I'm using OpenWrt SNAPSHOT r11025-14054e2982.

Thanks.

No. A connection is double direction (i.e. duplex)

Did someone try latest trunk? (Flow offload fix)
Still doesn't work for me.

It seems that MT7621's HW OFFLOAD break the network under latest snapshot. I can ping from LAN to WAN, but can not open websites.

It happens under latest snapshot and 19.07 snapshot, 18.06.4 works well.

Resurrecting this old thread because it's pretty impossible to figure out what's the current status (wiki page needed?).

My RBM33G (MT7621) takes advantage of flow offloading on OpenWrt 19.07.4, r11208-ce6496d796; I get 10-15% more performance (eyes-measure, no exact methodology, no benchmarks). What about other platforms? (if any)

ar71xx - ???
ar79 - ???
mipsel - ???
...

mt7621 minor improvements and using vlans with hw offlload enabled breaks.

So not really usable, plus the latest openwrt using 5.4 has no hw offload currently.

Really ?! I didn't test properly, but I noticed 10-15% less soft irqs (using top) when flow offloading was enabled. I couldn't notice any difference using sw or hw offloading, but something was happening.

Ehm, is it a definitive feature drop, or temporary lack of flow offloading given the kernel version bump?

ar71xx/ath79: with software flow offload, a TP-Link Archer C7 goes from 250 Mbps to 700 Mbps, but all instances of software-based flow offload currently break long-lived idle TCP connections. The timeout is for some reason always 120s (and ignores the net.netfilter.nf_conntrack_tcp_timeout_established sysctl value), see Software flow offloading and conntrack timeouts for a more detailed report.

1 Like

It was added back to work with DSA. On MT7622 it can do 940 mbps NAT over PPPoE at 0% CPU load.

"On MT7622" - on which router exactly?

Tested on Banana Pi R64

I don't think MT7622 is a good measure if flow offload working better or not
ARM Cortex A53 is too fast a CPU to detect much changes in performance.

I am testing flow offload on MT7621 and seems to be stucked at 700-800Mbps

It's definitively visible, on sirq.

Without flow offload:

CPU: 0% usr 0% sys 0% nic 56% idle 0% io 0% irq 43% sirq

With flow offload:

CPU: 0% usr 0% sys 0% nic 99% idle 0% io 0% irq 0% sirq

How did you get that? Zero sirq!?

EDIT: the following tests aren't useful as I was running iperf on the router. They only show that sw offloading is working, but can't say anything about hw offloading.

My RBM33G have some benefit but still gets most of sirq and sys when hw offloading is enabled:

Mem: 67660K used, 186060K free, 1544K shrd, 1732K buff, 16228K cached
CPU:   0% usr  25% sys   0% nic  37% idle   0% io   0% irq  36% sirq
Load average: 1.18 0.40 0.13 4/87 18703
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
18694 18685 root     R     1148   0%  25% iperf3 -s

[ 5] 0.00-94.00 sec 7.99 GBytes 730 Mbits/sec 4 sender

Sw offload gives better throughput and higher cpu usage.

Mem: 54752K used, 198968K free, 236K shrd, 1588K buff, 10140K cached
CPU:   1% usr  25% sys   0% nic  29% idle   0% io   0% irq  43% sirq
Load average: 0.94 0.50 0.21 3/93 3205
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 3205  3196 root     R     1136   0%  25% iperf3 -s

[ 5] 0.00-205.90 sec 21.5 GBytes 898 Mbits/sec 70 sender

No offload gives worst throughput and cpu usage:

Mem: 55576K used, 198144K free, 236K shrd, 1588K buff, 10104K cached
CPU:   0% usr  26% sys   0% nic  43% idle   0% io   0% irq  29% sirq
Load average: 0.72 0.40 0.19 4/93 3225
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 3215  3207 root     R     1132   0%  25% iperf3 -s

[ 5] 0.00-255.19 sec 20.7 GBytes 697 Mbits/sec 79 sender

Note: OpenWrt 19.07.4, r11208-ce6496d796. I rebooted between one test and the other. Iperf client is gbit ethernet connected directly to one of the rbm33g ethernet ports; the 3 ports are separated using vlans.

Do not run iperf3 on your router. Flow offloading does not work that way.

2 Likes

His router has a ARM Cortex A53 core, our MT7621ATs are MIPS1004Kc
I am getting the same numbers as you but the thing is I have seen better numbers on MT7621 before this where 1Gbps Throughput works so I am not sure what caused the regression.
I run jperf standalone client/server

BTW, MIPS1004Kc? I'm using MIPS24Kc binaries! Should I switch to MIPS1004Kc?

Doesn't matter Openwrt label them MIPS24kc but the binaries are MIPS32R2 so they are compatible