Netfilter "Flow offload" / HW NAT

dchard · October 18, 2018, 1:29pm

Can I bypass the flow offload with a similar rule for PPTP? If yes, can you give me an example? I have mt7621 based router, and would like to use the offload capability, but it breaks PPTP on the client side behind the router.

Thanks!

cwbsw · October 18, 2018, 2:33pm

I'm afraid I can not help you, I have no experience about pptp config.

Cye3s · October 24, 2018, 10:30am

I have an notebook run Manjaro under my x86 Openwrt, if I enable flow offload, the remote ssh and scp access(login through wan, scp send file from notebook to my office PC) become very slow(50KB/s).
If I run a speedtest.net test on notebook, result is upload speed=25Mbps, and http server speed on the notebook is fine, and scp send file from office pc to notebook speed is normal too.
If I disable flow offload, the ssh and scp speed back to 1.8MB/s, mach the max upload speed 25Mbps.

Anyone same? I had to disable flow offload now.

lucize · October 24, 2018, 8:54pm

yes, for me is the same, but I used the link over wireguard, if I exclude wireguard from flowofload everything is ok
in /etc/firewall.user something like
iptables -A forwarding_rule -i wg0 -j ACCEPT
iptables -A forwarding_rule -o wg0 -j ACCEPT

lsiudut · October 27, 2018, 8:54am

Hey guys, joining the offload testing party with similar report as last posts - slow upload, but with slightly different conditions.

I use mt7621 platfrorm with software + hw offloading enabled. My ISP uses PPPoE.

Right after firewall restarts I'm getting nice results on speedtests, ~150Mbit downling, ~20Mbit uplink. After a while though bad things starts to happening to uplink - it basically stalls after ~400KB. This is how it looks like for speedtest:

Testing upload speed..........
Upload: 0.46 Mbit/s

And now another interesting observation - I also observed transfer stalling on scp sending (similar to what @Cye3s described above, ofc I'm pretty sure this is protocol agnistic, it's just easy for me to scp). Interesting thing happened - I left transfer "hanging" for a while and it resumed properly after 2-3 minutes - kept working since then (even for new connections) while speedtest still stalls.

I'm definitely not exceeding nf_conntrack_max, during my tests value varied from 200 to 400 / 16k of max.

I remember that I've seen something similar on mwan3 setup and Quallcomm offload implementation that went viral ~year ago. The problem there was that it was properly offloading only for one of broadbands (let's call it WAN1). When I was sending something thorough WAN2 and exceeded offloading threshold (aka offloading got enabled on given connection) it was hanging in exactly the same way and sending following packets on WAN1.

I'll keep digging in this, will take a look on sources as well to understand the mechanism better. I have a feeling that in certain cases once we start offloading transfer stalls for some reason.

PS. while I was writing the post scp problems started to happening again, leaving it for a while to confirm that it will un-stall:

0% 2112KB 736.3KB/s - stalled

PS2. yes it did un-stall after ~20 seconds.

lsiudut · October 27, 2018, 5:47pm

Replying to myself.

Delaying offload a bit seems to be helping. After I added following line into /etc/firewall.user it seems to be stable for over 1h now:

iptables -A forwarding_rule -m conntrack --ctstate RELATED,ESTABLISHED -m connbytes --connbytes 0:10240 --connbytes-dir both --connbytes-mode bytes -j ACCEPT

dchard · November 6, 2018, 11:10pm

@nbd

Before I create an issue, I will try one more time:

With SW (or HW) flow offload enabled, the PPTP client behind the router stops working.

The proper nat helper module is loaded.

Any help would be appreciated. I am a bit surprised that I am the only one noticing this.

shonosf · November 10, 2018, 9:01am

Hi all,

I've built an FW yesterday from snapshot for my Archer C7 V2 with my usual config. (some extra packages for VPN, statistic, ddns - nothing extraordinary)
Now I'm totally greateful that offload kmod is selected by default. All I had to do is to enable it in LUCI.
It is a huge improvement from previous ~300Mbps WAN throughput.

I've made some tests with iperf3
iperf host: (H)
iperf client: (C)
data flow: ->
Revers test (iperf -R): (R)

1	lan eth (H) -> lan eth (C)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  17.5 GBytes   837 Mbits/sec    0             sender
[  5]   0.00-180.00 sec  17.5 GBytes   837 Mbits/sec                  receiver
CPU Utilization: local/sender 1.0% (0.0%u/1.0%s), remote/receiver 14.7% (1.6%u/13.1%s)
snd_tcp_congestion cubic

1(R)	lan eth (C) -> lan eth (H)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  19.6 GBytes   935 Mbits/sec    0             sender
[  5]   0.00-180.00 sec  19.6 GBytes   934 Mbits/sec                  receiver
CPU Utilization: local/receiver 25.5% (2.3%u/23.2%s), remote/sender 2.7% (0.1%u/2.6%s)
rcv_tcp_congestion cubic

2	wan (H) -> lan eth (C)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  14.2 GBytes   676 Mbits/sec  4120             sender
[  5]   0.00-180.00 sec  14.2 GBytes   676 Mbits/sec                  receiver
CPU Utilization: local/sender 3.8% (0.1%u/3.6%s), remote/receiver 11.7% (1.1%u/10.6%s)
snd_tcp_congestion cubic

2(R)	lan eth (C) -> wan (H)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  17.5 GBytes   834 Mbits/sec  4114             sender
[  5]   0.00-180.00 sec  17.5 GBytes   834 Mbits/sec                  receiver
CPU Utilization: local/receiver 28.9% (2.4%u/26.4%s), remote/sender 3.3% (0.0%u/3.2%s)
rcv_tcp_congestion cubic

3	lan eth (H) -> lan wifi (C)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  8.03 GBytes   383 Mbits/sec    1             sender
[  5]   0.00-180.00 sec  8.03 GBytes   383 Mbits/sec                  receiver
CPU Utilization: local/sender 2.1% (0.1%u/2.0%s), remote/receiver 12.2% (1.2%u/11.0%s)
snd_tcp_congestion cubic

3(R)	lan wifi (C) -> lan eth (H)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  8.38 GBytes   400 Mbits/sec  2936             sender
[  5]   0.00-180.00 sec  8.37 GBytes   400 Mbits/sec                  receiver
CPU Utilization: local/receiver 14.5% (1.1%u/13.4%s), remote/sender 1.5% (0.0%u/1.4%s)
rcv_tcp_congestion cubic

4	wan (H) -> lan wifi (C)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  5.68 GBytes   271 Mbits/sec    0             sender
[  5]   0.00-180.00 sec  5.68 GBytes   271 Mbits/sec                  receiver
CPU Utilization: local/sender 1.7% (0.1%u/1.6%s), remote/receiver 5.8% (0.6%u/5.2%s)
snd_tcp_congestion cubic

4(R)	lan wifi (C) -> wan (H)
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-180.00 sec  6.86 GBytes   327 Mbits/sec  2370             sender
[  5]   0.00-180.00 sec  6.86 GBytes   327 Mbits/sec                  receiver
CPU Utilization: local/receiver 13.8% (1.0%u/12.8%s), remote/sender 1.1% (0.0%u/1.1%s)
rcv_tcp_congestion cubic

Really thank you for improvement!

vladx · November 21, 2018, 1:21pm

Hi,

Is there any pre-requisite of this on TP-Link Archer C7 running latest LEDE FW (18.x)? Or just simply put this in user firewall config and offload will be enabled? I suppose hw option is not available for this model

Thanks

escalade · November 25, 2018, 12:32am

Seems like IPSec still doesn't work with flow offloading. Here's how I disable offloading for that while keeping it on for regular traffic:

iptables -A forwarding_rule -m policy --pol ipsec --dir out -j zone_vpn_forward
iptables -A forwarding_rule -m policy --pol ipsec --dir in -j zone_vpn_forward

I've defined my VPN subnets as zone 'vpn' in UCI. The trick here is to jump the OFFLOAD rule which comes directly after forwarding_rule:

OpenWrt:~# iptables -v -n -L FORWARD
Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
18568 2985K forwarding_rule  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* !fw3: Custom forwarding rule chain */
15163 2360K FLOWOFFLOAD  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* !fw3: Traffic offloading */ ctstate RELATED,ESTABLISHED FLOWOFFLOAD
15163 2360K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED /* !fw3 */
    4   256 zone_vpn_forward  all  --  *      *       10.1.1.0/24          0.0.0.0/0            /* !fw3 */
    0     0 zone_vpn_forward  all  --  *      *       10.1.2.0/24          0.0.0.0/0            /* !fw3 */
    0     0 zone_vpn_forward  all  --  *      *       10.1.3.0/24          0.0.0.0/0            /* !fw3 */
    0     0 zone_tor_forward  all  --  *      *       192.168.2.0/24       0.0.0.0/0            /* !fw3 */
 3229  569K zone_lan_forward  all  --  br-lan *       0.0.0.0/0            0.0.0.0/0            /* !fw3 */
  172 55554 zone_wan_forward  all  --  eth0.2 *       0.0.0.0/0            0.0.0.0/0            /* !fw3 */
  100 51610 reject     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* !fw3 */

annulus · December 7, 2018, 3:53pm

Some of my connections were stalling too with sw nat offload (WDR-3600), but not anymore. The rule you mentioned made it work as expected. Thanks!

lsiudut · December 8, 2018, 7:22pm

Happy to hear that! Out of curiosity - are you using any kind of PPPoE connection? I assume not since you use sw offload, but I'm still confused why only few of us complain about the stalls.

Another thing that comes to my mind is MTU and MSS clamp - do you use those maybe?

annulus · December 9, 2018, 9:11am

Basically it is an FTTB connection, but it is encapsulated in PPPoE.
I was able to reach ~200M without offloading and it's around 460M with it .

I didn't modify anything regarding the MTUs or MSS clamping, but I checked these and I can tell that the issue is caused by MTU mismatch:
The PPPoE has 1492 MTU, while the rest runs on 1500.
By default (I don't know, maybe I set it unintentionally previosuly) I had only an ingress MSS clamp (WAN =>), but after enabling it on the reverse direction (LAN=>WAN), it works like a charm (no need for the connbyte rule).

Thanks for sharing your thoughts!

lsiudut · December 9, 2018, 5:57pm

Hey, this is great! I think it allows us to assume that problem is somewhere with PPPoE or MTU related code (as lower MTU is normal on PPPoE), moreover on its software related part. I claim so as I use hardware offload.

I'll start my PTO on 19th and will try to find some time to dig for what's causing those stalls :).

quarky · December 9, 2018, 9:35pm

PPPoE has an 8 bytes encapsulation overhead. So all WAN links using PPPoE will have it MTU configured with 1492 if the Ethernet MTU is 1500. What you’re encountering is the classical PMTU issue. Many ISPs blocks ICMP replies, thus causing the problem. MSS clamping at the router side typically will be the solution. But this onl6 works for TCP connections, not UDPs.

Don’t think there’s any issue with the networking code.

lsiudut · December 9, 2018, 10:38pm

Right, seems that you're right. After you posted I recalled that once I disabled clamping and was observing similar behavior.

There is one thing that puzzles me though. FLOWOFFLOAD is applied only to RELATED,ESTABLISHED connections and that means that it will be applied only to a connections for which packets going both sides were observed. MSS negotiation happens during connection time (syn/ack) thus clamping should take place and offload should work as expected.

Unless it falls into some weird TCP/netfilter optimization that I'm not aware of.

quarky · December 9, 2018, 11:22pm

From what I understood of all the flow acceleration/offload techniques, be it software or hardware, is that they function after netfilter/conntrack has successfully processed the direction of the traffic and deemed it safe. These are those ESTABLISH, RELATED connections.

The offload engine keeps track of such connection. When it sees such network packets it sends the packet straight out the destination network interface, bypassing the netfilter stacks, which is deemed unnecessary. This is what is ‘accelerating’ the routing.

As for MTU, yes, it’s negotiated during connection establishment. When a client with larger MTU sends outgoing traffic behind a router with smaller MTU, the router clamps the MSS of the connection establishment, thereby reducing the effective MTU for that connection.

This issue will be especially problematic for IPv6, as IPv6 doesn’t allow fragmentation.

lsiudut · December 10, 2018, 8:06am

All that you wrote makes sense and I understand it. What I don't understand is why the problem occur for new connections since FLOWOFFLOAD rule is applied only for RELATED,ESTABLISHED, so after MTU negotiation during TCP three way handshake.

I modified my firewall rules to pass only three incoming/outgoing packets before enabling offload mechanism. Let's see if this change will affect the traffic.

annulus · December 10, 2018, 8:27am

In case you need help on debugging this, I'm glad to help you, just give me some instructions

lsiudut · December 10, 2018, 8:35am

So @quarky is absolutely right that it looks like PMTU issues and makes sense giving that we both are using PPPoE. Nevertheless once again I don't understand why wouldn't it break since offloading is applied after connection params negotiations.

Can you try to replace the rule I suggested previously with those two? The difference is that those will skip offloading only on first three incoming and first three outgoing packets. It's not a big difference from a computational point of view, more out of curiosity.

iptables -A forwarding_rule -m conntrack --ctstate RELATED,ESTABLISHED -p tcp -m connbytes --connbytes 0:3 --connbytes-dir original --connbytes-mode packets -j ACCEPT
iptables -A forwarding_rule -m conntrack --ctstate RELATED,ESTABLISHED -p tcp -m connbytes --connbytes 0:3 --connbytes-dir reply --connbytes-mode packets -j ACCEPT