Flow_offloading=1 is broken on latest snapshot (4.19 issue)

I’ve had a similar issue in EdgeRouter X.

It turned out for whatever reason WAN port eth0 was negotiating at 100Mb instead of 1G.

I thought it was flow offload related as well, until I decided to finally check. Changing eth4 to WAN resolved it for me

I'll test this right now! I doubt it is the reason though, cause I've speeds over 300 Mbps with offload off.

I noticed the same on my Archer C7 v2, top speeds at around 300Mbps. I tried using the old ar71 snapshots as a workaround but it appears that rpcd is broken on those so luci does not work. (But offloading works, so for anyone not in need of luci this might be a temporary solution)

As i kind of like luci for the ease of use i also switched to an ath79 4.14 image. This also gave me an excuse to try and build my own image :wink:

Should i create a flyspray account to add to the bug so it is clear that this is not an isolated problem for just one user?

FS has voting system, and I guess that'd help if you vote for the bug!

I don't think interface speed is the case:

root@router:~# for IFACE in $(ip -br link show  | awk '{ print $1 }' | grep -v lo); do ethtool $IFACE; done
Settings for eth0:
	Supported ports: [ ]
	Supported link modes:   1000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Current message level: 0x000000ff (255)
			       drv probe link timer ifdown ifup rx_err tx_err
	Link detected: yes
Settings for eth1:
	Supported ports: [ TP AUI BNC MII FIBRE ]
	Supported link modes:   1000baseT/Half 1000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  1000baseT/Full 
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: No
	Link partner advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Current message level: 0x000000ff (255)
			       drv probe link timer ifdown ifup rx_err tx_err
	Link detected: yes
Settings for br-lan:
	Link detected: yes
Settings for eth1.1@eth1:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available
Settings for eth0.2@eth0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available
Settings for wlan0:
	Link detected: yes
1 Like

I guess it does not.

4 Likes

identified cause
netfilter: nft_flow_offload: fix interaction with vrf slave device

I've been looking at the commit here: https://lkml.org/lkml/2019/2/12/1545 and wonder whether ft->iifidx and ft->oifidx got accidentally swapped.

Before the change, ft->iifidx was route->tuple[dir].ifindex, after the change, it is other_dst->dev->ifindex where other_dst corresponds to route->tuple[!dir].dst.

So instead of route->tuple[dir] before the change, it is now route->tuple[!dir] - the reverse direction. Vice versa for ft->oifidx.

Maybe someone can test a build with these two swapped (swap dst with other_dst) and see if it restores functionality. Cannot test myself due to lack of hardware at hand.

1 Like

I'm ready to test, but I'd appreciate if somebody prepared a branch I could use.

had the same thing in my mind, however swapping dst with other_dst did not work, or there were more places to do the change than the places i swapped those.

however, simple swap of ft->iifidx with ft->oifidx and vice versa restores flow offload to a working state

Interesting. If this simple swap is indeed the fix, then an openwrt patch should be fairly easy :slight_smile:

By the way, since trunk is now at 4.19.62, I see a number of flow_offload patches in 4.19.58 (https://lkml.org/lkml/2019/7/8/729, https://lkml.org/lkml/2019/7/8/586 and https://lkml.org/lkml/2019/7/8/790). However, looking at the code does not seem to have fixed this issue.

We probably should invite someone from the kernel team to this thread. Or resend the summary as an email to lkml.

Kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=204507

2 Likes

i've made a simple patch based on this and opened PR https://github.com/openwrt/openwrt/pull/2266 but, as expected, they claim their approach is correct and we should adjust openwrt offload patches...

i wonder which other devices use flow offloading feature besides openwrt routers, and if it is possible to reproduce regression on these as well

I recall this from the kernel bump back then which took a few weeks to handle properly. Especially this exact part.

Ill try to take a look at this asap after ath10k-ct

@psyborg I took your patch and added it to my own build. Software flow offloading is working fine again with a recent 4.19 trunk build :slight_smile:

1 Like

So is FO working again for all builds?

Nope, that would've been communicated at the various bug reports.
If you have Archer C7 v2, you can use my build from here: https://github.com/MOZGIII/archer-c7-v2-builds/tree/manual-4.19-2019-08-08-1
See instructions here: https://github.com/MOZGIII/archer-c7-v2-builds/blob/master/README.md

Do you have a patch or set of patches to enable s/w flow offload for the Archer C7v2?

(It wasn't clear from your GitHub repos where the changes were.)

I assume a build with PR2266

1 Like