Possible Kernel 5.10 regression issue with MT7621 and SW/HW offload enabled

Not yet. :slight_smile:

BusyBox v1.35.0 (2022-02-14 13:40:34 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt SNAPSHOT, r18809-5a0975f7ef
 -----------------------------------------------------
 | Machine: D-Link DIR-860L B1                       |
 | Uptime: 1d, 02:27:51                              |
 | Load: 0.14 0.10 0.02                              |
 | Flash: total: 5.4MB, free: 5.1MB, used: 5%        |
 | Memory: total: 117.4MB, free: 81.8MB, used: 30%   |
 | WAN: xx.xx.xx.xx, proto: pppoe                    |
 | LAN: 192.168.xx.xx, leases: xx                    |
 -----------------------------------------------------

My use case is just an average case: using it as an AP with a minimal usage of (the 1000/300) bandwidth.

With 19.07.8 I had a few random reboot per month, but for the last time it lasted 50 days of uptime before I upgraded to test the fix from here.


All ok:

 08:37:10 up 6 days, 52 min,  load average: 0.21, 0.79, 0.55
2 Likes

@xabolcs, sorry about the newbie and OT question: how to you get this status from CLI? Is it a custom script you did or is there a command to check this status? Thanks!

It's: sysinfo

I tried @cezary's easyconfig once and borrowed it. :wink:
You can install (get) it from cezary's packages feed!

It's a nice addition to motd, I like it too!

1 Like

Build SNAPSHOT r18792-337e942290 just completed 6 days up and running with firewall4 and HW flow offload enabled on an Archer C6 v3.2! I just saved this build and marked it as stable in my image library!

OpenWrt SNAPSHOT r18812-918d4ab41e

Working fine:

  • hw offload
  • wifi 2.4ghz
  • wifi 5ghz

My last problem was wifi mcu timeout

Snapshot r18878-a93dfff10e is good so far on my main ER-X gateway. I'm just using SW offload (with fq_codel/simple.qos SQM). Packet steering enabled, irqbalance enabled, adblock running, 4 VLAN's (but no PPOE needed with ISP) and two AP's wired to it.

I can't claim 6 days of runtime LOL...more like 11 hours, but that includes a full work day of video and voice conferencing with the family doing whatever (streaming, games, etc.) in the background.

System and kernel logs look clean. No reboots. Everything working fine.

1 Like

I can also report snapshot r18896-2fd049f5cd with:
net: ethernet: mtk_eth_soc: add ipv6 flow offload support
has been working fabulously for a day now.

Same use as above, though I did also briefly test HW offload on a high throughput download and CPU usage dropped to essentially zero, as expected.

I also did some careful testing with fq_codel/simple.qos on a hardwired device and am easily getting 250 Mbps SQM throughput (I dialed it back from the high 200's), which I think is pretty phenomenal for a MT7621AT ER-X. SW offload, packet steering and irqbalance are enabled.

CAKE/layer_cake maxes out around 100-110 Mbps. With the ER-X as the server, iperf3 network throughout is ~490 Mbps, and in reverse (-R) it's ~800 Mbps. Traffic through the ER-X hits line rate (~930 Mbps) - no surprise there.

Thank You devs for the continued work on more "post DSA" finishing touches and restoring offloading on this target!

5 Likes

Offload on today snapshot a gigabit ftth works only at 1000mb/400upload, with 19.07 works at 1000/1000... back to 19.07 or a lean fork back to normal values.

PD: DSA, nftables, ipv6 too much changes makes unstable for me, stick with old releases for more years :slight_smile:

Send setup for test pls. I build server on my house for test with netperf.

tested on redmi ac2100, with nftables, iptables, ipv6 disabled or enable, iperf3 shows 400mb max upload.

Ok. I have Xiaomi R3G v1. I building now firmware for test.

Your ISP use

  • PPPoE User and password
  • IPoE DHCP
    ?
  • PPPoE User and password

Humm I need create pppoe server for simulation your network.

My setup is IPoE.

I run several tests. And send feedback.
When i finish, i send results for forum.
Thanks for info.

1 Like

HW Flow Offloading seems to work now with PPPoE (0% CPU load with it enabled during speedtests), however, connections are randomly timing out for me for some reason. Running OpenWrt SNAPSHOT r18944-aae7af4219. It reminds me as if MSS-clamping is not working correctly (same symptoms, but probably completely unrelated). Random pages are not working. Disabling HW Flow Offloading (going back to SW Flow Offloading) fixes the issue.

1 Like

Confirm. When copying files from HDD connected to the router:

[ 4215.953773] ------------[ cut here ]------------
[ 4215.958415] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:467 0x806deba4
[ 4215.965448] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[ 4215.972373] Modules linked in: ksmbd(O) nf_nat_amanda nf_conntrack_amanda nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet wireguard snd_usb_audio rndis_host nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject_bridge nft_reject nft_redir nft_quota nft_queue nft_objref nft_numgen nft_nat nft_meta_bridge nft_masq nft_log nft_limit nft_hash nft_fwd_netdev nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_dup_netdev nft_ct nft_counter nft_compat nft_chain_nat nf_tables nf_conntrack_netlink mt7615e(O) mt7615_common(O) mt76_connac_lib(O) mt76(O) mac80211(O) lz4 libblake2s ipt_REJECT cfg80211(O) cdc_ether xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG vhci_hcd usbnet usbip_host usbip_core usbhid ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda ts_kmp ts_fsm ts_bm snd_usbmidi_lib sch_cake ntfs3(O) nfnetlink_queue nfnetlink nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic
[ 4215.972669]  nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_log_ipv4 nf_flow_table nf_dup_netdev nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_broadcast nf_conntrack_bridge macremapper(O) lz4_decompress lz4_compress libchacha20poly1305 libblake2s_generic iptable_mangle iptable_filter ip_tables hwmon hid_generic crc_ccitt compat(O) br_netfilter fuse cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sg hid evdev ledtrig_gpio cryptodev(O) nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 nfsd bonding ip6_gre ip_gre gre ifb nat46(O) sit ipcomp6 xfrm6_tunnel esp6 ah6 xfrm4_tunnel ipcomp esp4 ah4 ip6_tunnel tunnel6 tunnel4 rpcsec_gss_krb5 auth_rpcgss oid_registry tun snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer
[ 4216.060245]  snd soundcore ovpn_dco(O) xfrm_user xfrm_ipcomp af_key xfrm_algo lockd sunrpc grace isofs hfsplus hfs cdrom autofs4 dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax nls_utf8 nls_koi8_r nls_cp866 nls_cp1251 zram algif_skcipher algif_rng algif_hash algif_aead af_alg echainiv arc4 leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd ahci libahci libata gpio_button_hotplug(O) xfs f2fs ext4 mbcache jbd2 btrfs xor raid6_pq mii tpm rng_core crc32_generic
[ 4216.188511] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O      5.10.101 #0
[ 4216.195969] Stack : 00000000 00000000 80c40000 8237d040 00000000 00000000 00000000 00000000
[ 4216.204310]         00000000 00000000 00000000 00000000 00000000 00000001 8140fd18 80aa9114
[ 4216.212643]         8140fdb0 00000000 00000000 8140fbd8 00000038 8040ce84 ffffffea 00000000
[ 4216.220982]         8140fbc8 0000076b 80a45108 ffffffff 8140fcf8 808b823c 00000000 808c0000
[ 4216.229320]         00000009 00000002 ffffffff 8237d040 00000000 804a82ac 00000008 80c20008
[ 4216.237660]         ...
[ 4216.240099] Call Trace:
[ 4216.240204] [<8040ce84>] 0x8040ce84
[ 4216.246114] [<804a82ac>] 0x804a82ac
[ 4216.249585] [<8000a184>] 0x8000a184
[ 4216.253055] [<8000a18c>] 0x8000a18c
[ 4216.256532] [<803f0fc4>] 0x803f0fc4
[ 4216.260002] [<8008afd8>] 0x8008afd8
[ 4216.263471] [<8008afd8>] 0x8008afd8
[ 4216.266939] [<806deba4>] 0x806deba4
[ 4216.270407] [<800355cc>] 0x800355cc
[ 4216.273885] [<806deba4>] 0x806deba4
[ 4216.277356] [<800356c4>] 0x800356c4
[ 4216.280845] [<806deba4>] 0x806deba4
[ 4216.284351] [<843dc9cc>] 0x843dc9cc [mac80211@8f9e4cae+0x745b0]
[ 4216.290240] [<806de8a0>] 0x806de8a0
[ 4216.293712] [<806de8a0>] 0x806de8a0
[ 4216.297208] [<800ac528>] 0x800ac528
[ 4216.300682] [<800af7d8>] 0x800af7d8
[ 4216.304152] [<800ac7c4>] 0x800ac7c4
[ 4216.307623] [<806428ec>] 0x806428ec
[ 4216.311093] [<80093040>] 0x80093040
[ 4216.314571] [<800ac9e0>] 0x800ac9e0
[ 4216.318042] [<8008d98c>] 0x8008d98c
[ 4216.321512] [<8089f884>] 0x8089f884
[ 4216.324980] [<8040f090>] 0x8040f090
[ 4216.328453] [<8003a3c8>] 0x8003a3c8
[ 4216.331923] [<8040e8cc>] 0x8040e8cc
[ 4216.335401] [<80003c68>] 0x80003c68

[ 4216.340373] ---[ end trace 0f8d976d6b60321e ]---
[ 4216.344993] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 4216.351493] mtk_soc_eth 1e100000.ethernet eth0: Link is Down
[ 4216.384005] mtk_soc_eth 1e100000.ethernet eth0: configuring for fixed/trgmii link mode
[ 4216.392220] mtk_soc_eth 1e100000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

Also, as I told before, something wrong with the FTP protocol. Copying from HDD is very slow and with transmitting timed-out errors. Uploading to a remote FTP server is buggy too.

P.S. Buggy VLAN patchset can be fixed with this patch.

I experienced this with Windows 10, but not with Linux (Ubuntu 20.04) with my r18809-5a0975f7ef build.
My build doesn't contain the IPv6 offloading support which hit master on Feb 19, 2022 as commit e316664.

:thinking:

Just checked, I didn't have IPv6 under Linux, but on Windows.

Try disabling IPv6 (or upgrade to a recent build, like me :sweat_smile: )!

I am running OpenWrt SNAPSHOT r18944-aae7af4219, so that should have that patch already included.

Archer C6 v3.2, IPv6 disabled here, main router. SW/HW offload enabled.

SNAPSHOT r18880-e9e61d76fd (from Feb 18th) running rock solid, uptime 15 days and counting (or at least until I have a power failure, it is not connected to an UPS).

3 Likes

With radios disabled is stable

I compared offloading code with BPI-R2 5.12 kernel code and took changes to ours.

--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -142,6 +142,29 @@
 }
 
 static int
+mtk_flow_mangle_ipv6(const struct flow_action_entry *act,
+		     struct mtk_flow_data *data)
+{
+	__be32 *dest = 0;
+	size_t offset_of_ip6_daddr = offsetof(struct ipv6hdr, daddr);
+	size_t offset_of_ip6_saddr = offsetof(struct ipv6hdr, saddr);
+	u32 idx;
+
+	if (act->mangle.offset >= offset_of_ip6_saddr && act->mangle.offset < offset_of_ip6_daddr) {
+		idx = (act->mangle.offset - offset_of_ip6_saddr) / 4;
+		dest = &data->v6.src_addr.s6_addr32[idx];
+	} else if (act->mangle.offset >= offset_of_ip6_daddr &&
+		   act->mangle.offset < offset_of_ip6_daddr + 16) {
+		idx = (act->mangle.offset - offset_of_ip6_daddr) / 4;
+		dest = &data->v6.dst_addr.s6_addr32[idx];
+	}
+	if (dest)
+		memcpy(dest, &act->mangle.val, sizeof(u32));
+
+	return 0;
+}
+
+static int
 mtk_flow_get_dsa_port(struct net_device **dev)
 {
 #if IS_ENABLED(CONFIG_NET_DSA)
@@ -306,6 +324,7 @@
 		mtk_flow_set_ipv4_addr(&foe, &data, false);
 	}
 
+
 	if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
 		struct flow_match_ipv6_addrs addrs;
 
@@ -329,6 +348,9 @@
 		case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
 			err = mtk_flow_mangle_ipv4(act, &data);
 			break;
+		case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
+			err = mtk_flow_mangle_ipv6(act, &data);
+			break;
 		case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 			/* handled earlier */
 			break;

Now the router is rock stable. I'm using static IPv4 and IPv6-6in4.
Not sure if these changes are the correct fix, but no more kernel panics.

3 Likes