Possible Kernel 5.10 regression issue with MT7621 and SW/HW offload enabled

Ok. I have Xiaomi R3G v1. I building now firmware for test.

Your ISP use

  • PPPoE User and password
  • IPoE DHCP
    ?
  • PPPoE User and password

Humm I need create pppoe server for simulation your network.

My setup is IPoE.

I run several tests. And send feedback.
When i finish, i send results for forum.
Thanks for info.

1 Like

HW Flow Offloading seems to work now with PPPoE (0% CPU load with it enabled during speedtests), however, connections are randomly timing out for me for some reason. Running OpenWrt SNAPSHOT r18944-aae7af4219. It reminds me as if MSS-clamping is not working correctly (same symptoms, but probably completely unrelated). Random pages are not working. Disabling HW Flow Offloading (going back to SW Flow Offloading) fixes the issue.

1 Like

Confirm. When copying files from HDD connected to the router:

[ 4215.953773] ------------[ cut here ]------------
[ 4215.958415] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:467 0x806deba4
[ 4215.965448] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[ 4215.972373] Modules linked in: ksmbd(O) nf_nat_amanda nf_conntrack_amanda nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet wireguard snd_usb_audio rndis_host nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject_bridge nft_reject nft_redir nft_quota nft_queue nft_objref nft_numgen nft_nat nft_meta_bridge nft_masq nft_log nft_limit nft_hash nft_fwd_netdev nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_dup_netdev nft_ct nft_counter nft_compat nft_chain_nat nf_tables nf_conntrack_netlink mt7615e(O) mt7615_common(O) mt76_connac_lib(O) mt76(O) mac80211(O) lz4 libblake2s ipt_REJECT cfg80211(O) cdc_ether xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG vhci_hcd usbnet usbip_host usbip_core usbhid ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda ts_kmp ts_fsm ts_bm snd_usbmidi_lib sch_cake ntfs3(O) nfnetlink_queue nfnetlink nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic
[ 4215.972669]  nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_log_ipv4 nf_flow_table nf_dup_netdev nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_broadcast nf_conntrack_bridge macremapper(O) lz4_decompress lz4_compress libchacha20poly1305 libblake2s_generic iptable_mangle iptable_filter ip_tables hwmon hid_generic crc_ccitt compat(O) br_netfilter fuse cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sg hid evdev ledtrig_gpio cryptodev(O) nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 nfsd bonding ip6_gre ip_gre gre ifb nat46(O) sit ipcomp6 xfrm6_tunnel esp6 ah6 xfrm4_tunnel ipcomp esp4 ah4 ip6_tunnel tunnel6 tunnel4 rpcsec_gss_krb5 auth_rpcgss oid_registry tun snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer
[ 4216.060245]  snd soundcore ovpn_dco(O) xfrm_user xfrm_ipcomp af_key xfrm_algo lockd sunrpc grace isofs hfsplus hfs cdrom autofs4 dm_mirror dm_region_hash dm_log dm_crypt dm_mod dax nls_utf8 nls_koi8_r nls_cp866 nls_cp1251 zram algif_skcipher algif_rng algif_hash algif_aead af_alg echainiv arc4 leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd ahci libahci libata gpio_button_hotplug(O) xfs f2fs ext4 mbcache jbd2 btrfs xor raid6_pq mii tpm rng_core crc32_generic
[ 4216.188511] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O      5.10.101 #0
[ 4216.195969] Stack : 00000000 00000000 80c40000 8237d040 00000000 00000000 00000000 00000000
[ 4216.204310]         00000000 00000000 00000000 00000000 00000000 00000001 8140fd18 80aa9114
[ 4216.212643]         8140fdb0 00000000 00000000 8140fbd8 00000038 8040ce84 ffffffea 00000000
[ 4216.220982]         8140fbc8 0000076b 80a45108 ffffffff 8140fcf8 808b823c 00000000 808c0000
[ 4216.229320]         00000009 00000002 ffffffff 8237d040 00000000 804a82ac 00000008 80c20008
[ 4216.237660]         ...
[ 4216.240099] Call Trace:
[ 4216.240204] [<8040ce84>] 0x8040ce84
[ 4216.246114] [<804a82ac>] 0x804a82ac
[ 4216.249585] [<8000a184>] 0x8000a184
[ 4216.253055] [<8000a18c>] 0x8000a18c
[ 4216.256532] [<803f0fc4>] 0x803f0fc4
[ 4216.260002] [<8008afd8>] 0x8008afd8
[ 4216.263471] [<8008afd8>] 0x8008afd8
[ 4216.266939] [<806deba4>] 0x806deba4
[ 4216.270407] [<800355cc>] 0x800355cc
[ 4216.273885] [<806deba4>] 0x806deba4
[ 4216.277356] [<800356c4>] 0x800356c4
[ 4216.280845] [<806deba4>] 0x806deba4
[ 4216.284351] [<843dc9cc>] 0x843dc9cc [mac80211@8f9e4cae+0x745b0]
[ 4216.290240] [<806de8a0>] 0x806de8a0
[ 4216.293712] [<806de8a0>] 0x806de8a0
[ 4216.297208] [<800ac528>] 0x800ac528
[ 4216.300682] [<800af7d8>] 0x800af7d8
[ 4216.304152] [<800ac7c4>] 0x800ac7c4
[ 4216.307623] [<806428ec>] 0x806428ec
[ 4216.311093] [<80093040>] 0x80093040
[ 4216.314571] [<800ac9e0>] 0x800ac9e0
[ 4216.318042] [<8008d98c>] 0x8008d98c
[ 4216.321512] [<8089f884>] 0x8089f884
[ 4216.324980] [<8040f090>] 0x8040f090
[ 4216.328453] [<8003a3c8>] 0x8003a3c8
[ 4216.331923] [<8040e8cc>] 0x8040e8cc
[ 4216.335401] [<80003c68>] 0x80003c68

[ 4216.340373] ---[ end trace 0f8d976d6b60321e ]---
[ 4216.344993] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 4216.351493] mtk_soc_eth 1e100000.ethernet eth0: Link is Down
[ 4216.384005] mtk_soc_eth 1e100000.ethernet eth0: configuring for fixed/trgmii link mode
[ 4216.392220] mtk_soc_eth 1e100000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off

Also, as I told before, something wrong with the FTP protocol. Copying from HDD is very slow and with transmitting timed-out errors. Uploading to a remote FTP server is buggy too.

P.S. Buggy VLAN patchset can be fixed with this patch.

I experienced this with Windows 10, but not with Linux (Ubuntu 20.04) with my r18809-5a0975f7ef build.
My build doesn't contain the IPv6 offloading support which hit master on Feb 19, 2022 as commit e316664.

:thinking:

Just checked, I didn't have IPv6 under Linux, but on Windows.

Try disabling IPv6 (or upgrade to a recent build, like me :sweat_smile: )!

I am running OpenWrt SNAPSHOT r18944-aae7af4219, so that should have that patch already included.

Archer C6 v3.2, IPv6 disabled here, main router. SW/HW offload enabled.

SNAPSHOT r18880-e9e61d76fd (from Feb 18th) running rock solid, uptime 15 days and counting (or at least until I have a power failure, it is not connected to an UPS).

3 Likes

With radios disabled is stable

I compared offloading code with BPI-R2 5.12 kernel code and took changes to ours.

--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -142,6 +142,29 @@
 }
 
 static int
+mtk_flow_mangle_ipv6(const struct flow_action_entry *act,
+		     struct mtk_flow_data *data)
+{
+	__be32 *dest = 0;
+	size_t offset_of_ip6_daddr = offsetof(struct ipv6hdr, daddr);
+	size_t offset_of_ip6_saddr = offsetof(struct ipv6hdr, saddr);
+	u32 idx;
+
+	if (act->mangle.offset >= offset_of_ip6_saddr && act->mangle.offset < offset_of_ip6_daddr) {
+		idx = (act->mangle.offset - offset_of_ip6_saddr) / 4;
+		dest = &data->v6.src_addr.s6_addr32[idx];
+	} else if (act->mangle.offset >= offset_of_ip6_daddr &&
+		   act->mangle.offset < offset_of_ip6_daddr + 16) {
+		idx = (act->mangle.offset - offset_of_ip6_daddr) / 4;
+		dest = &data->v6.dst_addr.s6_addr32[idx];
+	}
+	if (dest)
+		memcpy(dest, &act->mangle.val, sizeof(u32));
+
+	return 0;
+}
+
+static int
 mtk_flow_get_dsa_port(struct net_device **dev)
 {
 #if IS_ENABLED(CONFIG_NET_DSA)
@@ -306,6 +324,7 @@
 		mtk_flow_set_ipv4_addr(&foe, &data, false);
 	}
 
+
 	if (addr_type == FLOW_DISSECTOR_KEY_IPV6_ADDRS) {
 		struct flow_match_ipv6_addrs addrs;
 
@@ -329,6 +348,9 @@
 		case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
 			err = mtk_flow_mangle_ipv4(act, &data);
 			break;
+		case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
+			err = mtk_flow_mangle_ipv6(act, &data);
+			break;
 		case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 			/* handled earlier */
 			break;

Now the router is rock stable. I'm using static IPv4 and IPv6-6in4.
Not sure if these changes are the correct fix, but no more kernel panics.

3 Likes

Hi,

On 23.05.04 it crashes/reboots multiple times a day.

Speeds are nice 900/900+ over PPPoE but crashing too often. Yesterday it crashed 7 times.

This is my hardware:

Model	Ubiquiti EdgeRouter X
Architecture	MediaTek MT7621 ver:1 eco:3
Target Platform	ramips/mt7621
Firmware Version	OpenWrt 23.05.4 r24012-d8dd03c46f / LuCI openwrt-23.05 branch git-24.086.45142-09d5a38
Kernel Version	5.15.162

I don't have any debugging tools, can access it over SSH or web but that is all, I can't take it apart and extract stuff.

It does not matter if sw/hw offloading or packet steering is on or off it crashes like theres no tomorrow.

What am I missing here?

On 22.x it crashed several times a month, but now with 23.05.4 every day at least 5 times.

Thermal issues?
Rig fast.com to run for 300s and run nice nice yes >//dev/null & for each cpu core, then run long test and check if upper side of box feels hot with palm.

1 Like

Ran it (4 threads) for 60 mins without any problem.
It is warm but can hold it no problem.

No errors in dmesg

that + lemgthy network work...