Possible Kernel 5.10 regression issue with MT7621 and SW/HW offload enabled

The same seems to be happening also for MT7622 devices. We initially thought only MT7622 was affected and now I'm finding this thread basically telling me that on MT7621 it's the same, just most users probably won't even notice the occasional reboot or blame it on the electricity grid or whatever.
See Belkin RT3200/Linksys E8450 WiFi AX discussion - #1375 by Mushoz for a full crash-log (poisoned pointer dereference), maybe anyone has an idea where that poisoned pointer comes from (the hardware? or is it a use-after-free thing?) and what we should do to prevent a crash in this case.

5 Likes

Hardware flow offlading with the 5.4 kernel is not fully supported as it does not work with PPPoE and VLANs. Kernel 5.10 restored that support, but introduced instability. Maybe that's the culprit?

@daniel @dsouza I'm using a firmware that was supposed to fix this issue; the version I have is with mixed results, but points the way to a fix.

MT7621 - 21.02/Master feedback firmware image test - IPV6 offload and disabled Flow Control - For Developers - OpenWrt Forum

1 Like

Well, I can say that my experience with kernel 5.10 has not been good.

With mt7621 and HW Offload enabled, I got random reboots. And I am using only IPv4, so I have not tried the IPv6 patch above. I have four Archer C6 v3.2, one router plus 3 access points. All APs are running custom "slim" builds which does not include firewall, iptables, nor dnsmasq. With the AP build kernel 5.10 is more stable, so I believe the issue is related to firewall/iptables with Kernel 5.10.

However, I also have a TP-Link RE305 v3 (mt7628) which is also running the "slim" build without firewall/iptables/dnsmasq. On this device with Kernel 5.10, JFFS2 is getting corrupted everyday and OpenWRT fails with different problems due to the corrupted overlay file system.

So therefore at least in my setup (four mt7621 and one mt7628 devices) Kernel 5.10 is really unstable and unreliable at this point. For now I reverted all devices back to the latest snapshot build with kernel 5.4 and everything is rock solid.

It is still in my plans to test Kernel 5.10 with firewall4/nftables with HW offload acceleration in the device running as router, but due to the reasons above I will roll back to previous builds with kernel 5.4 until snapshot builds are stable again.

2 Likes

I can give a try with 5.10 kernel and firewall4/nftables.
Actually snapshots have 5.10.92 kernel, an update will come soon. I'll just wait for it, maybe tomorrow.
I'll use a Netgear R6220 with SW/HW offload. I don't have IPv6 on the WAN side.

1 Like

Well, I just tried a snapshot I did today and it bricked my device. I just connected a UART cable, as you can see it is using Kernel 5.10.92 and just hangs at "Starting kernel ..." with no further error.

I will now try to recover this device and definitively I will stay away from Kernel 5.10 on the Archer C6 v3 devices until it is included in stable build... :slightly_frowning_face:

U-Boot 1.1.3 (May 13 2020 - 19:39:06)

Board: Ralink APSoC DRAM:  128 MB
relocate_code Pointer at: 87f58000

Config XHCI 40M PLL
flash manufacture id: c8, device id 40 18
find flash: GD25Q128C
*** Warning - bad CRC, using default environment

============================================
Ralink UBoot Version: 5.0.0.0
--------------------------------------------
ASIC MT7621A DualCore (MAC to MT7530 Mode)
DRAM_CONF_FROM: Auto-Detection
DRAM_TYPE: DDR3
DRAM bus: 16 bit
Xtal Mode=3 OCP Ratio=1/3
Flash component: SPI Flash
Date:May 13 2020  Time:19:39:06
============================================
THIS IS uboot
icache: sets:256, ways:4, linesz:32 ,total:32768
dcache: sets:256, ways:4, linesz:32 ,total:32768

 ##### The CPU freq = 880 MHZ ####
 estimate memory size =128 Mbytes

Press '4' or 't' to break the booting process

Press 'x' to enter recovery web server                                        0
nm_init:791
nm_initFwupPtnStruct:276
nm_lib_readPtnTable:738
[NM_Debug](nm_lib_readPtnTable) 00743: NM_PTN_TABLE_BASE = 0xfe0000
[NM_Debug](nm_lib_readPtnFromNvram) 00569: partition_used_len = 1054, requried l                                                          en = 8192
[NM_Debug](nm_lib_readPtnTable) 00751: Reading Partition Table from NVRAM ... OK

[NM_Debug](nm_lib_readPtnTable) 00759: Parsing Partition Table ... OK

[NM_Debug](nm_lib_readPtnFromNvram) 00569: partition_used_len = 2, requried len                                                           = 2
factory boot check integer ok.


3: System Boot system code via Flash.
## Booting image at bc040000 ...
   Image Name:   MIPS OpenWrt Linux-5.10.92
   Image Type:   MIPS Linux Kernel Image (lzma compressed)
   Data Size:    2731273 Bytes =  2.6 MB
   Load Address: 82000000
   Entry Point:  82000000
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
No initrd
## Transferring control to Linux (at address 82000000) ...
## Giving linux memsize in MB, 128

Starting kernel ...



OK, recovery was easier than I expected (opening the device and connecting the UART was the "hard" part, the rest was easy).

After I successfully recovered the device, I wanted to be sure the issue was not with my build. So I downloaded today's snapshot image from OpenWRT website and the results are the same as above. So be aware Archer C6 v3 users (as well as A6 v3) that as of January 30rd 2022 the snapshot builds will brick your device (and a UART connection will be required to recover it).

Lol (?) ...
Ok maybe I'll wait for next snapshot. I'm flashing a snapshot 2 or 3 times a week for the mt7621 and it has always worked. I know that snapshot are not garantied to work, for so far I had no real issue. Today's snapshot is still based on 5.10.92. I'll wait for the next based on 5.10.95.

Notice that this issue might affect only my device (Archer C6 v3 and A6 v3) and not all mt7621 devices.

So even with kernel 5.10.92 which is problematic in my experience you may have better luck with a different device.

BTW, due to a complete lack of error messages I am assuming the kernel is the issue (since it hung while loading), but it might be something else in this build.

Please continue the debate here (and maybe help by trying the possible fix) as this problem is not related to SW/HW offload (which is the topic of this thread):

1 Like

I bricked my A6v3 2 days ago. Can you tell me how to recover via UART connection?

@AashishAS, let's continue the discussion about unbricking Archer C6/A6 v3.x in the new topic below:

I just tested a new snapshot build today (r18710-dc2da6a233, kernel 5.10.96). Things got worse:

  1. With the now default firewall4, enabling HW flow offload breaks the firewall and all connected devices lose internet connectivity. Basically firewall4 is not able to recognize the flag option flow_offloading_hw '1' in the firewall config file. More details here.

  2. Redoing the build but selecting firewall3 instead, enabling HW flow offload now has no effect. I've monitored the CPU usage (medium/high) during a heavy download, and even with HW flow offload enabled the CPU usage is the same as disabled. With firewall3 and kernel 5.4 HW flow offload works perfectly and the CPU usage is minimum.

Once again rolled back to snapshot r18324-794e8123ce (kernel 5.4.162), the last snapshot build that has HW offload working and stable.

2 Likes

Just tried on a R6220 with Feb3 OpenWrt SNAPSHOT r18717-0e32c6baf3 (kernel 5.10.96)
Firewall is firewall4 (nftables).

basic settings : 610 Mbit/s
SW offloading : 630 Mbits/s
SW/HW Offloading : 600 Mbit/s
All results are very close and offloading doesn't seem to be active.

When I click on status/firewall I have no answer, and apparently no firewall process in system/processes.

BTW I have a symetric 1Gbit/s fiber which I normally use with a x86 router.

1 Like

You need reboot router now

Hardware Offload have many incompatibilities:

Vlan stp stats....

For future I recommend a big CPU without Offload

I'm waiting for something with 2.5Gbe for now

Are you sure that you followed the thread ? :laughing:
HW offloading is implemented in mt7621. It was working with previous kernels (even early 5.10), so there is a regression in the actual code. I'm just testing from time to time in order to help @dsouza by comparing with another device than his.

My main router is an x86/64. I have this precisely for its CPU power and no offloading :wink:

1 Like

Was just general recommendations for everyone :laughing:

Current HW Offload in snapshot is broken here too

1 Like

If you run grep OFFLOAD /proc/net/nf_conntrack do you see established connections that are supposedly offloaded at least?

Software offloaded conntrack entries should carry an [OFFLOAD] flag, hardware offloaded ones a [HW_OFFLOAD] flag.

Besides moving kernel from 5.4 to 5.10, the current snapshot builds also moved from firewall3 (iptables) to firewall4 (nftables).

HW offload for mt7621 is implemented only in iptables. But iptables with hardware acceleration has compatibility issues with Kernel 5.10 (current kernel version in snapshot, original post of this thread).

Bottom line is that mt7621 HW offload is currently broken in the snapshot builds with no solution in foreseeable future. Even doing a build with Kernel 5.10 and iptables, HW offload does not seem to work anymore.

For this reason I am using the last snapshot build with kernel 5.4 and iptables. With this configuration HW offload is running rock solid on an Archer C6 v3.2.

root@MI-R3P:~# grep OFFLOAD /proc/net/nf_conntrack
ipv6     10 tcp      6 src=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 dst=2a00:1450:4001:080e:0000:0000:0000:200e sport=45253 dport=443 packets=9 bytes=3850 src=2a00:1450:4001:080e:0000:0000:0000:200e dst=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 sport=443 dport=45253 packets=9 bytes=2190 [OFFLOAD] mark=0 zone=0 use=3
ipv6     10 tcp      6 src=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 dst=2a03:b0c0:0003:00d0:0000:0000:168b:9001 sport=43944 dport=443 packets=25 bytes=3405 src=2a03:b0c0:0003:00d0:0000:0000:168b:9001 dst=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 sport=443 dport=43944 packets=29 bytes=20045 [OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.1.42 dst=142.251.39.14 sport=43619 dport=443 packets=2 bytes=459 src=142.251.39.14 dst=EXTERNAL_IP sport=443 dport=43619 packets=0 bytes=0 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv6     10 tcp      6 src=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 dst=2a00:1450:4001:0810:0000:0000:0000:2004 sport=42176 dport=443 packets=18 bytes=4470 src=2a00:1450:4001:0810:0000:0000:0000:2004 dst=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 sport=443 dport=42176 packets=23 bytes=8309 [OFFLOAD] mark=0 zone=0 use=3
ipv6     10 tcp      6 src=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 dst=2a00:1450:4001:0810:0000:0000:0000:2004 sport=42112 dport=443 packets=31 bytes=2969 src=2a00:1450:4001:0810:0000:0000:0000:2004 dst=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 sport=443 dport=42112 packets=34 bytes=64203 [OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.1.42 dst=142.250.186.97 sport=53099 dport=443 packets=14 bytes=3076 src=142.250.186.97 dst=EXTERNAL_IP sport=443 dport=53099 packets=1 bytes=60 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.1.42 dst=108.177.127.113 sport=42794 dport=443 packets=21 bytes=9218 src=108.177.127.113 dst=EXTERNAL_IP sport=443 dport=42794 packets=6 bytes=449 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.1.42 dst=216.58.212.142 sport=47108 dport=443 packets=12 bytes=2091 [UNREPLIED] src=216.58.212.142 dst=EXTERNAL_IP sport=443 dport=47108 packets=0 bytes=0 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 udp      17 src=192.168.1.51 dst=52.29.246.211 sport=60304 dport=8765 packets=0 bytes=0 src=52.29.246.211 dst=EXTERNAL_IP sport=8765 dport=60304 packets=0 bytes=0 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv6     10 tcp      6 src=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 dst=2a00:1450:4001:0829:0000:0000:0000:2004 sport=44781 dport=443 packets=4 bytes=252 src=2a00:1450:4001:0829:0000:0000:0000:2004 dst=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 sport=443 dport=44781 packets=2 bytes=132 [OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.1.42 dst=142.251.39.14 sport=43763 dport=443 packets=2 bytes=104 src=142.251.39.14 dst=EXTERNAL_IP sport=443 dport=43763 packets=1 bytes=52 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 udp      17 src=192.168.1.60 dst=52.8.35.72 sport=16385 dport=1812 packets=0 bytes=0 [UNREPLIED] src=52.8.35.72 dst=EXTERNAL_IP sport=1812 dport=16385 packets=0 bytes=0 [HW_OFFLOAD] mark=0 zone=0 use=3
ipv4     2 tcp      6 src=192.168.1.20 dst=50.7.248.218 sport=2325 dport=82 packets=2 bytes=104 src=50.7.248.218 dst=EXTERNAL_IP sport=82 dport=2325 packets=1 bytes=60 [OFFLOAD] mark=0 zone=0 use=3
ipv6     10 tcp      6 src=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 dst=2a00:1450:4001:080e:0000:0000:0000:200e sport=45249 dport=443 packets=4 bytes=252 src=2a00:1450:4001:080e:0000:0000:0000:200e dst=2a01:00d0:e6c6:0000:aceb:d180:6f87:3a33 sport=443 dport=45249 packets=2 bytes=132 [OFFLOAD] mark=0 zone=0 use=3

My own build with nftables/firewall4. Working without issues whole the day.