MT7621 offloading reboots

Since I've enabled offloading, I'm getting frequent reboots, so I'm guessing there is some bug.

This is the trace:

Oct 27 07:17:44 jpmhome-router kernel: [103108.711880] CPU 3 Unable to handle kernel paging request at virtual address 009ced3a, epc == 8ed84f10, ra == 8ed85270

Oct 27 07:17:44 jpmhome-router kernel: [103108.722575] Oops[#1]:

Oct 27 07:17:44 jpmhome-router kernel: [103108.724929] CPU: 3 PID: 26454 Comm: kworker/3:1 Not tainted 4.14.63 #0

Oct 27 07:17:44 jpmhome-router kernel: [103108.731525] Workqueue: events_power_efficient 0x8ed85724 [nf_flow_table@8ed84000+0x32b0]

Oct 27 07:17:44 jpmhome-router kernel: [103108.739676] task: 8fc83e80 task.stack: 8fc58000

Oct 27 07:17:44 jpmhome-router kernel: [103108.744277] $ 0 : 00000000 00000001 fffffff5 00000000

Oct 27 07:17:44 jpmhome-router kernel: [103108.749576] $ 4 : 8fc59e10 0000000f 00000000 ffff00fe

Oct 27 07:17:44 jpmhome-router kernel: [103108.754873] $ 8 : 8fc59fe0 00007c00 00005dc6 0024224f

Oct 27 07:17:44 jpmhome-router kernel: [103108.760168] $12 : 00000000 00000898 00000924 80520000

Oct 27 07:17:44 jpmhome-router kernel: [103108.765465] $16 : 8edf6ac0 009ced0c 814bea40 8edf6a58

Oct 27 07:17:44 jpmhome-router kernel: [103108.770768] $20 : 00000000 00000000 80520000 8053f3c0

Oct 27 07:17:44 jpmhome-router kernel: [103108.776071] $24 : 00000010 80058a3c

Oct 27 07:17:44 jpmhome-router kernel: [103108.781371] $28 : 8fc58000 8fc59df8 8ed80000 8ed85270

Oct 27 07:17:44 jpmhome-router kernel: [103108.786670] Hi : 0000223d

Oct 27 07:17:44 jpmhome-router kernel: [103108.789619] Lo : 00000001

Oct 27 07:17:44 jpmhome-router kernel: [103108.792607] epc : 8ed84f10 0x8ed84f10 [nf_flow_table@8ed84000+0x32b0]

Oct 27 07:17:44 jpmhome-router kernel: [103108.799279] ra : 8ed85270 0x8ed85270 [nf_flow_table@8ed84000+0x32b0]

Oct 27 07:17:44 jpmhome-router kernel: [103108.805944] Status: 11007c03#011KERNEL EXL IE

Oct 27 07:17:44 jpmhome-router kernel: [103108.810203] Cause : 40800008 (ExcCode 02)

Oct 27 07:17:44 jpmhome-router kernel: [103108.814276] BadVA : 009ced3a

Oct 27 07:17:44 jpmhome-router kernel: [103108.817226] PrId : 0001992f (MIPS 1004Kc)

Oct 27 07:17:44 jpmhome-router kernel: [103108.821382] Modules linked in: pppoe ppp_async pppox ppp_generic nft_set_rbtree nft_set_hash nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject_bridge nft_reject nft_redir nft_quota nft_numgen nft_meta_bridge nft_meta nft_log nft_limit nft_exthdr nft_ct nft_counter nft_chain_route_ipv6 nft_chain_route_ipv4 nf_tables_ipv6 nf_tables_ipv4 nf_tables_inet nf_tables_bridge nf_tables nf_conntrack_ipv6 mt76x2e mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ums_usbat ums_sddr55 ums_sddr09 ums_karma

Oct 27 07:17:44 jpmhome-router kernel: [103108.891858] ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda ts_kmp ts_fsm ts_bm slhc nfnetlink nf_reject_ipv4 nf_nat_rtsp nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtsp nf_conntrack_rtcache nf_conntrack_ftp iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt compat sch_cake nf_conntrack sch_fq sch_teql em_nbyte sch_pie sch_gred act_police act_ipt sch_red sch_multiq sch_prio em_cmp em_meta em_text sch_codel sch_sfq cls_basic sch_dsmark act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress ledtrig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter

Oct 27 07:17:44 jpmhome-router kernel: [103108.962861] ip6_tables x_tables msdos ifb sit tunnel6 tunnel4 ip_tunnel tun vfat fat nls_utf8 nls_iso8859_1 nls_cp437 sha1_generic ecb usb_storage sdhci_pltfm sdhci uhci_hcd ohci_platform ohci_hcd f2fs ext4 jbd2 mbcache crc32c_generic crc32_generic mmc_block mtk_sd mmc_core leds_gpio xhci_mtk xhci_plat_hcd xhci_pci xhci_hcd ahci libahci libata sd_mod scsi_mod gpio_button_hotplug usbcore nls_base usb_common

Oct 27 07:17:44 jpmhome-router kernel: [103108.998971] Process kworker/3:1 (pid: 26454, threadinfo=8fc58000, task=8fc83e80, tls=00000000)

Oct 27 07:17:44 jpmhome-router kernel: [103109.007624] Stack : 8051e0a0 8f7eb800 8057cdc0 8051e1b8 814bedc0 80520000 8edf6a60 009ced0c

Oct 27 07:17:44 jpmhome-router kernel: [103109.016046] 0000001d 00000100 00000200 8f6afe00 0000000f 00000001 8edf6ac0 8f7eb800

Oct 27 07:17:44 jpmhome-router kernel: [103109.024467] 814bea40 814c1c00 00000000 00000000 80520000 fffffffe 80520000 8ed8573c

Oct 27 07:17:44 jpmhome-router kernel: [103109.032891] 00000000 00000000 80520000 8edf6ac0 8edf6ac0 80045578 814bebe0 814bea58

Oct 27 07:17:44 jpmhome-router kernel: [103109.041314] 80520000 814bebe0 80520000 fffffffe 8f7eb800 814bea40 8f7eb818 814bea58

Oct 27 07:17:44 jpmhome-router kernel: [103109.049732] ...

Oct 27 07:17:44 jpmhome-router kernel: [103109.052257] Call Trace:

Oct 27 07:17:44 jpmhome-router kernel: [103109.052378] [<8ed8573c>] 0x8ed8573c [nf_flow_table@8ed84000+0x32b0]

Oct 27 07:17:44 jpmhome-router kernel: [103109.061215] [<80045578>] 0x80045578

Oct 27 07:17:44 jpmhome-router kernel: [103109.064796] [<80045a70>] 0x80045a70

Oct 27 07:17:44 jpmhome-router kernel: [103109.068351] [<80065830>] 0x80065830

Oct 27 07:17:44 jpmhome-router kernel: [103109.071972] [<80045720>] 0x80045720

Oct 27 07:17:44 jpmhome-router kernel: [103109.075529] [<8004b5e8>] 0x8004b5e8

Oct 27 07:17:44 jpmhome-router kernel: [103109.079082] [<8004b4b8>] 0x8004b4b8

Oct 27 07:17:44 jpmhome-router kernel: [103109.082665] [<8004b4b8>] 0x8004b4b8

Oct 27 07:17:44 jpmhome-router kernel: [103109.086244] [<8004b4b8>] 0x8004b4b8

Oct 27 07:17:44 jpmhome-router kernel: [103109.089797] [<8000afd8>] 0x8000afd8

Oct 27 07:17:44 jpmhome-router kernel: [103109.093358]

Oct 27 07:17:44 jpmhome-router kernel: [103109.094924] Code: 00000000 100000d7 00000000 <9222002e> 144000d4 00000000 8e220078 3043000c 146000dd

Oct 27 07:17:44 jpmhome-router kernel: [103109.104735]

Oct 27 07:17:44 jpmhome-router kernel: [103109.106948] ---[ end trace fe4ce30bf081307d ]---


In another device, same SoC, same problem:


Oct 27 08:20:47 paco-router pppd[1662]: No response to 5 echo-requests

Oct 27 08:20:47 paco-router pppd[1662]: Serial link appears to be disconnected.

Oct 27 08:20:47 paco-router pppd[1662]: Connect time 9947.7 minutes.

Oct 27 08:20:47 paco-router pppd[1662]: Sent 4097994024 bytes, received 3091345733 bytes.

Oct 27 08:21:30 paco-router kernel: [596927.724469] INFO: rcu_sched self-detected stall on CPU

Oct 27 08:21:30 paco-router kernel: [596927.735037] #0113-...: (1 GPs behind) idle=792/140000000000001/0 softirq=24913280/24913281 fqs=1283

Oct 27 08:21:30 paco-router kernel: [596927.744454] INFO: rcu_sched detected stalls on CPUs/tasks:

Oct 27 08:21:30 paco-router kernel: [596927.744476] #0113-...: (1 GPs behind) idle=792/140000000000001/0 softirq=24913280/24913281 fqs=1283

Oct 27 08:21:30 paco-router kernel: [596927.744478] #011(detected by 1, t=6002 jiffies, g=8745603, c=8745602, q=4107)

Oct 27 08:21:30 paco-router kernel: [596927.744498] Sending NMI from CPU 1 to CPUs 3:

Oct 27 08:21:30 paco-router kernel: [596927.795709] #011 (t=6007 jiffies g=8745603 c=8745602 q=4107)

Oct 27 08:21:30 paco-router kernel: [596927.806769] rcu_sched kthread starved for 3442 jiffies! g8745603 c8745602 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

Oct 27 08:21:30 paco-router kernel: [596927.828443] rcu_sched I 0 8 2 0x00100000

Oct 27 08:21:30 paco-router kernel: [596927.828511] Stack : 80520000 87c3abc0 87c5de30 87c5de30 81121380 87c5de30 80520000 81121380

Oct 27 08:21:30 paco-router kernel: [596927.845540] 00000003 80085ccc 038e4ed0 81121380 81121380 87c5de30 80520000 81121380

Oct 27 08:21:30 paco-router kernel: [596927.862556] 00000003 8051e1b8 805264b0 80478c30 00000001 87c5de30 80520000 038e4ed0

Oct 27 08:21:30 paco-router kernel: [596927.879570] 81121380 8047bb58 00000003 80081dc8 80520000 00000000 87c5debc 00000001

Oct 27 08:21:30 paco-router kernel: [596927.896583] 00000000 8112141c 038e4ed0 8008621c 87c3abc0 04400001 00000001 805263a0

Oct 27 08:21:30 paco-router kernel: [596927.913597] ...

Oct 27 08:21:30 paco-router kernel: [596927.918699] Call Trace:

Oct 27 08:21:30 paco-router kernel: [596927.919090] [<80085ccc>] 0x80085ccc

Oct 27 08:21:30 paco-router kernel: [596927.931382] [<80478c30>] 0x80478c30

Oct 27 08:21:30 paco-router kernel: [596927.938623] [<8047bb58>] 0x8047bb58

Oct 27 08:21:30 paco-router kernel: [596927.945791] [<80081dc8>] 0x80081dc8

Oct 27 08:21:30 paco-router kernel: [596927.953028] [<8008621c>] 0x8008621c

Oct 27 08:21:30 paco-router kernel: [596927.960272] [<800823e4>] 0x800823e4

Oct 27 08:21:30 paco-router kernel: [596927.967569] [<80080000>] 0x80080000

Oct 27 08:21:30 paco-router kernel: [596927.974739] [<80080470>] 0x80080470

Oct 27 08:21:30 paco-router kernel: [596927.982215] [<80081e84>] 0x80081e84

Oct 27 08:21:30 paco-router kernel: [596927.989393] [<8004b5e8>] 0x8004b5e8

Oct 27 08:21:30 paco-router kernel: [596927.996567] [<8004b4b8>] 0x8004b4b8

Oct 27 08:21:30 paco-router kernel: [596928.003853] [<8004b4b8>] 0x8004b4b8

Oct 27 08:21:30 paco-router kernel: [596928.011150] [<8004b4b8>] 0x8004b4b8

Oct 27 08:21:30 paco-router kernel: [596928.018316] [<8000afd8>] 0x8000afd8

Oct 27 08:21:30 paco-router kernel: [596928.025530]

Oct 27 08:21:30 paco-router kernel: [596931.181252] NMI backtrace for cpu 3

Oct 27 08:21:30 paco-router kernel: [596931.188446] CPU: 3 PID: 28066 Comm: kworker/3:0 Not tainted 4.14.63 #0

Oct 27 08:21:30 paco-router kernel: [596931.201671] Workqueue: events 0x8022f148

Oct 27 08:21:30 paco-router kernel: [596931.209704] Stack : 00000006 00000000 80520000 87179c00 00000000 00000000 00000000 00000000

Oct 27 08:21:30 paco-router kernel: [596931.226714] 00000000 00000000 00000000 00000000 00000000 00000001 87c15cd8 532616de

Oct 27 08:21:30 paco-router kernel: [596931.243723] 87c15d70 00000000 00000000 000069d0 00000038 80476918 00000005 00000000

Oct 27 08:21:30 paco-router kernel: [596931.260734] 00000000 80520000 000313c7 00000000 87c15cb8 00000000 80540000 00000000

Oct 27 08:21:30 paco-router kernel: [596931.277747] 00000003 8051e0ac 000000e0 80520000 00000000 80290678 0000000c 8058000c

Oct 27 08:21:30 paco-router kernel: [596931.294761] ...

Oct 27 08:21:30 paco-router kernel: [596931.299862] Call Trace:

Oct 27 08:21:30 paco-router kernel: [596931.300118] [<80476918>] 0x80476918

Oct 27 08:21:30 paco-router kernel: [596931.312416] [<80290678>] 0x80290678

Oct 27 08:21:30 paco-router kernel: [596931.319656] [<80010040>] 0x80010040

Oct 27 08:21:30 paco-router kernel: [596931.326828] [<80010048>] 0x80010048

Oct 27 08:21:30 paco-router kernel: [596931.333995] [<8045f8cc>] 0x8045f8cc

Oct 27 08:21:30 paco-router kernel: [596931.341161] [<800706e4>] 0x800706e4

Oct 27 08:21:30 paco-router kernel: [596931.348403] [<804667d4>] 0x804667d4

Oct 27 08:21:30 paco-router kernel: [596931.355633] [<8000cf90>] 0x8000cf90

Oct 27 08:21:30 paco-router kernel: [596931.362811] [<8000cf90>] 0x8000cf90

Oct 27 08:21:30 paco-router kernel: [596931.369976] [<804668c4>] 0x804668c4

Oct 27 08:21:30 paco-router kernel: [596931.377143] [<8047c824>] 0x8047c824

Oct 27 08:21:30 paco-router kernel: [596931.384442] [<80083cc8>] 0x80083cc8

Oct 27 08:21:30 paco-router kernel: [596931.391612] [<80083bdc>] 0x80083bdc

Oct 27 08:21:30 paco-router kernel: [596931.398797] [<80083124>] 0x80083124

Oct 27 08:21:30 paco-router kernel: [596931.406102] [<80086694>] 0x80086694

Oct 27 08:21:30 paco-router kernel: [596931.413281] [<8009710c>] 0x8009710c

Oct 27 08:21:30 paco-router kernel: [596931.420583] [<803135c0>] 0x803135c0

Oct 27 08:21:30 paco-router kernel: [596931.427818] [<80345d68>] 0x80345d68

Oct 27 08:21:30 paco-router kernel: [596931.434993] [<80076b90>] 0x80076b90

Oct 27 08:21:30 paco-router kernel: [596931.442242] [<80070f60>] 0x80070f60

Oct 27 08:21:30 paco-router kernel: [596931.449411] [<8024a190>] 0x8024a190

Oct 27 08:21:30 paco-router kernel: [596931.456576] [<8024a03c>] 0x8024a03c

Oct 27 08:21:30 paco-router kernel: [596931.463754] [<8024a1fc>] 0x8024a1fc

Oct 27 08:21:30 paco-router kernel: [596931.470926] [<80070f60>] 0x80070f60

Oct 27 08:21:30 paco-router kernel: [596931.478099] [<8047ccd4>] 0x8047ccd4

Oct 27 08:21:30 paco-router kernel: [596931.485272] [<802491cc>] 0x802491cc

Oct 27 08:21:30 paco-router kernel: [596931.492469] [<8000b4e8>] 0x8000b4e8

Oct 27 08:21:30 paco-router kernel: [596931.499623]

Just to confirm, after some testing, software offloading doesn't seems to have this problem.

1 Like

I started seeing wifi issues after I enabled hardware offloading for the firewall with an MT7621 with MT7612E+ MT7603E. Seemed that after 36 hours the 5GHz wifi dropped out. I'm going to revert the change then observe just to be sure. Have you seen anything similar or is it more of a watchdog rebooting the system?

In my case, I'm not using the WiFi in that box, and the traces don't show anything about the WiFi modules, so I don't think this is happening to me, but not 100% sure.

You could try just with SW offloading instead of the HW one.

1 Like

I just disabled HW offloading and wifi is working, but it's going to take at least 24 hours for the onset of the supposed effects.

1 Like

Also, could you post the .dts file of your device?

Not very scientific but: 2 days later and no wifi failures.

1 Like

I have the same problem on edgerouter er-x.
https://bugs.openwrt.org/index.php?do=details&task_id=1956

In my case there are however hard lockups. I have to turn off/on the power to reboot the router.

I just experienced what you are describing and I had hardware offloading off. Wireless went down, and no link light when plugging in a cable. It had been up for between 36 and 48 hours. Seems like the watchdog didn't reboot the device in this case.

Same problem here : https://bugs.openwrt.org/index.php?do=details&task_id=2157

Perhaps you’d like to post the solution publicly for the benefit of everyone and so that the solution becomes searchable?

3 Likes

It is a bug, it stopped happening in a recent firmware upgrade, but in the last one it happens again. I've already reported it.

This continues to be a problem for me in recent firmwares once every day or so. I posted a crashlog in this active bug, since the other one was closed: FS#3538 : ramips-mt7621: CPU 3 Unable to handle kernel paging request at virtual address (openwrt.org)

You may try 21.02

I just tried the latest 21.02 snapshot build and HW offloading is still broken … as soon as I enable it, several websites load slowly

There is an issue in offload code, try to apply this commit.

1 Like

the file needed to be patched is
610-v5.13-33-net-ethernet-mtk_eth_soc-add-flow-offloading-support.patch