Mtk_soc_eth watchdog timeout after r11573

When is this patch hitting Master? Was a pull request already created?

I am wondering if Archer C7 v2 has similar interrupt handling issues, there’s also a lot of ERRs in the interrupt output.

What do you mean by 'without if condition', were you meaning modify the patch by eliminate the added 'if' line in routine fe_poll_tx?

BTW, I am compiling master branch with this patch you referred to, and will start testing tomorrow.

Just saw those errors on my Archer C7v3. Guess I need to swap it out...

1 Like

No, i am talking about this patch. This disable Flow Control. https://github.com/openwrt/mt76/issues/211#issuecomment-569944489. Remove all conditional and leave alone:

/ * (GE1, Force 1000M / FD, FC OFF, MAX_RX_LENGTH 1536) * /
mtk_switch_w32 (gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
mt7530_mdio_w32 (gsw, 0x3600, 0x5e30b);

you don't need to apply this patch, just edit the file at ./target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c

I can't get zero interrupt ERRs, with hw offloading on at least.

           CPU0       CPU1       CPU2       CPU3
  8:     225649     225616     225626     225614  MIPS GIC Local   1  timer
  9:      10551          0          0          0  MIPS GIC  63  IPI call
 10:          0       3335          0          0  MIPS GIC  64  IPI call
 11:          0          0      10829          0  MIPS GIC  65  IPI call
 12:          0          0          0       3295  MIPS GIC  66  IPI call
 13:    3403753          0          0          0  MIPS GIC  67  IPI resched
 14:          0      96741          0          0  MIPS GIC  68  IPI resched
 15:          0          0      35819          0  MIPS GIC  69  IPI resched
 16:          0          0          0      27239  MIPS GIC  70  IPI resched
 19:         14          0          0          0  MIPS GIC  33  ttyS0
 20:          0          0          0          0  MIPS GIC  29  xhci-hcd:usb1
 21:    1205402          0          0          0  MIPS GIC  10  1e100000.ethernet
 22:          2          0          0          0  MIPS GIC  30  gsw
 23:          2          0      58652          0  MIPS GIC  11  mt76x2e
 24:     206807          0          0          0  MIPS GIC  31  mt76x2e
 26:          0          0          0          0      GPIO   7  keys
 27:          0          0          0          0      GPIO  18  keys
ERR:          7

Just run it for less than 40 minutes.

Surely those interruption errors are due to the wifi interface.

Anyway, have you applied both patches? (220-mt7621-disable-flow-control and OpenWrt-Devel-PATCHv2-2-2-ramips-ethernet-fix-to-interrupt-handling)?

Copy patches to root of build path and apply with the commands:

patch -p1 < 220-mt7621-disable-flow-control.patch
patch -p1 < OpenWrt-Devel-PATCHv2-2-2-ramips-ethernet-fix-to-interrupt-handling.patch

If you are a building master, you can skip the first one.

I have uploaded both patches: https://www.mediafire.com/file/kzcmkazpsntny0b/openwrt_patches.zip/file

I am using the master branch, in which the other patch you were referring to already included.

BTW, what’s your use case, is it without the WiFi?

Ubiquiti Edgerouter X, no WiFi.

OK, I will have to test it with my DIR-860L B1 for a longer time to see if the timeout issues gone with this patch, then.

12 days without errors and the router was restarted 1 hour ago. I have updated the ER-X bootloader (it had the factory version) to see if this fixed. In addition, that bootloader had a very serious security issue, at boot the switch ports communicated with each other until the system boots.

1 Like

Hello, I am wondering if these patches will make it to master?

Hard to say in which way it will be adopted.

Now the developers are busy transitioning to the next major release which is based on Linux 5.4 kernel, for my understanding it’s a whole new story than just patches.

Best hope is, this particular patch, as critical as is, will be in 19.07.x branch eventually.

Is there a PR open for this fix?

comment for mark

So far I've got positive result with the interrupt handle patch, first of all, no mtk_soc_eth timeout spotted. The interrupt ERRs is much less than before:

           CPU0       CPU1       CPU2       CPU3
  8:   12476688   12476653   12476661   12476652  MIPS GIC Local   1  timer
  9:      59060          0          0          0  MIPS GIC  63  IPI call
 10:          0      18597          0          0  MIPS GIC  64  IPI call
 11:          0          0      39702          0  MIPS GIC  65  IPI call
 12:          0          0          0      11681  MIPS GIC  66  IPI call
 13:     152046          0          0          0  MIPS GIC  67  IPI resched
 14:          0    1925474          0          0  MIPS GIC  68  IPI resched
 15:          0          0     207644          0  MIPS GIC  69  IPI resched
 16:          0          0          0     422116  MIPS GIC  70  IPI resched
 19:         12          0          0          0  MIPS GIC  33  ttyS0
 20:          0          0          0          0  MIPS GIC  29  xhci-hcd:usb1
 21:   96750942          0          0          0  MIPS GIC  10  1e100000.ethernet
 22:          2          0          0          0  MIPS GIC  30  gsw
 23:          2          0    3003252          0  MIPS GIC  11  mt76x2e
 24:   11949047          0          0          0  MIPS GIC  31  mt76x2e
 26:          0          0          0          0      GPIO   7  keys
 27:          0          0          0          0      GPIO  18  keys
ERR:        567

for 1 day, 10:42.

For me the patch is not working now. In just 24 hours it has already given the first error:

Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.027642] ------------[ cut here ]------------
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.036889] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
Tue Mar 17 19:22:53 2020 kern.info kernel: [84611.053381] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.067251] Modules linked in: pppoe ppp_async pppox ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.211023]  nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 tun nls_utf8 nls_iso8859_15 nls_cp852 nls_cp850 nls_cp437 nls_base leds_gpio gpio_button_hotplug
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.285045] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.167 #0
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.297176] Stack : 00000000 8fd90f40 80580000 8007265c 805a0000 80546510 00000000 00000000
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.313819]         80512100 8fc0fdc4 8fc3cffc 805808e7 8050cef0 00000001 8fc0fd68 53261646
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.330461]         00000000 00000000 806e0000 00004530 00000000 000000ec 00000008 00000000
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.347103]         00000000 80580000 00045975 00000000 00000000 805a0000 00000000 80540718
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.363743]         80370050 00000140 00000003 8fd90f40 00000000 80299210 0000000c 806e000c
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.380389]         ...
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.385263] Call Trace:
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.390157] [<8000c7b0>] show_stack+0x58/0x100
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.399028] [<8044f8c4>] dump_stack+0xa4/0xe0
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.407715] [<8002f5f8>] __warn+0xe0/0x138
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.415873] [<8002f680>] warn_slowpath_fmt+0x30/0x3c
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.425767] [<80370050>] dev_watchdog+0x1ac/0x324
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.435159] [<8008932c>] call_timer_fn.isra.25+0x24/0x84
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.445756] [<800895e8>] run_timer_softirq+0x1bc/0x248
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.456014] [<8046d770>] __do_softirq+0x128/0x2ec
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.465399] [<80033f84>] irq_exit+0xac/0xc8
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.473755] [<8024c1c0>] plat_irq_dispatch+0xfc/0x138
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.483820] [<80007588>] except_vec_vi_end+0xb8/0xc4
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.493711] [<80008f50>] r4k_wait_irqoff+0x1c/0x24
Tue Mar 17 19:22:53 2020 kern.warn kernel: [84611.503392] ---[ end trace 81b0755d3220520a ]---
Tue Mar 17 19:22:53 2020 kern.err kernel: [84611.512630] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Tue Mar 17 19:22:53 2020 kern.info kernel: [84611.524982] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Tue Mar 17 19:22:53 2020 kern.info kernel: [84611.537041] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e990000, max=0, ctx=3103, dtx=3103, fdx=3091, next=3103
Tue Mar 17 19:22:53 2020 kern.info kernel: [84611.558726] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e030000, max=0, calc=3134, drx=3135
Tue Mar 17 19:22:53 2020 kern.info kernel: [84611.580051] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5b60000c, 0x10c = 0x80818

It remains to try with another power adapter. The router drain a maximum of 5W and the adapter is 6W (12V 0.5A). Maybe it's not enough. I will try with 1A.

You could try disabling the wifi and see if it still errors.

Here comes the bad news, the router and the whole lan seemed disconnected for a while, just like the symptom of the mtk_soc_eth timeout. When I checked with logread, however, I didn't found any suspicious logs there.

It is quite disappointing since I was sure it was so close for us to get a stable OpenWRT firmware for the DIR-860L B1.

There is a lot of discussion going on in this pull request:

I think we will have to wait and see if this fixes things once and for all.