Mtk_soc_eth watchdog timeout after r11573

How i can create de pull? when I try it if it really works I will

I've sent it to the mailing list. Just as most patches, they take forever to land upstream.

I have no push access to the OpenWrt repo.

I'm building 18.06.6 but cannot apply the patch. "Hunk # 1 FAILED", "Hunk # 1 succeeded at 98", "Patch failed! Please fix ./patches-4.14/220-mt7621-disable-flow-control.patch".

I have manually edit the gsw_mt7621.c file with the changes. I don't fix the patch because i don't have experience with them. Compiling...

Response from IRC is to try this if condition instead:

 if (ralink_soc == MT762X_SOC_MT7621AT) {

Sounds bogus to me but ¯_(ツ)_/¯

Sounds bogus to me too.

I'm running my ER-X with the patch. In 15 days will report back.

Only 15 hours running and kernel crash and one more timed out with this patch. I'm starting to hate openwrt and i'm thinking back to EdgeOS.

Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.115113] ------------[ cut here ]------------
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.124343] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
Fri Jan 31 23:32:37 2020 kern.info kernel: [52231.140800] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.154680] Modules linked in: pppoe ppp_async pppox ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.298454]  nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables tun nls_utf8 nls_iso8859_15 nls_cp852 nls_cp850 nls_cp437 nls_base leds_gpio gpio_button_hotplug
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.370342] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.162 #0
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.382457] Stack : 00000000 8ff49240 805a0000 80070278 805d0000 80567b20 00000000 00000000
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.399091]         8053360c 8fc0fdc4 8fc3cffc 805a2947 8052e638 00000001 8fc0fd68 53261643
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.415727]         00000000 00000000 80600000 00004690 00000000 000000ee 00000008 00000000
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.432354]         00000000 805a0000 0005a6a6 00000000 00000000 805d0000 00000000 8037b920
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.448982]         00000009 00000140 00000003 8ff49240 00000000 8029e618 0000000c 8060000c
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.465609]         ...
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.470465] Call Trace:
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.475348] [<800106a0>] show_stack+0x58/0x100
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.484193] [<8046f074>] dump_stack+0xa4/0xe0
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.492858] [<8002e958>] __warn+0xe0/0x114
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.501000] [<8002e9bc>] warn_slowpath_fmt+0x30/0x3c
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.510891] [<8037b920>] dev_watchdog+0x1ac/0x324
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.520266] [<80087394>] call_timer_fn.isra.3+0x24/0x84
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.530655] [<800875b0>] run_timer_softirq+0x1bc/0x248
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.540887] [<8048c758>] __do_softirq+0x128/0x2ec
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.550250] [<800330d4>] irq_exit+0xac/0xc8
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.558573] [<80254c4c>] plat_irq_dispatch+0xfc/0x138
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.568620] [<8000b5e8>] except_vec_vi_end+0xb8/0xc4
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.578493] [<8000cfb0>] r4k_wait_irqoff+0x1c/0x24
Fri Jan 31 23:32:37 2020 kern.warn kernel: [52231.588183] ---[ end trace c3836cca1bde30a8 ]---
Fri Jan 31 23:32:37 2020 kern.err kernel: [52231.597384] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Fri Jan 31 23:32:37 2020 kern.info kernel: [52231.609704] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Fri Jan 31 23:32:37 2020 kern.info kernel: [52231.621695] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e980000, max=0, ctx=123, dtx=123, fdx=122, next=123
Fri Jan 31 23:32:37 2020 kern.info kernel: [52231.642660] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e020000, max=0, calc=2057, drx=2058
Fri Jan 31 23:32:37 2020 kern.info kernel: [52231.663879] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
Fri Jan 31 23:32:37 2020 kern.info kernel: [52231.683626] mtk_soc_etc 1e100000.ethernet: PPE started

Sat Feb  1 01:09:04 2020 kern.err kernel: [58018.076394] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Sat Feb  1 01:09:04 2020 kern.info kernel: [58018.088733] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Sat Feb  1 01:09:04 2020 kern.info kernel: [58018.100737] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0e020000, max=0, ctx=1452, dtx=1452, fdx=1451, next=1452
Sat Feb  1 01:09:04 2020 kern.info kernel: [58018.122400] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0c580000, max=0, calc=1274, drx=1275
Sat Feb  1 01:09:04 2020 kern.info kernel: [58018.143655] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5a60000c, 0x10c = 0x80818
Sat Feb  1 01:09:04 2020 kern.info kernel: [58018.163488] mtk_soc_eth 1e100000.ethernet: PPE started

I know the feeling.

The initial patch is correct. I send it upstream. Just wait for it to land.

I have compiled with that patch and that is the log. Still the same. Also this morning has restarted

I can't apply this patch automatically:

diff -u a/drivers/net/ethernet/mediatek/gsw_mt7621.c b/drivers/net/ethernet/mediatek/gsw_mt7621.c
--- a/drivers/net/ethernet/mediatek/gsw_mt7621.c
2019-11-25 14:14:15.152253091 +0500
+++ b/drivers/net/ethernet/mediatek/gsw_mt7621.c
2019-12-30 15:31:52.119516791 +0500
@@ -98,8 +98,8 @@
  mt7530_mdio_w32(gsw, 0x7000, 0x3);
  usleep_range(10, 20);
 
- if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) {
- /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */
+ if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) >= 0x0101) {
+ /* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */
  mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
  mt7530_mdio_w32(gsw, 0x3600, 0x5e30b);
  } else {

I get this error when compiling:

"Hunk # 1 FAILED", "Hunk # 1 succeeded at 98", "Patch failed! Please fix ./patches-4.14/220-mt7621-disable-flow-control.patch".

That's why i manually edited gsw_mt7621.c replacing the following according to the patch:

if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) { to if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) >= 0x0101) {

And
/* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ to /* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */

Is this correct? Still it doesn't fix kernel crash error, timed outs and reboots...

Your modifications seems okay. It should work with no more crashes.

But keep crashing...

Perhaps this is a workaround for this bug?:

5 days on 18.6.07 with patch, and:

4 Feb 09:23 installed
5 Feb 04:38 First Kernel Crash
8 feb no more kernel crash from 5 feb
9 Feb 19:51 router has restarted (I could not see the syslog before)

I will try to edit the patch, having both conditions configure the switch in the same way. Maybe the version of my mtk is less than that of the condition (> = 0x0101) and has the same FC bug.

The patch would look like this:

if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) >= 0x0101) {
	/* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */
	mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
	mt7530_mdio_w32(gsw, 0x3600, 0x5e30b);
} else {
	/* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */
	mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
	mt7530_mdio_w32(gsw, 0x3600, 0x5e30b);
}

If you're going to do that, there's no point in the if condition.

I know, it's just to try with little modification. If it works and does not give kernel crash again, I will remove the if condition and leave it alone:

mtk_switch_w32 (gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
mt7530_mdio_w32 (gsw, 0x3600, 0x5e30b);

This patch does not work from the beginning. It is an aesthetic change rather than a functional one. The first modification is the comparator (from == to >=) of the condition, to verify that the id of the chip is equal to or greater than 0x0101, but all MT7621 are 0x0101 therefore there is no effective change. Second, change a typography from FC ON to FC OFF, but it's just a comment line.

In order to really deactivate the FC, I believe that one of the following hexadecimal values should be modified:

mtk_switch_w32 (gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR);
mt7530_mdio_w32 (gsw, 0x3600, 0x5e30b);

I do not know how.

I found configuration bits on gsw_mt7620.h

#define MAC_MCR_MAX_RX_2K BIT (29)
#define MAC_MCR_IPG_CFG (BIT (18) | BIT (16))
#define MAC_MCR_FORCE_MODE BIT (15)
#define MAC_MCR_TX_EN BIT (14)
#define MAC_MCR_RX_EN BIT (13)
#define MAC_MCR_BACKOFF_EN BIT (9)
#define MAC_MCR_BACKPR_EN BIT (8)
#define MAC_MCR_FORCE_RX_FC BIT (5)
#define MAC_MCR_FORCE_TX_FC BIT (4)
#define MAC_MCR_SPEED_1000 BIT (3)
#define MAC_MCR_FORCE_DPX BIT (1)
#define MAC_MCR_FORCE_LINK BIT (0)

If we convert the configuration of both conditions to binary we obtain the following:

if configuration:
0x2305e30b = 100011000001011110001100001011

else configuration:
0x2305e33b = 100011000001011110001100111011

If we look at the definition of each bit in the previous file, bits 4 and 5 are the ones that configure Flow Control for RX and TX.

Looking at the binary values of the if and else condition (bit 0 to 29, from right to left) we see that FC is disabled in the if, but not on else.

So the patch does disable flow control (at least for rev id 0x0101 and higher), assuming that the id detection is correct. The way to ensure that FC deactivation is applied is to eliminate the if condition.

Anyway, i think it only disables it between the switch and the cpu, but not on the individual ports. FC may also not be the only thing that generates kernel crash.

It's a non upstream NIH driver.

If you really want to fix this long term, switch to the upstream driver.

Whats is upstream driver? I'm using default git repo driver ./target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c. But in that mention includes gsw_mt7620.h and that is where i found what each bit of the switch configuration means, nothing else.

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/net/ethernet/mediatek?h=next-20200210

Requires a recent kernel.