Eth0: transmit timed out on ER-X with LEDE 17.01.7

Hi. I bought the ER-X 1 month ago and flashed it with own compiled LEDE 17.01.7 with Qualcomm Fast Path patch for Kernel 4.4 (Qualcomm Fast Path For LEDE). Since then almost every day I have these errors in the System Log:

Mon Nov 4 20:36:32 2019 kern.err kernel: [253216.881056] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Nov 4 20:36:32 2019 kern.info kernel: [253216.893553] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Mon Nov 4 20:36:32 2019 kern.info kernel: [253216.905712] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dc20000, max=512, ctx=338, dtx=338, fdx=337, next=338
Mon Nov 4 20:36:33 2019 kern.info kernel: [253216.927213] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f356000, max=512, calc=391, drx=392

What causes the connection to the router is lost a few seconds until it recovers. Other times it stays "fried", the router cannot be accessed, and there is no communication between the ethernet ports until I restart it. Also sometimes after restarting it within a few hours, I find this in the Kernel Log:

[39736.008429] ------------[ cut here ]------------
[39736.017665] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:306 dev_watchdog+0x258/0x2fc()
[39736.034476] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[39736.048328] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_pptp nf_nat_ipv4 nf_nat_amanda nf_conntrack_pptp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack_amanda ipt_REJECT ipt_MASQUERADE fast_classifier xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CT xt_CLASSIFY ts_kmp ts_fsm ts_bm slhc shortcut_fe_ipv6 shortcut_fe nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt act_connmark nf_conntrack act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_tbf sch_htb sch_hfsc sch_ingress ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables ifb tun nls_utf8 nls_iso8859_15 nls_cp852 nls_cp850 nls_cp437 nls_base leds_gpio gpio_button_hotplug
[39736.283575] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.182 #0
[39736.295519] Stack : 00000000 00000000 804f6882 00000034 00000000 00000000 804a0000 80510000
[39736.295519] 8fc4c02c 80499da3 80412674 00000003 00000000 804f367c ffffffff 00000200
[39736.295519] 00100000 80065e48 804a0000 80510000 8049e4b8 8049e4bc 80417038 8fc19df4
[39736.295519] 00000003 80063b94 ffffffff 00000200 00100000 00000000 00000006 00c19df4
[39736.295519] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[39736.295519] ...
[39736.366262] Call Trace:
[39736.371136] [<80016c74>] show_stack+0x54/0x88
[39736.379827] [<801c01f0>] dump_stack+0x84/0xbc
[39736.388502] [<8002d0c8>] warn_slowpath_common+0xa0/0xd0
[39736.398897] [<8002d124>] warn_slowpath_fmt+0x2c/0x38
[39736.408791] [<802d6dbc>] dev_watchdog+0x258/0x2fc
[39736.418159] [<80074f14>] call_timer_fn.isra.3+0x24/0x80
[39736.428552] [<8007516c>] run_timer_softirq+0x1fc/0x25c
[39736.438779] [<8002fd44>] __do_softirq+0x294/0x2e0
[39736.448129] [<8003002c>] irq_exit+0x78/0x94
[39736.456457] [<801ea760>] plat_irq_dispatch+0xb4/0xdc
[39736.466334] [<80005988>] except_vec_vi_end+0xb8/0xc4
[39736.476229] [<800137d0>] r4k_wait_irqoff+0x18/0x20
[39736.485778] [<8005fad0>] cpu_startup_entry+0x184/0x1ec
[39736.496006] [<8001b5f4>] start_secondary+0x404/0x434
[39736.505874]
[39736.508901] ---[ end trace bd1050f94fcc3b8f ]---

It seems like a problem with the switch. I don't know if the patch has anything to do with it, but since i have searched and although there are many with the same problem i discard this. Although there is no definitive solution, some say that disabling flow control is solved, but I cannot disable it with ethtool because it is not implemented in the driver. I tried to disable it in the equipment and switch connected to it, but i think there is still a port (eth0) with the flow control enabled because I cannot deactivate it in the FTTH ONT.

Does anyone know how I could fix it once and for all? Any patch for the Ethernet driver?

Thanks so much.

Do you see the same behavior on 19.07 or master?

Do you see the same behavior without the local patch applied?

Not yet, later I will install 17.01.7 without the patch and try a few days. If the error persists I will try the 19.07 snapshot. But I have no faith that it works, since there are different messages with the same problem even with 18.06.X:

https://patchwork.ozlabs.org/patch/806124/


https://dev.archive.openwrt.org/ticket/22139.html
http://lists.infradead.org/pipermail/openwrt-devel/2018-April/011939.html
https://www.mail-archive.com/lede-dev@lists.infradead.org/msg08327.html
https://lists.openwrt.org/pipermail/openwrt-devel/2017-August/008691.html
http://lists.infradead.org/pipermail/lede-bugs/2017-August/005181.html
http://lists.infradead.org/pipermail/lede-bugs/2018-June/008173.html
https://bugs.openwrt.org/index.php?do=details&task_id=1618&string=mtk_soc_eth+error+with+kernel+stacktrace+since&type[0]=&sev[0]=&pri[0]=&due[0]=&reported[0]=&cat[0]=&status[0]=open&percent[0]=&opened=&dev=&closed=&duedatefrom=&duedateto=&changedfrom=&changedto=&openedfrom=&openedto=&closedfrom=&closedto=&pagenum=3
http://lists.infradead.org/pipermail/lede-bugs/2018-January/006812.html
http://lists.infradead.org/pipermail/linux-mediatek/2017-November/011174.html


http://www.intercity-vpn.de/files/openwrt/mqmaker_r49276_dmesg.txt

Not all are ER-X, but same MT7621 SoC or MT7530 switch.

I found patch for kernel 4.14 in one of the previous posts: Is it possible that it is included in 19.07?
http://lists.infradead.org/pipermail/openwrt-devel/2019-October/019627.html

Thanks for repply

Checking OpenWrt Patchwork, if I found the right patch, it doesn't look like it has been reviewed at this time.

https://patchwork.ozlabs.org/project/openwrt/list/?series=139447

Thanks. I will try to patch 18.06.x, but first I will rule out if the problem was the Qualcomm patch. I report back.

EDIT: This patch has been successfully applied in 18.06.2. It's compiling while I try 17.01 without the patch.

1 Like

Well, for now it has been running without errors with official lede 17.01.7 for two days (without patches), and without the kmod-sched-core module that I have read that generates problems in the MT7621 soc and I had it in my compilation. I will wait 3 more days and compile again with the Qualcomm patch but without kmod-sched-core.

2 Likes

Any specific reason you went for 17.01.7 instead of the 18.06.4 / 19.07 or just building from the master? Available packages must be pretty outdated, as well as fewer kernel options.

On my ERX I've switched to building from the master branch pretty quickly since first using the stable 18.06.x, then snapshots. In terms of stability, performance, etc - no negatives to report. There was a snapshot in July I think that caused kernel panic when hardware flow offload was enabled, but it had since been fixed. I've recently switched default congestion control to BBR - it's been working quite well.

I prefer a stable version to update every bit. I also prefer 17.01.x to 18.06.x or the latter really for aesthetic reasons of LuCI. It may also consume more resources (ram). I don't like how the wan interface is shown on the overview page, nor the action buttons (apply, cancel, add, delete...) that are not filled, or the spacing between each line or elements (for example the routes and arp, connections, firewall rules, etc.) If I get 17.01.7 to work stable 24/7 all time I will continue with it for a long time.

17 is EOL and will receive no further updates (including no security updates, AFAIK). 18 is effectively in maintenance mode (nothing new here, nor many, if any bug fixes). 19 is already a thousand commits behind master and it hasn’t even been released yet.

LuCI can easily be installed on a snapshot, or added with the image builder.

After 4 days and 8 hours without any error in system log "transmit timed out", the router has become frozen again having to disconnect and connect it to the power, without being able to read the log. I will have to compile 19.07 or master branch.

In 18.06 I have the same kernel error. It is running now on 19.07 last snapshot. (Build OpenWrt 19.07-SNAPSHOT r10666-d3e11e8ad8 Nov 10 09:02:23 2019)

Fri Aug  7 18:38:30 2020 kern.err kernel: [102474.329315] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Fri Aug  7 18:38:30 2020 kern.info kernel: [102474.335592] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
Fri Aug  7 18:38:30 2020 kern.info kernel: [102474.341717] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0d6f0000, max=0, ctx=3892, dtx=3892, fdx=3891, next=3892
Fri Aug  7 18:38:30 2020 kern.info kernel: [102474.352764] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0b640000, max=0, calc=4050, drx=4053

Got an error in log file.
OpenWRT 19.07.3 (latest). Error is irregular, router can work few day or few weeks to catch this error.
Xiaomi R3G