Mtk_soc_eth watchdog timeout after r11573

There are definitely no devices and services on the network that use any type of VPN.
If VPN was used, then the entries in the LOG would be permanent.
This is a one-time error occurring a few minutes before error 320.
Obviously connected with overloading of some of the blocks or buses of SOC , or wrong timings / buffers.

PS. I playing around CPU/OCP/SYS devider by bootstrap resistor.
880/293/220 is more stable then frequency 880/220/220 MHz.
Bandwidth of OCP bus is matter.

The hangs and reboots issues came back after four months stable, and I can't find a way to make it work fine again.

While it was working correctly, it was using 19.07.1 with GMAC Port 5 FC Off and Interrupt Handling Patch (https://patchwork.ozlabs.org/project/openwrt/patch/20190306040846.21746-1-rosenp@gmail.com/). Each Ethernet port in a different VLAN (or with more than one). In this way it was 100% stable, but I made the changes below and since then I have had problems with hangs and reboots when squeezing the connection:

  1. I updated to 19.07.4.

  2. I added another managed switch in cascade to port eth1 where I connected the machines that were on ports eth2, eth3 and eth4. Thus I eliminated the need to use software-bridge between the different VLANs of each port. I disabled the ports eth2, eth3 and eth4 (they do not belong to any VLAN). Ports eth0 (WAN) and eth1 (LAN) remain unchanged, VLAN2 and VLAN1 respectively.

  3. I reconnected the original power supply, while it was stable I had another one in use to rule out problems.

I have tried to revert this changes to the previous functional one, but keeping the Switch and not connecting anything to the ports eth2, eth3 and eth4 (although they have a VLAN assigned, and belong to a software-bridge). The same problem continues, if I use the WAN intensively, the communication between the ethernet ports is lost and the router cannot be accessed, or it is directly rebooted. The syslog is clean, no transmit timed out or kernel crash.

I can not find a logical explanation, having tried to leave it rolled (almost) as before. Actually the only change right now is that there is nothing connected to ports eth2, eth3 and eth4. I already had a Switch connected to eth1, I just added another one in cascade.

I will compile 19.07.1 without Interrupt Handling Patch, which was reverted in branch 19.07 for alleged problems. But I remember that without that patch, the logs were flooded with transmit timed out errors on a daily basis.

I have tried the same compilation and exactly the same configuration (from a backup). I have removed the new Switch, leaving EVERYTHING exactly as it was 3 months ago working. It keeps getting hanging. I completely deny. I'm not going to spend another second of my time on this. I'm going to sell the router and will never buy something with a "ShitTek" chip again.

Please try 19.07.5

I've tried it, clean install. Same behavior.

Only left to try EdgeOS from Ubiquiti to rule out a hardware problem.

I have temporarily configured syslog in flash, to see what happens when the Switch hangs. And this is what I found after manual reboot.

Mon Dec 14 10:18:19 2020 kern.warn kernel: [  270.132232] br-lan: received packet on eth0.1 with own address as source address (addr:80:2a:a8:xx:xx:xx, vlan:0)
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.141007] ------------[ cut here ]------------
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.150244] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
Mon Dec 14 10:18:26 2020 kern.info kernel: [  277.166739] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.180658] Modules linked in: pppoe ppp_async pppox ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_rtsp nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtsp nf_conntrack_rtcache
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.324460]  nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 tun nls_utf8 nls_iso8859_15 nls_cp852 nls_cp850 nls_cp437 nls_base leds_gpio gpio_button_hotplug
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.455322] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.167 #0
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.467450] Stack : 00000000 8fead540 80580000 8007265c 805a0000 80546510 00000000 00000000
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.484085]         80512100 8fc0bdc4 8fc3c33c 805808e7 8050cef0 00000001 8fc0bd68 53261646
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.500718]         00000000 00000000 806e0000 00004490 00000000 000000e7 00000008 00000000
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.517350]         00000000 80580000 0006f29a 00000000 00000000 805a0000 00000000 80540718
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.533982]         80370050 00000140 00000001 8fead540 00000000 80299210 00000004 806e0004
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.550613]         ...
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.555475] Call Trace:
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.560366] [<8000c7b0>] show_stack+0x58/0x100
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.569229] [<8044f8c4>] dump_stack+0xa4/0xe0
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.577905] [<8002f5f8>] __warn+0xe0/0x138
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.586052] [<8002f680>] warn_slowpath_fmt+0x30/0x3c
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.595938] [<80370050>] dev_watchdog+0x1ac/0x324
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.605316] [<8008932c>] call_timer_fn.isra.25+0x24/0x84
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.615883] [<800895e8>] run_timer_softirq+0x1bc/0x248
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.626126] [<8046d770>] __do_softirq+0x128/0x2ec
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.635491] [<80033f84>] irq_exit+0xac/0xc8
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.643829] [<8024c1c0>] plat_irq_dispatch+0xfc/0x138
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.653880] [<80007588>] except_vec_vi_end+0xb8/0xc4
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.663755] [<80008f50>] r4k_wait_irqoff+0x1c/0x24
Mon Dec 14 10:18:26 2020 kern.warn kernel: [  277.673449] ---[ end trace 77cac57743d47a00 ]---
Mon Dec 14 10:18:26 2020 kern.err kernel: [  277.682707] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:18:26 2020 kern.info kernel: [  277.695106] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:18:26 2020 kern.info kernel: [  277.707162] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ed50000, max=0, ctx=3950, dtx=3761, fdx=3761, next=3950
Mon Dec 14 10:18:26 2020 kern.info kernel: [  277.728879] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0dcf0000, max=0, calc=391, drx=392
Mon Dec 14 10:18:27 2020 kern.info kernel: [  278.164010] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:18:27 2020 kern.info kernel: [  278.178416] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:18:27 2020 kern.info kernel: [  278.203701] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:18:38 2020 kern.err kernel: [  289.218283] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:18:38 2020 kern.info kernel: [  289.230629] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:18:38 2020 kern.info kernel: [  289.242623] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dcf0000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:18:38 2020 kern.info kernel: [  289.263289] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f0b0000, max=0, calc=3079, drx=3080
Mon Dec 14 10:18:38 2020 kern.info kernel: [  289.687138] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:18:38 2020 kern.info kernel: [  289.701548] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:18:38 2020 kern.info kernel: [  289.726294] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:18:48 2020 kern.err kernel: [  299.136375] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:18:48 2020 kern.info kernel: [  299.148711] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:18:48 2020 kern.info kernel: [  299.160707] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f140000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:18:48 2020 kern.info kernel: [  299.181370] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f3d0000, max=0, calc=1581, drx=1582
Mon Dec 14 10:18:48 2020 kern.info kernel: [  299.605233] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:18:48 2020 kern.info kernel: [  299.619649] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:18:48 2020 kern.info kernel: [  299.644538] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:18:58 2020 kern.err kernel: [  309.134745] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:18:58 2020 kern.info kernel: [  309.147088] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:18:58 2020 kern.info kernel: [  309.159066] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eb20000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:18:58 2020 kern.info kernel: [  309.179711] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0d330000, max=0, calc=2066, drx=2067
Mon Dec 14 10:18:58 2020 kern.info kernel: [  309.603558] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:18:58 2020 kern.info kernel: [  309.617975] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:18:58 2020 kern.info kernel: [  309.642755] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:19:08 2020 kern.err kernel: [  319.133366] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:19:08 2020 kern.info kernel: [  319.145733] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:19:08 2020 kern.info kernel: [  319.157718] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ba80000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:19:08 2020 kern.info kernel: [  319.178390] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0b840000, max=0, calc=2076, drx=2077
Mon Dec 14 10:19:08 2020 kern.info kernel: [  319.602157] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:19:08 2020 kern.info kernel: [  319.616612] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:19:08 2020 kern.info kernel: [  319.641355] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:19:18 2020 kern.err kernel: [  329.132170] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:19:18 2020 kern.info kernel: [  329.144511] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:19:18 2020 kern.info kernel: [  329.156508] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eac0000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:19:18 2020 kern.info kernel: [  329.177172] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0d040000, max=0, calc=2073, drx=2074
Mon Dec 14 10:19:18 2020 kern.info kernel: [  329.600996] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:19:18 2020 kern.info kernel: [  329.615412] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:19:18 2020 kern.info kernel: [  329.640025] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:19:28 2020 kern.err kernel: [  339.131152] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:19:28 2020 kern.info kernel: [  339.143492] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:19:28 2020 kern.info kernel: [  339.155492] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f250000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:19:28 2020 kern.info kernel: [  339.176157] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f1a0000, max=0, calc=2071, drx=2072
Mon Dec 14 10:19:28 2020 kern.info kernel: [  339.599992] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:19:28 2020 kern.info kernel: [  339.614404] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:19:28 2020 kern.info kernel: [  339.638988] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:19:38 2020 kern.err kernel: [  349.130285] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:19:38 2020 kern.info kernel: [  349.142628] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:19:38 2020 kern.info kernel: [  349.154628] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f180000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:19:38 2020 kern.info kernel: [  349.175289] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f240000, max=0, calc=2092, drx=2093
Mon Dec 14 10:19:38 2020 kern.info kernel: [  349.599117] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:19:38 2020 kern.info kernel: [  349.613538] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:19:38 2020 kern.info kernel: [  349.638386] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:19:48 2020 kern.err kernel: [  359.129526] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:19:48 2020 kern.info kernel: [  359.141867] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:19:48 2020 kern.info kernel: [  359.153848] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0d390000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:19:48 2020 kern.info kernel: [  359.174460] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f1b0000, max=0, calc=2068, drx=2069
Mon Dec 14 10:19:48 2020 kern.info kernel: [  359.598366] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:19:48 2020 kern.info kernel: [  359.612779] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:19:48 2020 kern.info kernel: [  359.637611] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:19:58 2020 kern.err kernel: [  369.128902] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:19:58 2020 kern.info kernel: [  369.141239] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:19:58 2020 kern.info kernel: [  369.153240] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0f380000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:19:58 2020 kern.info kernel: [  369.173905] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0f240000, max=0, calc=2063, drx=2066
Mon Dec 14 10:19:58 2020 kern.info kernel: [  369.597778] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:19:58 2020 kern.info kernel: [  369.612214] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:19:58 2020 kern.info kernel: [  369.636904] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:20:08 2020 kern.err kernel: [  379.128371] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:20:08 2020 kern.info kernel: [  379.140715] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:20:08 2020 kern.info kernel: [  379.152714] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0c3d0000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:20:08 2020 kern.info kernel: [  379.173389] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0b9d0000, max=0, calc=2093, drx=2094
Mon Dec 14 10:20:08 2020 kern.info kernel: [  379.597206] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:20:08 2020 kern.info kernel: [  379.611623] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:20:08 2020 kern.info kernel: [  379.636491] mtk_soc_eth 1e100000.ethernet: PPE started
Mon Dec 14 10:20:18 2020 kern.err kernel: [  389.127904] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
Mon Dec 14 10:20:18 2020 kern.info kernel: [  389.140243] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
Mon Dec 14 10:20:18 2020 kern.info kernel: [  389.152242] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0b9f0000, max=0, ctx=3072, dtx=0, fdx=0, next=3072
Mon Dec 14 10:20:18 2020 kern.info kernel: [  389.172908] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0d040000, max=0, calc=2072, drx=2073
Mon Dec 14 10:20:18 2020 kern.info kernel: [  389.596757] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x3c60180c, 0x10c = 0x80818
Mon Dec 14 10:20:18 2020 kern.info kernel: [  389.611171] mtk_soc_eth 1e100000.ethernet: reset pse
Mon Dec 14 10:20:18 2020 kern.info kernel: [  389.636110] mtk_soc_eth 1e100000.ethernet: PPE started

I thought it was not giving any transmit timed out error while working, only gives it when it freezes and couldn't see the syslog in ram after a reboot. I had never seen the error of the first line where a packet is received from the router itself (?).

It seems that transmit timed out repeats every 10 seconds until i reboot the router.

I have left everything exactly as it was a few months ago, when I had no reboots or crashes. But I get nothing. Also i tried patch from Mushoz (Mt7621 / mt7530 programming: Disabling Flow Control on all ports) to disable pause frames advertisement but without positive result.

I can recreate this situation quickly, just using iperf between LAN - WAN (or two routing interfaces) with no more than 20 threads is enough ...

The truth is that I do not know if the problem has always been there. With a "normal" use of the connection does not happen (even with P2P or large downloads). I can only crash it with iperf.

Hi apocalypse,

There have been cases of dodgy lan cables or mixed match duplex upstream causing this issue. What is this connected to? upstream connection?

You can do a 'cat /proc/interrupts' -- share the output
Also do a 'swconfig dev switch0 show' - share output

Look for RX and TX Drops and pauses, this indicates connectivity issues

mine looks like this

8:    9161096    9161063    9161076    9161070  MIPS GIC Local   1  timer
  9:    1748981          0          0          0  MIPS GIC  63  IPI call
 10:          0     324527          0          0  MIPS GIC  64  IPI call
 11:          0          0    1172221          0  MIPS GIC  65  IPI call
 12:          0          0          0    3753511  MIPS GIC  66  IPI call
 13:     894782          0          0          0  MIPS GIC  67  IPI resched
 14:          0     541507          0          0  MIPS GIC  68  IPI resched
 15:          0          0    4282221          0  MIPS GIC  69  IPI resched
 16:          0          0          0    3149794  MIPS GIC  70  IPI resched
 19:         14          0          0          0  MIPS GIC  33  ttyS0
 20:          0          0          0          0  MIPS GIC  29  xhci-hcd:usb1
 21:       5558   10587240          0          0  MIPS GIC  10  1e100000.ethernet
 22:          4          0          0          0  MIPS GIC  30  gsw
 24:        155          0    9884089      94597  MIPS GIC  31  mt7603e
 25:        259      75997          0   21094596  MIPS GIC  32  mt76x2e
ERR:        493

You can see the ERR stats are low for me, because im using a new NAPI polling function for the wifi chips. And spreading the interrupts across the cores, as per the manufacturer script does.

What router are you using?

Iperf will stress the switch stack, and show possible issues.

Router is ER-X. Connection is FTTH connected to port eth0 (WAN) and managable Switch in eth1 (LAN1).

VLAN config very simple, to rule out problems.

WAN port: VLAN 2, untagged
LAN port: VLAN 1, tagged
CPU: VLAN 1 & 2, tagged

All other ports are not members of any VLAN.

I think none of cables are bad. Anyway, I have only tried connecting the ONT and the server with iperf, doing the test to WAN from 4G network in laptop. And the same thing happens. The cables I have tested are pre-fabricated. I have already ruled out all that, the problem is the router hardware or the damn driver of this SoC.

/proc/interrupts show 0 ERR (You can see some post above, a capture where it had 33 days of uptime and without errors).

swconfig show 0 count to all TxDrop, RxDrop, TxCol, TxPause or RxPause of all ports. I can't tell at kernel crash time if those values have increased.

There are many posts and many people with the same problem and nobody comes up with a definitive solution. We only speculate that if it is due to Flow Control, that if it is because of having more than one port in the same VLAN ... patches here and patches there ... but in the end nothing will solve it.

I have switched to snapshot master branch. It seems the problem is over. I've been running iperf tests for an hour or so and it won't hangs or rebooted.

well, this is why I urged upstream to switch to the upstream DSA driver instead of the buggy swconfig driver.

Is my problem same?
Xiaomi Mi Router 3G v1
OpenWrt 19.07.5 r11257-5090152ae3 / LuCI openwrt-19.07 branch git-20.341.57626-51f55b5

[126334.567493] ------------[ cut here ]------------
[126334.572215] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 0x8038d150
[126334.579345] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[126334.586357] Modules linked in: xt_tcpmss xt_statistic xt_recent xt_length xt_hl xt_helper xt_ecn xt_dscp xt_connmark xt_connlimit xt_connbytes xt_HL xt_DSCP xt_CLASSIFY iptable_raw ipt_ECN xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink wireguard ip6_udp_tunnel udp_tunnel pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 mt76x2e mt76x2_common mt76x02_lib mt7603e mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG
[126334.657158]  xt_FLOWOFFLOAD xt_CT slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd gpio_button_hotplug usbcore nls_base usb_common
[126334.702236] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.209 #0
[126334.708396] Stack : 00000000 00000000 00000000 8ffb6f40 00000000 00000000 00000000 00000000
[126334.716820]         00000000 00000000 00000000 00000000 00000000 00000001 8fc0fd60 1cc28231
[126334.725238]         8fc0fdf8 00000000 00000000 00007550 00000038 8049da98 00000008 00000000
[126334.733676]         00000000 80550000 000ab71c 70617773 8fc0fd40 00000000 00000000 8050c4d4
[126334.742135]         8038d150 00000140 00000003 8ffb6f40 00000008 802ae190 0000000c 806b000c
[126334.750586]         ...
[126334.753134] Call Trace:
[126334.753155] [<8049da98>] 0x8049da98
[126334.759239] [<8038d150>] 0x8038d150
[126334.762804] [<802ae190>] 0x802ae190
[126334.766359] [<8000c1a0>] 0x8000c1a0
[126334.769930] [<8000c1a8>] 0x8000c1a8
[126334.773494] [<804868d4>] 0x804868d4
[126334.777077] [<80071c80>] 0x80071c80
[126334.780647] [<8002e798>] 0x8002e798
[126334.784241] [<8038d150>] 0x8038d150
[126334.787825] [<8002e820>] 0x8002e820
[126334.791400] [<800552b8>] 0x800552b8
[126334.794987] [<8038d150>] 0x8038d150
[126334.798570] [<80099b90>] 0x80099b90
[126334.802143] [<8038cfa4>] 0x8038cfa4
[126334.805722] [<80088738>] 0x80088738
[126334.809319] [<8005f3e4>] 0x8005f3e4
[126334.812897] [<800889f4>] 0x800889f4
[126334.816464] [<80079328>] 0x80079328
[126334.820041] [<804a4898>] 0x804a4898
[126334.823616] [<80033164>] 0x80033164
[126334.827182] [<8025b4c0>] 0x8025b4c0
[126334.830810] [<80007488>] 0x80007488
[126334.834430] 
[126334.836155] ---[ end trace ab978da42a027320 ]---
[126334.840893] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[126334.847153] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[126334.853336] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0eff0000, max=0, ctx=3101, dtx=3101, fdx=3100, next=3101
[126334.864291] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ed40000, max=0, calc=3661, drx=3662
[126334.876845] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[126334.889645] mtk_soc_eth 1e100000.ethernet: PPE started

It looks like it is. I recommend that you try the latest snapshot. You will need to do a clean install, the network settings are not compatible.

or just delete /etc/config/network.

1 Like

It is so simple that it cannot be incompatible. Or I'm wrong?

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fd06:fec6:fbad::/48'

config interface 'lan'
        option type 'bridge'
        option ifname 'eth0.1'
        option proto 'static'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option ipaddr '192.168.0.1'

config device 'lan_eth0_1_dev'
        option name 'eth0.1'
        option macaddr '34:ce:00:6c:3c:ae'

config interface 'wan'
        option ifname 'eth0.2'
        option proto 'dhcp'
        option delegate '0'
        option metric '1'
        option macaddr '18:D6:C7:53:B8:CB'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option ports '2 3 6t'

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option ports '1 6t'

config interface 'wg2'
        option proto 'wireguard'
        option delegate '0'
        list addresses '________'
        option private_key '____'
        option metric '11'

config wireguard_wg2
        option public_key '_____________'
        option endpoint_port '_____'
        list allowed_ips '0.0.0.0/0'
        list allowed_ips '________'
        option route_allowed_ips '1'
        option endpoint_host '___________'
        option persistent_keepalive '25'

Yes. The conversion of swconfig to DSA requires it. Backup your wireguard config first.

1 Like

Thank you.
My plan is this: delete the /etc/config/network file, reboot (will the router set the default configuration?) And connect with a cable and do the settings again?

Yes it will regenerate

Delete /etc/config/network, without reboot upgrade to latest snapshot (keeping settings). And when it has been updated and restarted, the new config will regenerate.

If you restart after delete network file and before upgrade, the config generated will still be incompatible with snapshot. The network/switch driver it's completely different, and ethernet devices has another names.

I already have 19 07 5, need to download sysupgrade again? Or i miss something :sweat_smile:

19.07.x is the last stable branch with outdated driver. Snapshot are builds from master branch. This branch includes the new DSA driver. You can download it from here: https://downloads.openwrt.org/snapshots/targets/ramips/mt7621/

CAUTION: These builds do not include luci. You must install it from ssh.

opkg update
opkg install luci

1 Like