Eth1 (mtk_soc_eth): transmit queue 1 timed out #13122

While using the router Zyxel EX5601-T0 I randomly encountered the following problem:
The Ethernet switch driver (mtk_soc_eth) stops working, for no reason (the crash log is attached in the Actual behavior section).
The following problem seems to be the same as this Issue: #12143

This problem blocks the normal use of the router as well as the functionality of the Ethernet ports.

OpenWrt version

r23551-e21b4c9636

OpenWrt target/subtarget

mediatek/filogic

Device

Zyxel EX5601-T0

Image kind

Official downloaded image

Steps to reproduce

The mentioned issue is randomly encountered, I encountered the issue twice during an active upload stream (for example live video stream) a month apart.

Actual behaviour

KERNEL LOG:

Sun Jul 16 20:31:12 2023 kern.warn kernel: [328426.306756] ------------[ cut here ]------------
Sun Jul 16 20:31:12 2023 kern.info kernel: [328426.311461] NETDEV WATCHDOG: eth1 (mtk_soc_eth): transmit queue 1 timed out
Sun Jul 16 20:31:12 2023 kern.warn kernel: [328426.318517] WARNING: CPU: 2 PID: 0 at dev_watchdog+0x330/0x33c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.324427] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e mt76_connac_lib mt76 mac80211 cfg80211 slhc nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc_ccitt compat crypto_safexcel sha1_generic seqiv md5 des_generic libdes authencesn authenc leds_gpio gpio_button_hotplug
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.383717] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.15.120 #0
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.389879] Hardware name: Zyxel EX5601-T0 (DT)
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.394478] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.401505] pc : dev_watchdog+0x330/0x33c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.405586] lr : dev_watchdog+0x330/0x33c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.409665] sp : ffffffc008c3bdb0
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.413049] x29: ffffffc008c3bdb0 x28: 0000000000000140 x27: 00000000ffffffff
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.420252] x26: 0000000000000000 x25: 0000000000000002 x24: ffffff800085a4c0
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.427454] x23: 0000000000000000 x22: 0000000000000001 x21: ffffffc008af6000
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.434656] x20: ffffff800085a000 x19: 0000000000000001 x18: ffffffc008b0a338
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.441858] x17: ffffffc0372cf000 x16: ffffffc008c38000 x15: 00000000000005b8
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.449060] x14: 00000000000001e8 x13: ffffffc008c3bad8 x12: ffffffc008b62338
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.456262] x11: 712074696d736e61 x10: ffffffc008b62338 x9 : 0000000000000000
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.463464] x8 : ffffffc008b0a2e8 x7 : ffffffc008b0a338 x6 : 0000000000000001
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.470666] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.477867] x2 : ffffff803fdad080 x1 : ffffffc0372cf000 x0 : 000000000000003f
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.485071] Call trace:
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.487591]  dev_watchdog+0x330/0x33c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.491326]  call_timer_fn.constprop.0+0x20/0x80
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.496014]  __run_timers.part.0+0x208/0x284
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.500354]  run_timer_softirq+0x38/0x70
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.504347]  _stext+0x10c/0x28c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.507559]  __irq_exit_rcu+0xdc/0xfc
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.511295]  irq_exit+0xc/0x1c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.514423]  handle_domain_irq+0x60/0x8c
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.518420]  gic_handle_irq+0x50/0x120
Sun Jul 16 20:31:12 2023 kern.debug kernel: [328426.522244]  call_on_irq_stack+0x20/0x34
...
...
...

Actual behaviour

Using the following configuration:

  • 2.5 Gbps Fiber ONT is connected on the ETH1 (wan) port (ISP: TIM, Italy).
  • A PPoE connection is established through the Fiber ONT on the following router.
  • All my devices are wired through the Ethernet ports.
  • All client network devices lost connection to DHCP server.
  • ssh, ping, telnet all got no response from the router.

Upon crashing, all Ethernet ports stop responding: not allowing me to access the OpenWRT GUI via numeric IP/local DNS name.
I can only access the OpenWRT GUI from the Wi-Fi interface.
(The bug affects the mtk_soc_eth ethernet switch).

To restore full system functionality, I had to completely reboot the router.

Expected behaviour

The network connectivity should remain stable, and the router should not experience timeouts/crash or loss of LAN access.

Additional info

The issue occurs randomly and is not reproducible consistently.
The log indicates a timeout in the transmit queue of eth1, which is related to the Mtk_soc_eth driver.
Searching the web with the keyword "(mtk_soc_eth): transmit queue" I came across several similar issues.
Except for this problem, the router has never presented any instability problems

I should add that I use Stubby for DoT configuration, I don't think it is the cause of the problem.

I have created an issue on GitHub, but I wanted to elaborate more here on the issue, and how it is still not understood.