What I am thinking is that CPU pointer overtakes the DMA pointer and get stuck.
This may can happen when DMA is slowdown by PAUSE frames.
But I tried to reproduce it seems not that easily to do.
I was hoping reducing the DMA size to 32 packets may trigger this timeout quicker.
But it doesn't.
That is my theory.
I added a bit more debug output to the timeout().
Also the DMA poll_tx() checks if the DMA done bit is set.
Also reduces the dma_size 4096 to 256 entries to reduce the debug output when a timeout hits.
BTW: my code also enables SFP port on ubiquity ER-X-SFP.
A crash now looks like this.
I added these two lines. First show all the DMA done bits status of every entry.
fe_tx_timeout: 00: 0x00000000 0x00000000
Set cpu pointer behind dma. idx = 15
Full log.
[ 34.529713] ------------[ cut here ]------------
[ 34.538936] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324
[ 34.555397] NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out
[ 34.569244] Modules linked in: pppoe ppp_async nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_set_rbtree nft_set_hash nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir_ipv4 nft_redir nft_quota nft_objref nft_numgen nft_nat nft_meta nft_masq_ipv4 nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_exthdr nft_ct nft_counter nft_chain_route_ipv6 nft_chain_route_ipv4 nft_chain_nat_ipv4 nf_tables_ipv6 nf_tables_ipv4 nf_tables_inet nf_tables nf_conntrack_ipv6 mt76x2e mt76x2_common mt76x02_lib mt76 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD wireguard ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot
[ 34.710711] ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda slhc nfnetlink nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat i2c_gpio i2c_algo_pca i2c_algo_bit gpio_pca953x i2c_dev ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ip6_udp_tunnel udp_tunnel uas mmc_block usb_storage mtk_sd mmc_core leds_gpio xhci_plat_hcd xhci_pci xhci_mtk xhci_hcd dwc3 sd_mod scsi_mod gpio_button_hotplug usbcore nls_base usb_common
[ 34.833658] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.169 #0
[ 34.845770] Stack : 00000000 8fea7040 ffffffff 80079724 80610000 805aaf94 00000000 00000000
[ 34.862399] 80575178 8fc0bdcc 8fc3c70c 805e4a47 8056fb80 00000001 8fc0bd70 53261648
[ 34.879028] 00000000 00000000 80b90000 00000000 80b88500 000000d7 00000007 00000000
[ 34.895656] 00000000 00000000 000cb87a ffffffff 00000000 80610000 00000000 803c2b50
[ 34.912284] 805a511c 00000140 00000001 8fea7040 00000000 802e7618 00000004 80b80004
[ 34.928913] ...
[ 34.933771] Call Trace:
[ 34.938648] [<8000c4d4>] show_stack+0x58/0x100
[ 34.947491] [<804a8274>] dump_stack+0xa4/0xe0
[ 34.956167] [<80031db0>] __warn+0xe0/0x140
[ 34.964310] [<800319f4>] warn_slowpath_fmt+0x30/0x3c
[ 34.974182] [<803c2b50>] dev_watchdog+0x1ac/0x324
[ 34.983543] [<800913ac>] call_timer_fn.isra.28+0x24/0x84
[ 34.994105] [<8009170c>] run_timer_softirq+0x1bc/0x248
[ 35.004326] [<804c5898>] __do_softirq+0x128/0x2e8
[ 35.013686] [<800367b0>] irq_exit+0xa8/0xc4
[ 35.022007] [<8029a4c4>] plat_irq_dispatch+0xf0/0x13c
[ 35.032051] [<800074c8>] except_vec_vi_end+0xb8/0xc4
[ 35.041921] [<80008e48>] r4k_wait_irqoff+0x1c/0x24
[ 35.051594] ---[ end trace 8c2028659346c630 ]---
[ 35.060790] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 35.073091] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000065
[ 35.085049] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dfca000, max=16, ctx=1, dtx=1, fdx=1, next=1
[ 35.104784] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0da80000, max=0, calc=57, drx=58
[ 35.122277] fe_tx_timeout: 00: 0x00000000 0x00000000
[ 35.132149] Set cpu pointer behind dma. idx = 0
[ 35.532124] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x6060000c, 0x10c = 0x80818
[ 35.560414] mtk_soc_eth 1e100000.ethernet: PPE started
[ 45.169725] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 45.182051] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
[ 45.194035] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dfca000, max=16, ctx=1, dtx=0, fdx=0, next=1
[ 45.213796] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e320000, max=0, calc=39, drx=40
[ 45.231312] fe_tx_timeout: 00: 0x00000000 0x00000000
[ 45.241208] Set cpu pointer behind dma. idx = 15
[ 45.642123] mtk_soc_eth 1e100000.ethernet: 0x100 = 0x5d60000c, 0x10c = 0x80818
[ 45.670574] mtk_soc_eth 1e100000.ethernet: PPE started
[ 55.169725] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out
[ 55.182047] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067
[ 55.194029] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0dfca000, max=16, ctx=1, dtx=0, fdx=0, next=1
[ 55.213790] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0e320000, max=0, calc=33, drx=34
[