Mtk_soc_eth watchdog timeout after r11573

I commented. On my end, besides the wifi issue, PPPoE is also dropping (not ISP fault).

Maybe commit 34a96529041d4e9502c490c66f8af0154187c6d2 would help.

Revert "ramips: ethernet: fix to interrupt handling"

This reverts commit 7ac454014a11347887323a131415ac7032d53546.

The change reportedly causes regressions in ethernet performance.

Fixes: FS#3332
Signed-off-by: Jo-Philipp Wich <jo@mein.io>

This commit is only for 19.07.x with the old driver. Not for master with DSA driver. On the other hand, they shouldn't reverse it. I have been using this patch for months and not only does it not cause any performance problems, but it is absolutely necessary to avoid transmit timed out errors or unexpected reboots.

I reverted 6 days worth of commits, did not helped. Then 13 days of commits, did not helped either, which is weird as the last time I built an image was 10 days ago, and that worked without a flaw...

MOD: tried the latest master with the latest mt76 push and it is the same issue: 2.4G wifi is HW resetting constantly.

@Mushoz it is really only the two of us how sees this instability?

MOD: I went back a month worth of commits, and still this crap with the 2.4GHz wifi persists. At least the PPPoE drops are gone.

I have the same issue with the Netgear R6220:

[311768.306053] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[326308.563468] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[326311.227446] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[337893.023836] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[337896.867296] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[341279.270143] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00
[341279.281642] mt76x2e 0000:01:00.0: Build: 1
[341279.290039] mt76x2e 0000:01:00.0: Build Time: 201507311614____
[341279.318453] mt76x2e 0000:01:00.0: Firmware running!
[341279.338533] ieee80211 phy1: Hardware restart was requested
[342852.173685] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[342854.857630] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[349963.279376] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[349965.967635] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[351498.704203] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[351502.654127] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[360266.776839] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[360269.332901] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[364880.383706] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[364882.984503] mtk_soc_eth 1e100000.ethernet

Running 19.07.4 did solve my issue when unplugging a USB 3 gigabit controller it disconnected some ports.

I did a complete power cycle on a 1month old version, and that seems to be solved the Wifi and PPPoE issues as well. I am going to upgrade to the latest snapshot, and if the Wifi issue is present, will do a full powercycle after the upgrade.

1 Like

Flashed latest 19.07.4 on my 3 R6220's no timeouts now and WiFi is super stable for the most part. I'll be trying on my R6350s at the end of the month. But I did notice that I couldn't reuse my setup files on the 6220s even though they were in dumb AP mode with only 3VLANS and a pair of WiFi on 2.4G and a single on 5G They would straight up lock up/crash.

Edit - Firewall/Odhcpd/one other service are all offline. There are no routing rules here and everything else is handled upstream on an EA4500 (Viper)

So, I am back on latest master, the Wifi issue is gone, but the PPPoE link keeps dropping:

[  975.364411] mt7530 mdio-bus:1f wan: Link is Down
[  975.379349] mt7530 mdio-bus:1f wan: configuring for phy/gmii link mode
[  975.392953] 8021q: adding VLAN 0 to HW filter on device wan
[  975.498268] mt7530 mdio-bus:1f wan: configuring for phy/gmii link mode
[  975.511931] 8021q: adding VLAN 0 to HW filter on device wan
[  979.602660] mt7530 mdio-bus:1f wan: Link is Up - 1Gbps/Full - flow control off
[  984.958268] pppoe-digi: renamed from ppp0

It repeats every 5-8 minutes. Seems to be something with the switch driver. Hopefully @nbd can take a look.

I already deleted and recreated the PPPoE connection from scratch.

Very weird. I am also on PPPoE but with zero issues. I am only seeing the log spam regarding the resetting WiFi. Your WiFi issues went away by power cycling? How did you power cycle the device exactly?

"Your WiFi issues went away by power cycling? "

Yes. Just pulled the plug for 10 seconds.

As for PPPoE, I had to revert to a 1month old version as the latest master drops every 5 minutes.

Ah, I am on 19.07.4. Is that version fine for you as well for your PPPoE connection?

I am on DSA for months now, I cannot revert to 4.19 without reconfiguring the whole router, so I would not test that if you dont mind :slight_smile:

But again: reverting to SNAPSHOT r14295-05b8e84362 fixed the PPPoE issue as well.

Hahh, there is a misterious link change bug on ath79 too! :thinking:

http://lists.openwrt.org/pipermail/openwrt-devel/2020-September/031466.html

Tried today's snapshot: PPPoE still fails.
Went back to this one: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=2c2fcbd2e0f856f460040b8c67530ca27fa323e7 and it works fine.

the latest 19.07.4 basically make my Xiaomi Router 4a 100M to "NETDEV WATCHDOG: eth0 (mtk_soc_eth)" several times in a single day. at least the RAM still have something like 20MB left compared to 19.07.3 which eats the RAM like crazy, still once that NETDEV WATCHDOG pops out, the 2.4ghz is killed, still the 5ghz is alive and if you give it like 5-10minutes, 2.4ghz wifi will just recover on its own with no related logs pertaining on the issue. and if you try to force restart the 2.4ghz radio it will just pop up a error with "device not ready" issue with several "mt76_wmac 10300000.wmac: MCU message 8 (seq 8) timed out" on the logs

in 19.07.3 it works fine at least for a few days (about 12 days) wtihout any issue before it crash with luci having a error with Out of Memory issue and the ssh also not working (probably the process is killed due to out of ram issue) I remember that out of 58MB of ram, only like 2mb of ram is left at 10days uptime, I dont know why it got so low at that point given that im only using this as a AP with setup to do a VLAN-SSID with 4 SSID with different isolated networks, with only like 7devices connected most of the time. but for some unknown reason, the 5ghz part of the wifi is still working and the 2.4ghz is dead until I force power cycle the Xiaomi 4a.

reverted back to 19.07.3 and just put a every 5 days auto reboot cronjob on it. that works fine for me at least.

Your problem with the RAM is strange. I have several HG556a also with 64MB running on 19.07.4 and 31MB remain free after 18 days of uptime. The same as just rebooted. Before they had 19.07 snapshot and i had no problems with ram. And they are working as a router with several packages installed (openvpn etc).

as of now, only like 5.15mb is total available, yesterday that was around 20MB, with Free is around 30+MB.

to be honest I dont know whats going on, no extra packages installed, no routing, no NAT, no firewall, just plain AP.

You have 12.96MB Free.

2 Likes

ill just post the later one which will drop like 3MB available with around <5MB free once it reach 5+days uptime.

edit: it happen sooner than I think.... here's what happen to luci now


cant access ssh since it was probably killed due to out of ram.