Mtk_soc_eth watchdog timeout after r11573

Is this on 19.07? Those patches are not for the upstream driver.

Yes, that is on 19.07. He said so himself one post earlier :slight_smile:

Tried it. Slightly less performance, only two cores are loaded.

These patches are included in release 19.07.4.

Disable Flow Control was added to the 19.07 branch on May 26. Interrupt handling patch on September 6.
19.07.4 was released on September 7th.

Hardware offloading also seems to have been fixed (I haven't tested it).

1 Like

Yes, this looks like a really solid release for mt7621 devices. I am about to do the upgrade. Can't wait to test stability!

I think... YEP!!!!
In my usage scenario, 10 minute and reboot.
19.04 - more than 1hr heavy load, and no reboot.
Its cool

That is promising! I wonder what usage scenario causes it to crash that quickly. I only ran into issues about ~once a week.

What version were you running before this one by the way?

  • release 19.07.3 - no reboot, but sometimes "sch_generic.c:320 error" in log

  • snapshot, including "Update kernel 4.14 to version 4.14.195", "generic: fix flow table hw offload " and "ramips: gsw_mt7621: disable PORT 5 MAC RX/TX flow control by default" - reboot with various kernel panic (logging by serial console)

  • release 19.07.4 (previous patches + "ramips: ethernet: fix to interrupt handling") - no reboot, and no "sch_generic.c:320 error" for now

1 Like

Release 19.07.4 Hardware NAT is fixed partially, now it works, but after some time of work, network gets inaccessible, must reboot router. Switch to Software flow offloading, seems work stable.

Just installed today's snapshot, and now there is no way to spread the load on more than 1 core, this creates a bottleneck of 350Mbits on a gigabit line. With SW offload. I also tried HW offload, thought it might got working after this commit: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=4fb58813f94ac6cc8167138e23a92189fe50b258, but no.

I also tried to enable/disable packet steering, tried my previous tricks with RPS, nothing works, limited to single core and terrible speeds...

MOD:

And my kernel log is full with this:

[  105.362573] mt76x2e 0000:02:00.0: MCU message 2 (seq 9) timed out
[  105.731965] mt76x2e 0000:02:00.0: Firmware Version: 0.0.00
[  105.743041] mt76x2e 0000:02:00.0: Build: 1
[  105.751312] mt76x2e 0000:02:00.0: Build Time: 201507311614____
[  105.778561] mt76x2e 0000:02:00.0: Firmware running!
[  105.790708] ieee80211 phy1: Hardware restart was requested
[  106.834540] mt76x2e 0000:02:00.0: MCU message 2 (seq 12) timed out
[  107.203695] mt76x2e 0000:02:00.0: Firmware Version: 0.0.00
[  107.214697] mt76x2e 0000:02:00.0: Build: 1
[  107.222905] mt76x2e 0000:02:00.0: Build Time: 201507311614____
[  107.250568] mt76x2e 0000:02:00.0: Firmware running!
[  107.262695] ieee80211 phy1: Hardware restart was requested

If I disable all wifi adapters, the kernel log stops shooting this message.

MOD2: only the 2.4GHz wifi is affected.

Here has none of these logs, running

OpenWrt SNAPSHOT, r14465-04d3b517dc

for 3 days, 18:41.

BTW, as I mentioned before, my D-Link DIR-860L B1 is just used as AP, not main router.

@dchard Please let the mt76 developers know you are also experiencing the same problem. Hopefully they will look into it if they know it's a widespread issue. Out of curiosity, what mt76 device are you using? Curious if yours is using the same WiFi chips as my router is.

I commented. On my end, besides the wifi issue, PPPoE is also dropping (not ISP fault).

Maybe commit 34a96529041d4e9502c490c66f8af0154187c6d2 would help.

Revert "ramips: ethernet: fix to interrupt handling"

This reverts commit 7ac454014a11347887323a131415ac7032d53546.

The change reportedly causes regressions in ethernet performance.

Fixes: FS#3332
Signed-off-by: Jo-Philipp Wich <jo@mein.io>

This commit is only for 19.07.x with the old driver. Not for master with DSA driver. On the other hand, they shouldn't reverse it. I have been using this patch for months and not only does it not cause any performance problems, but it is absolutely necessary to avoid transmit timed out errors or unexpected reboots.

I reverted 6 days worth of commits, did not helped. Then 13 days of commits, did not helped either, which is weird as the last time I built an image was 10 days ago, and that worked without a flaw...

MOD: tried the latest master with the latest mt76 push and it is the same issue: 2.4G wifi is HW resetting constantly.

@Mushoz it is really only the two of us how sees this instability?

MOD: I went back a month worth of commits, and still this crap with the 2.4GHz wifi persists. At least the PPPoE drops are gone.

I have the same issue with the Netgear R6220:

[311768.306053] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[326308.563468] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[326311.227446] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[337893.023836] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[337896.867296] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[341279.270143] mt76x2e 0000:01:00.0: Firmware Version: 0.0.00
[341279.281642] mt76x2e 0000:01:00.0: Build: 1
[341279.290039] mt76x2e 0000:01:00.0: Build Time: 201507311614____
[341279.318453] mt76x2e 0000:01:00.0: Firmware running!
[341279.338533] ieee80211 phy1: Hardware restart was requested
[342852.173685] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[342854.857630] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[349963.279376] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[349965.967635] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[351498.704203] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[351502.654127] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[360266.776839] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[360269.332901] mtk_soc_eth 1e100000.ethernet eth0: port 3 link up
[364880.383706] mtk_soc_eth 1e100000.ethernet eth0: port 3 link down
[364882.984503] mtk_soc_eth 1e100000.ethernet

Running 19.07.4 did solve my issue when unplugging a USB 3 gigabit controller it disconnected some ports.

I did a complete power cycle on a 1month old version, and that seems to be solved the Wifi and PPPoE issues as well. I am going to upgrade to the latest snapshot, and if the Wifi issue is present, will do a full powercycle after the upgrade.

1 Like

Flashed latest 19.07.4 on my 3 R6220's no timeouts now and WiFi is super stable for the most part. I'll be trying on my R6350s at the end of the month. But I did notice that I couldn't reuse my setup files on the 6220s even though they were in dumb AP mode with only 3VLANS and a pair of WiFi on 2.4G and a single on 5G They would straight up lock up/crash.

Edit - Firewall/Odhcpd/one other service are all offline. There are no routing rules here and everything else is handled upstream on an EA4500 (Viper)