21.02 rc4: mt7530 mdio-bus:1f lan1: Link is Down, Link is Up

I have observed network traffic with tcpdump on lan1 while the problem happens and I did not notice anything weird before the lan port goes down.
When the problem arises a gap of 5 seconds happens in the interception of the network traffic, which confirms packets do not go through in that period.

However, I am not connected to lan1, other computers are connected to that port. I am connected to another port which is not showing up in the logs.

I will repeat this test but sniffing traffic on the whole lan bridge and also the port I am connected to to see if I can find anything useful.

Have you tried disabling eee by adding ethtool --set-eee eth0 eee off on /etc/rc.local ? The only clue i have is that eee is causing this issue so i guess it's worth a try. I rolled back on the working build and cant test right now unfortuantely.

1 Like

I tried giving that command for all the ethernet interfaces while the problem was showing up and it didn't stop it. I had the impression it made it worse because it started happening continously but it's just an impression. I rebooted the router and the problem disappeared.

I have the same problem, I had to go back to rc3

1 Like

I can't think of something else right now (I'm not an advanced user unfortunately and can't help you much) . I will flash rc4 again to test and if i find anything useful i'll reply back.

Some good news, I installed ethtool as suggested and added to /etc/rc.local the command to disable EEE on the ports which were giving issues on the different devices:

ethtool --set-eee lan1 eee off
ethtool --set-eee lan2 eee off
ethtool --set-eee lan3 eee off
ethtool --set-eee wan eee off

The routers reboot every night at 3 AM, it seems that the combination of rebooting + giving these commands at boot may be giving a positive effect, because today I did not experience the issue at all, while in the previous days I was experiencing it several times each day.

Beware that as I noted in my previous post, giving this command while the problem is being experienced did not give me a positive effect but it looked to me that it got worse, while after adding these lines to be executed at boot + a reboot the issue disappeared and hasn't reappeared since then.

I will keep testing it and if I find any more info I'll post it.

In any case though, it looks there's a bug somewhere to be fixed in the code of this driver.

2 Likes

@nemesis please keep us posted because I have this problem with rc3 too.
turning eee off made the issue worse on rc3 even after a reboot.

I don't exactly know where this error started.
All the time I thought I had bad connection, after seeing this thread I realized that it was my router.

[258182.613013] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258182.620654] br-lan: port 1(lan1) entered blocking state
[258182.626025] br-lan: port 1(lan1) entered forwarding state
[258271.699929] mt7530 mdio-bus:1f lan1: Link is Down
[258271.705058] br-lan: port 1(lan1) entered disabled state
[258273.748028] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258273.755670] br-lan: port 1(lan1) entered blocking state
[258273.761021] br-lan: port 1(lan1) entered forwarding state
[258308.563492] mt7530 mdio-bus:1f lan1: Link is Down
[258308.568441] br-lan: port 1(lan1) entered disabled state
[258310.611622] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258310.619265] br-lan: port 1(lan1) entered blocking state
[258310.624611] br-lan: port 1(lan1) entered forwarding state
[258312.659619] mt7530 mdio-bus:1f lan1: Link is Down
[258312.664598] br-lan: port 1(lan1) entered disabled state
[258314.707579] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258314.715233] br-lan: port 1(lan1) entered blocking state
[258314.720585] br-lan: port 1(lan1) entered forwarding state
[258315.731414] mt7530 mdio-bus:1f lan1: Link is Down
[258315.736363] br-lan: port 1(lan1) entered disabled state
[258318.803534] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258318.811173] br-lan: port 1(lan1) entered blocking state
[258318.816510] br-lan: port 1(lan1) entered forwarding state
[258320.851534] mt7530 mdio-bus:1f lan1: Link is Down
[258320.856509] br-lan: port 1(lan1) entered disabled state
[258322.899495] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258322.907135] br-lan: port 1(lan1) entered blocking state
[258322.912467] br-lan: port 1(lan1) entered forwarding state
[258323.923322] mt7530 mdio-bus:1f lan1: Link is Down
[258323.928267] br-lan: port 1(lan1) entered disabled state
[258326.995451] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258327.003114] br-lan: port 1(lan1) entered blocking state
[258327.008450] br-lan: port 1(lan1) entered forwarding state
[258328.019284] mt7530 mdio-bus:1f lan1: Link is Down
[258328.024238] br-lan: port 1(lan1) entered disabled state
[258331.091406] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258331.099046] br-lan: port 1(lan1) entered blocking state
[258331.104387] br-lan: port 1(lan1) entered forwarding state
[258332.115232] mt7530 mdio-bus:1f lan1: Link is Down
[258332.120179] br-lan: port 1(lan1) entered disabled state
[258335.187374] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258335.195045] br-lan: port 1(lan1) entered blocking state
[258335.200389] br-lan: port 1(lan1) entered forwarding state
[258336.211195] mt7530 mdio-bus:1f lan1: Link is Down
[258336.216158] br-lan: port 1(lan1) entered disabled state
[258338.259349] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258338.267021] br-lan: port 1(lan1) entered blocking state
[258338.272354] br-lan: port 1(lan1) entered forwarding state
1 Like

I deleted my previous post becasue right after posting I took the time to triple check.

I narrowed down the workstation which when turned on generates the issue by asking people who was having problems with which workstation. I turned it on and as soon as it turned on I saw this in the logread output of the OpenWrt router:

Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.168417] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.183417] br-lan: port 1(lan1) entered blocking state
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.193904] br-lan: port 1(lan1) entered listening state
Mon Aug 16 12:22:54 2021 daemon.notice netifd: Network device 'lan1' link is up
Mon Aug 16 12:22:59 2021 kern.info kernel: [30158.327955] br-lan: port 1(lan1) entered learning state
Mon Aug 16 12:23:03 2021 kern.info kernel: [30162.424015] br-lan: port 1(lan1) entered forwarding state
Mon Aug 16 12:23:03 2021 kern.info kernel: [30162.434797] br-lan: topology change detected, propagating
Mon Aug 16 12:23:09 2021 kern.info kernel: [30168.504362] mt7530 mdio-bus:1f lan1: Link is Down
Mon Aug 16 12:23:09 2021 kern.info kernel: [30168.514147] br-lan: port 1(lan1) entered disabled state
Mon Aug 16 12:23:09 2021 daemon.notice netifd: Network device 'lan1' link is down
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.552573] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.568040] br-lan: port 1(lan1) entered blocking state
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.578533] br-lan: port 1(lan1) entered listening state
Mon Aug 16 12:23:11 2021 daemon.notice netifd: Network device 'lan1' link is up
Mon Aug 16 12:23:15 2021 kern.info kernel: [30174.712372] br-lan: port 1(lan1) entered learning state
Mon Aug 16 12:23:19 2021 kern.info kernel: [30178.808251] br-lan: port 1(lan1) entered forwarding state
Mon Aug 16 12:23:19 2021 kern.info kernel: [30178.819063] br-lan: topology change detected, propagating

I guess br-lan: topology change detected, propagating is normal, isn't it? The LAN topology is changing because a new node is online, but mt7530 mdio-bus:1f lan1: Link is Down and then up in a matter of a second doesn't look ok.

So i have some bad news , just tested rc4 on my router and disconnects were so severe that every ~10secs disconnections were happening making it impossible even to establish a pppoe connection. What's more strange is that with rc3(and i think with rc4 the last time i tried , i may remember wrong though) i could use the router just with some occasional disconnections every couple of hours. Connected to a wifi in order to install ethtool and make the tweaks but it didn't help.

For some reason though, once I added those ethtool commands at boot which disable EEE, I am not suffering from this problem anymore, occasionally the log lines show in logread but this frequent disconnection does not take place.

Is this the driver we're talking about?

If we look at the history of the changes on that driver we can find a commit from April 13, 2021 which adds support for EEE, so I wonder if this EEE thing is just a coincidence or not.

Does anybody know if there's some good information we can collect about this issue and send a bug report upstream?

1 Like

I recompiled my image with OpenWrt master but I removed the EEE patch in the mt7530 firmware: target/linux/generic/pending-5.4/761-net-dsa-mt7530-Support-EEE-features.patch.

Nonetheless, I still see the following in the log:

Mon Sep 27 16:00:32 2021 kern.info kernel: [26125.537587] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:00:35 2021 kern.info kernel: [26128.657842] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:03:17 2021 kern.info kernel: [26290.899954] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:03:21 2021 kern.info kernel: [26294.020221] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:04:09 2021 kern.info kernel: [26342.900625] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:04:14 2021 kern.info kernel: [26347.060843] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:13:32 2021 kern.info kernel: [26905.548205] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:13:35 2021 kern.info kernel: [26908.669070] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:13:58 2021 kern.info kernel: [26931.548524] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:14:00 2021 kern.info kernel: [26933.628830] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:17:21 2021 kern.info kernel: [27134.351156] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:17:25 2021 kern.info kernel: [27138.511534] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:26:07 2021 kern.info kernel: [27660.598009] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:26:10 2021 kern.info kernel: [27663.718139] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off

Therefore I think we can rule out EEE having to do anything with this.

It is the EEE that is causing problem.

It was mentioned on the mailing list.
The wake up timer needs to be tuned.
[RFC,v4,net-next,1/4] net: phy: add MediaTek PHY driver - Patchwork (kernel.org)

1 Like

Is there a way to disable this entirely? Or is it something extremely needed?

Turn it off through ethtool doesn't work for you?

I did it, I also removed the patch target/linux/generic/pending-5.4/761-net-dsa-mt7530-Support-EEE-features.patch, but I still see the port going down and up. However, it is not causing internet connection issues anymore. I am not really sure what is goind on honestly.

you need that patch to turn it off through ethtool

How do I know that this problem may affect me if I have the X edgerouter?

check your log and see if there's any unusual Link is up/ Link is Down

[   21.055373] IPv6: ADDRCONF(NETDEV_CHANGE): lanveth: link becomes ready
[   21.069074] IPv6: ADDRCONF(NETDEV_CHANGE): lanbrport: link becomes ready
[   21.121121] br-lan: port 5(lanbrport) entered blocking state
[   21.132487] br-lan: port 5(lanbrport) entered disabled state
[   21.144540] device lanbrport entered promiscuous mode
[   21.155305] br-lan: port 5(lanbrport) entered blocking state
[   21.166736] br-lan: port 5(lanbrport) entered forwarding state
[   21.178843] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   22.840366] mt7530 mdio-bus:1f eth1: Link is Up - 100Mbps/Full - flow control off
[   22.855670] br-lan: port 1(eth1) entered blocking state
[   22.866223] br-lan: port 1(eth1) entered forwarding state
[   24.216238] mt7530 mdio-bus:1f eth0: Link is Up - 1Gbps/Full - flow control off
[   24.230868] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   58.808144] mt7530 mdio-bus:1f eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[   58.823120] br-lan: port 2(eth2) entered blocking state
[   58.833587] br-lan: port 2(eth2) entered forwarding state
[ 5887.300776] mt7530 mdio-bus:1f eth1: Link is Down
[ 5887.310673] br-lan: port 1(eth1) entered disabled state
[ 5890.373009] mt7530 mdio-bus:1f eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 5890.388065] br-lan: port 1(eth1) entered blocking state
[ 5890.398576] br-lan: port 1(eth1) entered forwarding state

This looks consistent with what I am having too.
Are you having internet connection issues because of this? I mean temporary hickups.

@castiel652 regarding the indicated patch, how can this be applied to OpenWrt?
BTW do you know the subject of the thread on the mailing list so I can take a look at it please?