21.02 rc4: mt7530 mdio-bus:1f lan1: Link is Down, Link is Up

I am testing OpenWrt 21.02 RC4 (a custom build with wpad-mesh-wolfssl and non CT drivers, based on this commit) a mesh router connected to:

  • an ADSL router via ethernet
  • another mesh router 802.11s mesh mode

The internet connection is quite intermittent some times and when debugging the logs I found out this on the router:

Tue Aug  3 15:10:37 2021 kern.info kernel: [40216.869351] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Tue Aug  3 15:10:54 2021 kern.info kernel: [40234.277374] mt7530 mdio-bus:1f lan1: Link is Down
Tue Aug  3 15:10:57 2021 kern.info kernel: [40237.349616] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Tue Aug  3 15:16:01 2021 kern.info kernel: [40541.482137] mt7530 mdio-bus:1f lan1: Link is Down
Tue Aug  3 15:16:04 2021 kern.info kernel: [40544.554338] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Tue Aug  3 15:19:01 2021 kern.info kernel: [40720.685006] mt7530 mdio-bus:1f lan1: Link is Down
Tue Aug  3 15:19:04 2021 kern.info kernel: [40723.757134] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Tue Aug  3 15:26:21 2021 kern.info kernel: [41161.011397] mt7530 mdio-bus:1f lan1: Link is Down
Tue Aug  3 15:26:24 2021 kern.info kernel: [41164.083598] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Tue Aug  3 15:27:45 2021 kern.info kernel: [41244.980612] mt7530 mdio-bus:1f lan1: Link is Down
Tue Aug  3 15:27:47 2021 kern.info kernel: [41247.028811] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off

This can go on for hours and when it starts it happens roughly between every 4 to 6 minutes.

Anyone can hint at what may be causing it?

The fact that it's happening intermittently and every 4 to 6 minutes seems to be some kind of bug which is triggered by something to me.

Network config of the mesh router:

config interface 'lan'                                                                                                                                                                                             
        option type 'bridge'                                                                                                                                                                                       
        option ifname 'lan1 lan2'                                                                                                                                                                              
        option ip6assign '60'                                                                                                                                                                                      
        option proto 'dhcp'                                                                                                                                                                                        
        option stp '1'                                                                                                                                                                                             
        option ageing_time '600'                                                                                                                                                                                   
        option forward_delay '4'                                                                                                                                                                                   
        option hello_time '4'                                                                                                                                                                                      
        option priority '4001'

Mesh config of the router:

config wifi-iface 'wifi_mesh0'
        option device 'radio0'
        option ifname 'mesh0'
        option mode 'mesh'
        option encryption 'psk2+ccmp'
        option key '**********'
        option mesh_id '*********'
        option network 'lan'
        option mesh_gate_announcements '1'
        option mesh_fwding '1'
        option mesh_rssi_threshold '-80'
        option mesh_hwmp_rootmode '4'

config wifi-iface 'wifi_mesh1'
        option device 'radio1'
        option ifname 'mesh1'
        option mode 'mesh'
        option encryption 'psk2+ccmp'
        option key '**********'
        option mesh_id '*********'
        option network 'lan'
        option mesh_gate_announcements '1'
        option mesh_fwding '1'
        option mesh_rssi_threshold '-80'
        option mesh_hwmp_rootmode '4'

The other mesh nodes are similarly configured, priority on the bridge is 5000 and they do not have mesh_gate_announcements nor mesh_hwmp_rootmode.

Thanks in advance to anyone who will help out.

Edit:

I found recent commens in two different posts which seem to be using master/openwrt 21.02 rc and have a similar issue:

Hello,
Do you have any news about this issue? Just found another user reporting a very similar bug , seems like i should try connecting the router on a modem with gigabit ports and see if disconnections still appear.

So far only this news:

But haven't tried it yet (nor I'm sure it makes sense for my case since I'm not using Xiaomi).

I am observing this problem on multiple devices of the same model.

Any suggestion of tests I can do? Should I check the ethernet cable type and whether the ADSL router has gigabit ports or not?

Seems like this issue can happen on all ports and not just wan according to the bug report. I'm not an expert but this patch seems to just disable EEE on the switch so i guess it can work on most mt7621 devices. For me connecting with gigabit capable cable on a 100M port still triggered disconnections after a while ,still havent tried connecting to a gigabit port but i think it's worth trying if you can.

Are you getting a similar output to the one I posted in logread?

Yes the log output was exactly the same as yours on 21.02 rc3/4 (I have it posted on the mi router 4AG forum) , the only difference on my case is that disconnections occured every few hours .

The disconnection doesn't happen constantly, it happens sometimes but when it does happens it doesn't happen only once but several times every 2 to 4 minutes, then it disappears.

For me it happens one time every few hours , the device i have connected to the router probably triggers the bug in that frequency.

1 Like

I have observed network traffic with tcpdump on lan1 while the problem happens and I did not notice anything weird before the lan port goes down.
When the problem arises a gap of 5 seconds happens in the interception of the network traffic, which confirms packets do not go through in that period.

However, I am not connected to lan1, other computers are connected to that port. I am connected to another port which is not showing up in the logs.

I will repeat this test but sniffing traffic on the whole lan bridge and also the port I am connected to to see if I can find anything useful.

Have you tried disabling eee by adding ethtool --set-eee eth0 eee off on /etc/rc.local ? The only clue i have is that eee is causing this issue so i guess it's worth a try. I rolled back on the working build and cant test right now unfortuantely.

1 Like

I tried giving that command for all the ethernet interfaces while the problem was showing up and it didn't stop it. I had the impression it made it worse because it started happening continously but it's just an impression. I rebooted the router and the problem disappeared.

I have the same problem, I had to go back to rc3

1 Like

I can't think of something else right now (I'm not an advanced user unfortunately and can't help you much) . I will flash rc4 again to test and if i find anything useful i'll reply back.

Some good news, I installed ethtool as suggested and added to /etc/rc.local the command to disable EEE on the ports which were giving issues on the different devices:

ethtool --set-eee lan1 eee off
ethtool --set-eee lan2 eee off
ethtool --set-eee lan3 eee off
ethtool --set-eee wan eee off

The routers reboot every night at 3 AM, it seems that the combination of rebooting + giving these commands at boot may be giving a positive effect, because today I did not experience the issue at all, while in the previous days I was experiencing it several times each day.

Beware that as I noted in my previous post, giving this command while the problem is being experienced did not give me a positive effect but it looked to me that it got worse, while after adding these lines to be executed at boot + a reboot the issue disappeared and hasn't reappeared since then.

I will keep testing it and if I find any more info I'll post it.

In any case though, it looks there's a bug somewhere to be fixed in the code of this driver.

2 Likes

@nemesis please keep us posted because I have this problem with rc3 too.
turning eee off made the issue worse on rc3 even after a reboot.

I don't exactly know where this error started.
All the time I thought I had bad connection, after seeing this thread I realized that it was my router.

[258182.613013] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258182.620654] br-lan: port 1(lan1) entered blocking state
[258182.626025] br-lan: port 1(lan1) entered forwarding state
[258271.699929] mt7530 mdio-bus:1f lan1: Link is Down
[258271.705058] br-lan: port 1(lan1) entered disabled state
[258273.748028] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258273.755670] br-lan: port 1(lan1) entered blocking state
[258273.761021] br-lan: port 1(lan1) entered forwarding state
[258308.563492] mt7530 mdio-bus:1f lan1: Link is Down
[258308.568441] br-lan: port 1(lan1) entered disabled state
[258310.611622] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258310.619265] br-lan: port 1(lan1) entered blocking state
[258310.624611] br-lan: port 1(lan1) entered forwarding state
[258312.659619] mt7530 mdio-bus:1f lan1: Link is Down
[258312.664598] br-lan: port 1(lan1) entered disabled state
[258314.707579] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258314.715233] br-lan: port 1(lan1) entered blocking state
[258314.720585] br-lan: port 1(lan1) entered forwarding state
[258315.731414] mt7530 mdio-bus:1f lan1: Link is Down
[258315.736363] br-lan: port 1(lan1) entered disabled state
[258318.803534] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258318.811173] br-lan: port 1(lan1) entered blocking state
[258318.816510] br-lan: port 1(lan1) entered forwarding state
[258320.851534] mt7530 mdio-bus:1f lan1: Link is Down
[258320.856509] br-lan: port 1(lan1) entered disabled state
[258322.899495] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258322.907135] br-lan: port 1(lan1) entered blocking state
[258322.912467] br-lan: port 1(lan1) entered forwarding state
[258323.923322] mt7530 mdio-bus:1f lan1: Link is Down
[258323.928267] br-lan: port 1(lan1) entered disabled state
[258326.995451] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258327.003114] br-lan: port 1(lan1) entered blocking state
[258327.008450] br-lan: port 1(lan1) entered forwarding state
[258328.019284] mt7530 mdio-bus:1f lan1: Link is Down
[258328.024238] br-lan: port 1(lan1) entered disabled state
[258331.091406] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258331.099046] br-lan: port 1(lan1) entered blocking state
[258331.104387] br-lan: port 1(lan1) entered forwarding state
[258332.115232] mt7530 mdio-bus:1f lan1: Link is Down
[258332.120179] br-lan: port 1(lan1) entered disabled state
[258335.187374] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258335.195045] br-lan: port 1(lan1) entered blocking state
[258335.200389] br-lan: port 1(lan1) entered forwarding state
[258336.211195] mt7530 mdio-bus:1f lan1: Link is Down
[258336.216158] br-lan: port 1(lan1) entered disabled state
[258338.259349] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258338.267021] br-lan: port 1(lan1) entered blocking state
[258338.272354] br-lan: port 1(lan1) entered forwarding state
1 Like

I deleted my previous post becasue right after posting I took the time to triple check.

I narrowed down the workstation which when turned on generates the issue by asking people who was having problems with which workstation. I turned it on and as soon as it turned on I saw this in the logread output of the OpenWrt router:

Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.168417] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.183417] br-lan: port 1(lan1) entered blocking state
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.193904] br-lan: port 1(lan1) entered listening state
Mon Aug 16 12:22:54 2021 daemon.notice netifd: Network device 'lan1' link is up
Mon Aug 16 12:22:59 2021 kern.info kernel: [30158.327955] br-lan: port 1(lan1) entered learning state
Mon Aug 16 12:23:03 2021 kern.info kernel: [30162.424015] br-lan: port 1(lan1) entered forwarding state
Mon Aug 16 12:23:03 2021 kern.info kernel: [30162.434797] br-lan: topology change detected, propagating
Mon Aug 16 12:23:09 2021 kern.info kernel: [30168.504362] mt7530 mdio-bus:1f lan1: Link is Down
Mon Aug 16 12:23:09 2021 kern.info kernel: [30168.514147] br-lan: port 1(lan1) entered disabled state
Mon Aug 16 12:23:09 2021 daemon.notice netifd: Network device 'lan1' link is down
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.552573] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.568040] br-lan: port 1(lan1) entered blocking state
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.578533] br-lan: port 1(lan1) entered listening state
Mon Aug 16 12:23:11 2021 daemon.notice netifd: Network device 'lan1' link is up
Mon Aug 16 12:23:15 2021 kern.info kernel: [30174.712372] br-lan: port 1(lan1) entered learning state
Mon Aug 16 12:23:19 2021 kern.info kernel: [30178.808251] br-lan: port 1(lan1) entered forwarding state
Mon Aug 16 12:23:19 2021 kern.info kernel: [30178.819063] br-lan: topology change detected, propagating

I guess br-lan: topology change detected, propagating is normal, isn't it? The LAN topology is changing because a new node is online, but mt7530 mdio-bus:1f lan1: Link is Down and then up in a matter of a second doesn't look ok.

So i have some bad news , just tested rc4 on my router and disconnects were so severe that every ~10secs disconnections were happening making it impossible even to establish a pppoe connection. What's more strange is that with rc3(and i think with rc4 the last time i tried , i may remember wrong though) i could use the router just with some occasional disconnections every couple of hours. Connected to a wifi in order to install ethtool and make the tweaks but it didn't help.

For some reason though, once I added those ethtool commands at boot which disable EEE, I am not suffering from this problem anymore, occasionally the log lines show in logread but this frequent disconnection does not take place.

Is this the driver we're talking about?

If we look at the history of the changes on that driver we can find a commit from April 13, 2021 which adds support for EEE, so I wonder if this EEE thing is just a coincidence or not.

Does anybody know if there's some good information we can collect about this issue and send a bug report upstream?

1 Like

I recompiled my image with OpenWrt master but I removed the EEE patch in the mt7530 firmware: target/linux/generic/pending-5.4/761-net-dsa-mt7530-Support-EEE-features.patch.

Nonetheless, I still see the following in the log:

Mon Sep 27 16:00:32 2021 kern.info kernel: [26125.537587] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:00:35 2021 kern.info kernel: [26128.657842] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:03:17 2021 kern.info kernel: [26290.899954] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:03:21 2021 kern.info kernel: [26294.020221] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:04:09 2021 kern.info kernel: [26342.900625] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:04:14 2021 kern.info kernel: [26347.060843] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:13:32 2021 kern.info kernel: [26905.548205] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:13:35 2021 kern.info kernel: [26908.669070] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:13:58 2021 kern.info kernel: [26931.548524] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:14:00 2021 kern.info kernel: [26933.628830] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:17:21 2021 kern.info kernel: [27134.351156] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:17:25 2021 kern.info kernel: [27138.511534] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:26:07 2021 kern.info kernel: [27660.598009] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:26:10 2021 kern.info kernel: [27663.718139] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off

Therefore I think we can rule out EEE having to do anything with this.

It is the EEE that is causing problem.

It was mentioned on the mailing list.
The wake up timer needs to be tuned.
[RFC,v4,net-next,1/4] net: phy: add MediaTek PHY driver - Patchwork (kernel.org)

1 Like