I have observed network traffic with tcpdump on lan1 while the problem happens and I did not notice anything weird before the lan port goes down.
When the problem arises a gap of 5 seconds happens in the interception of the network traffic, which confirms packets do not go through in that period.
However, I am not connected to lan1, other computers are connected to that port. I am connected to another port which is not showing up in the logs.
I will repeat this test but sniffing traffic on the whole lan bridge and also the port I am connected to to see if I can find anything useful.
Have you tried disabling eee by adding ethtool --set-eee eth0 eee off on /etc/rc.local ? The only clue i have is that eee is causing this issue so i guess it's worth a try. I rolled back on the working build and cant test right now unfortuantely.
I tried giving that command for all the ethernet interfaces while the problem was showing up and it didn't stop it. I had the impression it made it worse because it started happening continously but it's just an impression. I rebooted the router and the problem disappeared.
I can't think of something else right now (I'm not an advanced user unfortunately and can't help you much) . I will flash rc4 again to test and if i find anything useful i'll reply back.
Some good news, I installed ethtool as suggested and added to /etc/rc.local the command to disable EEE on the ports which were giving issues on the different devices:
ethtool --set-eee lan1 eee off
ethtool --set-eee lan2 eee off
ethtool --set-eee lan3 eee off
ethtool --set-eee wan eee off
The routers reboot every night at 3 AM, it seems that the combination of rebooting + giving these commands at boot may be giving a positive effect, because today I did not experience the issue at all, while in the previous days I was experiencing it several times each day.
Beware that as I noted in my previous post, giving this command while the problem is being experienced did not give me a positive effect but it looked to me that it got worse, while after adding these lines to be executed at boot + a reboot the issue disappeared and hasn't reappeared since then.
I will keep testing it and if I find any more info I'll post it.
In any case though, it looks there's a bug somewhere to be fixed in the code of this driver.
@nemesis please keep us posted because I have this problem with rc3 too.
turning eee off made the issue worse on rc3 even after a reboot.
I don't exactly know where this error started.
All the time I thought I had bad connection, after seeing this thread I realized that it was my router.
[258182.613013] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258182.620654] br-lan: port 1(lan1) entered blocking state
[258182.626025] br-lan: port 1(lan1) entered forwarding state
[258271.699929] mt7530 mdio-bus:1f lan1: Link is Down
[258271.705058] br-lan: port 1(lan1) entered disabled state
[258273.748028] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258273.755670] br-lan: port 1(lan1) entered blocking state
[258273.761021] br-lan: port 1(lan1) entered forwarding state
[258308.563492] mt7530 mdio-bus:1f lan1: Link is Down
[258308.568441] br-lan: port 1(lan1) entered disabled state
[258310.611622] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258310.619265] br-lan: port 1(lan1) entered blocking state
[258310.624611] br-lan: port 1(lan1) entered forwarding state
[258312.659619] mt7530 mdio-bus:1f lan1: Link is Down
[258312.664598] br-lan: port 1(lan1) entered disabled state
[258314.707579] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258314.715233] br-lan: port 1(lan1) entered blocking state
[258314.720585] br-lan: port 1(lan1) entered forwarding state
[258315.731414] mt7530 mdio-bus:1f lan1: Link is Down
[258315.736363] br-lan: port 1(lan1) entered disabled state
[258318.803534] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258318.811173] br-lan: port 1(lan1) entered blocking state
[258318.816510] br-lan: port 1(lan1) entered forwarding state
[258320.851534] mt7530 mdio-bus:1f lan1: Link is Down
[258320.856509] br-lan: port 1(lan1) entered disabled state
[258322.899495] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258322.907135] br-lan: port 1(lan1) entered blocking state
[258322.912467] br-lan: port 1(lan1) entered forwarding state
[258323.923322] mt7530 mdio-bus:1f lan1: Link is Down
[258323.928267] br-lan: port 1(lan1) entered disabled state
[258326.995451] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258327.003114] br-lan: port 1(lan1) entered blocking state
[258327.008450] br-lan: port 1(lan1) entered forwarding state
[258328.019284] mt7530 mdio-bus:1f lan1: Link is Down
[258328.024238] br-lan: port 1(lan1) entered disabled state
[258331.091406] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258331.099046] br-lan: port 1(lan1) entered blocking state
[258331.104387] br-lan: port 1(lan1) entered forwarding state
[258332.115232] mt7530 mdio-bus:1f lan1: Link is Down
[258332.120179] br-lan: port 1(lan1) entered disabled state
[258335.187374] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258335.195045] br-lan: port 1(lan1) entered blocking state
[258335.200389] br-lan: port 1(lan1) entered forwarding state
[258336.211195] mt7530 mdio-bus:1f lan1: Link is Down
[258336.216158] br-lan: port 1(lan1) entered disabled state
[258338.259349] mt7530 mdio-bus:1f lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[258338.267021] br-lan: port 1(lan1) entered blocking state
[258338.272354] br-lan: port 1(lan1) entered forwarding state
I deleted my previous post becasue right after posting I took the time to triple check.
I narrowed down the workstation which when turned on generates the issue by asking people who was having problems with which workstation. I turned it on and as soon as it turned on I saw this in the logread output of the OpenWrt router:
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.168417] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.183417] br-lan: port 1(lan1) entered blocking state
Mon Aug 16 12:22:54 2021 kern.info kernel: [30154.193904] br-lan: port 1(lan1) entered listening state
Mon Aug 16 12:22:54 2021 daemon.notice netifd: Network device 'lan1' link is up
Mon Aug 16 12:22:59 2021 kern.info kernel: [30158.327955] br-lan: port 1(lan1) entered learning state
Mon Aug 16 12:23:03 2021 kern.info kernel: [30162.424015] br-lan: port 1(lan1) entered forwarding state
Mon Aug 16 12:23:03 2021 kern.info kernel: [30162.434797] br-lan: topology change detected, propagating
Mon Aug 16 12:23:09 2021 kern.info kernel: [30168.504362] mt7530 mdio-bus:1f lan1: Link is Down
Mon Aug 16 12:23:09 2021 kern.info kernel: [30168.514147] br-lan: port 1(lan1) entered disabled state
Mon Aug 16 12:23:09 2021 daemon.notice netifd: Network device 'lan1' link is down
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.552573] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.568040] br-lan: port 1(lan1) entered blocking state
Mon Aug 16 12:23:11 2021 kern.info kernel: [30170.578533] br-lan: port 1(lan1) entered listening state
Mon Aug 16 12:23:11 2021 daemon.notice netifd: Network device 'lan1' link is up
Mon Aug 16 12:23:15 2021 kern.info kernel: [30174.712372] br-lan: port 1(lan1) entered learning state
Mon Aug 16 12:23:19 2021 kern.info kernel: [30178.808251] br-lan: port 1(lan1) entered forwarding state
Mon Aug 16 12:23:19 2021 kern.info kernel: [30178.819063] br-lan: topology change detected, propagating
I guess br-lan: topology change detected, propagating is normal, isn't it? The LAN topology is changing because a new node is online, but mt7530 mdio-bus:1f lan1: Link is Down and then up in a matter of a second doesn't look ok.
So i have some bad news , just tested rc4 on my router and disconnects were so severe that every ~10secs disconnections were happening making it impossible even to establish a pppoe connection. What's more strange is that with rc3(and i think with rc4 the last time i tried , i may remember wrong though) i could use the router just with some occasional disconnections every couple of hours. Connected to a wifi in order to install ethtool and make the tweaks but it didn't help.
For some reason though, once I added those ethtool commands at boot which disable EEE, I am not suffering from this problem anymore, occasionally the log lines show in logread but this frequent disconnection does not take place.
I recompiled my image with OpenWrt master but I removed the EEE patch in the mt7530 firmware: target/linux/generic/pending-5.4/761-net-dsa-mt7530-Support-EEE-features.patch.
Nonetheless, I still see the following in the log:
Mon Sep 27 16:00:32 2021 kern.info kernel: [26125.537587] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:00:35 2021 kern.info kernel: [26128.657842] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:03:17 2021 kern.info kernel: [26290.899954] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:03:21 2021 kern.info kernel: [26294.020221] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:04:09 2021 kern.info kernel: [26342.900625] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:04:14 2021 kern.info kernel: [26347.060843] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:13:32 2021 kern.info kernel: [26905.548205] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:13:35 2021 kern.info kernel: [26908.669070] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:13:58 2021 kern.info kernel: [26931.548524] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:14:00 2021 kern.info kernel: [26933.628830] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:17:21 2021 kern.info kernel: [27134.351156] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:17:25 2021 kern.info kernel: [27138.511534] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Mon Sep 27 16:26:07 2021 kern.info kernel: [27660.598009] mt7530 mdio-bus:1f lan1: Link is Down
Mon Sep 27 16:26:10 2021 kern.info kernel: [27663.718139] mt7530 mdio-bus:1f lan1: Link is Up - 100Mbps/Full - flow control off
Therefore I think we can rule out EEE having to do anything with this.
I did it, I also removed the patch target/linux/generic/pending-5.4/761-net-dsa-mt7530-Support-EEE-features.patch, but I still see the port going down and up. However, it is not causing internet connection issues anymore. I am not really sure what is goind on honestly.
[ 21.055373] IPv6: ADDRCONF(NETDEV_CHANGE): lanveth: link becomes ready
[ 21.069074] IPv6: ADDRCONF(NETDEV_CHANGE): lanbrport: link becomes ready
[ 21.121121] br-lan: port 5(lanbrport) entered blocking state
[ 21.132487] br-lan: port 5(lanbrport) entered disabled state
[ 21.144540] device lanbrport entered promiscuous mode
[ 21.155305] br-lan: port 5(lanbrport) entered blocking state
[ 21.166736] br-lan: port 5(lanbrport) entered forwarding state
[ 21.178843] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[ 22.840366] mt7530 mdio-bus:1f eth1: Link is Up - 100Mbps/Full - flow control off
[ 22.855670] br-lan: port 1(eth1) entered blocking state
[ 22.866223] br-lan: port 1(eth1) entered forwarding state
[ 24.216238] mt7530 mdio-bus:1f eth0: Link is Up - 1Gbps/Full - flow control off
[ 24.230868] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 58.808144] mt7530 mdio-bus:1f eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[ 58.823120] br-lan: port 2(eth2) entered blocking state
[ 58.833587] br-lan: port 2(eth2) entered forwarding state
[ 5887.300776] mt7530 mdio-bus:1f eth1: Link is Down
[ 5887.310673] br-lan: port 1(eth1) entered disabled state
[ 5890.373009] mt7530 mdio-bus:1f eth1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 5890.388065] br-lan: port 1(eth1) entered blocking state
[ 5890.398576] br-lan: port 1(eth1) entered forwarding state
This looks consistent with what I am having too.
Are you having internet connection issues because of this? I mean temporary hickups.
@castiel652 regarding the indicated patch, how can this be applied to OpenWrt?
BTW do you know the subject of the thread on the mailing list so I can take a look at it please?