OpenWrt Forum Archive

Topic: [SOLVED]lan interface(eth1) goes down and up when a lan-port gets up

The content of this topic has been archived on 20 Apr 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

I am running DESIGNATED DRIVER(Bleeding Edge, r48016) on WR941ND_v5 and having this problem:

[ 24.857657] eth0: link up (100Mbps/Full duplex)
[ 24.862385] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 25.095483] br-lan: port 2(wlan0) entered forwarding state
[ 44.555479] random: nonblocking pool is initialized
[ 6056.204855] eth1: link down
[ 6056.207808] br-lan: port 1(eth1) entered disabled state
[ 6056.705953] eth1: link up (1000Mbps/Full duplex)
[ 6056.710786] br-lan: port 1(eth1) entered forwarding state
[ 6056.716480] br-lan: port 1(eth1) entered forwarding state
[ 6058.714209] br-lan: port 1(eth1) entered forwarding state
[18610.676741] eth1: link down
[18610.679693] br-lan: port 1(eth1) entered disabled state
[18611.177835] eth1: link up (1000Mbps/Full duplex)
[18611.182688] br-lan: port 1(eth1) entered forwarding state
[18611.188367] br-lan: port 1(eth1) entered forwarding state
[18613.186107] br-lan: port 1(eth1) entered forwarding state
[68589.590802] eth1: link down
[68589.593772] br-lan: port 1(eth1) entered disabled state
[68590.091962] eth1: link up (1000Mbps/Full duplex)
[68590.096798] br-lan: port 1(eth1) entered forwarding state
[68590.102476] br-lan: port 1(eth1) entered forwarding state
[68592.099886] br-lan: port 1(eth1) entered forwarding state
[89785.554919] eth1: link down
[89785.557880] br-lan: port 1(eth1) entered disabled state
[89786.055996] eth1: link up (1000Mbps/Full duplex)
[89786.060863] br-lan: port 1(eth1) entered forwarding state
[89786.066556] br-lan: port 1(eth1) entered forwarding state
[89788.064237] br-lan: port 1(eth1) entered forwarding state
[98091.536793] eth1: link down
[98091.539788] br-lan: port 1(eth1) entered disabled state
[98092.037904] eth1: link up (1000Mbps/Full duplex)
[98092.042759] br-lan: port 1(eth1) entered forwarding state
[98092.048449] br-lan: port 1(eth1) entered forwarding state
[98094.046142] br-lan: port 1(eth1) entered forwarding state

My eth1 is bound to lan ports, as specified in the /etc/config/network:

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fda3:7e09:64d1::/48'

config interface 'lan'
        option type 'bridge'
        option ifname 'eth1'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'

config interface 'wan'
        option ifname 'eth0'
        option proto 'dhcp'

config interface 'wan6'
        option ifname 'eth0'
        option proto 'dhcpv6'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option ports '0 1 2 3 4'

After some trials I have found that the eth1 goes down and up when a lan-port link gets up(active), that is, when one of local hosts under the router boots up or resumes from sleep. Not always but it's the only case for me.

I don't know it's a normal behavior or not. But, I am worried if it might affect the connections of the other hosts under the router.

Thanks.

PS
The logread at the moment of booting a local host is like:

Thu Jan 14 11:30:44 2016 kern.info kernel: [488789.171236] eth1: link down
Thu Jan 14 11:30:44 2016 kern.info kernel: [488789.174310] br-lan: port 1(eth1) entered disabled state
Thu Jan 14 11:30:44 2016 daemon.notice netifd: Network device 'eth1' link is down
Thu Jan 14 11:30:45 2016 kern.info kernel: [488789.672318] eth1: link up (1000Mbps/Full duplex)
Thu Jan 14 11:30:45 2016 kern.info kernel: [488789.677296] br-lan: port 1(eth1) entered forwarding state
Thu Jan 14 11:30:45 2016 kern.info kernel: [488789.683075] br-lan: port 1(eth1) entered forwarding state
Thu Jan 14 11:30:45 2016 daemon.notice netifd: Network device 'eth1' link is up
Thu Jan 14 11:30:47 2016 kern.info kernel: [488791.680579] br-lan: port 1(eth1) entered forwarding state
Thu Jan 14 11:30:49 2016 daemon.info dnsmasq-dhcp[1128]: DHCPREQUEST(br-lan) 192.168.1.2 xx:xx:xx:xx:xx:xx
Thu Jan 14 11:30:49 2016 daemon.info dnsmasq-dhcp[1128]: DHCPACK(br-lan) 192.168.1.2 xx:xx:xx:xx:xx:xx home
Thu Jan 14 11:30:50 2016 daemon.warn odhcpd[788]: DHCPV6 CONFIRM IA_NA from 0001000118f13f3cxxxxxxxxxxxx on br-lan: not on-link fda3:7e09:64d1::2/128
Thu Jan 14 11:30:50 2016 daemon.warn odhcpd[788]: DHCPV6 SOLICIT IA_NA from 0001000118f13f3cxxxxxxxxxxxx on br-lan: ok fda3:7e09:64d1::2/128
Thu Jan 14 11:30:50 2016 daemon.info dnsmasq[1128]: read /etc/hosts - 4 addresses
Thu Jan 14 11:30:50 2016 daemon.info dnsmasq[1128]: read /tmp/hosts/odhcpd - 1 addresses
Thu Jan 14 11:30:50 2016 daemon.info dnsmasq[1128]: read /tmp/hosts/dhcp - 4 addresses
Thu Jan 14 11:30:50 2016 daemon.info dnsmasq-dhcp[1128]: read /etc/ethers - 0 addresses
Thu Jan 14 11:30:51 2016 daemon.warn odhcpd[788]: DHCPV6 REQUEST IA_NA from 0001000118f13f3cxxxxxxxxxxxx on br-lan: ok fda3:7e09:64d1::2/128
Thu Jan 14 11:30:51 2016 daemon.info dnsmasq[1128]: read /etc/hosts - 4 addresses
Thu Jan 14 11:30:51 2016 daemon.info dnsmasq[1128]: read /tmp/hosts/odhcpd - 2 addresses
Thu Jan 14 11:30:51 2016 daemon.info dnsmasq[1128]: read /tmp/hosts/dhcp - 4 addresses
Thu Jan 14 11:30:51 2016 daemon.info dnsmasq-dhcp[1128]: read /etc/ethers - 0 addresses

(Last edited by dzchoi on 25 Jan 2016, 02:29)

Try

        option force_link '1'

on the 'lan' interface. It's there by default in BB and CC. However, I'm not running trunk, I'm not sure if it should be there in DD.

Yes, the document https://wiki.openwrt.org/doc/uci/network says force_link is set to 1 by default if option proto is 'static', which is my case.
Anyway, I will give it a try.

Thanks.

(Last edited by dzchoi on 14 Jan 2016, 04:56)

Unfortunately it's not working with the /etc/config/network containing:

config interface 'lan'
        option type 'bridge'
        option ifname 'eth1'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '60'
        option force_link '1'

A little more experiments have revealed that this problem also happens on a lan port down as well as a lan port up, not always but mostly.

Specifically, I have a two PCs A and B under WR941ND_V5. A is plugged in port 1 and B is plugged in port 2. Then turn on first A and then B (this order is important). Now turn off A with B left on. The dmesg (and logread) will read "eth1 down" and then "eth1 up" after 0.5s, such as in the above log.

Digging into the source code, I have found that the problem is related with the function link_function() @ag71xx_ar7240.c.

static void link_function(struct work_struct *work) {
        struct ag71xx *ag = container_of(work, struct ag71xx, link_work.work);
        struct ar7240sw *as = ag->phy_priv;
        unsigned long flags;
        u8 mask;
        int i;
        int status = 0;

        mask = ~as->swdata->phy_poll_mask;
        for (i = 0; i < AR7240_NUM_PHYS; i++) {
                int link;

                if (!(mask & BIT(i)))
                        continue;

                link = ar7240sw_phy_read(ag->mii_bus, i, MII_BMSR);
                if (link & BMSR_LSTATUS) {
                        status = 1;
                        break;  // to be commented out for work-around
                }
        }

        spin_lock_irqsave(&ag->lock, flags);
        if (status != ag->link) {
                ag->link = status;
                ag71xx_link_adjust(ag);
        }
        spin_unlock_irqrestore(&ag->lock, flags);

        schedule_delayed_work(&ag->link_work, HZ / 2);
}

My understanding is that:
- this function is invoked at every 0.5s(= HZ / 2) to check the status change for lan ports.
- it declares eth1 is up (status = 1) if any of the lan ports is up(active), or
- declares eth1 is down (status = 0) if all the lan ports are down(inactive).
- it checks only the first active port among the four ports and skips the remaining ports for a good and efficient reason.
- thus, in my scenario above, when PC A in port 1 is turned on and later PC B in port 2 is on, it never checks the status for port 2.
- PC A is now turned off, then it detects this by checking port 1, then checks the port 2 for the first time, and gets some weird result (link == 0x7969) from the call ar7240sw_phy_read(ag->mii_bus, i, MII_BMSR), which denotes an inactive (because BMSR_LSTATUS == 0x4 and (link & BMSR_LSTATUS) == 0). Since all the ports are now inactive, it finally declares eth1 is down.
- at the next turn after 0.5s, it checks the status for port 1 which is down, and checks again the status for port 2. And this second time of calling ar7240sw_phy_read() for port 2, however, gives the result of link == 0x796d that denotes an active. Since port 2 is active this time, it declares eth1 is up.
- Once it reads 0x796d for port 2, it keeps reading the same afterwards.

I worked around the problem by commenting out(that is, deleting) the line "break;" in the source code, making the link_function() to check at every invoke all the ports, not only the first active port. And this worked!

I am not good at handling the HW of ar7240, and don't know why the results from the first and the second calling of ar7240sw_phy_read() differ, and don't know the exact pin-point solution retaining the same efficiency with checking only the first active port.

Note:
- The patch "[OpenWrt-Devel] [PATCH 2/3][RFC] ar7240: report port link state changes to kernel log" from https://lists.openwrt.org/pipermail/ope … 29765.html will also patch this problem, since having the same effect of checking all the lan ports at each invoke.
- The swconfig utility has a minor bug(?) mapping lan ports incorrectly; the port 1 displayed is actually the physical port 4, the port 2 displayed is the physical port 1, the port 3 displayed is the physical port 2, and the port 4 displayed is the physical port 3.

Thanks.
-dizzy

(Last edited by dzchoi on 25 Jan 2016, 06:17)

Some notes for who does not want to apply the work-around or the patch "[OpenWrt-Devel] [PATCH 2/3][RFC] ar7240: report port link state changes to kernel log":

- if you connect only one (or no) PC in any lan port, you're ok.
- if you connect two or more PCs in lan ports, connect the always-on PC into the (physical) port 1; I mean always-on by not-sleeping such as NAS. A sleeping one, if WOL(Wake-On-Lan) is enabled, indeed has a non-sleeping NIC inside running with 10Mbps, but it does not help because the going in and out of sleep have some interval of port down(inactive) before changing the link status, which was my case exactly.
- if you connect two or more PCs and both(all) of them may change the link status, and you do not want to apply the work-around or the patch, connect the least status-changing one into the (physical) port 1, then you will reduce the affects of eth1 down and up.

And also note that wireless devices under the router have nothing to do with eth1, thus they do not affect (and are not affected by) this problem.

Thank.
-dizzy

(Last edited by dzchoi on 25 Jan 2016, 07:46)

Is there a way to fix this problem through LuCI? Or maybe a firmware version I can flash that has this bug patched up?

No. There is no way to fix it through LuCI because the kernel needs to be patched from scratch and recompiled.

No. There are no downloadable firmware versions (neither daily snapshots nor named releases) in https://downloads.openwrt.org. I would have to build the patched one myself.

dzchoi wrote:

No. There is no way to fix it through LuCI because the kernel needs to be patched from scratch and recompiled.

No. There are no downloadable firmware versions (neither daily snapshots nor named releases) in https://downloads.openwrt.org. I would have to build the patched one myself.

That's unfortunate. This bug has been extremely annoying. I guess it's time to just go buy the edgerouter or look into DD-WRT.

Thank you for your time.

The discussion might have continued from here.