The eth1 doesn’t go to “link down” when unplugged, goes to 10Mb/s half duplex, thus won’t failover - mii-tool & ethtool don’t agree

Hi there!

I’ve got the strangest problem, yet exceedingly frustrating. When I unplug from the WAN (eth1) port, my eth1 adapter - according to the kernel - switches from the usual 100 Mb/sec full duplex link up state, into a 10 Mb / sec half duplex link up state. I am expecting it to go into link down. The resulting problem is that the kernel doesn’t do all the fancy things its supposed to do like fail over, etc, to the other network connections, because it doesn’t detect that its down.

Reproduced with:
Gateworks - GW-2355
ixp4xx processor
18.06.4 binaries from the repo (ixp4xx-avila-zImage + generic-squashfs.img)
custom build as well for production based on 18.06.4

Now I’ve spent a large chunk of hours on this and here’s what I’ve found thus far:

mii-tool - correctly reports the link going down as you can see:

root@OpenWrt:/# mii-tool -w eth1
02:25:29 eth1: negotiated 100baseTx-FD, link ok
02:25:35 eth1: no link
[  355.944209] eth1: link up, speed 10 Mb/s, half duplex
[  364.258169] eth1: link up, speed 100 Mb/s, full duplex
02:25:44 eth1: negotiated 100baseTx-FD, link ok
02:25:47 eth1: no link
[  369.454528] eth1: link up, speed 10 Mb/s, half duplex
[  371.533212] eth1: link up, speed 100 Mb/s, full duplex
02:25:51 eth1: negotiated 100baseTx-FD, link ok

However the newer ethtool doesn’t seem to work in the same manner, indicating the link is still up:

root@OpenWrt:/# ethtool  eth1
Settings for eth1:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
        Link partner advertised pause frame use: Symmetric Receive-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 16
        Transceiver: internal
        Auto-negotiation: on
        Link detected: yes
root@OpenWrt:/# [  953.606209] eth1: link up, speed 10 Mb/s, half duplex <---- unplugged it here
root@OpenWrt:/# ethtool  eth1
Settings for eth1:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 10Mb/s
        Duplex: Half
        Port: MII
        PHYAD: 16
        Transceiver: internal
        Auto-negotiation: on
        Link detected: yes

And of course the last line, Link detected: yes is the biggest pain at this point, could really care less about the network speed.

  • I’ve looked at the kernel_menuconfig to ensure the PHY drivers are correct, they are.
  • I’ve tried disabling IPv6 on a hunch the kernel was doing something weird with the link down message
  • Stripped out all the extra Ethernet drivers thinking something was causing the issue
  • Emailed the vendor - Gateworks - they will look into it

I can confirm that the Ethernet driver I’m using is ipx4xx_eth.

I’m just not sure what else to try here. I’m looking for direction on what the differences are between mii-tool and ethtool to help me point to clues as to why this isn’t working. I have this same hardware working on a much older OpenWRT build from about 8 years ago - one of the most reliable systems I’ve seen. I’m being forced by Verizon to move off of 3G to LTE as they have discontinued the ability to activate any new devices on the 3G network, and will turn the network off at the end of the year. I have roughly 750 of these devices - happy to share/ship one if someone has an idea.

Thanks for you time!

Found this workaround on the internet a while back...

Change the Port # to physically where your eth1 is.

Change the delay to your preference.

#!/bin/sh

while true
do

wan_up=`ifstatus wan | awk '/"up":/ { print $2 }'`
wan_status=`swconfig dev switch0 port 5 get link | sed -r 's/.*link:([[:alnum:]]*).*/\1/'`

if [ "$wan_status" = "down" ] && [ "$wan_up" = "true," ]
then
	ifdown wan
	
elif [ "$wan_status" = "up" ] && [ "$wan_up" = "false," ]
then
	ifup wan
	
fi

sleep 5
done

Thanks @sunnymonday but unfortunately that won’t work. This is one of those boards that actually has 3 NIC’s instead of a VLAN’ed switch. In fact when I do a swconfig list, it doesn’t even show any switches. There is a switch on here, but it doesn’t do any of the newer VLAN’ing stuff and its not even logically wired into anything.

Its just odd that the signal for the unplug is getting passed just fine, but it just does the wrong thing to whoever it passes it to. Not sure where that is.

I will keep this in mind for some of the newer routers though, a bruit force workaround for them.

Maybe you could try to output the result of mii-tool over the swconfig in the script. You may need to awk and/or sed to obtain the correct $wan_status, though.

I would be interested to know the outcome, in case I run into a scenario like yours.