Archer C7v5: OpenWrt does not detect loss of WAN connection

I've installed OpenWRT 22.03.3 on an Archer C7v5.

If I unplug the WAN cable, I get the following log message:

Mar 6 19:36:38 kern.info kernel: [17361.932870] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down

If I plug in the WAN cable, I get the following log message:

Mar 6 19:36:54 kern.info kernel: [17377.532335] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up

But there is no ifdown action on unplugging and no ifup action on plug-in. So OpenWRT does not detect the loss of a WAN connection and therefore does not react accordingly.

Is this a bug or a feature?

With my RPI4B + TP UE300 as WAN the link goes down (which triggers an ifdown) resp. the link goes up (which triggers an ifup).

The WAN port it not a single Ethernet device but rather one port of the integrated switch. However, netifd should detect the link loss if you have a WAN interface set up.

Please post

  • a few more lines from your log - what follows the Port 1 is down message?
  • a bit more about your configuration, make sure to redact any personal details. Specifically, the content of /etc/config/network and /tmp/board.json.
Mon Mar  6 19:36:38 2023 kern.info kernel: [17361.932870] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Mon Mar  6 19:36:54 2023 kern.info kernel: [17377.532335] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up

There are just these two lines: one for plug-off, one for plug-in.

Excerpt from /etc/config/network for WAN interface:

config switch_vlan
        option device 'switch0'
        option description 'WAN'
        option vlan '2'
        option vid '2'
        option ports '0t 1'

config device
        option name 'eth0.2'
        option type '8021q'
        option ifname 'eth0'
        option vid '2'

config interface 'wan'
        option device 'eth0.2'
        option proto 'dhcp'
        option peerdns '0'
        list dns '127.0.0.1'

cat /tmp/board.json

{
        "model": {
                "id": "tplink,archer-c7-v5",
                "name": "TP-Link Archer C7 v5"
        },
        "led": {
                "wan": {
                        "name": "WAN",
                        "sysfs": "green:wan",
                        "trigger": "switch0",
                        "type": "switch",
                        "mode": "",
                        "port_mask": "0x02",
                        "speed_mask": ""
                },
                "lan1": {
                        "name": "LAN1",
                        "sysfs": "green:lan1",
                        "trigger": "switch0",
                        "type": "switch",
                        "mode": "",
                        "port_mask": "0x04",
                        "speed_mask": ""
                },
                "lan2": {
                        "name": "LAN2",
                        "sysfs": "green:lan2",
                        "trigger": "switch0",
                        "type": "switch",
                        "mode": "",
                        "port_mask": "0x08",
                        "speed_mask": ""
                },
                "lan3": {
                        "name": "LAN3",
                        "sysfs": "green:lan3",
                        "trigger": "switch0",
                        "type": "switch",
                        "mode": "",
                        "port_mask": "0x10",
                        "speed_mask": ""
                },
                "lan4": {
                        "name": "LAN4",
                        "sysfs": "green:lan4",
                        "trigger": "switch0",
                        "type": "switch",
                        "mode": "",
                        "port_mask": "0x20",
                        "speed_mask": ""
                }
        },
        "switch": {
                "switch0": {
                        "enable": true,
                        "reset": true,
                        "ports": [
                                {
                                        "num": 0,
                                        "device": "eth0",
                                        "need_tag": false,
                                        "want_untag": false
                                },
                                {
                                        "num": 2,
                                        "role": "lan",
                                        "index": 1
                                },
                                {
                                        "num": 3,
                                        "role": "lan",
                                        "index": 2
                                },
                                {
                                        "num": 4,
                                        "role": "lan",
                                        "index": 3
                                },
                                {
                                        "num": 5,
                                        "role": "lan",
                                        "index": 4
                                },
                                {
                                        "num": 1,
                                        "role": "wan"
                                }
                        ],
                        "roles": [
                                {
                                        "role": "lan",
                                        "ports": "2 3 4 5 0t",
                                        "device": "eth0.1"
                                },
                                {
                                        "role": "wan",
                                        "ports": "1 0t",
                                        "device": "eth0.2"
                                }
                        ]
                }
        },
        "network": {
                "lan": {
                        "device": "eth0.1",
                        "protocol": "static"
                },
                "wan": {
                        "device": "eth0.2",
                        "protocol": "dhcp",
                        "macaddr": "b0:be:76:77:aa:f2"
                }
        }
}

Do you need more information?

This looks about right.
I would reset the router to defaults and see what happens with the default configuration.

At the moment the C7v5 is my main/production router and I can't take it offline for more than 5 minutes. So the reset to defaults must wait.

It's a really pity, that OpenWRT on the C7v5 is not able to detect the loss of the WAN connection. The combo RPI4B + UE300 detects the loss.

This ability is an important feature for more. I access the internet via LTE. LTE router is a Zyxel LTE4506. It operates in bridge (or IP pass-through) mode and just works as a simple LTE modem.

My ISP assigns a public IPv4 address via DHCP to the LTE4506 (it's the same address I get on the WAN side of the OpenWRT router). He renews the IP at least every 24 hours.

The LTE4506 signals the loss of the ISP lease with a "link down" on its LAN ports. The OpenWRT router (at least the combo RPI4B+UE300) detects the "link down" and triggers an "ifdown wan".

When the LTE4506 gets a new IP address from the ISP it signals a "link up" on its LAN ports. The OpenWRT router (at least the combo RPI4B+UE300) detects the "link up" and triggers an "ifup wan", which discovers the new IP address.

This "mechanics" work with the combo RPI4B+UE300, but not with the C7v5. :frowning_face:

The following excerpt from the C7v5 logs shows the entries, when the ISP issues a new IP address:

Wed Mar  8 03:06:35 2023 kern.info kernel: [99440.129640] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Wed Mar  8 03:11:41 2023 kern.info kernel: [99745.884945] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up
Wed Mar  8 03:11:44 2023 kern.info kernel: [99749.005204] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Wed Mar  8 03:12:07 2023 kern.info kernel: [99771.884572] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up
Wed Mar  8 03:12:10 2023 kern.info kernel: [99775.004839] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Wed Mar  8 03:12:17 2023 kern.info kernel: [99782.284421] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up

The "link down" resp. the "link up" signals arrive at the C7v5, but they do not trigger an "ifdown wan" resp. "ifup wan" (missing hotplug event).

My current work-around is to start an "ifup wan" via a cron job. I consider this a quick & dirty solution. It works, when the renewal is always at the same time, but not otherwise.

Not detecting the loss of a WAN connection (where the IP address is assigned via DHCP) is IMO a severe bug.

I've made some further investigations and tests regarding the above issue.

There are differences between devices with a switch and those without a switch.

Devices without a switch (tested with RPI4B+UE300, ZeroPi+UE300, GL-AR300M) behave the following way, if the WAN connection is interrupted:

  1. the ethernet device signals "link down"
  2. a hotplug event triggers an "ifdown wan"

Devices without a switch (tested with RPI4B+UE300, ZeroPi+UE300, GL-AR300M) behave the following way, if the WAN connection is reestablished:

  1. the ethernet device signals "link up"
  2. a hotplug event triggers an "ifup wan"

Devices with a switch (tested with Archer C7v5, Archer C2600) behave the following way, if the WAN connection is interrupted:

  1. the ethernet device signals "port down"
  2. no hotplug event appears

Devices with a switch (tested with Archer C7v5, Archer C2600) behave the following way, if the WAN connection is reestablished:

  1. the ethernet device signals "port up"
  2. no hotplug event appears

Devices with a switch use an invalid IP address until the leasetime is over (so there is no internet access available). Then the DHCP client starts several DISCOVER calls and succeeds, if the WAN connection is up again.

In order to keep the interval without internet as small as possible, I think it is necessary to trigger a hotplug event "ifdown wan" resp. "ifup wan".

Perhaps a dev can jump in.

Hi Barney,

It could be a not a bug, but a missing feature. I was not able to find an OpenWrt version with this working.

Basically the switchport status must be monitored for each VLAN or the port down event must trigger a configuration check and link status check of each port belonging to the VLAN. For example it could be more complex - You may have an IPTV box with external IP, thus at least two switchports on the VLAN passing WAN traffic or to detect some LAN ports being down in order to affect the routing.

Here are my findings:

From my syslog it appears that the switchport being up generates some hotplug event, however the VLAN interface was already up at startup:

With no cable:

Tue Apr 11 11:47:04 2023 daemon.notice netifd: Interface 'wan' is enabled
Tue Apr 11 11:47:04 2023 daemon.notice netifd: Interface 'wan' is setting up now
Tue Apr 11 11:47:04 2023 daemon.notice netifd: Interface 'wan' is enabled
Tue Apr 11 11:47:04 2023 daemon.notice netifd: Interface 'wan' is setting up now
Tue Apr 11 11:47:04 2023 daemon.notice netifd: Interface 'wan' is now up
Tue Apr 11 11:47:04 2023 daemon.notice netifd: VLAN 'eth0.2' link is up
Tue Apr 11 11:47:04 2023 daemon.notice netifd: Interface 'wan' has link connectivity
Tue Apr 11 11:47:17 2023 user.warn mwan3-hotplug[3271]: hotplug called on wan before mwan3 has been set up
Tue Apr 11 11:47:18 2023 user.notice firewall: Reloading firewall due to ifup of wan (eth0.2)

Taken down after 10s via custom ifup script:

Tue Apr 11 11:47:29 2023 daemon.notice netifd: VLAN 'eth0.2' link is down
Tue Apr 11 11:47:29 2023 daemon.notice netifd: Interface 'wan' has link connectivity loss

Cable plugged-in after 5 min:

Tue Apr 11 11:52:31 2023 kern.info kernel: [ 362.710369] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 5 is up
Tue Apr 11 11:52:39 2023 daemon.notice netifd: VLAN 'eth0.2' link is up
Tue Apr 11 11:52:39 2023 daemon.notice netifd: Interface 'wan' has link connectivity
Tue Apr 11 11:52:39 2023 kern.info kernel: [ 371.629305] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.2: link becomes ready

If disabled and re-enabled via ifup script with iplink, eth0.2 disappears, when using static addressing and will remain down until the cable is plugged, so the physical link is definitely detectable.

Using option force_link '0' does not appear to change the WAN link connectivity behavior on startup or hotplug (at least for me).

As an interim solution for your case, you may use some (ifup) script to monitor the port via swconfig/ethtool or syslog better than relying on a cron job. However I do not think any of these could be used as a WAN tracking solution, since will not detect any forwarding failure and will do nothing to re-route if you have a backup link.
Another theoretical solution is the switch registers to be modified in order to get the second RGMII interface enabled (if wired) and to convert the port to a routed WAN. This may impact the performance in case of hardware NAT offload over the switch chip.

Thanks a lot for your detailed reply.

The only solution I see now is to periodically ping an internet IP (e.g. 8.8.8.8) via a cron job. But IMHO I consider this an ugly, quick & dirty hack.

Perhaps someone else has a better solution.

I think doing ping is a better workaround approach, since the port flapping of the provider modem also looks like an ugly workaround of a poor implementation. My script appears to be working - possibly there is an ifup hotplug event which somehow re-enables the interface. However sometimes it starts flapping on link permanently down if the protocol is DHCP, so needs some additional refinement. After certain amount of short interval pings 8.8.8.8 may start filtering-out the requests, so the gateway, the DNS, or some device over the uplink path is a better target. In my case I have a backup connection, so I am going to track for a forwarding failure over the primary uplink. This is my current approach to monitor the cable presence:

# cat /etc/hotplug.d/iface/99-wan
#!/bin/sh
[ "$ACTION" = "ifup" -a "$INTERFACE" = "wan" ] && {
while :
do
sleep 5s
if ! $( swconfig dev mdio.0 port 1 show | grep -q up )
then
ip link set eth0.2 down
else
ip link set eth0.2 up
fi
done
}
exit 0

It may need some adjustment if swconfig is not working anymore.

Thanks for your proposal.

I can't use it as hotplug script, because the port up/down events doesn't trigger a hotplug event.

Querying the port status via swconfig may be a base for a cron job. I'll give it a try and report back.

watchcat will do a ping test and then an action.

1 Like

Thanks for your reply. I'll investigate your proposal. Sounds better than a cron job.

There should be an initial hotplug event on startup if the interface is not stated manually. This will trigger the script, which will trigger the further interface states. Hotplug should be visible over the syslog — for example reloading the firewall on ifup — It won't function properly and will probably miss the "eth0.2" from the firewall rules if broken.

Wed Apr 12 19:55:17 2023 user.notice firewall: Reloading firewall due to ifup of lan (br-lan)
Wed Apr 12 19:55:25 2023 user.notice firewall: Reloading firewall due to ifup of usb (br-usb)
Wed Apr 12 19:55:27 2023 user.notice firewall: Reloading firewall due to ifup of wan (eth0.2)
Wed Apr 12 19:55:39 2023 user.notice firewall: Reloading firewall due to ifup of wwan (wlan2gs)

You may log or redirect $ACTION and $INTERFACE from a hotplug script to a log file or /dev/kmsg:

echo $ACTION $INTERFACE >> /tmp/hotplug.log

You may try both "iface" and "net" folders.

Here is the refined version of the workaround script, less relying on swconfig and with no timers:

#!/bin/sh
port=1
interface=wan

[ "$ACTION" = "ifup" -a "$INTERFACE" = "$interface" ] && {
if [ -e /sbin/swconfig ]
then
        if $( swconfig dev mdio.0 port $port show | grep -q down )
        then
                ifdown $interface
                exit 0
        fi
else
        logger -p 4 Package: swconfig missing! Please install. The $( awk 'BEGIN{print toupper(ARGV[1])}' $interface ) interface may show up with no cable and the static IP if configured!
fi
action1=ifdown
state1=down
}

[ "$ACTION" = "ifdown" -a "$INTERFACE" = "$interface" ] && {
action1=ifup
state1=up
}

[ "$INTERFACE" = "$interface" ] && {
sh -c 'echo "$$"; exec logread -f -z 0 -e  mdio.0:00: Port $port' | (
        read pid
        grep -m1 $state1 || exit
        $action1 $interface
        kill "$pid" 2> /dev/null
        true)
}

exit 0

If hotplug is really not working it could be approached via init.d, but this won't work directly, since the loop is handled via hotplug.

Tried both, no hotplug event triggered.

You may try with "/etc/init.d/network restart". If not working something should be broken, or the automatic startup of the interface is disabled which may affect hotplug. The interface could be missing from the firewall therefore. Basically if you are seeing an interface as "UP" there should be some event present.

I noticed if the protocol is DHCP that "ip link set eth0.2 down" sets it down and up, generating hotplug events, thus the flapping experience at the first attempt. However, I am not running the exactly same version.

The question is not, whether an interface is up or down, but whether the port goes down, while the interface is still up. In that case hotplug does not trigger an event.

At the moment I'm developing and testing a script, which watches the port status of the WAN interface by querying via swconfig (see your proposal in a previous post). I'll report back, when I've successfully finished the tests.

If you are adventurous, you can try a snapshot build with this PR applied: https://github.com/openwrt/openwrt/pull/4622 - if I'm not mistaken, the C7v5 is covered by it.

This converts the switch driver to DSA and you can use any port as WAN port without VLAN setup. Port up/down are properly handled as network up/down in DSA config (just tested this behavior on my Fritz!7520 with the DSA driver).

There is a difference between hotplug being broken and the interface being always up. The latter is likely the expected behaviour and won't generate any further events. Consider it like if the wan port of a regular router being connected to an external switch and trying to monitor the remaining switch ports via the router itself. The same happens, if you are missing the code required to monitor the driver state. My concern was that your log does not have consistent downtime present, so the swconfig timer must be very short and still can miss it. If using logread or tail, the output can be piped, so it won't miss.

This should work for releasing/renewing the DHCP:

#!/bin/sh
while :
do
sh -c 'echo "$$"; exec logread -f -z 0 -e  "mdio.0:00: Port 1"' | (
        read pid
        grep -m1 down || exit
        PID=`pidof udhcpc` && kill -SIGUSR2 $PID && kill -SIGUSR1 $PID
        kill "$pid" 2> /dev/null
        true)
done &

It can be placed in /etc/rc.local

I've developed and tested the following script:

#!/bin/sh

# customize these variables, SwitchDevice and WanPortNumber are for Archer C2600
LogFile="/tmp/${0##*/}.log"
CheckIntervall='15s'
SwitchDevice='37000000.mdio-mii'
WanPortNumber='5'

WriteLog() {
echo "$(date '+%F.%H-%M-%S'): $*" >> "$LogFile"
}

GetPortStatus() {
echo "$($SWCONFIG dev $SwitchDevice port $WanPortNumber show | grep link | cut -f3 -d' ' | cut -f2 -d:)"
}

SWCONFIG="$(type -p swconfig)"

if [ -z "$SWCONFIG" ]
   then WriteLog "missing swconfig ... exiting"
	exit 1
fi

LastStatus="$(GetPortStatus)"
WriteLog "Starting $0"
WriteLog "Current Port Status = $LastStatus"

while true
do sleep "$CheckIntervall"
   CurrentStatus="$(GetPortStatus)"
   if [ "$CurrentStatus" != "$LastStatus" ]
      then case "$CurrentStatus" in
	     down)
		ifdown wan
		LastStatus="$CurrentStatus"
		WriteLog "Current Port Status = $CurrentStatus"
		WriteLog "ifdown wan"
		;;
	     up)
		ifup wan
		LastStatus="$CurrentStatus"
		WriteLog "Current Port Status = $CurrentStatus"
		WriteLog "ifup wan"
		;;
	     *)
		WriteLog "Unknown status: $CurrentStatus"
		;;
	   esac
   fi
done

For me it works for my Archer C2600 and C7v5. Feel free to use it.

2 Likes

Thanks for your hint. I prefer stable releases and (regarding this matter) can wait until snapshot becomes stable. Until then I use my script posted a few minutes ago.