TL-WR1043NDv2 WAN connection and reboots

I'm running 18.06.4 on a TP-Link TL-WR1043NDv2. Nothing fancy with my setup. Simply an Arris TM1602 cable modem plugged into the WAN interface, and several LAN clients of course.

I've had OpenWrt installed for probably about a year now, and I've always had one nagging issue. If the router is rebooted / power cycled, the WAN connection is lost until the network cable is physically unplugged and plugged back in. If the WAN interface is disconnected, the router then rebooted, and the WAN interface reconnected-- that works fine. If the cable modem is rebooted independently of the router, that's no problem either. The only situation that consistently shows the issue is when the router is rebooted with the WAN interface physically connected.

I've been trying to determine if this is an issue with the router or modem. I've tried (through ssh) restarting the network service, bringing down/up the WAN interface, and even delaying the start of the WAN interface on boot up to no avail. When in the broken state, I cannot ping the router (192.168.100.1) until I physically cycle the network cable.

Here are the WAN entries from /etc/config/network:

config interface 'wan'
        option ifname 'eth0'
        option proto 'dhcp'
        option peerdns '0'
        option dns '1.1.1.1 1.0.0.1'

config interface 'wan6'
        option ifname 'eth0'
        option proto 'dhcpv6'
        option reqaddress 'try'
        option reqprefix 'auto'
        option peerdns '0'
        option dns '2606:4700:4700::1111 2606:4700:4700::1001'

If I reboot the router, here are the DHCP-related message I'm seeing using logread:

root@LEDE:~# logread | grep dhcp
Fri Nov 15 13:30:30 2019 daemon.info dnsmasq[770]: read /tmp/hosts/dhcp.cfg01411c - 0 addresses
Fri Nov 15 13:30:35 2019 user.notice ucitrack: Setting up /etc/config/network reload dependency on /etc/config/dhcp
Fri Nov 15 13:30:36 2019 user.notice ucitrack: Setting up /etc/config/dhcp reload dependency on /etc/config/odhcpd
Fri Nov 15 13:30:38 2019 daemon.notice netifd: wan (1126): udhcpc: started, v1.28.4
Fri Nov 15 13:30:38 2019 daemon.err odhcp6c[1127]: Failed to send RS (Address not available)
Fri Nov 15 13:30:38 2019 daemon.notice netifd: wan (1126): udhcpc: sending discover
Fri Nov 15 13:30:39 2019 daemon.info odhcpd[930]: Using a RA lifetime of 0 seconds on br-lan
Fri Nov 15 13:30:39 2019 daemon.notice odhcpd[930]: Failed to send to ff02::1%br-lan (Address not available)
Fri Nov 15 13:30:39 2019 daemon.err odhcp6c[1127]: Failed to send DHCPV6 message to ff02::1:2 (Address not available)
Fri Nov 15 13:30:41 2019 daemon.notice netifd: wan (1126): udhcpc: sending discover
Fri Nov 15 13:30:43 2019 daemon.info dnsmasq-dhcp[1385]: DHCP, IP range 192.168.1.100 -- 192.168.1.249, lease time 12h
Fri Nov 15 13:30:43 2019 daemon.info dnsmasq[1385]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Fri Nov 15 13:30:43 2019 daemon.info dnsmasq-dhcp[1385]: read /etc/ethers - 0 addresses
Fri Nov 15 13:30:43 2019 daemon.info dnsmasq[1385]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Fri Nov 15 13:30:43 2019 daemon.info dnsmasq-dhcp[1385]: read /etc/ethers - 0 addresses
Fri Nov 15 13:30:44 2019 daemon.notice netifd: wan (1126): udhcpc: sending discover

So udhcpc is showing "Failed to send RS (Address not available)" and a similar DHCPV6 message as well. As soon as I cycle the network cable, udhcpc immediately retrieves an IP from the modem.

Here's the ifstatus output for the interface (when in the broken state of course):

root@LEDE:~# ifstatus wan
{
        "up": false,
        "pending": true,
        "available": true,
        "autostart": true,
        "dynamic": false,
        "proto": "dhcp",
        "device": "eth0",
        "data": {

        }
}

And finally, the ip addr show output for eth0:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether c0:4a:00:f3:93:f3 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::c24a:ff:fef3:93f3/64 scope link
       valid_lft forever preferred_lft forever

I'm not knowledgeable enough to interpret all of the above, but my mostly-uneducated guess is that the physical re-connection of the network cable is sending some sort of event that suddenly makes either device aware of the other.

I'm a little lost on where to look next. Any suggestions or ideas of what could be going on?

Could you run again the logread grepping the interface name instead? It is interesting to see if the eth0 is detected to go up/down and trigger the hotplug scripts.
Also what is the output of ifconfig eth0 ?
Have you tried a different cable?

Here's the logread after rebooting the router. On a side note, the timestamps are wrong, so please ignore :slight_smile:

root@LEDE:~# logread | grep eth0
Fri Nov 15 15:45:11 2019 kern.info kernel: [    2.221114] eth0: Atheros AG71xx at 0xb9000000, irq 4, mode:RGMII
Fri Nov 15 15:45:19 2019 kern.info kernel: [   20.846158] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Fri Nov 15 15:45:20 2019 kern.info kernel: [   21.899713] eth0: link up (1000Mbps/Full duplex)
Fri Nov 15 15:45:20 2019 kern.info kernel: [   21.904567] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Fri Nov 15 15:45:20 2019 daemon.notice netifd: Network device 'eth0' link is up

And after I cycled the network cable:

Fri Nov 15 15:47:22 2019 user.notice firewall: Reloading firewall due to ifup of wan (eth0)
Fri Nov 15 15:47:36 2019 user.notice firewall: Reloading firewall due to ifup of wan6 (eth0)

Here's the ifconfig output for eth0 in the broken state:

root@LEDE:~# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr C0:4A:00:F3:93:F3
          inet6 addr: fe80::c24a:ff:fef3:93f3/64 Scope:Link
          inet6 addr: fe80::c24a:ff:fef3:93f3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3595 errors:0 dropped:0 overruns:0 frame:0
          TX packets:44 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:217748 (212.6 KiB)  TX bytes:10857 (10.6 KiB)
          Interrupt:4

I haven't tried a different cable yet, but I'll give it a shot. That didn't immediately come to mind since it works fine after re-connecting it, but it won't hurt to try.

I find it strange that the netifd didn't detect the jack exercising.
I tried the same on my router and it immediately detected it:

Sun Nov 17 00:16:06 2019 daemon.notice netifd: Network device 'eth0' link is down
Sun Nov 17 00:16:06 2019 daemon.notice netifd: Interface 'wan' has link connectivity loss
Sun Nov 17 00:16:06 2019 kern.info kernel: [182994.186853] Atheros AR8216/AR8236/AR8316 ag71xx-mdio.0:00: Port 2 is up
Sun Nov 17 00:16:06 2019 kern.info kernel: [182994.186929] Atheros AR8216/AR8236/AR8316 ag71xx-mdio.0:00: Port 3 is up
Sun Nov 17 00:16:06 2019 kern.info kernel: [182994.186999] Atheros AR8216/AR8236/AR8316 ag71xx-mdio.0:00: Port 4 is up
Sun Nov 17 00:16:06 2019 kern.info kernel: [182994.187136] eth0: link down
Sun Nov 17 00:16:06 2019 daemon.notice netifd: Interface 'wan6' is now down
Sun Nov 17 00:16:06 2019 daemon.notice netifd: Interface 'wan6' is disabled
Sun Nov 17 00:16:06 2019 daemon.notice netifd: Network alias '' link is down
Sun Nov 17 00:16:06 2019 daemon.notice netifd: Interface 'wan6' has link connectivity loss
....
Sun Nov 17 00:16:12 2019 daemon.notice netifd: Network device 'eth0' link is up
Sun Nov 17 00:16:12 2019 daemon.notice netifd: Interface 'wan' has link connectivity
Sun Nov 17 00:16:12 2019 daemon.notice netifd: Interface 'wan' is setting up now

I suppose you have not messed with any "force link" options or hotplug, right?

My previous grep was for "eth0", but here it is for "wan" (with my public IP obfuscated):

Fri Nov 15 15:45:19 2019 daemon.notice netifd: Interface 'wan' is enabled
Fri Nov 15 15:45:19 2019 daemon.notice netifd: Interface 'wan6' is enabled
Fri Nov 15 15:45:20 2019 daemon.notice netifd: Interface 'wan' has link connectivity
Fri Nov 15 15:45:20 2019 daemon.notice netifd: Interface 'wan' is setting up now
Fri Nov 15 15:45:20 2019 daemon.notice netifd: Interface 'wan6' has link connectivity
Fri Nov 15 15:45:20 2019 daemon.notice netifd: Interface 'wan6' is setting up now
Fri Nov 15 15:45:21 2019 daemon.notice netifd: wan (1230): udhcpc: started, v1.28.4
Fri Nov 15 15:45:22 2019 daemon.notice netifd: wan (1230): udhcpc: sending discover
Fri Nov 15 15:45:25 2019 daemon.notice netifd: wan (1230): udhcpc: sending discover
Fri Nov 15 15:45:28 2019 daemon.notice netifd: wan (1230): udhcpc: sending discover
Fri Nov 15 15:47:46 2019 daemon.notice netifd: wan (1230): udhcpc: sending select for x.x.x.x
Fri Nov 15 15:47:49 2019 daemon.notice netifd: wan (1230): udhcpc: sending select for x.x.x.x
Fri Nov 15 15:47:49 2019 daemon.notice netifd: wan (1230): udhcpc: lease of x.x.x.x obtained, lease time 51161
Fri Nov 15 15:47:49 2019 daemon.notice netifd: Interface 'wan' is now up
Fri Nov 15 15:47:50 2019 user.notice firewall: Reloading firewall due to ifup of wan (eth0)
Sat Nov 16 18:19:21 2019 daemon.notice netifd: Interface 'wan6' is now up
Sat Nov 16 18:19:21 2019 user.notice firewall: Reloading firewall due to ifup of wan6 (eth0)

The cable was cycled at the 15:47 timestamp. Is the above what you'd expect to see? The force link options are off, though I did experiment with enabling them yesterday just to see if there was any difference, which there wasn't. Haven't messed with hotplug at all.

I installed tcpdump to see if I could capture anything interesting on the eth0 interface. The only traffic relevant was the DHCP requests from the router. When I cycled the cable, there was a 1-2 second delay in the terminal output, and then the reply messages. Not sure this tells me much:

15:47:31.387584 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:34.390985 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:37.394312 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:40.397770 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:43.401125 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:46.322711 IP6 fe80::217:10ff:fe88:7406 > ff02::1: ICMP6, router advertisement, length 192
15:47:46.404471 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:46.501592 IP x.x.x.x.67 > x.x.x.x.68: BOOTP/DHCP, Reply, length 301
15:47:46.502215 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:49.282901 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from c0:4a:00:f3:93:f3, length 300
15:47:49.366273 IP x.x.x.x.67 > x.x.x.x.68: BOOTP/DHCP, Reply, length 301

Tcpdump wouldn't say much there anyway. I am puzzled that tcpdump was not killed when you took out the cable. Did you run something like tcpdump -i eth0 ... ?

Maybe a hardware issue with the port itself? One or more contact points being stuck so there's barely any pressure difference when you insert or remove the cable? Did you try with another cable as well?

If the internal switch allows it you could repurpose a LAN port for WAN use (and vice versa if you need them all) to see of that works.

1 Like

@Trendy-- yes, I ran tcpdump with "-n -i eth0".

@borromini -- I tried another cable with the same result, so I devised some tests to help narrow it down even further.

I connected a laptop running udhcpd to the WAN port of the router, and saw the router successfully lease an IP from the laptop. I power cycled the router, and on boot it successfully leased another IP as expected.

From there I killed udhcpd and connected the laptop directly to the cable modem. It of course successfully leased an IP, and did so again on reboot. The ISP has no problem doling out an IP address to my laptop on every reboot, so it makes me think there's no DHCP throttling on their side.

Next, I tried disconnecting the coax from the cable modem. The cable modem issued a private IP to the router, and the router was able to successfully request another on reboot without cycling the network cable. Not sure if this test was really relevant since it dealt with the cable modem's internal DHCP.

I updated the VLAN and mapped a LAN port to the WAN interface. No change, same behavior as with the WAN port.

Finally, I placed a simple unmanaged switch between the cable modem and router. With this setup, the problem disappears. I can reboot the router and it always retrieves a WAN IP on boot. I can disconnect the cable modem from the switch, reconnect it, and the router grabs an IP without issue. Rebooting the switch also worked fine.

So the absolute specific scenario the problem occurs is when the router is rebooted while the cable modem is directly connected to the router and is online. What I can't wrap my head around is what the difference is between cycling the network cable and rebooting the router. It seems like, to the cable modem and ISP anyway, the two actions should be identical. I ran tcpdump again in a last ditch effort with full verbose on, and there is ZERO difference between the DHCP requests before and after cycling the network cable. In other words, how in the world would the cable modem / the ISP's DHCP determine any difference between the two?! I can sit there and spam the network cable again and again, and the DHCP requests are always answered. After a reboot though? No response.

I think I might be throwing in the towel on this one. Given all of the above, I'm thinking there must be something going on with the router on reboots that is outside the control of OpenWrt (e.g. something hardware level maybe?). I can't even ping the cable modem after a router restart, but magically can after cycling the network cable.

Any other ideas before I use the router as target practice? :smiley:

It's fixed. The solution was to swap eth0 and eth1. By default, WAN is assigned to eth0 for this router, and others have seen the same issue I encountered.

Whether this is a software or hardware bug, I don't know. It's also baffling to me why a switch in-between the cable modem and router fixed the issue, even while WAN was mapped to eth0.

I rebooted the router several times and it successfully grabs an IP on boot. I'd still like to know the exact root cause, but at least I don't have to fumble around with cables every time the router is rebooted / loses power now :smiley:

3 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.