Intermittently losing WAN

randy · August 26, 2021, 1:27am

Since moving to a new house a few months ago, I've started experiencing occasional losses of WAN connectivity. It will drop unexpectedly (sometimes multiple times a day, sometimes not for several days), and will only be restored after I run /etc/init.d/network restart. Even stranger, when WAN is lost, the local domain names I've configured don't resolve (e.g. "router.lan" stops connecting to the router, but "192.168.1.1" connects just fine).

This is on a TL-WDR4300 v1, currently running OpenWRT 19.07.8 (and the same behaviour happened on 19.07.7). The WAN connection is through DHCP via a DOCSIS modem.

I've checked the system log a few times and can't see anything that gives me a clue. Everything looks normal until ddns_scrips starts throwing errors because it can't connect out to set the IP.

I've heard that brief WAN drops are common in this area, but of course I would like it if OpenWRT recovers from those automatically. Are there any settings to help with automatic reconnect? Is there anything I should look for in the logs that would help solve this? Or should I just create a cron job to ping 1.1.1.1 and restart the network when it fails?

psherman · August 26, 2021, 2:36am

Let's see if you're getting DHCP errors.

logread | grep dhcpc

Please redact the IP address since it may be publicly routable.

Also, it would be good to try running some concurrent ping sessions to help identify where the issue is presenting. I recommend the following:

ping another system on your network
ping your router
ping your cable modem (my Arris modem, for example, has an address of 192.168.100.1)
ping 8.8.8.8

When you see the problem present, you'll hopefully have some idea of where the issue is occurring -- likely on the WAN with DHCP or something upstream of the cable modem.

randy · August 29, 2021, 2:08pm

Thanks for the suggestions, Peter. I finally had a drop sometime overnight, and then another drop shortly after fixing the first one. After the first drop, here's the output of logread | grep dhcpc:

Sun Aug 29 09:32:10 2021 daemon.notice netifd: wan (1325): udhcpc: sending renew to [ip]

Then I ran /etc/init.d/network restart and the second drop happened about an hour later, adding these lines to logread | grep dhcpc:

Sun Aug 29 10:32:36 2021 daemon.notice netifd: wan (1325): udhcpc: received SIGTERM
Sun Aug 29 10:32:43 2021 daemon.notice netifd: wan (17278): udhcpc: started, v1.30.1
Sun Aug 29 10:32:43 2021 daemon.notice netifd: wan (17278): udhcpc: sending discover
Sun Aug 29 10:32:43 2021 daemon.notice netifd: wan (17278): udhcpc: sending select for [ip]
Sun Aug 29 10:32:43 2021 daemon.notice netifd: wan (17278): udhcpc: lease of [ip] obtained, lease time 107968

To me, these look like dhcpc going about its normal business, but let me know if you think otherwise.

As to your other questions, when the WAN is down:

I can ping the IPs of other systems on the local network.
I can ping the IP of the router.
I can connect to the IP of the modem (it does not respond to pings at any time, but I can reach its admin login page at 192.168.100.1 even when the WAN is down).
I cannot ping 8.8.8.8 (or any other external address or domain).

So I've learned that the issue is occurring upstream of the modem, but I'm still not sure what to do about it on the router, besides a cron job. Any further thoughts?

randy · September 16, 2021, 1:43am

Final update for future reference. In chronological order:

I learned that a WAN drop can be fixed by running ip link set eth0.2 down; ip link set eth0.2 up. This completes faster than running /etc/init.d/network restart. The eth0.2 interface is used for WAN on my device, but other devices can be confirmed with uci show network.wan.device.
I updated to OpenWRT 21.02.0. This had no effect on my WAN drops.
I installed Watchcat and configured it for "Restart Interface" mode. This is basically an improved version of the cron job I was planning to create myself.

With that, my problem is pretty well solved. It would be nice if the interface didn't need to be restarted, but automatic restarting is close enough.

system · September 26, 2021, 1:43am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.