Wrt1900acs v2 - Loosing internet connection randomly

What do you see when you run this command:

logread | grep udhcpc

I only have the following:

Tue Jun  9 05:09:14 2020 daemon.notice netifd: wan (1953): udhcpc: sending renew to <GATEWAYIP>
Tue Jun  9 05:09:14 2020 daemon.notice netifd: wan (1953): udhcpc: lease of <WANIP> obtained, lease time 604800

However that's 2 days ago.

That is a 7 day lease. Obtained two days ago. So you didn’t have a lease expiration. But it could still be an issue on the isp side. You could ask them if they see any status changes on your ipv4 connectivity.

1 Like

I seem to have exactly same issue as @Morgoth but my lease time is 3600 (1 hour), which gave me a lot of tries to diagnose the problem (although I am not as experienced as you are guys).

Network topology is the (roughly) the same: I have a fiber optic incoming, an ONU terminal that works as a bridge, then my OpenWrt router (it's an X-wrt fork) and then all of my devices. What's even more suspicious, is that the problems started exactly after 17th of May. Judging from the logs of my ISP there was a 125 day session and after that it's all around the place from 1 minute to 7 days. Somewhere around 28th of May they seem to update something once again and the lease was set to 3600 which marks the start of my headaches.

Every session since is 39-40 minutes exactly. Typically logs look like this:
logread | grep udhcpc

Sun Jun 14 09:29:06 2020 daemon.notice netifd: wan (17894): udhcpc: lease of %myIP% obtained, lease time 3600
Sun Jun 14 09:59:06 2020 daemon.notice netifd: wan (17894): udhcpc: sending renew to %DHCPS%

Sometime around 10:09 (basically 40 minutes after the initial lease) the internet goes down. There's nothing in my logs at this particular time, but I can still use my ISP's website and my personal profile area (with billing, logs, etc). And I can even see that my connection has been dropped: 09:28-10:08 | 39 min | %myIP%

But my router has no idea and continues to try and renew normally:

Sun Jun 14 10:14:06 2020 daemon.notice netifd: wan (17894): udhcpc: sending renew to %DHCPS%
Sun Jun 14 10:21:36 2020 daemon.notice netifd: wan (17894): udhcpc: sending renew to %DHCPS%
Sun Jun 14 10:25:21 2020 daemon.notice netifd: wan (17894): udhcpc: sending renew to %DHCPS%
Sun Jun 14 10:27:13 2020 daemon.notice netifd: wan (17894): udhcpc: sending renew to %DHCPS%
Sun Jun 14 10:28:09 2020 daemon.notice netifd: wan (17894): udhcpc: sending renew to 0.0.0.0
Sun Jun 14 10:28:09 2020 daemon.notice netifd: wan (17894): udhcpc: lease of %myIP% obtained, lease time 3600

The second to last line is just what it is: the IP shows exactly 0.0.0.0 and I have no idea what it is. Is it a fallback of some sort? Is that a broadcast? Anyway, my ISP technicians only know "reset your router" and "try to connect directly to the PON terminal" solutions so there's not much hope on that side.

My thoughts at this point: on the internet when I searched this issue I saw other peoples' logs and they show that other routers renew their IPs way more aggressively (like there are 10 tries per second) because ISPs DHCP servers can get flooded on heavy traffic days/hours. And I know someone said that it is against regulations and you some servers can even give you a timeout for that. But apparently if you don't do that, you can sometimes get outrun by those who do. I don't know, it's just a thought.

Another strange thing is that my ISP's gateway is not the same as their DHCP server. My subnet mask is /24. The gateway and DHCP server seem to be in different subnets (need at least /23). Can this be a problem? Is there a way to finetune my DHCP broadcasts to receive replies from a wider range of IPs? Obviously I have no idea what I'm talking about.

TLDR; Problems seem to began around May 26th, it has to do with DHCP renewal process because something got updated on the ISP's side. My money right now is on specifics of udhcpc code and particularities of net topology.

At half of lease time, the dhcp client tries to renew the dhcp lease with unicast messages to the dhcp server.
If that fails, it will start the dhcp transaction from the beginning, sending broadcast to discover the dhcp server.

It's not strange and not a problem.

Nope.

You could raise this issue with your ISP and hope for the best. You might as well try connecting your PC directly on the PON.

I understand that. What I'm weirded by is 0.0.0.0 address. Why does it succeed at renewal, but the previous attempts that use the actual DHCP server IP do not? Also, are you sure that this is discover broadcast? Because discover looks like this in my logs:

Sun Jun 14 14:24:51 2020 daemon.notice netifd: wan (6422): udhcpc: received SIGTERM
Sun Jun 14 14:24:51 2020 daemon.notice netifd: wan (6422): udhcpc: unicasting a release of %myIP% to %DHCPS%
Sun Jun 14 14:24:51 2020 daemon.notice netifd: wan (6422): udhcpc: sending release
Sun Jun 14 14:24:51 2020 daemon.notice netifd: wan (6422): udhcpc: entering released state
Sun Jun 14 14:24:51 2020 daemon.notice netifd: wan (12894): udhcpc: started, v1.31.1
Sun Jun 14 14:24:51 2020 daemon.notice netifd: wan (12894): udhcpc: sending discover
Sun Jun 14 14:24:52 2020 daemon.notice netifd: wan (12894): udhcpc: sending select for %myIP%
Sun Jun 14 14:24:52 2020 daemon.notice netifd: wan (12894): udhcpc: lease of %myIP% obtained, lease time 3600

I can clearly see discover event and select event. But in "normal" uninterrupted logs, it sends renew to 0.0.0.0 and succeeds with renewing the lease. Perhaps I'm wrong or the logs omit some data, I don't know.

I tried, but it's very difficult to get ahold of any technician that actually knows anything beyond "did you try to turn it off and back on again".

I have an older macbook I can hook up to the PON terminal directly, but I have no idea what logs to look for.

Well, I did. The connection held for 1.5h uninterrupted. It appears there's no client-side specific DHCP logs on the Mac, and system log (console) didn't show anything particular, so who knows how it handled the renewal process, but it did.

Hooked everything back and everything got back to being screwed. Good grief this is frustrating.

My ISP does the same, (ignoring/blocking unicast renew requests)
I think it is a mis-configuration of their dhcp relay setup...
I made a patch for udhcpd that always sends the renew via broadcast:

--- a/networking/udhcp/dhcpc.c
+++ b/networking/udhcp/dhcpc.c
@@ -701,10 +701,6 @@ static int raw_bcast_from_client_data_if
 
 static int bcast_or_ucast(struct dhcp_packet *packet, uint32_t ciaddr, uint32_t server)
 {
-	if (server)
-		return udhcp_send_kernel_packet(packet,
-			ciaddr, CLIENT_PORT,
-			server, SERVER_PORT);
 	return raw_bcast_from_client_data_ifindex(packet, ciaddr);
 }
 
@@ -817,7 +813,7 @@ static NOINLINE int send_renew(uint32_t
 	add_client_options(&packet);
 
 	temp_addr.s_addr = server;
-	bb_info_msg("sending renew to %s", inet_ntoa(temp_addr));
+	bb_info_msg("sending renew via %s", "broadcast");
 	return bcast_or_ucast(&packet, ciaddr, server);
 }

It works but I'm not 100% certain if this is the best way to do this.
Would be nice to have an option to toggle this via command line.

udhcpdc now gets the lease at the first attempt every time:

logread -e udhcpc
Sat Jun 13 18:01:01 2020 daemon.notice netifd: wan (4633): udhcpc: started, v1.31.1
Sat Jun 13 18:01:01 2020 daemon.notice netifd: wan (4633): udhcpc: sending discover
Sat Jun 13 18:01:02 2020 daemon.notice netifd: wan (4633): udhcpc: sending select for 5.147.x.x
Sat Jun 13 18:01:02 2020 daemon.notice netifd: wan (4633): udhcpc: lease of 5.147.x.x obtained, lease time 7200
Sat Jun 13 19:01:40 2020 daemon.notice netifd: wan (4633): udhcpc: sending renew via broadcast
Sat Jun 13 19:01:40 2020 daemon.notice netifd: wan (4633): udhcpc: lease of 5.147.x.x obtained, lease time 7200
Sat Jun 13 20:01:40 2020 daemon.notice netifd: wan (4633): udhcpc: sending renew via broadcast
Sat Jun 13 20:01:41 2020 daemon.notice netifd: wan (4633): udhcpc: lease of 5.147.x.x obtained, lease time 7200
.........

I ended up adding a line to the crontab (System/Scheduled Tasks):

*/35 * * * * /bin/kill -SIGUSR1 $(pidof udhcpc)

Supposed to run every 35 minutes, which is slightly longer than 1/2 of my lease time, but a bit shorter than predicted connection loss. I might tweak it in the future, so far it has worked.

What it does is sends SIGUSR1 signal to udhcpc daemon which forces the latter to do a full renew procedure (both multicast/unicast) and it receives the IP successfully this time. It goes something like this:

Sun Jun 14 18:30:40 2020 daemon.notice netifd: wan (16969): udhcpc: performing DHCP renew
Sun Jun 14 18:30:40 2020 daemon.notice netifd: wan (16969): udhcpc: sending renew to %DHCPS%
Sun Jun 14 18:31:10 2020 daemon.notice netifd: wan (16969): udhcpc: sending renew to 0.0.0.0
Sun Jun 14 18:31:10 2020 daemon.notice netifd: wan (16969): udhcpc: lease of %myIP% obtained, lease time 3600

ISP server logs show that connection isn't terminated and continues as normal.

signals accepted by udhcpc 🡥
--
udhcpc also responds to SIGUSR1 and SIGUSR2. SIGUSR1 will force a renew state,
and SIGUSR2 will force a release of the current lease, and cause udhcpc to
go into an inactive state (until it is killed, or receives a SIGUSR1). You do
not need to sleep between sending signals, as signals received are processed
sequencially in the order they are received.

1 Like

I still have the problem. I contacted my ISP and I got the expected answer: there is no problem on their side. Expected in a way that it's almost impossible to get anything technical from them.

Few minutes ago the problem happened again. It still the same: I cannot ping IPv4 addresses but I can ping IPv6 addresses.

logread:

Tue Jun 30 07:08:25 2020 daemon.info dnsmasq-dhcp[11164]: DHCPACK(br-lan) 192.168.0.8 a4:45:19:19:2d:af redmi
Tue Jun 30 08:49:43 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f IEEE 802.11: authenticated
Tue Jun 30 08:49:43 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f IEEE 802.11: associated (aid 2)
Tue Jun 30 08:49:43 2020 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 3c:22:fb:31:dc:3f
Tue Jun 30 08:49:43 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f WPA: pairwise key handshake completed (RSN)
Tue Jun 30 08:49:48 2020 daemon.info dnsmasq-dhcp[11164]: DHCPDISCOVER(br-lan) 3c:22:fb:31:dc:3f
Tue Jun 30 08:49:48 2020 daemon.info dnsmasq-dhcp[11164]: DHCPOFFER(br-lan) 192.168.0.159 3c:22:fb:31:dc:3f
Tue Jun 30 08:49:48 2020 daemon.info dnsmasq-dhcp[11164]: DHCPDISCOVER(br-lan) 3c:22:fb:31:dc:3f
Tue Jun 30 08:49:48 2020 daemon.info dnsmasq-dhcp[11164]: DHCPOFFER(br-lan) 192.168.0.159 3c:22:fb:31:dc:3f
Tue Jun 30 08:49:49 2020 daemon.info dnsmasq-dhcp[11164]: DHCPREQUEST(br-lan) 192.168.0.159 3c:22:fb:31:dc:3f
Tue Jun 30 08:49:49 2020 daemon.info dnsmasq-dhcp[11164]: DHCPACK(br-lan) 192.168.0.159 3c:22:fb:31:dc:3f fr082059
Tue Jun 30 08:49:52 2020 kern.debug kernel: [1025383.368008] ieee80211 phy0: Mac80211 start BA 3c:22:fb:31:dc:3f
Tue Jun 30 08:56:37 2020 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED 3c:22:fb:31:dc:3f
Tue Jun 30 08:56:37 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f IEEE 802.11: disassociated
Tue Jun 30 08:56:38 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Tue Jun 30 08:57:11 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f IEEE 802.11: authenticated
Tue Jun 30 08:57:11 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f IEEE 802.11: associated (aid 2)
Tue Jun 30 08:57:11 2020 daemon.notice hostapd: wlan0: AP-STA-CONNECTED 3c:22:fb:31:dc:3f
Tue Jun 30 08:57:11 2020 daemon.info hostapd: wlan0: STA 3c:22:fb:31:dc:3f WPA: pairwise key handshake completed (RSN)
Tue Jun 30 08:57:12 2020 daemon.info dnsmasq-dhcp[11164]: DHCPREQUEST(br-lan) 192.168.0.159 3c:22:fb:31:dc:3f
Tue Jun 30 08:57:12 2020 daemon.info dnsmasq-dhcp[11164]: DHCPACK(br-lan) 192.168.0.159 3c:22:fb:31:dc:3f fr082059
Tue Jun 30 08:57:14 2020 kern.debug kernel: [1025825.863997] ieee80211 phy0: Mac80211 start BA 3c:22:fb:31:dc:3f
Tue Jun 30 09:49:13 2020 daemon.notice hostapd: wlan0: AP-STA-DISCONNECTED a4:45:19:19:2d:af
Tue Jun 30 09:49:16 2020 daemon.info hostapd: wlan0: STA a4:45:19:19:2d:af IEEE 802.11: authenticated
Tue Jun 30 09:49:16 2020 daemon.info hostapd: wlan0: STA a4:45:19:19:2d:af IEEE 802.11: associated (aid 1)
Tue Jun 30 09:49:16 2020 daemon.notice hostapd: wlan0: AP-STA-CONNECTED a4:45:19:19:2d:af
Tue Jun 30 09:49:16 2020 daemon.info hostapd: wlan0: STA a4:45:19:19:2d:af WPA: pairwise key handshake completed (RSN)
Tue Jun 30 09:49:16 2020 daemon.info dnsmasq-dhcp[11164]: DHCPDISCOVER(br-lan) a4:45:19:19:2d:af
Tue Jun 30 09:49:16 2020 daemon.info dnsmasq-dhcp[11164]: DHCPOFFER(br-lan) 192.168.0.8 a4:45:19:19:2d:af
Tue Jun 30 09:49:16 2020 daemon.info dnsmasq-dhcp[11164]: DHCPREQUEST(br-lan) 192.168.0.8 a4:45:19:19:2d:af
Tue Jun 30 09:49:16 2020 daemon.info dnsmasq-dhcp[11164]: DHCPACK(br-lan) 192.168.0.8 a4:45:19:19:2d:af redmi
Tue Jun 30 09:57:13 2020 authpriv.info dropbear[10030]: Child connection from 192.168.0.159:52123
Tue Jun 30 09:57:16 2020 authpriv.notice dropbear[10030]: Password auth succeeded for 'myuser' from 192.168.0.159:52123
Tue Jun 30 09:57:24 2020 authpriv.notice sudo: myuser : TTY=pts/0 ; PWD=/home/myuser ; USER=root ; COMMAND=/bin/ash
Tue Jun 30 09:59:51 2020 daemon.err uhttpd[1771]: luci: accepted login on / for root from 192.168.0.159
Tue Jun 30 10:00:01 2020 daemon.notice netifd: wan (4112): udhcpc: received SIGTERM
Tue Jun 30 10:00:01 2020 daemon.notice netifd: Interface 'wan' is now down
Tue Jun 30 10:00:01 2020 daemon.notice netifd: Interface 'wan' is setting up now
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: reading /tmp/resolv.conf.auto
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain test
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain onion
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain localhost
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain local
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain invalid
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain bind
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using local addresses only for domain lan
Tue Jun 30 10:00:01 2020 daemon.info dnsmasq[11164]: using nameserver fd0f:ee:b0::1#53

Did anyone ever solve this?