Fixes for udhcpc DHCP renewal bugs in Busybox 1.30 used in 19.07

This post is to contribute a solution I came up with that addresses two bugs in the DHCP renewal process of the udhcpc code used to provide DHCP protocol client functionality found in the default configuration of 19.07.x (and likely in prior 18.x versions) of OpenWRT.
Since 19.07 is EoL, I thought I’d post the fix here, if there is ever a 19.07.11, I’d ask the devs to please consider including this patch in that build.

The root problem is that there are ISPs out there that intentionally ignore unicast DHCP renew request packets. They only respond to renew if two items are true: the packet is addressed to 255.255.255.255 AND if the bootp flag is set to broadcast. Just broadcasting the packet, but with bootp flags set to zero (unicast), does not work.

This is not per the RFC, but hey, they exist out there.

The 1.30.1 (and likely prior versions) of the udhcpc in the Busybox package attempts to send a broadcast packet as a last-ditch effort but does not set the broadcast flag. So the renewals fail.

The second bug relates to how the code is handling renewal intervals and retries. If a long lease is received, it waits for the usual lease_time /2 before attempting a renew, but if the response packet is lost (or never sent), it then waits until a quarter of the original lease_time, before trying again. All that is fine, but on really long leases (multi-day), the lease times out before this code goes into last-ditch mode, and any effort at broadcasting a renew is too late (or again, the ISP is ignoring even those).

My goals in addressing these two issues were to make the fewest changes to the code, as it seems rather fragile (for instance, ‘timeout’ is a massively overloaded variable), and to keep focused on only solving these two issues. Once one studies this code, it’s obvious there are many ways to improve it, and compared to the latest versions of the code, seems the author agrees. So, my patch is minimal on purpose, but also easy to desk-check.

My strategy to deal with the renewal interval was to make a more strenuous attempt at renewal at T2 by issuing up to discovery_retries attempts at renewing (using unicast), and if not, then fall through to attempt a broadcast renew. And if that did not work, then fall through to the rebinding state, so we at least get a fresh lease before the old one expires, as clearly renew is not working.

The changes for dealing with the bootp flags were again, simple, so in send_renew, all we check for is a broadcast address (a zero is used for server if it is a broadcast) and if so, set the bootp flag to broadcast.
One might envision making that dependent on the Broadcast option setting, but that’s an overload I think, as the RFC describes that flag as telling the server we need a broadcast reply during discovery. Yet in our case, the ISP requires that flag before responding to renew.

I had some fun during testing, as the default script used to launch the udhcpc instance is actually forcing the discovery_retries value to zero (-t 0) in the proto_run_command. This is in the /lib/netifd/proto/dhcp.sh and the source file is in /package/network/config/netifd/files/lib/netifd/proto/dhcp.sh

So you need to patch or edit that to -t 3 so we actually send the multiple renews.
I’d also recommend (Timeout) -T 5 be added, as some ISP DHCP servers are slow (default is 3 seconds).

A build based on 19.07.9 that includes the patch has been deployed on multiple ISPs, including the one that spurred the development, and it is working fine to date.

Look forward to any feedback on the patch.

Save this code as 204-udhcpc_renew_fixes.patch
The udhcpc patch file goes into /source/package/utils/busybox/patches/ directory.

--- a/networking/udhcp/dhcpc.c
+++ b/networking/udhcp/dhcpc.c
@@ -810,6 +810,9 @@
 	packet.xid = xid;
 	packet.ciaddr = ciaddr;
 
+	/* Ensure we set broadcast flags if server is ANY_ADDR */
+	if (!server) packet.flags |= htons(BROADCAST_FLAG);
+
 	/* Add options: maxsize,
 	 * optionally: hostname, fqdn, vendorclass,
 	 * "param req" option according to -O, and options specified with -x
@@ -1259,6 +1262,7 @@
 	uint32_t xid = xid; /* for compiler */
 	int packet_num;
 	int timeout; /* must be signed */
+	int unicastRenewCount = 0;
 	unsigned already_waited_sec;
 	unsigned opt;
 	IF_FEATURE_UDHCPC_ARPING(unsigned arpping_ms;)
@@ -1521,14 +1525,17 @@
 			case BOUND:
 				/* 1/2 lease passed, enter renewing state */
 				state = RENEWING;
+				unicastRenewCount = 0;
 				client_config.first_secs = 0; /* make secs field count from 0 */
 				change_listen_mode(LISTEN_KERNEL);
 				log1("entering renew state");
 				/* fall right through */
 			case RENEW_REQUESTED: /* manual (SIGUSR1) renew */
 			case_RENEW_REQUESTED:
+				unicastRenewCount = 0;
+				state = RENEWING;  /* explicitly change state so we do not keep resetting renew counts */
 			case RENEWING:
-				if (timeout >= 60) {
+				if (unicastRenewCount < discover_retries) {
 					/* send an unicast renew request */
 			/* Sometimes observed to fail (EADDRNOTAVAIL) to bind
 			 * a new UDP socket for sending inside send_renew.
@@ -1539,7 +1546,8 @@
 			 * into INIT_SELECTING state.
 			 */
 					if (send_renew(xid, server_addr, requested_ip) >= 0) {
-						timeout >>= 1;
+						timeout = discover_timeout;
+						unicastRenewCount++;
 //TODO: the timeout to receive an answer for our renew should not be selected
 //with "timeout = lease_seconds / 2; ...; timeout = timeout / 2": it is often huge.
 //Waiting e.g. 4*3600 seconds for a reply does not make sense
@@ -1562,12 +1570,13 @@
 			case REBINDING:
 				/* Switch to bcast receive */
 				change_listen_mode(LISTEN_RAW);
-				/* Lease is *really* about to run out,
+				/* Renew unicasts failed,
 				 * try to find DHCP server using broadcast */
-				if (timeout > 0) {
+				if (unicastRenewCount > 0) {
 					/* send a broadcast renew request */
 					send_renew(xid, 0 /*INADDR_ANY*/, requested_ip);
-					timeout >>= 1;
+					timeout = discover_timeout;
+					unicastRenewCount = 0; /* reset so we continue on with rebinding */
 					continue;
 				}
 				/* Timed out, enter init state */

end

1 Like

Thanks JonFo for looking into this. I am on FreshTomato (not openwrt) with busybox 1.34.1 which I believe has the same udhcpc "issue" you're describing on some ISPs. Yeah I just switched ISPs... sigh...
I'm currently capturing packets to confirm renewals and rebinds are simply ignored if missing BROADCAST_FLAG, which I think may be the root of most evil.
I also agree that default -T 3 is ridiculously small considering ISP level deployments, especially for REQUEST which may (should?) run an ARP scan before offering. But this at least can be worked around with params.
To top that off, rebinds are only tried at 30s prior to lease expiry, which is arbitrary and may even be subject to clock skew, on multi-day leases. Just not robust practice - udhcpc in busybox 0.x used to renew at 7/8 lease time which was considerably more flexible.

I believe, if behavior is confirmed, the best path forward would be to

  • add a separate option for BROADCAST_FLAG on unicasts / renewals to work around non-RFC ISPs - default disabled
  • add a separate option to switch from RENEW to REBIND, based on lease length or fixed time length instead of the hardcoded 30s. This can also be defaulted to 30s as today, for backwards compact

These 2, put against busybox repo as a change request which if accepted would trickle down in all our products and to many it devices, eventually

Thanks JonFo and shadowncs for addressing these issues, but may I ask if these issues affect devices trying to obtain leases from routers, not ISPs.
If a bootp flag is not set to broadcast, are there routers that will reject the renew request ?

@kamel I would not think a router would suffer from this issue.
This tends to crop up due to the complex nature of some ISP deployments. A router's LAN DHCP service is pretty simple, in contrast.

Also worth noting that this thread is referring to an older (EOL) version of OpenWrt with an old version of Busybox (thread title 1.30, later described in 1.34.1, current OpenWrt uses 1.36.1).

Any bugs present in versions earlier than 22.03 will not be addressed as they are EOL. 22.03 itself will go EOL in April 2024, so any bugs reported against that version would need to be addressed soon or never.