This post is to contribute a solution I came up with that addresses two bugs in the DHCP renewal process of the udhcpc code used to provide DHCP protocol client functionality found in the default configuration of 19.07.x (and likely in prior 18.x versions) of OpenWRT.
Since 19.07 is EoL, I thought I’d post the fix here, if there is ever a 19.07.11, I’d ask the devs to please consider including this patch in that build.
The root problem is that there are ISPs out there that intentionally ignore unicast DHCP renew request packets. They only respond to renew if two items are true: the packet is addressed to 255.255.255.255 AND if the bootp flag is set to broadcast. Just broadcasting the packet, but with bootp flags set to zero (unicast), does not work.
This is not per the RFC, but hey, they exist out there.
The 1.30.1 (and likely prior versions) of the udhcpc in the Busybox package attempts to send a broadcast packet as a last-ditch effort but does not set the broadcast flag. So the renewals fail.
The second bug relates to how the code is handling renewal intervals and retries. If a long lease is received, it waits for the usual lease_time /2 before attempting a renew, but if the response packet is lost (or never sent), it then waits until a quarter of the original lease_time, before trying again. All that is fine, but on really long leases (multi-day), the lease times out before this code goes into last-ditch mode, and any effort at broadcasting a renew is too late (or again, the ISP is ignoring even those).
My goals in addressing these two issues were to make the fewest changes to the code, as it seems rather fragile (for instance, ‘timeout’ is a massively overloaded variable), and to keep focused on only solving these two issues. Once one studies this code, it’s obvious there are many ways to improve it, and compared to the latest versions of the code, seems the author agrees. So, my patch is minimal on purpose, but also easy to desk-check.
My strategy to deal with the renewal interval was to make a more strenuous attempt at renewal at T2 by issuing up to discovery_retries attempts at renewing (using unicast), and if not, then fall through to attempt a broadcast renew. And if that did not work, then fall through to the rebinding state, so we at least get a fresh lease before the old one expires, as clearly renew is not working.
The changes for dealing with the bootp flags were again, simple, so in send_renew, all we check for is a broadcast address (a zero is used for server if it is a broadcast) and if so, set the bootp flag to broadcast.
One might envision making that dependent on the Broadcast option setting, but that’s an overload I think, as the RFC describes that flag as telling the server we need a broadcast reply during discovery. Yet in our case, the ISP requires that flag before responding to renew.
I had some fun during testing, as the default script used to launch the udhcpc instance is actually forcing the discovery_retries value to zero (-t 0) in the proto_run_command. This is in the /lib/netifd/proto/dhcp.sh and the source file is in /package/network/config/netifd/files/lib/netifd/proto/dhcp.sh
So you need to patch or edit that to -t 3 so we actually send the multiple renews.
I’d also recommend (Timeout) -T 5 be added, as some ISP DHCP servers are slow (default is 3 seconds).
A build based on 19.07.9 that includes the patch has been deployed on multiple ISPs, including the one that spurred the development, and it is working fine to date.
Look forward to any feedback on the patch.
Save this code as 204-udhcpc_renew_fixes.patch
The udhcpc patch file goes into /source/package/utils/busybox/patches/ directory.
--- a/networking/udhcp/dhcpc.c
+++ b/networking/udhcp/dhcpc.c
@@ -810,6 +810,9 @@
packet.xid = xid;
packet.ciaddr = ciaddr;
+ /* Ensure we set broadcast flags if server is ANY_ADDR */
+ if (!server) packet.flags |= htons(BROADCAST_FLAG);
+
/* Add options: maxsize,
* optionally: hostname, fqdn, vendorclass,
* "param req" option according to -O, and options specified with -x
@@ -1259,6 +1262,7 @@
uint32_t xid = xid; /* for compiler */
int packet_num;
int timeout; /* must be signed */
+ int unicastRenewCount = 0;
unsigned already_waited_sec;
unsigned opt;
IF_FEATURE_UDHCPC_ARPING(unsigned arpping_ms;)
@@ -1521,14 +1525,17 @@
case BOUND:
/* 1/2 lease passed, enter renewing state */
state = RENEWING;
+ unicastRenewCount = 0;
client_config.first_secs = 0; /* make secs field count from 0 */
change_listen_mode(LISTEN_KERNEL);
log1("entering renew state");
/* fall right through */
case RENEW_REQUESTED: /* manual (SIGUSR1) renew */
case_RENEW_REQUESTED:
+ unicastRenewCount = 0;
+ state = RENEWING; /* explicitly change state so we do not keep resetting renew counts */
case RENEWING:
- if (timeout >= 60) {
+ if (unicastRenewCount < discover_retries) {
/* send an unicast renew request */
/* Sometimes observed to fail (EADDRNOTAVAIL) to bind
* a new UDP socket for sending inside send_renew.
@@ -1539,7 +1546,8 @@
* into INIT_SELECTING state.
*/
if (send_renew(xid, server_addr, requested_ip) >= 0) {
- timeout >>= 1;
+ timeout = discover_timeout;
+ unicastRenewCount++;
//TODO: the timeout to receive an answer for our renew should not be selected
//with "timeout = lease_seconds / 2; ...; timeout = timeout / 2": it is often huge.
//Waiting e.g. 4*3600 seconds for a reply does not make sense
@@ -1562,12 +1570,13 @@
case REBINDING:
/* Switch to bcast receive */
change_listen_mode(LISTEN_RAW);
- /* Lease is *really* about to run out,
+ /* Renew unicasts failed,
* try to find DHCP server using broadcast */
- if (timeout > 0) {
+ if (unicastRenewCount > 0) {
/* send a broadcast renew request */
send_renew(xid, 0 /*INADDR_ANY*/, requested_ip);
- timeout >>= 1;
+ timeout = discover_timeout;
+ unicastRenewCount = 0; /* reset so we continue on with rebinding */
continue;
}
/* Timed out, enter init state */
end