Dnsmasq hands out new IP on every boot due to 'ping'

Running dnsmasq on an RPI4 and with my laptop connected via ethernet to it running Ubuntu 20.04, when I reboot the RPI4 it keeps switching the address of the connected laptop.

The following sequence of events occurs:

  • the laptop gets 192.168.1.179
  • I reboot the rpi
  • the laptop detects a short connectivity loss and on reconnection keeps the old IP but issues a DHCPREQUEST (for 192.168.1.179)
  • receiving no response, it then starts issuing DHCPDISCOVER (for 192.168.1.179)
  • when dnsmasq comes up on the RPI, it sees the DHCPDISCOVER for 192.168.1.179, and attempts to ping 192.168.1.179. Finding that 192.168.1.179 is already responsive, it decides that it should give a new IP to my laptop (192.168.1.180)
  • this leads to any existing network connection being dropped (i.e. an ssh session)

I can work around this by adding 'noping' to /etc/config/dhcp, but this seems like a bad idea.

Is there any other way to stop this happening aside from persisting the dhcp lease table, or do I need to patch dnsmasq to realise that if it receives a response to the ping from the same MAC that's asking for the address it should agree to it!

EDIT: I should add that I'm using a non-randomised MAC on my laptop, and dnsmasq is configured to assign IP addresses based on hashing MACs. But this is not that relevant, since the problem is caused by my laptop presenting an IP in the DHCPDISCOVER packet which it already has assigned to itself and dnsmasq not detecting that it's pinging the same device that issued the DHCPDISCOVER when detecting a clash.

If you have option sequential_ip set to 1 in the config, this will happen.
Otherwise Dnsmasq issues an ip based on the hash of the mac address of the requesting device, so if your mac address remains unchanged, you should get the same ip address allocated every time.

Perhaps you can show the output of:
uci show dhcp

1 Like

You can instruct dnsmasq to assign dhcp addresses based on e.g. ethernet MAC-address, that way it should be pretty constant, but you need to manually do this for every device, and e.g. for modern devices you might need to disable the WiFi MAC randomisation or your SSID for this to work...

Maybe this is what you consider as "persisting the dhcp lease table"...

I do not have sequential_ip set; I have the default OpenWrt 23 dhcp config. It correctly hashes the MAC address to 192.168.1.179, but then detects a clash because it can already ping 192.168.1.179 and adds 1 to avoid the clash, without realising that the device responding to the ping is the same device that is doing the DHCPDISCOVER.

I believe the relevant code in dnsmasq is:

          else if (opt && address_available(context, addr, tagif_netid) && !lease_find_by_addr(addr) &&
                   !config_find_by_address(daemon->dhcp_conf, addr) && do_icmp_ping(now, addr, 0, loopback))
            mess->yiaddr = addr;
          else if (emac_len == 0)
            message = _("no unique-id");
          else if (!address_allocate(context, &mess->yiaddr, emac, emac_len, tagif_netid, now, loopback))
            message = _("no address available");

(do_icmp_ping returns NULL if the ping succeeds)

Please read the original post carefully :slight_smile:

Let's see the relevant bits of your configuration:

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
grafik
Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/wireless
cat /etc/config/dhcp
cat /etc/config/firewall

I have a custom build (based on OpenWrt 23) that I'm reluctant to share at the the config for fully at the moment, but I promise you can generate this on an out of box rpi4 config :slight_smile:

At any rate, my dhcp config is the default:

config dnsmasq
	option domainneeded	1
	option boguspriv	1
	option filterwin2k	0  # enable for dial on demand
	option localise_queries	1
	option rebind_protection 1  # disable if upstream must serve RFC1918 addresses
	option rebind_localhost 1  # enable for RBL checking and similar services
	#list rebind_domain example.lan  # whitelist RFC1918 responses for domains
	option local	'/lan/'
	option domain	'lan'
	option expandhosts	1
	option nonegcache	0
	option cachesize	1000
	option authoritative	1
	option readethers	1
	option leasefile	'/tmp/dhcp.leases'
	option resolvfile	'/tmp/resolv.conf.d/resolv.conf.auto'
	#list server		'/mycompany.local/1.2.3.4'
	option nonwildcard	1 # bind to & keep track of interfaces
	#list interface		br-lan
	#list notinterface	lo
	#list bogusnxdomain     '64.94.110.11'
	option localservice	1  # disable to allow DNS requests from non-local subnets
	option ednspacket_max	1232
	option filter_aaaa	0
	option filter_a		0
	#list addnmount		/some/path # read-only mount path to expose it to dnsmasq

config dhcp lan
	option interface	lan
	option start 	100
	option limit	150
	option leasetime	12h

config dhcp wan
	option interface	wan
	option ignore	1

Do you believe my diagnosis of the situation is incorrect? It seems reasonably clear to me that this is standard DHCP behaviour, and I've identified the bit of dnsmasq that causes it. I'm not sure what the other config is going to add here.

Where did this custom build come from?

Please share a config that will repro the issue that comes from an official OpenWrt 23.05 release so that others can evaluate and replicate.

No, that's most certainly not default. And there are a lot of things that look strange there.

At this point, I think there is something seriously wrong with your custom build and/or the config files.

No, this is not a standard issue. AFAICT, you're the first one to report this issue. If you can share with us a config that replicates the problem, we can evaluate further.

If I understand your thread, it should literally just be the DHCP config file, so it should be reproducable on any OpenWrt 23.05 system, right??

If the lease table were persistent, dnsmasq would treat the request as a renewal and issue the same IP address. So you could move the lease table to flash keeping in mind that that does mean flash wear.

The overall philosophy here is that rebooting a main router is going to disrupt everything, and it should be avoided in the first place.

Another potential workaround is to use very short leasetimes, so that during the reboot the laptop expires its address and stops responding to pings. This of course means frequent renewals, but in a home network with only few devices it is tolerable.

1 Like

The dhcp config is the default. What's confusing is that odhcpd usually modifies the dhcp config, but I don't have it installed:

I flashed: https://downloads.openwrt.org/releases/23.05.3/targets/bcm27xx/bcm2711/openwrt-23.05.3-bcm27xx-bcm2711-rpi-4-squashfs-factory.img.gz and reproduced the issue:

Fri Mar 22 22:09:58 2024 daemon.info dnsmasq-dhcp[1]: DHCPDISCOVER(br-lan) 192.168.1.179 00:e0:4c:68:00:6b
Fri Mar 22 22:09:58 2024 daemon.info dnsmasq-dhcp[1]: DHCPOFFER(br-lan) 192.168.1.180 00:e0:4c:68:00:6b
Fri Mar 22 22:09:58 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(br-lan) 192.168.1.180 00:e0:4c:68:00:6b
Fri Mar 22 22:09:58 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(br-lan) 192.168.1.180 00:e0:4c:68:00:6b mma-1375

For reference, the config is:

root@OpenWrt:~# cat /etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd85:2d6e:4517::/48'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

root@OpenWrt:~# cat /etc/config/wireless

config wifi-device 'radio0'
	option type 'mac80211'
	option path 'platform/soc/fe300000.mmcnr/mmc_host/mmc1/mmc1:0001/mmc1:0001:1'
	option channel '36'
	option band '5g'
	option htmode 'VHT80'
	option disabled '1'

config wifi-iface 'default_radio0'
	option device 'radio0'
	option network 'lan'
	option mode 'ap'
	option ssid 'OpenWrt'
	option encryption 'none'

root@OpenWrt:~# cat /etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option boguspriv '1'
	option filterwin2k '0'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option nonegcache '0'
	option cachesize '1000'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option nonwildcard '1'
	option localservice '1'
	option ednspacket_max '1232'
	option filter_aaaa '0'
	option filter_a '0'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option dhcpv6 'server'
	option ra 'server'
	option ra_slaac '1'
	list ra_flags 'managed-config'
	list ra_flags 'other-config'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

It's not necessarily easily reproducible, because it requires a few things:

  • the device to not bring down the network interface for very long on reboot so that Ubuntu doesn't consider the connection completely dropped. I suspect that rpi4 is relatively unique in this way
  • connected device to respond to pings
  • connected device to react to a transient connectivity loss by keeping the IP but issuing a DHCPREQUEST (possibly followed by DHCPDISCOVER)

My guess is whatever manages dhcp in Ubuntu 20.04 (network manager?) has two timeouts - one for keeping the IP but doing a DHCPREQUEST again and one for dropping the IP entirely. And 23.05.03 is fast enough to slot into this gap, where Ubuntu would like to get a new lease but doesn't give up the old IP straight away. Interestingly, it's not easily reproducible on the snapshot, because it appears to have the network interface down for slightly longer, causing Ubuntu to think that the connection is entirely dropped and lose its IP.

But it does happen on system upgrade, and the frontend tries to poll the IP... admittedly, I haven't confirmed that I can replicate this situation, but I assume it would be the same.

It's also frustrating if the router loses power for some reason, at which point theoretically connected devices will have their IPs reshuffled if they issue a DHCPDISCOVER while still holding their old IP (though of course something has to cause them to issue a DHCPDISCOVER...).

I would also assume, though I haven't validated it, that devices (i.e. like OpenWrt devices) that keep dhcp allocated IPs indefinitely would unnecessarily flip IPs at renewal time due to this issue (i.e. basically 'router drops lease table'). EDIT: I tried to check this by reducing the leasetime to 1m and dropping the dnsmasq lease table, but apparently dnsmasq is smarter when responding to a DHCPREQUEST and hands out the same IP even though it doesn't have it in the lease table.

No, this is not the correct default.

You never answered where your custom build came from?

FWIW, this is what the default DHCP file looks like:

root@OpenWrt:~# cat /etc/config/dhcp

config dnsmasq
        option domainneeded '1'
        option boguspriv '1'
        option filterwin2k '0'
        option localise_queries '1'
        option rebind_protection '1'
        option rebind_localhost '1'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option nonegcache '0'
        option cachesize '1000'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
        option nonwildcard '1'
        option localservice '1'
        option ednspacket_max '1232'
        option filter_aaaa '0'
        option filter_a '0'

config dhcp 'lan'
        option interface 'lan'
        option start '100'
        option limit '150'
        option leasetime '12h'
        option dhcpv4 'server'
        option dhcpv6 'server'
        option ra 'server'
        option ra_slaac '1'
        list ra_flags 'managed-config'
        list ra_flags 'other-config'

config dhcp 'wan'
        option interface 'wan'
        option ignore '1'

config odhcpd 'odhcpd'
        option maindhcp '0'
        option leasefile '/tmp/hosts/odhcpd'
        option leasetrigger '/usr/sbin/odhcpd-update'
        option loglevel '4'

I've now duplicated the issue on a standard RPI4 image, so happily it doesn't matter :slight_smile:

What you've given is the DHCP config after the odhcpd setup has modified it. This is not the default DHCP config (see link above).

Can you share the files from the Pi4 image that you're running?

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
grafik
Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/wireless
cat /etc/config/dhcp
cat /etc/config/firewall

EDIT: Please also provide the exact steps for reproducing the issue.

I have already shared them. See above. It's just the default RPI4 image.

Ok, I didn't cat firewall or system board, so:

root@OpenWrt:~# ubus call system board
cat /etc/con{
	"kernel": "5.15.150",
	"hostname": "OpenWrt",
	"system": "ARMv8 Processor rev 3",
	"model": "Raspberry Pi 4 Model B Rev 1.5",
	"board_name": "raspberrypi,4-model-b",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "23.05.3",
		"revision": "r23809-234f1a2efa",
		"target": "bcm27xx/bcm2711",
		"description": "OpenWrt 23.05.3 r23809-234f1a2efa"
	}
}
root@OpenWrt:~# cat /etc/config/firewall
config defaults
	option syn_flood	1
	option input		REJECT
	option output		ACCEPT
	option forward		REJECT
# Uncomment this line to disable ipv6 rules
#	option disable_ipv6	1

config zone
	option name		lan
	list   network		'lan'
	option input		ACCEPT
	option output		ACCEPT
	option forward		ACCEPT

config zone
	option name		wan
	list   network		'wan'
	list   network		'wan6'
	option input		REJECT
	option output		ACCEPT
	option forward		REJECT
	option masq		1
	option mtu_fix		1

config forwarding
	option src		lan
	option dest		wan

# We need to accept udp packets on port 68,
# see https://dev.openwrt.org/ticket/4108
config rule
	option name		Allow-DHCP-Renew
	option src		wan
	option proto		udp
	option dest_port	68
	option target		ACCEPT
	option family		ipv4

# Allow IPv4 ping
config rule
	option name		Allow-Ping
	option src		wan
	option proto		icmp
	option icmp_type	echo-request
	option family		ipv4
	option target		ACCEPT

config rule
	option name		Allow-IGMP
	option src		wan
	option proto		igmp
	option family		ipv4
	option target		ACCEPT

# Allow DHCPv6 replies
# see https://github.com/openwrt/openwrt/issues/5066
config rule
	option name		Allow-DHCPv6
	option src		wan
	option proto		udp
	option dest_port	546
	option family		ipv6
	option target		ACCEPT

config rule
	option name		Allow-MLD
	option src		wan
	option proto		icmp
	option src_ip		fe80::/10
	list icmp_type		'130/0'
	list icmp_type		'131/0'
	list icmp_type		'132/0'
	list icmp_type		'143/0'
	option family		ipv6
	option target		ACCEPT

# Allow essential incoming IPv6 ICMP traffic
config rule
	option name		Allow-ICMPv6-Input
	option src		wan
	option proto	icmp
	list icmp_type		echo-request
	list icmp_type		echo-reply
	list icmp_type		destination-unreachable
	list icmp_type		packet-too-big
	list icmp_type		time-exceeded
	list icmp_type		bad-header
	list icmp_type		unknown-header-type
	list icmp_type		router-solicitation
	list icmp_type		neighbour-solicitation
	list icmp_type		router-advertisement
	list icmp_type		neighbour-advertisement
	option limit		1000/sec
	option family		ipv6
	option target		ACCEPT

# Allow essential forwarded IPv6 ICMP traffic
config rule
	option name		Allow-ICMPv6-Forward
	option src		wan
	option dest		*
	option proto		icmp
	list icmp_type		echo-request
	list icmp_type		echo-reply
	list icmp_type		destination-unreachable
	list icmp_type		packet-too-big
	list icmp_type		time-exceeded
	list icmp_type		bad-header
	list icmp_type		unknown-header-type
	option limit		1000/sec
	option family		ipv6
	option target		ACCEPT

config rule
	option name		Allow-IPSec-ESP
	option src		wan
	option dest		lan
	option proto		esp
	option target		ACCEPT

config rule
	option name		Allow-ISAKMP
	option src		wan
	option dest		lan
	option dest_port	500
	option proto		udp
	option target		ACCEPT


### EXAMPLE CONFIG SECTIONS
# do not allow a specific ip to access wan
#config rule
#	option src		lan
#	option src_ip	192.168.45.2
#	option dest		wan
#	option proto	tcp
#	option target	REJECT

# block a specific mac on wan
#config rule
#	option dest		wan
#	option src_mac	00:11:22:33:44:66
#	option target	REJECT

# block incoming ICMP traffic on a zone
#config rule
#	option src		lan
#	option proto	ICMP
#	option target	DROP

# port redirect port coming in on wan to lan
#config redirect
#	option src			wan
#	option src_dport	80
#	option dest			lan
#	option dest_ip		192.168.16.235
#	option dest_port	80
#	option proto		tcp

# port redirect of remapped ssh port (22001) on wan
#config redirect
#	option src		wan
#	option src_dport	22001
#	option dest		lan
#	option dest_port	22
#	option proto		tcp

### FULL CONFIG SECTIONS
#config rule
#	option src		lan
#	option src_ip	192.168.45.2
#	option src_mac	00:11:22:33:44:55
#	option src_port	80
#	option dest		wan
#	option dest_ip	194.25.2.129
#	option dest_port	120
#	option proto	tcp
#	option target	REJECT

#config redirect
#	option src		lan
#	option src_ip	192.168.45.2
#	option src_mac	00:11:22:33:44:55
#	option src_port		1024
#	option src_dport	80
#	option dest_ip	194.25.2.129
#	option dest_port	120
#	option proto	tcp

And what is the exact process for reproducing the issue? Sequence, timing, etc.

Reproduction: Connect Ubuntu 20.04 laptop to RPI4 running default OpenWrt 23.05.3 image (squashfs). Get an IP. Reboot RPI4. Get a different IP.

But as explained earlier, this may be hardware specific in terms of how long my ethernet adapter has connectivity loss.

If you unplug the cable and plug it back in (instead of rebooting the Pi), does the problem manifest?

No, it still has the lease table so even if I do it very quickly it's fine. It's only the short term drop caused by reboots with the related loss of the lease table on an RPI4 that reveal the issue (and that dnsmasq is not immediately up and responsive, because if it is it will receive the DHCPREQUEST and it will work... I believe it's the DHCPDISCOVER that finds the tricky code path).

If you're interested, the spec says that in response to a DHCPDISCOVER when generating the DHCPOFFER:

When allocating a new address, servers SHOULD check that the offered network address is not already in use; e.g., the server may probe the offered address with an ICMP Echo Request.

I don't think either side of this interaction is strictly doing anything wrong, it's just that I would like dnsmasq to be smarter.

What about if you u plug the cable, reboot the pi, and then plug the cable back in?