Dhcp broken with snapshot

I compiled my own kernel for mediatek/filogic snapshot just yesterday Nov 16th. After updating my device I ran into issue, as dhcp doesn´t provide addresses to the devices in my lan neither in my guest vlan. Only devices with fixed ip address could access the wan.
In sys log I found this line

 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 0.0.0.100 -- 0.0.0.199, lease time 12h

and later entries complaining no available addresses. I found this entry, but had no luck. However,manually editing of /var/etc/dnsmasq.conf.* (which contains the wrong values) did not help, even /etc/init.d/dnsmasq restart rebuilds the wrong content of this file.
Solution so far: revert to 23.05.0.
Because of vacation trip I can´t provide further details until sunday evening, sorry.
Anyone here with the same issue?
Go

Before implicating snapshot in general, it would be good if you tried the snapshots available directly from the firmware-selector. This will help identify if there is a potential bug that was recently introduced or if it is something that was incorrectly configured in your build evnironment.

The other thing we need to do is to look at your config files to make sure that your network and DHCP settings are fully valid. A misconfiguration here could also cause this problem.

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:
grafik
Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/dhcp

Hello,

thx for fast reply - not expecting :wink:
As you requested, here the informations from the running system:

root@bananapi:/tmp/etc# ubus call system board
{
        "kernel": "5.15.134",
        "hostname": "bananapi.lan",
        "system": "ARMv8 Processor rev 4",
        "model": "Bananapi BPI-R3",
        "board_name": "bananapi,bpi-r3",
        "rootfs_type": "squashfs",
        "release": {
                "distribution": "OpenWrt",
                "version": "23.05.0",
                "revision": "r23497-6637af95aa",
                "target": "mediatek/filogic",
                "description": "OpenWrt 23.05.0 r23497-6637af95aa"
        }
}

the relevant section of /etc/config/dhcp:

config dhcp 'lan'
	option interface 'lan'
	option leasetime '12h'
	option start '137'
	option force '1'
	option netmask 'x.y.z.192'
	list dhcp_option '6, x.y.z.129'
	list dhcp_option '42, x.y.z.129'
	option limit '52'
	option master '1'
	option ra 'hybrid'
	option dhcpv6 'hybrid'

config dhcp 'guest'
	option interface 'guest'
	option start '100'
	option leasetime '12h'
	option limit '100'
	option force '1'
	option netmask '255.255.255.0'
	list dhcp_option '6,8.8.8.8'

and excerpt from /etc/config/network

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'
	list ports 'sfp2'

config interface 'lan'
	option device 'br-lan.1'
	option proto 'static'
	option ipaddr 'x.y.z.129'
	option ip6ifaceid '::ff'
	option delegate '0'
	option force_link '0'
	option stp '1'
	option netmask '255.255.255.192'
	option bridge_empty '1'
	option broadcast 'x.y.z.191'
	option ip6assign '60'

config device
	option name 'br-wan'
	option type 'bridge'
	list ports 'eth1'
	list ports 'wan'

config device
	option name 'eth1'
	option macaddr '42:c2:cc:2d:e1:d5'

config interface 'guest'
	option proto 'static'
	option device 'br-lan.80'
	option ipaddr '192.168.88.1'
	option netmask '255.255.255.0'
	option force_link '0'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'lan1:t'
	list ports 'lan2:t'
	list ports 'lan3:u*'
	list ports 'lan4:t'
	list ports 'sfp2:u*'

config bridge-vlan
	option device 'br-lan'
	option vlan '80'
	list ports 'lan1:t'
	list ports 'lan2:t'
	list ports 'lan3:t'
	list ports 'lan4:t'
	list ports 'sfp2:t'

I have not altered the config files between update from recent snapshot to current and back to 23.05.
HTH
Go

you have over-redacted your configs...

Where you have x.y.z is (or should be) RFC1918 addresses which do not need to be hidden (they're not sensitive or personally identifying information), but beacause you've redated too much, it's impossible to see if there is an error here. Can you post again without redacting those sections?

2 Likes

I can´t use the snapshots from firmware selector, because I have my boot file system increased. Sysupgrade complains about not matching images.

Go

Excuse me, nearly the whole world hides RFC1918-Adresses even in this forum, so did I. But you are right.
/etc/config/dhcp

config dhcp 'lan'
	option interface 'lan'
	option leasetime '12h'
	option start '137'
	option force '1'
	option netmask '255.255.255.192'
	list dhcp_option '6, 10.200.137.129'
	list dhcp_option '42, 10.200.137.129'
	option limit '52'
	option master '1'
	option ra 'hybrid'
	option dhcpv6 'hybrid'

config dhcp 'guest'
	option interface 'guest'
	option start '100'
	option leasetime '12h'
	option limit '100'
	option force '1'
	option netmask '255.255.255.0'
	list dhcp_option '6,8.8.8.8'
	list dhcp_option '42, 10.200.137.129'

/etc/config/network

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'
	list ports 'sfp2'

config interface 'lan'
	option device 'br-lan.1'
	option proto 'static'
	option ipaddr '10.200.137.129'
	option ip6ifaceid '::ff'
	option delegate '0'
	option force_link '0'
	option stp '1'
	option netmask '255.255.255.192'
	option bridge_empty '1'
	option broadcast '10.200.137.191'
	option ip6assign '60'

config device
	option name 'br-wan'
	option type 'bridge'
	list ports 'eth1'
	list ports 'wan'

config device
	option name 'eth1'
	option macaddr '42:c2:cc:2d:e1:d5'

config interface 'guest'
	option proto 'static'
	option device 'br-lan.80'
	option ipaddr '192.168.88.1'
	option netmask '255.255.255.0'
	option force_link '0'

config bridge-vlan
	option device 'br-lan'
	option vlan '1'
	list ports 'lan1:t'
	list ports 'lan2:t'
	list ports 'lan3:u*'
	list ports 'lan4:t'
	list ports 'sfp2:u*'

config bridge-vlan
	option device 'br-lan'
	option vlan '80'
	list ports 'lan1:t'
	list ports 'lan2:t'
	list ports 'lan3:t'
	list ports 'lan4:t'
	list ports 'sfp2:t'

So starting with your netmask... it's not necessary in the DHCP server section, but you have a /26 defined here. That has a max of 62 hosts in the subnet.

Your start is the offset from the network address, so that means the start address is already beyond the limit of the network scope and even of the octet in general (network is 10.200.137.128/26, +137 = 10.200.137.265).

There are a few other thigns here:

You don't need to specify option 6 unless it would be different than the router's address on the network. In this case, it's the same so this line can be omitted.

STP sholdn't be here... this is a layer 3 interface definition, STP happens at layer 2.

Bridge statements do not belong in the network interface stanzas at all, they should only be in bridge device statements.

While this address is correct, you can omit it because it is automatically calculated.

In general, I recommend using a /24 since it's just an easier range to work with (more intutive). But, given that you probably had a reason to use a /26, it's certainly fine if you want to use this size network.

Here's what it should look like:

config dhcp 'lan'
	option interface 'lan'
	option leasetime '12h'
	option start '9'
	option force '1'
	list dhcp_option '42, 10.200.137.129'
	option limit '52'
	option ra 'hybrid'
	option dhcpv6 'hybrid'
config interface 'lan'
	option device 'br-lan.1'
	option proto 'static'
	option ipaddr '10.200.137.129'
	option ip6ifaceid '::ff'
	option delegate '0'
	option force_link '0'
	option netmask '255.255.255.192'
	option ip6assign '60'

Try that -- I expect it will work properly with these adjustments.

2 Likes

Thx for your recommendations. Just for clarification:

option start '9'

is wrong in this context, because the correct dhcp range goes from 10.200.137.129 to 10.200.137.191. Start has to be a value within this range.
The other values you mentioned are historical, I´m using openwrt since more then 10 years and from there are these settings.
And you are right, this unusual dhcp range is necessary because it should fit into my network plan at work (using vpn).
I will give it a try on sunday. Luckily I have both images available and can boot the snapshot kernel again.
Go

No. It is the offset -- see the documentatoin

start Specifies the offset from the network address of the underlying interface to calculate the minimum address that may be leased to clients. It may be greater than 255 to span subnets.

And over the past 10 years, syntax has changed several times. For example, bridges, as I described above, must be defined outside the network stanza. IIRC, in 18.06 and earlier, a bridge was actually defined within the network stanza. If you do that now, it will not work. It is a common mistake, though.

If you attempted to import a configuration from an old version (for example 18.06), you would find that the syntax is not compatible and it would not work properly (might even soft-brick). Technically, only one version back (n-1) is tested/supported for restoring an old config backup to the current version, and even that depends on some other context (i.e. major architectural changes will require a reset to defaults and configure from scratch).

Please make the changes I suggested.

2 Likes

I will do it and retest against the snapshot build I have.
Wondering as long this faulty dhcp config runs w/o issues

Thx for all
Go

I think this is related to the incomplete guest network specifications in the dhcp config since the faulty range is 100 IPs.

Helpful to see the results of:

grep dhcp-range /var/etc/dnsmasq.conf.*
1 Like

I'm not seeing any incomplete dhcp specifications for the guest network -- the only required options are the interface, start, limit, and leasetime. Those are all included and valid for a /24 (start = 100, limit = 100 --> DHCP range for the guest network is 192.168.88.100 - 192.168.88.199).

True, I was only on my first cup of coffee when I wrote that. But I still suspect the guest interface being badly processed by all the recent ipcalc changes in main. Cant put my finger on it yet.

Hello,

As promised:
I just booted build 24414 (build number JFTR) again with the alterations suggested by psherman and my "production" lan works as expected. Thanks to all for looking.
But my guest lan keeps this weird error.

Fri Nov 17 11:42:34 2023 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 0.0.0.100 -- 0.0.0.199, lease time 12h
Fri Nov 17 11:42:34 2023 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 10.200.137.137 -- 10.200.137.188, lease time 12h

Even if I change the offset just for a try:

Sun Nov 19 17:10:58 2023 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 0.0.0.50 -- 0.0.0.149, lease time 12h

Reverting back to 23.05.0, both ranges are working

Sun Nov 19 17:11:02 2023 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 192.168.88.50 -- 192.168.88.149, lease time 12h
Sun Nov 19 17:11:02 2023 daemon.info dnsmasq-dhcp[1]: DHCP, IP range 10.200.137.137 -- 10.200.137.188, lease time 12h

And indeed, there are alterations in /etc/init.d/dnsmasq between the builds I tested, but I can´t dive deep into the changes

gotthard@Deskmini:~$ diff dnsmasq.2305 dnsmasq.24414
512d511
<
542a542,544
>       ipaddr="${subnet%%/*}"
>       prefix_or_netmask="${subnet##*/}"
>
544c546,548
<       config_get netmask "$cfg" netmask "${subnet##*/}"
---
>       config_get netmask "$cfg" netmask
>
>       [ -n "$netmask" ] && prefix_or_netmask="$netmask"
569a574
>       config_get dns_sl "$cfg" domain
585,588d589
<       if [ "$limit" -gt 0 ] ; then
<               limit=$((limit-1))
<       fi
<
590c591
<       if [ "$dhcpv4" != "disabled" ] && eval "$(ipcalc.sh "${subnet%%/*}" "$netmask" "$start" "$limit")" ; then
---
>       if [ "$dhcpv4" != "disabled" ] && ipcalc "$ipaddr/$prefix_or_netmask" "$start" "$limit" ; then
652a654,660
>
>               if [ -n "$dns_sl" ]; then
>                       ddssl=""
>                       for dd in $dns_sl; do append ddssl "$dd" ","; done
>               fi
>
>               dhcp_option_append "option6:domain-search,$ddssl" "$networkid"

I´m just compiling the actual build and will try then again.

Go

Try removing the netmask line from the guest dhcp server, too.

Ah, I didn´t see this line in /etc/config/dhcp and you are rigth, in luci there is no field to enter the netmask in network / interfaces / dhcp. Old habits die hard...

So is something out of sync right now between ipcalc.sh and dnsmasq.init? @pprindeville @yogo1212 @vgaetera ?

Overriding netmask in the dhcp pool config is valid.

./ipcalc.sh 192.168.88.1/255.255.255.0 100 100
IP=192.168.88.1
NETMASK=0.0.0.0
NETWORK=0.0.0.0
BROADCAST=0.0.0.255
PREFIX=16
START=0.0.0.100
END=0.0.0.199

You've got the syntax slightly wrong. The / can be used for CIDR notation (i.e. 192.168.88.1/24). But if you are using subnet mask notation, you should use a space instead of a slash (192.168.88.1 255.255.255.0)

root@openwrt:~# ipcalc.sh 192.168.88.1/24 100 100
IP=192.168.88.1
NETMASK=255.255.255.0
BROADCAST=192.168.88.255
NETWORK=192.168.88.0
PREFIX=24
START=192.168.88.100
END=192.168.88.200

root@openwrt:~# ipcalc.sh 192.168.88.1 255.255.255.0 100 100
IP=192.168.88.1
NETMASK=255.255.255.0
BROADCAST=192.168.88.255
NETWORK=192.168.88.0
PREFIX=24
START=192.168.88.100
END=192.168.88.200
2 Likes

Understood, but dnsmasq.init is currently invoking ipcalc with the slash format, so something is inconsistent at the moment in main. A lot of changes made recently…

This commit claims ipcalc understands the syntax, but doesn’t seem to.

Edit: I guess the above commit was premature since this one is still pending.

Interesting find... I'm not the one who can comment on the details of the code implementation, but certainly does look like the ipaddr/dotted-netmask ... notation is indeed not working.