Openwrt seemingly blocking DHCP after a while

jxp · February 14, 2025, 8:46pm

I have an recurring issue and I'd like to get some input on what logs and troubleshooting steps I should best do next time it happens.

My openwrt 23.05 (same issue on 23.03 previously) is running on a UniFi ER-X as a managed switch. I have one VLAN for guest wifi (1010) and one for all the rest (VLAN 1001). The 1001 is only tagged between the switch and the router (trunked intf) and the guest wifi is tagged starting at the APs.

The issue is most of the time, that DHCP traffic responses from the router are not properly forwarded to the clients and lease expire. Once I noticed that ping was failing between specific IPs (not all IPs).

Rebooting the managed switch has solved the issue every single time so far.

The DHCP symptom hits me most of the time after rebooting the router (not openwrt, it's an opnsense machine): as long as DHCP lease is valid, no issue, but when the router restarts, DHCP lease have to be renewed and then it starts being impacting.

My problem is, when it happens, it's mostly urgent to restore internet access and I don't have time to troubleshoot, so I just reboot the switch and everything is fine again.

Now, for the next router reboot, I intend to pick a time out of "business" hours (it's my home network) and have time to troubleshoot it properly.

Hence, my ask: what should I first check when looking into an issue that goes away when rebooting?

So far I intended to:

tcpdump -i any port 67 or port 68 to check where DHCP packets come in and which ones are not forwarded
check cpu/ram with htop
ping left and right

Any further recommendations?

psherman · February 14, 2025, 9:20pm

Let's start by reviewing your topology...

Can you draw a network diagram to show how things are connected? Please be sure to indicate each major piece of infrastructure equipment (router, switches, APs) complete with their addresses, brand+model, and what firmware they are using.
Let's see your config, starting with the main router:

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button:

Remember to redact passwords, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/dhcp
cat /etc/config/firewall

Marcelo.Barros · February 14, 2025, 9:31pm

Hi.... Maybe I can be wrong...

Can you check at all devices, about time and date sincronization?

I had a problem with dhcp in a device (not openwrt) and to fix it I did:

force ntp
force dhcp renew to 3hours... yes... broken.

psherman · February 14, 2025, 9:35pm

NTP has nothing to do with DHCP... not sure why you had to force it.

DHCP lease renewal times are based on the actual lease time. The standard specifies that the client should attempt to renew the lease at 50% of the lease time, and then again at 87.5% and again at 100% (assuming the previous ones fail). So for a 12h lease, that would be at 6h, 10.5h, and 12h respectively.

Your force renewal at 3h isn't really relevant, nor does it explain where the problem is.

Marcelo.Barros · February 14, 2025, 9:36pm

I wrote... I can be wrong... but solve my issue with another network vendor.

Who knows what they are doing with linux in their " router " devices?

You, as a developer, know what is correct and know how openwrt should be work to do the job 20000%.

But... the problem is the others!!
Another day... We discover at hard way... that the developer use tinydns inside his device, and not bind.

So... again... I can be wrong... but... Who knows what the other non unknown developers are doing with linux?

psherman · February 14, 2025, 9:41pm

Let's wait until we see the configs and can understand the OP's network sufficiently as to be able to resolve the primary issue. After all, a non-OpenWrt process is unlikely to work, and there aren't any known issues that would cause DHCP to fail in the way described.... this means it's likely a configuration issue somewhere in the OP's network.

jxp · February 15, 2025, 10:39am

Many thanks for all the replies, I will try to answer all questions. I am really a noob in openwrt and I am "misusing" it as a managed switch, and although I do have some background on networking, I definitely might have configured something wrong.

Diagramm

I refreshed one diagram which I mostly use to remember the physical setup, as I have different physical mediums. I hope it's not too confusing, as it probably has a lot of superfluous information. The orange line can be considered a broadcast medium: it is a unicable coaxial setup, the G.hn adapters are in a full mesh (and they provide PoE to the APs).

The problematic openwrt switch is the one running on Ubiquiti EdgeRouter X hardware, in the red box, next to the router.

Software versions:

Synology DS918+: 7.2.2-72806 Update 3
opensense: 25.1.1
openWRT (Ubiquiti EdgeRouter X): 23.05.5
openWRT (TL-WDR23600): 23.05.3 (kind of apprehending upgrading this one, it uses extRoot)
UniFi Network/Controller: 8.6.9
UniFi APs: 6.6.77
Fritz!Box: Fritz!OS 8.00
Home Assistant OS: 14.2
Home Assistant Core: 2025.2.4
GigaCopper G4202TCP: dcp962c_v1_x-HN SPIRIT.v7_12_r877+10_cvs R75
GigaCopper G4202T (phone cable): dcp962p_v1_x-HN SPIRIT.v7_12_r877+10_cvs R73

Wi-Fi Clients: iPhones, Macs, Windows, Android, Ubuntu, I have a mix of stuff. I can try to get the software versions but since they are all impacted by the issue, I don't think their software version is relevant.

Config of problematic openwrt switch:

`ubus call system board`

{
	"kernel": "5.15.167",
	"hostname": "erx",
	"system": "MediaTek MT7621 ver:1 eco:3",
	"model": "Ubiquiti EdgeRouter X",
	"board_name": "ubnt,edgerouter-x",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "23.05.5",
		"revision": "r24106-10cc5fcd00",
		"target": "ramips/mt7621",
		"description": "OpenWrt 23.05.5 r24106-10cc5fcd00"
	}
}

/etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd68:c0fd:f8dc::/48'
	option packet_steering '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'

config interface 'lan'
	option device 'br-lan.1001'
	option proto 'dhcp'
	option delegate '0'

config interface 'mgmt'
	option proto 'static'
	option device 'mgmt'
	option ipaddr '192.168.200.1'
	option broadcast '192.168.200.255'
	list ip6addr 'fd00:200::'
	option ip6gw 'fd00:200::'
	option netmask '255.255.255.0'

config device
	option name 'eth4'
	option acceptlocal '1'

config device
	option type 'bridge'
	option name 'mgmt'
	list ports 'eth4'

config bridge-vlan
	option device 'br-lan'
	option vlan '1001'
	list ports 'eth0:u*'
	list ports 'eth1:u*'
	list ports 'eth2:u*'
	list ports 'eth3:t'

config interface 'guestwifi'
	option proto 'none'
	option device 'br-lan.1010'

config interface 'lan6'
	option proto 'dhcpv6'
	option device 'br-lan.1001'
	option reqaddress 'try'
	option reqprefix 'auto'

config bridge-vlan
	option device 'br-lan'
	option vlan '1010'
	list ports 'eth0:t'
	list ports 'eth1:t'
	list ports 'eth2:t'
	list ports 'eth3:t'

config device
	option name 'br-lan.1010'
	option type '8021q'
	option ifname 'br-lan'
	option vid '1010'
	option ipv6 '1'

config device
	option name 'br-lan.1001'
	option type '8021q'
	option ifname 'br-lan'
	option vid '1001'
	option ipv6 '1'

/etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'
	list interface 'mgmt'
	list notinterface 'guestwifi'
	list notinterface 'lan'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option dhcpv6 'relay'
	option ra 'relay'
	option master '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

config dhcp 'mgmt'
	option interface 'mgmt'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option force '1'
	option dhcpv6 'server'

/etc/config/firewall

config defaults
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'
	option drop_invalid '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'
	list network 'lan6'

config zone
	option name 'mgmt'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	list network 'mgmt'

config forwarding
	option src 'mgmt'
	option dest 'lan'

Time

I checked the time on the openwrt box using date, it is correct (but to be fair, I do not have the issue right now).

psherman · February 15, 2025, 4:24pm

A few things you can change:

Delete these -- there's no need for a bridge on a single port, and typically only one bridge is allowed on a single switch chip, so this could be causing problems:

Then, edit your mgmt interface to use eth4 directly:

config interface 'mgmt'
	option proto 'static'
	option device 'eth4'
	option ipaddr '192.168.200.1'
	option broadcast '192.168.200.255'
	list ip6addr 'fd00:200::'
	option ip6gw 'fd00:200::'
	option netmask '255.255.255.0'

Delete lan6:

And delete the 802.1q stanzas. They're not necessary because the bridge-vlan constructs create the .1q devices automatically:

jxp:

config device
	option name 'br-lan.1010'
	option type '8021q'
	option ifname 'br-lan'
	option vid '1010'
	option ipv6 '1'

config device
	option name 'br-lan.1001'
	option type '8021q'
	option ifname 'br-lan'
	option vid '1001'
	option ipv6 '1'

Delete these lines from the DHCP file:

Edit the lan DHCP server to specifically disable it... it should look like this:

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option ignore '1'

Don't force the mgmt DHCP server (it's not needed, and in the few cases where it seems necessary, it's usually a sign of bigger issues).... remove the force line:

Remove lan6 and add mgmt to the lan firewall zone like this:

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'
	list network 'mgmt'

Delete the rest:

Reboot and test again.

mk24 · February 15, 2025, 8:10pm

It looks like your intention is to have everything inside the house on the same lan vlan but using a mix of Ethernet and coax bridges.

Two dhcp servers on the network will cause problems. The main router here should be the only DHCP server. Check other devices that may contain a DHCP service as well.

jxp · February 15, 2025, 8:54pm

Many thanks for this, I am working through the changes, but not all at the same time. I remove the bridge on eth4 and now I have a question about lan6:

If I delete it, I loose DHCPv6 assignment on the switch, or did I miss something? I know it might sound slightly obsessive, but I prefer my devices to have an IPv6.

You mean that in the few cases where I will have to plug directly into the mgmt port, I can configure an IP manually on my laptop, rather than relying on DHCP?

psherman · February 15, 2025, 8:56pm

In that case, feel free to leave it. It use device eth4 directly.

No, the dhcp server will be active unless another one is detected. Forcing can cause problems if another dhcp server is on the same network. So it is best to avoid using the force option.

jxp · February 15, 2025, 9:43pm

I’m starting to think I really completely misunderstood how to configure DHCP on openwrt and it’s a wonder my setup works… I didn’t intend for lan6 to use eth4, I simply wanted the switch to acquire an IPv6 from the router, to act as a DHCPv6 client on the main lan bridge (vlan 1001)

I also do not want at all a second DHCP server on my network, except on the mgmt interface (eth4) and solely there. It should never distribute IPs anywhere outside of eth4.

And thanks for bearing with me here.

psherman · February 15, 2025, 10:38pm

Sorry... my mistake. You can set it up as you had previously:

That is why I recommended explicitly disabling the lan DHCP server -- this ensures that you will not have a second DHCP server (at least from OpenWrt) active on the network.

jxp · February 16, 2025, 4:17am

thanks, I had not noticed the DHCP server was enabled on lan, I thought I had disabled it.

configs now look like below. I kept lan6 because I want openwrt to be a DHCPv6 client and acquire an IPv6 from the main router. So far, looking at packet captures on a laptop using (port 67 or port 68) or (port 546 or port 547) capture filter, I see traffic going back and forth when renewing, so it looks good. I'll keep monitoring to see if the issue happens again.

Many thanks for the help, highly appreciate you taking the time to explain openwrt to me.

My main lesson I think here, is that maybe using the GUI is not always the most straightforward option, I should look at config files.

/etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd68:c0fd:f8dc::/48'
	option packet_steering '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'

config interface 'lan'
	option device 'br-lan.1001'
	option proto 'dhcp'
	option delegate '0'

config interface 'mgmt'
	option proto 'static'
	option device 'eth4'
	option ipaddr '192.168.200.1'
	option broadcast '192.168.200.255'
	list ip6addr 'fd00:200::'
	option ip6gw 'fd00:200::'
	option netmask '255.255.255.0'

config bridge-vlan
	option device 'br-lan'
	option vlan '1001'
	list ports 'eth0:u*'
	list ports 'eth1:u*'
	list ports 'eth2:u*'
	list ports 'eth3:t'

config interface 'guestwifi'
	option proto 'none'
	option device 'br-lan.1010'

config interface 'lan6'
	option proto 'dhcpv6'
	option device 'br-lan.1001'
	option reqaddress 'try'
	option reqprefix 'auto'

config bridge-vlan
	option device 'br-lan'
	option vlan '1010'
	list ports 'eth0:t'
	list ports 'eth1:t'
	list ports 'eth2:t'
	list ports 'eth3:t'

/etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'
	list interface 'mgmt'
	list notinterface 'guestwifi'
	list notinterface 'lan'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option ra 'relay'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

config dhcp 'mgmt'
	option interface 'mgmt'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv6 'server'
	list dns 'fd00:1::1'
	list dns '2a02:8106:65:8400::1'

/etc/config/firewall

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'
	list interface 'mgmt'
	list notinterface 'guestwifi'
	list notinterface 'lan'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option ra 'relay'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

config dhcp 'mgmt'
	option interface 'mgmt'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv6 'server'
	list dns 'fd00:1::1'
	list dns '2a02:8106:65:8400::1'

root@erx:~# cat /etc/config/firewall

config defaults
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'
	option drop_invalid '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	list network 'lan'
	list network 'lan6'
	list network 'mgmt'

psherman · February 17, 2025, 12:06am

These can be deleted:

Otherwise, everything look okay.

jxp · February 20, 2025, 7:53am

Unfortunately, the issue happened again, and I again did not have time to troubleshoot it.

Same symptoms: a specific device is not getting an IP via DHCP anymore, rebooting this openwrt machine immediately fixes the issue.

I also just removed (after rebooting) the three lines mentioned above from /etc/config/dhcp and after /etc/init.d/dnsmasq restart I got udhcpc: no lease, failing.

Config is now, just for reference:

/etc/config/network

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd68:c0fd:f8dc::/48'
	option packet_steering '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'

config interface 'lan'
	option device 'br-lan.1001'
	option proto 'dhcp'
	option delegate '0'

config interface 'mgmt'
	option proto 'static'
	option device 'eth4'
	option ipaddr '192.168.200.1'
	option broadcast '192.168.200.255'
	list ip6addr 'fd00:200::'
	option ip6gw 'fd00:200::'
	option netmask '255.255.255.0'

config bridge-vlan
	option device 'br-lan'
	option vlan '1001'
	list ports 'eth0:u*'
	list ports 'eth1:u*'
	list ports 'eth2:u*'
	list ports 'eth3:t'

config interface 'guestwifi'
	option proto 'none'
	option device 'br-lan.1010'

config interface 'lan6'
	option proto 'dhcpv6'
	option device 'br-lan.1001'
	option reqaddress 'try'
	option reqprefix 'auto'

config bridge-vlan
	option device 'br-lan'
	option vlan '1010'
	list ports 'eth0:t'
	list ports 'eth1:t'
	list ports 'eth2:t'
	list ports 'eth3:t'

/etc/init.d/firewall

#!/bin/sh /etc/rc.common

START=19
USE_PROCD=1
QUIET=""

service_triggers() {
	procd_add_reload_trigger firewall
}

restart() {
	fw4 restart
}

start_service() {
	fw4 ${QUIET} start
}

stop_service() {
	fw4 flush
}

reload_service() {
	fw4 reload
}

boot() {
	# Be silent on boot, firewall might be started by hotplug already,
	# so don't complain in syslog.
	QUIET=-q
	start
}

/etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option cachesize '1000'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.d/resolv.conf.auto'
	option localservice '1'
	option ednspacket_max '1232'

config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv4 'server'
	option ra 'relay'
	option ignore '1'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

config dhcp 'mgmt'
	option interface 'mgmt'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dhcpv6 'server'
	list dns 'fd00:1::1'
	list dns '2a02:8106:65:8400::1'

The switch still has an IP, but does that mean it's unable to get an IP from the main router (which acts at the DHCP server)?

What seems more likely to be the root issue though is that it appears there is some time sync issue:

Sun Feb 16 04:59:29 2025 user.notice firewall: Reloading firewall due to ifup of lan (br-lan.1001)
Thu Feb 20 08:08:03 2025 cron.err crond[1506]: time disparity of 5948 minutes detected

(the first line can be ignored, it's just to show that there are no other logs the 4 days before that and also, interestingly, the difference between the two timestamps is 5949 minutes.)

but the time appears to be now correct on openwrt:

root@erx:~# date
Thu Feb 20 08:22:14 CET 2025

I found multiple issues related to the clock sync of the mt7621 chipset, but so far I can't find one that really matches my issue well enough.

I could upgrade to 24.10 and see if it helps, but that's really a shot in the dark then.

[edit] couple more commands that might help understanding what is happening.

root@erx:~# hwclock -r
hwclock: can't open '/dev/misc/rtc': No such file or directory
root@erx:~# ps | grep ntp
 2078 root      2912 S    {ntpd} /sbin/ujail -t 5 -n ntpd -U ntp -G ntp -C /etc/capabilities/ntpd.json -c -u -r /bin/ubus -r /usr/bin/
 2093 ntp       1376 S    /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.open
 3609 root      1376 R    grep ntp
root@erx:~# 
root@erx:~# /etc/init.d/sysntpd status
running

psherman · February 20, 2025, 7:33pm

To clarify, is this just a single device?
What about other devices?
Which network is this relating to (lan, guest)?
Does the device in question have a DHCP reservation set for it (on the main router)?
Does the problem resolve when you reboot the OpenWrt device?
What happens if you main restart the router? (instead of the OpenWrt device)

This is expected. This is from the mgmt interface verifying that there is no other DHCP server on that network before it starts its own.

Time sync doesn't matter for DHCP, especially on a switch/bridged-AP which is supposed to be effectively transparent.

jxp · February 21, 2025, 8:19am

It is a single device each time, but always a different one. It seems to me it is whichever has its DHCP lease expiring first.

Other devices will have an IP (for now?) at the time the one device is impacted.

This is lan network.

Yes, DHCP reservation is set on the main router.

yes, immediately and every time so far (had the issue 4 times so far). It appears to me, this started shortly after I introduced vlans on the openwrt device, but I am not sure.

I have noticed at least once where rebooting the main router triggered the issue, then rebooting openwrt fixed it. I believe (it was two or three months ago) that at the time I was recabling some things and hence the main router stayed down or disconnected for half an hour something like that.

But in general, I would say that rebooting the main router (without any down time – immediate reboot) does not trigger the issue. I regularly upgrade the main router, it requires a reboot about half the time I would say, hence I am quite certain of this.

I hope next time the issue happens I have some time to troubleshoot, at least do a quick tcpdump -i any port 67 or port 68 on openwrt.

jxp · March 11, 2025, 7:47am

So, issue finally happened at a time where I could troubleshoot. For context, this is happening on 23.03, 23.05 and now on 24.10.

I ran:
tcpdump -i eth0 port 67 or port 68 -w /tmp/eth0_capture.pcap &
tcpdump -i eth1 port 67 or port 68 -w /tmp/eth1_capture.pcap &
tcpdump -i eth2 port 67 or port 68 -w /tmp/eth2_capture.pcap &
tcpdump -i eth3 port 67 or port 68 -w /tmp/eth3_capture.pcap &
tcpdump -i br-lan.1001 port 67 or port 68 -w /tmp/br-lan1001_capture.pcap &

Client request is coming in via eth1, untagged, openwrt broadcasts it out all interfaces. Router is connected over eth3, vlan 1001.

DHCP requests are successfully forwarded to the router over eth3/br-lan.1001, I see ACK and Offer sent back by the router, but openwrt is not forwarding neither ACK nor Offer back to eth1 (or to any interface).

Rebooting openwrt fixed the issue, as usual.

One trigger for this issue is rebooting the home router I noticed. Now, when I upgrade the home router, I simply took the habit of rebooting the openwrt managed switch afterward as well. But it's annoying, but it often means, I need to go to the basement and powercycle it manually, as local network connectivity is lost.

Any pointers what I could check next time the issue occurs?

jxp · March 11, 2025, 8:03am

Forgot to mention, the syslog look interesting: Tue Mar 11 00:21:22 2025 daemon.warn dnsmasq-dhcp[1]: no address range available for DHCP request via br-lan.1001

Seems like something happened at 00:21 last night on the switch somehow, there's a huge amount of lines – but the syslog does not go back far enough to see the beginning. I will do some searching using the above error.

[edit] more interesting lines in syslog from last night, seems like a crash and device was in an unstable state after coming back up. The very first line is the "linux version" line, there's nothing before it, but that looks like a crash to me:

Tue Mar 11 00:21:05 2025 kern.notice kernel: [    0.000000] Linux version 5.15.167 (builder@buildhost) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 12.3.0 r24106-10cc5fcd00) 12.3.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Mon Sep 23 12:34:46 2024
[...]
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: child pid 2485 exited with code 127.
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: TC: tc-qos-helper.sh exited with code 127. Disabling it.
[...]
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: '/usr/lib/netdata/plugins.d/perf.plugin' (pid 2491) disconnected after 0 successful data collections (ENDs).
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: child pid 2491 exited with code 1.
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: '/usr/lib/netdata/plugins.d/perf.plugin' (pid 2491) exited with error code 1 and haven't collected any data. Disabling it.
Tue Mar 11 00:21:15 2025 daemon.info netdata[2193]: thread with task id 2489 finished
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: '/usr/lib/netdata/plugins.d/ioping.plugin' (pid 2490) disconnected after 0 successful data collections (ENDs).
Tue Mar 11 00:21:15 2025 daemon.info netdata[2193]: Initializing file /var/cache/netdata/netdata.statsd_packets/main.db.
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: child pid 2490 exited with code 127.
Tue Mar 11 00:21:15 2025 daemon.err netdata[2193]: '/usr/lib/netdata/plugins.d/ioping.plugin' (pid 2490) exited with error code 127 and haven't collected any data. Disabling it.