DHCP not working after installing bind/named

Hello all!

I recently installed the bind/named DNS server on my Mikrotik hEX S to use with the adblock service. After some trouble with the adblock startup because of insufficient RAM, some thrashing, autoreboots and trouble with setting up swap with a swapfile on an SD card, I got it to run.

The DNS server and adblock are now working correctly, without rebooting the router due to low memory, but now my DHCP server no longer works :frowning: .

All the other networking part of the router is running fine, if I setup static IPs on the devices everything is perfect, but I cannot get the DHCP server to work anymore :confused: .

As per the wiki instructions, I disabled the DNS of dnsmasq by setting it to port 0, but the DHCP server now stopped working.

Below are the uci output for DHCP, without my static leases, and networking, without the MAC addresses and the IP and keys for the wireguard config:

Networking:

network.loopback=interface
network.loopback.device='lo'
network.loopback.proto='static'
network.loopback.ipaddr='127.0.0.1'
network.loopback.netmask='255.0.0.0'
network.globals=globals
network.globals.ula_prefix='feed::/48'
network.globals.packet_steering='1'
network.@device[0]=device
network.@device[0].name='br-lan'
network.@device[0].type='bridge'
network.@device[0].ports='lan2' 'lan3' 'lan4' 'lan5'
network.@device[1]=device
network.@device[1].name='lan2'
network.@device[1].macaddr='XX:XX:XX:XX:XX:XX'
network.@device[2]=device
network.@device[2].name='lan3'
network.@device[2].macaddr='XX:XX:XX:XX:XX:XX'
network.@device[2].ipv6='1'
network.@device[2].multicast_router='1'
network.@device[3]=device
network.@device[3].name='lan4'
network.@device[3].macaddr='XX:XX:XX:XX:XX:XX'
network.@device[3].ipv6='1'
network.@device[3].multicast_router='1'
network.@device[4]=device
network.@device[4].name='lan5'
network.@device[4].macaddr='XX:XX:XX:XX:XX:XX'
network.@device[4].ipv6='1'
network.@device[4].multicast_router='1'
network.lan=interface
network.lan.device='br-lan'
network.lan.proto='static'
network.lan.ipv6='1'
network.lan.ip6assign='65'
network.lan.ipaddr='192.168.3.1/25'
network.lan.ip6addr='feed::cafe/96'
network.lan.broadcast='192.168.3.127'
network.lan.ip6ifaceid='::cafe'
network.lan.gateway='192.168.0.1'
network.lan.dns_search='home'
network.lan.ip6class='local'
network.@device[5]=device
network.@device[5].name='br-wan'
network.@device[5].type='bridge'
network.@device[5].ports='wan' 'sfp'
network.@device[6]=device
network.@device[6].name='wan'
network.@device[6].macaddr='XX:XX:XX:XX:XX:XX'
network.@device[6].ipv6='1'
network.@device[6].multicast_router='1'
network.@device[7]=device
network.@device[7].name='sfp'
network.@device[7].macaddr='XX:XX:XX:XX:XX:XX'
network.wan=interface
network.wan.device='br-wan'
network.wan.proto='dhcp'
network.wan.delegate='0'
network.wan6=interface
network.wan6.device='br-wan'
network.wan6.proto='dhcpv6'
network.wan6.reqaddress='try'
network.wan6.reqprefix='64'
network.wan6.ip6assign='64'
network.wan6.ip6ifaceid='::cafe:0'
network.wg0=interface
network.wg0.proto='wireguard'
network.wg0.private_key='<redacted>'
network.wg0.delegate='0'
network.wg0.addresses='192.168.3.190/25'
network.@wireguard_wg0[0]=wireguard_wg0
network.@wireguard_wg0[0].description='wire_guard'
network.@wireguard_wg0[0].public_key='<redacted>'
network.@wireguard_wg0[0].route_allowed_ips='1'
network.@wireguard_wg0[0].endpoint_host='<redacted>'
network.@wireguard_wg0[0].endpoint_port='1337'
network.@wireguard_wg0[0].allowed_ips='192.168.3.129/25'
network.@wireguard_wg0[0].persistent_keepalive='16'

DHCP:

dhcp.@dnsmasq[0]=dnsmasq
dhcp.@dnsmasq[0].domainneeded='1'
dhcp.@dnsmasq[0].localise_queries='1'
dhcp.@dnsmasq[0].rebind_protection='1'
dhcp.@dnsmasq[0].rebind_localhost='1'
dhcp.@dnsmasq[0].local='/home/'
dhcp.@dnsmasq[0].domain='home'
dhcp.@dnsmasq[0].expandhosts='1'
dhcp.@dnsmasq[0].cachesize='10000'
dhcp.@dnsmasq[0].readethers='1'
dhcp.@dnsmasq[0].leasefile='/tmp/dhcp.leases'
dhcp.@dnsmasq[0].resolvfile='/tmp/resolv.conf.d/resolv.conf.auto'
dhcp.@dnsmasq[0].localservice='1'
dhcp.@dnsmasq[0].ednspacket_max='1232'
dhcp.@dnsmasq[0].sequential_ip='1'
dhcp.@dnsmasq[0].confdir='/tmp/dnsmasq.d'
dhcp.@dnsmasq[0].enable_tftp='1'
dhcp.@dnsmasq[0].tftp_root='/TFTP'
dhcp.@dnsmasq[0].dhcp_boot='openwrt-23.05-mikrotik_cap-ac-initramfs.bin'
dhcp.@dnsmasq[0].interface='lan'
dhcp.@dnsmasq[0].port='0'
dhcp.lan=dhcp
dhcp.lan.interface='lan'
dhcp.lan.start='6'
dhcp.lan.limit='150'
dhcp.lan.leasetime='12h'
dhcp.lan.dhcpv4='server'
dhcp.lan.dhcpv6='hybrid'
dhcp.lan.ra='hybrid'
dhcp.lan.master='0'
dhcp.lan.ra_useleasetime='1'
dhcp.lan.dhcp_option='3,192.168.3.1 6,192.168.3.1'
dhcp.wan=dhcp
dhcp.wan.interface='wan'
dhcp.wan.ignore='1'
dhcp.odhcpd=odhcpd
dhcp.odhcpd.maindhcp='0'
dhcp.odhcpd.leasefile='/tmp/hosts/odhcpd'
dhcp.odhcpd.leasetrigger='/usr/sbin/odhcpd-update'
dhcp.odhcpd.loglevel='4'
dhcp.@host[0]=host
dhcp.@host[0].name='My AP Name'
[...]
dhcp.@host[8].leasetime='7d'

What am I doing wrong? Should I revert the bind installation?

PS: If anyone knows of a good way to mount a loop device at boot and swapon to that, I very much will be taking suggestions!

ubus call system board

It's a Mikrotik hEX S, as stated in the start of the post, sorry that I forgot to say it's in OpenWrt 23.05.0-rc2, but here it goes:

{
	"kernel": "5.15.118",
	"hostname": "Hex-S",
	"system": "MediaTek MT7621 ver:1 eco:4",
	"model": "MikroTik RouterBOARD 760iGS",
	"board_name": "mikrotik,routerboard-760igs",
	"rootfs_type": "squashfs",
	"release": {
		"distribution": "OpenWrt",
		"version": "23.05.0-rc2",
		"revision": "r23228-cd17d8df2a",
		"target": "ramips/mt7621",
		"description": "OpenWrt 23.05.0-rc2 r23228-cd17d8df2a"
	}
}

Everything was working as intended until I installed bind, then it started to crash when loading the adblock lists, I don't know for sure if it was the grep, awk or named that was causing the Out of Memory errors, but setting up swap solved this issue.

Everything is working, except DHCP. For now, the most critical devices in the network are using static IPs, but I "kinda" miss using DHCP. Have you noticed any mistake in the configs that I might have overlooked?

First sysupgrade to stable release.
You have zero for dns port, and now bind has to provide recursive service to lan?

Yes, bind is the new recursive DNS

Something weird came up during the update, something about the board being not defined.

It updated to 23.5.3 and seems to be working as a router, but all my installed packages are gone (including bind), I can't access it via SSH and, even after re-enabling the DNS in dnsmasq, I don't have any DNS.

Still, no DHCP in sight...

I'll try to restore my packages and see if I can at least get the named server working again.

I started writing the reply while the update image was uploading, but everything broke in the meanwhile. Now bind is no longer the DNS, as it seems to have been uninstalled, haha (:cry:)

I has a screenshot of the error, but it failed to upload because no DNS. Let's see if I can get anything to work again

Also, the ext-root didn't mount when it reboot. I have just revised the wiki and saw that it advises against updating via opkg updgrade, does the luci interface do upgrades this way?

Ok, going through syslog via luci shows me why I don't have SSH:

authpriv.warn dropbear[1917]: Failed listening on '22': Error listening: Address not available
authpriv.info dropbear[1917]: Not backgrounding

I also seem to have a bad IPv4 on line 48 of the dnsmasq config, but without SSH is there any way for me to fix it?

daemon.crit dnsmasq[1]: bad IPv4 address at line 48 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
[...]
daemon.info procd: Instance dnsmasq::cfg01411c s in a crash loop 6 crashes, 0 seconds since last crash

Hmm, about the SSH, I changed it to 2222 and "could connect", but the connection instantly dies. From the syslog:

Mon Jun 24 01:10:18 2024 authpriv.notice dropbear[6562]: Pubkey auth succeeded for 'root' with ssh-rsa key SHA256:<redacted> from 192.168.3.4:52148
Mon Jun 24 01:10:18 2024 authpriv.info dropbear[6563]: Exit (root) from <192.168.3.4:52148>: Child failed
Mon Jun 24 01:10:18 2024 authpriv.info dropbear[6562]: Exit (root) from <192.168.3.4:52148>: Disconnect received

Maybe bash is the culprit, because the log also contains:

Mon Jun 24 01:04:00 2024 cron.err crond[5426]: can't execute '/bin/bash' for user root
Mon Jun 24 01:04:00 2024 cron.err crond[5427]: can't execute '/bin/bash' for user root

Also, looking at my first post, line 48 of the dhcp config seems to fall right in my custom static leases :confused:

Ok, deleted all the static leases, now the error is on line 39 or 41...

daemon.crit dnsmasq[1]: bad IPv4 address at line 39 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.crit dnsmasq[1]: bad IPv4 address at line 39 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.crit dnsmasq[1]: bad IPv4 address at line 39 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.crit dnsmasq[1]: bad IPv4 address at line 39 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.crit dnsmasq[1]: bad IPv4 address at line 39 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.crit dnsmasq[1]: bad IPv4 address at line 39 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.info procd: Instance dnsmasq::cfg01411c s in a crash loop 6 crashes, 0 seconds since last crash
daemon.crit dnsmasq[1]: bad IPv4 address at line 41 of /var/etc/dnsmasq.conf.cfg01411c
daemon.crit dnsmasq[1]: FAILED to start up
daemon.info procd: Instance dnsmasq::cfg01411c s in a crash loop 7 crashes, 0 seconds since last crash
daemon.warn odhcpd[1502]: No default route present, overriding ra_lifetime!

Ok, the kernel updated between 23.5-rc2 and 23.5.3, so thats why some things broke, as I had kmod packages installed. Maybe updating to a stable release wasn't such a good idea...

I'm going to sleep now, this has consumed too much of today. Tomorrow I'll try some more to fix this mess...

You have to reset config.
option ipv6 1 is from super ancient version

Ok, I'll reset the router and try to apply the configs and packages again.

I bought and flashed the router last year, installing some release candidate of OpenWrt 23.5, can't remember if it was already rc2 or something older, maybe rc1, so I don't think the configs are that ancient to begin with.

I'll post how it goes

Welp, lost the installed package list, bind dns zones, ext-root and some configs, but at least DHCP and DNS are working again. I'll try to get the ext-root config going again first, just a moment

EDIT: Got the SD card to be recognized, but while following the ext-root procedure from the wiki (except the part about formatting the disk) I seem to have lost everything in the original partition, maybe something to do with the tar command?

Anyway, ext-root is not auto-mounting on boot, I will try some more

You cannot reset or upgrade after extroot

Oh, that would have been nice to know if it was in the wiki. The only info on sysupgrade was about the kernel incompatibilities, which I thought maybe could have been solved with some careful edits

Oh well, back to configuring everything again... Extroot still not mounting, but let me see if I can get it to work somehow

You couuld drop a wislist topic to require force sysupgrade with extroot.

1 Like

Ok, I'm out of home now, will be back in a few hours and I'll try to get everything to work. Besides kernel modules, do you know of any other possible incompatibilities that may be caused by sysupgrade?

1 Like

For sysupgrade - do you really need extroot, or better keeping sysupgrade (or auc) capability and mount e.g usb storage and divert all data to subdirectories of "external storage" mount?

Solved my problem with the extroot, it was a leftover file on the /etc/.extroot-uuid of the extroot device.

I erased the /lib/modules of the extroot first, before mounting on /overlay, and than got it to mount correctly, than I was able to use the overlayfs on the extroot fine after that.

The reset seems to have worked, both bind and dnsmasq are working fine now. I diffed some config files to get most of the stuff back. Lost some configs, like my static DHCP leases and the wireguard keys, and most of my custom packages, although mostly working, are not reported by opkg as installed.

Nothing too bad, for the next days I'll see if I find something missing and watch out for what screams. I survive another day without my family trying to kill me because I "broke the internet" :sweat_smile: .

Thanks for the help!

In the end it seems that all I needed was to reset the router to (OpenWrt-)factory and reinstall everything. I should have backed up the package list before, but well, lesson learned...

Yeah, that's a good question, this could have saved me from a lot of trouble. :zipper_mouth_face:

I like the overlayfs thing because symlinks get chaotic quickly. Do you know what are the rules on editing the wiki? Maybe I should repport some of my trouble and how to fix them there for future adventurers :smile: .