So I have had a rather annoying and persistent problem for over a year now, that has spanned over a number of different types of hardware (x86 and MediaTek) as well releases (23.01-23.05), relating to dnsmasq.
At seemingly random points (sometimes separated weeks, sometimes days, sometimes as soon as the router reboots etc.) the local network will suddenly lose DNS resolution. Upon logging into the router and checking the status of dnsmasq, when I run ps w
, this is what I get:
...
11020 root 2912 S {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmasq.conf -r /etc/ethers -
11021 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
11022 root 4532 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12495 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12504 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12505 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12507 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12510 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12511 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12514 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12515 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12520 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12521 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12522 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12523 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12524 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12525 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12528 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12529 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12530 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12531 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12532 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12533 dnsmasq 4572 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
For reference, this is what I normally see when dnsmasq is running correctly:
11020 root 2912 S {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmasq.conf -r /etc/ethers -
11021 dnsmasq 4588 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
11022 root 4532 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
Attempting /etc/init.d/dnsmasq restart
doesn't tend to fix the problem unfortunately - it just spawns another dozen processes when it restarts - so I have to do a hardware reboot of the router.
I used to think that this problem was due to my running OpenWRT in a virtualised LXC container, but having switched to dedicated hardware (TP-Link ER605), I am seeing the exact same issue.
These are the saliant parts of my dnsmasq config in /etc/config/dhcp
:
config dnsmasq
option noresolv '1'
list server '127.0.0.1#5453'
list server '/redacted2/10.0.2.1'
option proxydnssec '1'
option domainneeded '1'
option rebind_protection '1'
option rebind_localhost '1'
option expandhosts '1'
option authoritative '1'
option readethers '1'
option leasefile '/tmp/dhcp.leases'
option local '/redacted/'
option domain 'redacted'
option localservice '0'
list interface 'lan'
list interface 'ZT10'
option localise_queries '1'
option logqueries '1'
list rebind_domain 'redacted'
list rebind_domain 'plex.tv'
list rebind_domain 'plex.direct'
option confdir '/tmp/dnsmasq.d'
(Stubby handles the name resolution itself, with dnsmasq just acting as the downstream handler, as well as providing local domain resolution).
Having searched on the Internet multiple times over the last year or so, I haven't been able to find any similar instances of this problem, which is really quite depressing.
The closest thing I found was this post mentioning multiple (only three) dnsmasq processes spawning due to some strange behaviour by an LG TV, however I don't have any LG hardware in my home and to be honest I don't exactly understand how it caused multiple processes to be spawned.
Would really appreciate any pointers on how to try to get to the bottom of this (such as how the dnsmasq service handles spawning new processes).