Dnsmasq spawning over a dozen processes

So I have had a rather annoying and persistent problem for over a year now, that has spanned over a number of different types of hardware (x86 and MediaTek) as well releases (23.01-23.05), relating to dnsmasq.

At seemingly random points (sometimes separated weeks, sometimes days, sometimes as soon as the router reboots etc.) the local network will suddenly lose DNS resolution. Upon logging into the router and checking the status of dnsmasq, when I run ps w, this is what I get:

...
11020 root      2912 S    {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmasq.conf -r /etc/ethers -
11021 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
11022 root      4532 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12495 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12504 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12505 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12507 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12510 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12511 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12514 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12515 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12520 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12521 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12522 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12523 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12524 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12525 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12528 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12529 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12530 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12531 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12532 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
12533 dnsmasq   4572 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid

For reference, this is what I normally see when dnsmasq is running correctly:

11020 root      2912 S    {dnsmasq} /sbin/ujail -t 5 -n dnsmasq -u -l -r /bin/ubus -r /etc/TZ -r /etc/dnsmasq.conf -r /etc/ethers -
11021 dnsmasq   4588 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
11022 root      4532 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid

Attempting /etc/init.d/dnsmasq restart doesn't tend to fix the problem unfortunately - it just spawns another dozen processes when it restarts - so I have to do a hardware reboot of the router.

I used to think that this problem was due to my running OpenWRT in a virtualised LXC container, but having switched to dedicated hardware (TP-Link ER605), I am seeing the exact same issue.

These are the saliant parts of my dnsmasq config in /etc/config/dhcp:

config dnsmasq
        option noresolv '1'
        list server '127.0.0.1#5453'
        list server '/redacted2/10.0.2.1'
        option proxydnssec '1'
        option domainneeded '1'
        option rebind_protection '1'
        option rebind_localhost '1'
        option expandhosts '1'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option local '/redacted/'
        option domain 'redacted'
        option localservice '0'
        list interface 'lan'
        list interface 'ZT10'
        option localise_queries '1'
        option logqueries '1'
        list rebind_domain 'redacted'
        list rebind_domain 'plex.tv'
        list rebind_domain 'plex.direct'
        option confdir '/tmp/dnsmasq.d'

(Stubby handles the name resolution itself, with dnsmasq just acting as the downstream handler, as well as providing local domain resolution).

Having searched on the Internet multiple times over the last year or so, I haven't been able to find any similar instances of this problem, which is really quite depressing.

The closest thing I found was this post mentioning multiple (only three) dnsmasq processes spawning due to some strange behaviour by an LG TV, however I don't have any LG hardware in my home and to be honest I don't exactly understand how it caused multiple processes to be spawned.

Would really appreciate any pointers on how to try to get to the bottom of this (such as how the dnsmasq service handles spawning new processes).

Are these new processes children of the main dnsmasq process or ujail? dnsmasq will spawn a separate process when it has to send a dns request over tcp (versus udp). Check your log to see if that is happening. It looks like you have logging enabled.

Does it occur if you remove stubby as the upstream server?

1 Like

I would guess they are children of the ujail, considering the PID numbers, although I'm not entirely sure to be honest, as I don't have a particularly good understanding of how openwrt actually handles services like dnsmasq (looking at /etc/init.d/dnsmasq and the associated /lib/functions.sh files just made my head spin).

Check your log to see if that is happening. It looks like you have logging enabled.

Fri May  2 19:37:00 2025 daemon.info dnsmasq[1]: 6 fd78:6dc1:848d:0:4d3f:f95b:502:9d85/28788 forwarded connectivitycheck.gstatic.com to 127.0.0.1#5453
Fri May  2 19:37:00 2025 daemon.info dnsmasq[3]: 7 10.0.1.178/62532 query[AAAA] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[4]: 107 10.0.1.178/62533 query[A] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[5]: 207 10.0.1.178/62534 query[HTTPS] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[6]: 307 10.0.1.178/62535 query[AAAA] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[8]: 507 10.0.1.178/62536 query[A] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[9]: 607 10.0.1.178/62538 query[AAAA] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[10]: 707 10.0.1.178/62539 query[A] beacons.gcp.gvt2.com from 10.0.1.178
Fri May  2 19:37:00 2025 daemon.info dnsmasq[7]: 407 10.0.1.178/62537 query[HTTPS] beacons.gcp.gvt2.com from 10.0.1.178

I take it that the [n] after dnsmasq denotes a separate child process?

I have just noticed the following error:

Fri May  2 18:59:43 2025 daemon.err stubby[4073]: cts had a name argument that for a name that is not in the dict.Could not get qname from query: A helper function for dicts had a name argument that for a name that is not in the dict.Could not get qname from query: A helper function for dicts had a name argument th
at for a name that is not in the dict.Could not get qname from query: A helper function for dicts had a name argument that for a name that is not in the dict.Could not get qname from query: A helper function for dicts had a name argument that for a name that is not in the dict.Could not get qname from query: A help
er function for dicts had a name argument that for a name that is not in the dict.Could not get qname from query: A helper function for dicts had a name argument that for a name that is not in the dict.Could not get qname from query: A helper function for dicts had a name argument that for a name that is not in the
 dict.Could not get qname from query: A helper function for dicts had a name argument that

This seems quite conspicuous - I'm starting to think the problem might be Stubby, rather than dnsmasq. I will try installing https-dns-proxy and see if that resolves the issue.

Thanks very much for the pointers.

The increasing PIDs on the query logs suggests the queries are arriving via tcp to dnsmasq, at least from this particular client (.178). It is unusual (to me) to see so many incoming tcp queries instead of udp. Anything unique about this client?