Dnsmasq repeatedly crashes on OpenWrt — need help identifying root cause

Hi everyone,

I’m experiencing repeated dnsmasq crashes on my OpenWrt router, and I’m trying to understand the root cause. I’ve already collected detailed logs, crash snapshots, and strace attempts, but the failure reason is still unclear.

:chart_decreasing: Problem summary

  • dnsmasq runs normally for some time

  • suddenly disappears (no process, port 53 not listening)

  • my monitoring script detects the crash and restarts dnsmasq

  • this may repeat several times in a short period

  • crash passports show “Unknown reason” or “Process missing”

  • strace sometimes attaches, but often produces only a single line like:

    Код

    epoll_pwait(3,
    
    

    and nothing else

:pushpin: Environment

  • OpenWrt (stable release)

  • dnsmasq-full

  • multiple upstream DNS services (DoH/DoT forwarders)

  • nftables firewall with large dynamic sets

  • no OOM events, RAM usage is low

  • two dnsmasq processes normally run (main + helper)

:scroll: What I already checked

  • RAM, CPU, load → normal

  • no kernel OOM

  • no segfaults in dmesg

  • no relevant messages in system log

  • netlink sockets look normal

  • dnsmasq logs do not show any fatal errors

  • strace often fails to capture anything meaningful

  • crash snapshots show dnsmasq simply disappears without trace

:paperclip: Example crash moment

Port 53 suddenly stops listening, both dnsmasq PIDs vanish, and my script reports:

Код

dnsmasq: not running
WARNING: port 53 not listening!
dnsmasq CRASH detected — reason: Unknown

:red_question_mark: What I need help with

I’m trying to understand why dnsmasq is crashing silently and what additional diagnostics I should collect.

Could you please advise:

  1. Which logs or debug flags should I enable in dnsmasq to capture internal errors?

  2. Is there a recommended way to run dnsmasq with full debugging (e.g., --log-queries, --log-dhcp, --log-debug)?

  3. Could nftables dynamic sets or large rulesets cause dnsmasq to crash?

  4. Is it normal to have two dnsmasq processes (main + helper)?

  5. How can I reliably capture the exact moment of failure?

    • strace

    • gdb

    • core dumps

    • procd logging

  6. Are there known issues in recent OpenWrt builds that could cause dnsmasq to exit without logs?

Any guidance on what to check next or how to instrument dnsmasq properly would be greatly appreciated.

Thanks!

The very first thing we'll need is some of the config info:

Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button (red circle; this works best in the 'Markdown' composer view in the blue oval):

Screenshot 2025-10-20 at 8.14.14 PM

Remember to redact passwords, VPN keys, MAC addresses and any public IP addresses you may have:

ubus call system board
cat /etc/config/network
cat /etc/config/dhcp

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd00:192:1682::/48'
	option source_filter '0'
	option packet_steering '2'
	option steering_flows '256'
	option udp_l3mdev '1'
	option tcp_l3mdev '1'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth1'
	option multicast '1'
	option multicast_querier '1'

config device
	option name 'eth1'
	option macaddr '1a:79:2e:4d:9e:72'
	option multicast '1'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.2.1'
	option netmask '255.255.255.0'
	option ip6assign '64'

config device
	option name 'eth0'
	option macaddr '1a:79:2e:4d:9e:71'

config interface 'wan'
	option device 'eth0'
	option proto 'dhcp'
	option peerdns '0'
	option defaultroute '0'
	option delegate '0'

config interface 'wan6'
	option device 'eth0'
	option proto 'dhcpv6'
	option disabled '1'
	option auto '0'
	option reqaddress 'try'
	option reqprefix 'auto'
	option norelease '1'
	option defaultroute '0'
	option peerdns '0'
	option sourcefilter '0'
	option delegate '0'

config interface 'rt'
	option proto 'pppoe'
	option device 'eth0'
	option username '465793-1@BLG'
	option password 'bMi15JnS'
	option ipv6 'auto'
	option keepalive '10 3'
	option service 'BLGD-BRAS1'
	option force_link '1'
	option peerdns '0'
	option metric '2000'
	option ip6assign '64'
	option ip6prefix 'no'
	option delegate '0'
	option sourcefilter '0'
	option norelease '1'


config interface 'oc0'
	option proto 'openconnect'
	option vpn_protocol 'anyconnect'
	option uri ''
	option username ''
	option password ''
	option delegate '0'
	option defaultroute '0'
	option peerdns '0'
	option serverhash ''

config interface 'oc1'
	option proto 'openconnect'
	option vpn_protocol 'anyconnect'
	option uri ''
	option username ''
	option password ''
	option delegate '0'
	option defaultroute '0'
	option peerdns '0'
	option serverhash ''
	option disabled '1'

config interface 'oc2'
	option proto 'openconnect'
	option vpn_protocol 'anyconnect'
	option uri '
	option serverhash ''
	option username ''
	option password ''
	option defaultroute '0'
	option delegate '0'

config interface 'wg0'
	option proto 'wireguard'
	option private_key '+LW1DG8qXYBkp24ldfDDXeBViCZDMtbNSbJekjSz/lA='
	list addresses '172.16.0.2/32'
	list addresses '2606:4700:110:8087:26f1:8184:cde3:c393/128'
	option auto '0'

config wireguard_wg0
	option public_key 'bmXOC+F1FxEMF9dyiK2H5/1SUtzH0JuVo51h2wPfgyo='
	option endpoint_host '162.159.192.1'
	option endpoint_port '2408'
	option persistent_keepalive '25'
	list allowed_ips '0.0.0.0/0'
	list allowed_ips '::/0'
config dnsmasq
	option domainneeded '1'
	option rebind_protection '1'
	option local '/lan/'
	option domain 'openwrt.lan'
	option expandhosts '1'
	option cachesize '9500'
	option prefetch '1'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option localservice '1'
	option ednspacket_max '1232'
	option dnsforwardmax '150'
	list confdir '/etc/dnsmasq.d'
	option dhcp_script '/usr/lib/dnsmasq/dhcp-script.sh'
	option logqueries '1'
	option logdhcp '1'
	option max_cache_ttl '86400'
	option stripsubnet '1'
	option logfacility '/tmp/dnsmasq.log'
	list interface 'lan'
	list server '/mask.icloud.com/'
	list server '/mask-h2.icloud.com/'
	list server '/use-application-dns.net/'
	list server '127.0.0.1#5053'
	list server '127.0.0.1#5054'
	list server '127.0.0.1#5055'
	list server '127.0.0.1#5056'
	option rebind_localhost '1'
	option localise_queries '1'
	option stripmac '1'
	list addnhosts '/etc/dns_doh'
	option max_file_limit '2048'
	option bind_dynamic '1'
	option noresolv '1'
	list notinterface 'oc0'
	list notinterface 'oc1'
	list notinterface 'oc2'
	list notinterface 'rt'
	list notinterface 'rt_6'
	list notinterface 'wan'
	list notinterface 'wan6'
	list notinterface 'wg0'
	option dnssec '1'
        option dnsseccheckunsigned '1'
	option min_cache_ttl '60'
	option doh_backup_noresolv '1'
	list doh_backup_server '/mask.icloud.com/'
	list doh_backup_server '/mask-h2.icloud.com/'
	list doh_backup_server '/use-application-dns.net/'
	list doh_backup_server '127.0.0.1#5053'
	list doh_backup_server '127.0.0.1#5054'
	list doh_backup_server '127.0.0.1#5055'
	list doh_backup_server '127.0.0.1#5056'
	list doh_server '127.0.0.1#5053'
	list doh_server '127.0.0.1#5054'
	list doh_server '127.0.0.1#5055'
	list doh_server '127.0.0.1#5056'


config dhcp 'lan'
	option interface 'lan'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option ra 'server'
	option dhcpv4 'server'
	option dhcpv6 'server'
	option ra_management '1'
	option ra_advrouter '1'
	option ra_interval '54000'
	option ra_lifetime '3600'
	option preferred_lifetime '24h'
	list dhcp_option '44,192.168.2.1'
	option ra_default '2'

config dhcp 'wan'
	option interface 'wan'
	option ignore '1'
	option start '100'
	option limit '150'
	option leasetime '12h'
	option dynamicdhcp '0'
	option peerdns '0'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'
	option piofolder '/tmp/odhcpd-piofolder'

root@WROUTER:~# ubus call system board
{
        "kernel": "6.6.119",
        "hostname": "WROUTER",
        "system": "ARMv8 Processor rev 0",
        "model": "FriendlyElec NanoPi R3S",
        "board_name": "friendlyarm,nanopi-r3s",
        "rootfs_type": "ext4",
        "release": {
                "distribution": "OpenWrt",
                "version": "24.10.5",
                "revision": "r29087-d9c5716d1d",
                "target": "rockchip/armv8",
                "description": "OpenWrt 24.10.5 r29087-d9c5716d1d",
                "builddate": "1766005702"
        }
}
root@WROUTER:~#

This configuration is extremely unusual. Where did all of this come from? Did you import it from the friendlywrt firmware, or is this all configured from scratch on official openwrt? And how did you arrive at all the config options here? Did you follow any guides? AI? Something else?

This setup is built on official OpenWrt, not FriendlyWRT or any other fork.
All configuration was done manually, step‑by‑step, over time — nothing was imported, auto‑generated, or created by AI.

If something in the configuration looks unusual to you, please specify exactly which part concerns you or seems incorrect.
I’m open to clarifying any section, but I need to understand what specifically you think is problematic.

1 Like

There is just a huge amount of uncommon stuff going on. I can’t speak to if any of it is suspect, but the dnsmasq config sees like it is potentially harboring some errors. I can’t say what, though, as I don’t use doh. Maybe someone else will spot an issue that could be causing the crash.

Thanks for taking a look.

Yes, the setup is more complex than a typical home OpenWrt installation — but it’s all built manually on official OpenWrt, not imported from FriendlyWRT or generated by AI. Every component was added step‑by‑step over time while experimenting with DNS architecture, DoH/DoT chains, nftables sets, and monitoring.

If something in the dnsmasq configuration looks suspicious to you, I’d appreciate it if you could point out which part specifically raises concerns. I’m fully open to reviewing any section, but I need to know what exactly seems unusual from your perspective.

Right now the main issue is that dnsmasq crashes silently without logs, even with DoH/DoT disabled during testing. So if anyone spots a potential misconfiguration or knows what additional diagnostics I should collect, I’m definitely interested

config main 'config'
option canary_domains_icloud '1'
option canary_domains_mozilla '1'
option dnsmasq_config_update '*'
option force_dns '1'
list force_dns_port '53'
list force_dns_src_interface 'lan'
option procd_trigger_wan6 '0'
option heartbeat_domain 'heartbeat.melmac.ca'
option heartbeat_sleep_timeout '10'
option heartbeat_wait_timeout '10'
option user 'nobody'
option group 'nogroup'
option listen_addr '127.0.0.1'

config https-dns-proxy
option bootstrap_dns '1.1.1.1,1.0.0.1,2606:4700:4700::1111,2606:4700:4700::1001'
option resolver_url 'https://cloudflare-dns.com/dns-query'
option listen_addr '127.0.0.1'
option listen_port '5053'
option http2 '1'
option keepalive '1'
option user 'nobody'
option group 'nogroup'

config https-dns-proxy
option bootstrap_dns '8.8.8.8,8.8.4.4,2001:4860:4860::8888,2001:4860:4860::8844'
option resolver_url 'https://dns.google/dns-query'
option listen_addr '127.0.0.1'
option listen_port '5054'
option http2 '1'
option keepalive '1'
option user 'nobody'
option group 'nogroup'

config https-dns-proxy
option resolver_url 'https://dns.quad9.net/dns-query'
option bootstrap_dns '9.9.9.11,149.112.112.11,2620:fe::11,2620:fe::fe:11'
option listen_addr '127.0.0.1'
option listen_port '5055'
option http2 '1'
option keepalive '1'
option user 'nobody'
option group 'nogroup'

config https-dns-proxy
option resolver_url 'https://doh.opendns.com/dns-query'
option bootstrap_dns '208.67.222.222,208.67.220.220,2620:119:35::35,2620:119:53::53'
option listen_addr '127.0.0.1'
option listen_port '5056'
option http2 '1'
option keepalive '1'
option user 'nobody'
option group 'nogroup'

The configuration may look unusual, but it is fully understood and maintained manually.
A real specialist can immediately point out what is correct, what is unnecessary, and what should or should not be enabled — and I’ve already received such feedback from people who actually understand dnsmasq and OpenWrt internals.

Your message, however, doesn’t provide any concrete observations or technical points.
If you see a specific misconfiguration, please point it out directly.
General comments like “this looks strange” don’t help with diagnosing the crash.

You shoud dump dnsmasq statistics

killall -s USR1 dnsmasq
sleep 1
logread -e dnsmasq | tail

you say huge sets - how huge exactly?
You have to show your monitoring script, it is totally impissibel for dnsmasq to unlisten on a port then still churn epoll(), likely your (totally not AI slop) monitoring script just kills it.

Then in other news...

for the 2nd no- procd would log the crash and restart it.

Some random questions, are you using the vendor drivers for the realtek adapters ? If so try the standard r8169 driver instead.

Are you using containers in some manner ? One of the reasons I switched away from using containers is that dns would randomly pop so that’s why I ask.

I would probably consult those people who actually understand DNSMasq, why do you ask at this forum or do those real specialist do not know the cause?

You can set coredumps to be created, and then you can analyse the coredumps with gdb.

Ideally you use a locally compiled Dnsmasq on the router, so that you have the non-stripped binary in your build system, so that symbol table matches. (In router, the stripped binaries are normally installed to save space)

You can analyse coredumps, but you could also use remote gdb to launch Dnsmasq, do that you have the gdb control in the build system and at/after crash you can look at the variables etc.

1 Like

Something I have noticed is that if a filter (which become a server line) is not accessible at the time the instance is launching, then that instance will bail (not an actual crash, but will be an unexpected exit for any supervisor process). So if you have a dependency thing going on (some dnsmasq instance is needed in order to be able to resolve another dnsmasq's filter (server), then the dependant instance will bail until the server is available).

In your case it may be the listeners on ports 505X are not yet available when the dnsmasq instance(s) are trying to start.

I’m using the standard OpenWrt kmod-r8169 driver:

  • kmod-r8169 6.6.119-r1
  • r8169-firmware 20241110-r2

The network is stable on both interfaces — no link drops, no flapping, no carrier loss. So Realtek vendor drivers are not involved here.

I’m also not using any containers (no Docker, LXC, Podman, etc.), so there are no virtual interfaces being created by container runtimes.

The system does use multiple netlink tables (around 10) because of nftables sets and routing policies, but the interfaces themselves remain stable.

There are always specialists present on any technical forum — OpenWrt is no exception.
I’m not addressing casual users, but those who actually understand the deeper parts of OpenWrt configuration and dnsmasq behavior.

That’s why I’m asking here: this community includes people with real hands‑on experience, and I’m hoping to hear from those who are familiar with these specific mechanisms.

Thanks — yes, enabling coredumps and analysing them with gdb is definitely the next step.

I’m aware that the dnsmasq binary shipped in OpenWrt is stripped, so for proper symbol resolution I’ll need to build dnsmasq locally and deploy the unstripped version (or at least keep the matching build tree for gdb). That part is clear.

Remote gdb is also an option — launching dnsmasq under gdbserver and attaching from the build system would allow inspecting variables and state right at the moment of the crash.

I’ll prepare the environment for that and try to capture a reproducible crash with a matching unstripped binary.

That’s a good point — dnsmasq will indeed exit immediately if a server= target is unreachable at startup, and that can look like a crash from the outside. I’ve tested that scenario.

In my case the upstream listeners on ports 5053–5056 (DoH/DoT forwarders) are already up and accepting connections before dnsmasq starts. They are supervised separately and start earlier in the boot sequence, so dnsmasq doesn’t depend on them to resolve its own filters.

I also reproduced the crash during normal runtime, not only at startup — dnsmasq can run for hours and then suddenly disappear even though all upstream ports remain available and healthy.

So the “unavailable server= dependency” scenario doesn’t seem to be the trigger here, but I agree it’s something worth checking.

1 Like

You have no crash recorded in your logs, seems a pretty pointless chore as of now

The problem is exactly that dnsmasq does crash, but leaves absolutely no trace in syslog or dmesg. That’s why I’m trying to understand what additional mechanisms I can enable to capture the actual reason — coredumps, gdb, unstripped binaries, or anything else that can record the moment of failure.

Without logs it’s impossible to diagnose, so the goal is to figure out how to make dnsmasq produce meaningful crash evidence.

If you know a reliable way to force dnsmasq to leave a trace when it exits unexpectedly, I’d appreciate the guidance.

Show your AI slop monitoring script. procd services dont disappear silently,