I’m experiencing repeated dnsmasq crashes on my OpenWrt router, and I’m trying to understand the root cause. I’ve already collected detailed logs, crash snapshots, and strace attempts, but the failure reason is still unclear.
Problem summary
dnsmasq runs normally for some time
suddenly disappears (no process, port 53 not listening)
my monitoring script detects the crash and restarts dnsmasq
this may repeat several times in a short period
crash passports show “Unknown reason” or “Process missing”
strace sometimes attaches, but often produces only a single line like:
Код
epoll_pwait(3,
and nothing else
Environment
OpenWrt (stable release)
dnsmasq-full
multiple upstream DNS services (DoH/DoT forwarders)
nftables firewall with large dynamic sets
no OOM events, RAM usage is low
two dnsmasq processes normally run (main + helper)
What I already checked
RAM, CPU, load → normal
no kernel OOM
no segfaults in dmesg
no relevant messages in system log
netlink sockets look normal
dnsmasq logs do not show any fatal errors
strace often fails to capture anything meaningful
crash snapshots show dnsmasq simply disappears without trace
Example crash moment
Port 53 suddenly stops listening, both dnsmasq PIDs vanish, and my script reports:
Код
dnsmasq: not running
WARNING: port 53 not listening!
dnsmasq CRASH detected — reason: Unknown
What I need help with
I’m trying to understand why dnsmasq is crashing silently and what additional diagnostics I should collect.
Could you please advise:
Which logs or debug flags should I enable in dnsmasq to capture internal errors?
Is there a recommended way to run dnsmasq with full debugging (e.g., --log-queries, --log-dhcp, --log-debug)?
Could nftables dynamic sets or large rulesets cause dnsmasq to crash?
Is it normal to have two dnsmasq processes (main + helper)?
How can I reliably capture the exact moment of failure?
strace
gdb
core dumps
procd logging
Are there known issues in recent OpenWrt builds that could cause dnsmasq to exit without logs?
Any guidance on what to check next or how to instrument dnsmasq properly would be greatly appreciated.
The very first thing we'll need is some of the config info:
Please connect to your OpenWrt device using ssh and copy the output of the following commands and post it here using the "Preformatted text </> " button (red circle; this works best in the 'Markdown' composer view in the blue oval):
Remember to redact passwords, VPN keys, MAC addresses and any public IP addresses you may have:
ubus call system board
cat /etc/config/network
cat /etc/config/dhcp
This configuration is extremely unusual. Where did all of this come from? Did you import it from the friendlywrt firmware, or is this all configured from scratch on official openwrt? And how did you arrive at all the config options here? Did you follow any guides? AI? Something else?
This setup is built on official OpenWrt, not FriendlyWRT or any other fork.
All configuration was done manually, step‑by‑step, over time — nothing was imported, auto‑generated, or created by AI.
If something in the configuration looks unusual to you, please specify exactly which part concerns you or seems incorrect.
I’m open to clarifying any section, but I need to understand what specifically you think is problematic.
There is just a huge amount of uncommon stuff going on. I can’t speak to if any of it is suspect, but the dnsmasq config sees like it is potentially harboring some errors. I can’t say what, though, as I don’t use doh. Maybe someone else will spot an issue that could be causing the crash.
Yes, the setup is more complex than a typical home OpenWrt installation — but it’s all built manually on official OpenWrt, not imported from FriendlyWRT or generated by AI. Every component was added step‑by‑step over time while experimenting with DNS architecture, DoH/DoT chains, nftables sets, and monitoring.
If something in the dnsmasq configuration looks suspicious to you, I’d appreciate it if you could point out which part specifically raises concerns. I’m fully open to reviewing any section, but I need to know what exactly seems unusual from your perspective.
Right now the main issue is that dnsmasq crashes silently without logs, even with DoH/DoT disabled during testing. So if anyone spots a potential misconfiguration or knows what additional diagnostics I should collect, I’m definitely interested
The configuration may look unusual, but it is fully understood and maintained manually.
A real specialist can immediately point out what is correct, what is unnecessary, and what should or should not be enabled — and I’ve already received such feedback from people who actually understand dnsmasq and OpenWrt internals.
Your message, however, doesn’t provide any concrete observations or technical points.
If you see a specific misconfiguration, please point it out directly.
General comments like “this looks strange” don’t help with diagnosing the crash.
you say huge sets - how huge exactly?
You have to show your monitoring script, it is totally impissibel for dnsmasq to unlisten on a port then still churn epoll(), likely your (totally not AI slop) monitoring script just kills it.
Then in other news...
for the 2nd no- procd would log the crash and restart it.
You can set coredumps to be created, and then you can analyse the coredumps with gdb.
Ideally you use a locally compiled Dnsmasq on the router, so that you have the non-stripped binary in your build system, so that symbol table matches. (In router, the stripped binaries are normally installed to save space)
You can analyse coredumps, but you could also use remote gdb to launch Dnsmasq, do that you have the gdb control in the build system and at/after crash you can look at the variables etc.
Something I have noticed is that if a filter (which become a server line) is not accessible at the time the instance is launching, then that instance will bail (not an actual crash, but will be an unexpected exit for any supervisor process). So if you have a dependency thing going on (some dnsmasq instance is needed in order to be able to resolve another dnsmasq's filter (server), then the dependant instance will bail until the server is available).
In your case it may be the listeners on ports 505X are not yet available when the dnsmasq instance(s) are trying to start.
There are always specialists present on any technical forum — OpenWrt is no exception.
I’m not addressing casual users, but those who actually understand the deeper parts of OpenWrt configuration and dnsmasq behavior.
That’s why I’m asking here: this community includes people with real hands‑on experience, and I’m hoping to hear from those who are familiar with these specific mechanisms.
Thanks — yes, enabling coredumps and analysing them with gdb is definitely the next step.
I’m aware that the dnsmasq binary shipped in OpenWrt is stripped, so for proper symbol resolution I’ll need to build dnsmasq locally and deploy the unstripped version (or at least keep the matching build tree for gdb). That part is clear.
Remote gdb is also an option — launching dnsmasq under gdbserver and attaching from the build system would allow inspecting variables and state right at the moment of the crash.
I’ll prepare the environment for that and try to capture a reproducible crash with a matching unstripped binary.
That’s a good point — dnsmasq will indeed exit immediately if a server= target is unreachable at startup, and that can look like a crash from the outside. I’ve tested that scenario.
In my case the upstream listeners on ports 5053–5056 (DoH/DoT forwarders) are already up and accepting connections before dnsmasq starts. They are supervised separately and start earlier in the boot sequence, so dnsmasq doesn’t depend on them to resolve its own filters.
I also reproduced the crash during normal runtime, not only at startup — dnsmasq can run for hours and then suddenly disappear even though all upstream ports remain available and healthy.
So the “unavailable server= dependency” scenario doesn’t seem to be the trigger here, but I agree it’s something worth checking.
The problem is exactly that dnsmasq does crash, but leaves absolutely no trace in syslog or dmesg. That’s why I’m trying to understand what additional mechanisms I can enable to capture the actual reason — coredumps, gdb, unstripped binaries, or anything else that can record the moment of failure.
Without logs it’s impossible to diagnose, so the goal is to figure out how to make dnsmasq produce meaningful crash evidence.
If you know a reliable way to force dnsmasq to leave a trace when it exits unexpectedly, I’d appreciate the guidance.