tldr; How can I help OpenWrt be more conservative/cautious when applying changes to services, and/or setup monitoring/notifications when configs go awry?
So last night I made a quick innocent change in the LUCI web UI. I clicked a single button to convert a dynamic lease into a static one. Clicked save and went to bed. This morning it became apparent that dnsmasq had silently crashed in the background without any warning or notifications, the moment I applied the static assignment. All my servers went offline while I slept, and I was confused for a solid 20 minutes trying to get my desktop online when I woke.
The whole problem was simply that the hostname I added, already existed. I can't blame dnsmasq for refusing to boot when it detects bad configuration. But the LUCI ui gave me no indication that anything went wrong when I applied the changes. No indication that dnsmasq didn't like the new config, no indication that dnsmasq wasn't even running.
So my real question is this. What can I do to setup an early warning system? Perhaps send an email to some address when dnsmasq or other services are detected to have crashed in the logs? Perhaps a visual indicator somewhere in the LUCI ui? Even better, some method of having LUCI test out a configuration for a moment before actually saving it (I imagine it could create a backup of the original config somewhere, then restore the config if services fail to start).
If nothing else, I'll probably write a script on my main server to periodically poke into openwrt and ask whether its services are running, then email me on failure. Hopefully there's already a premade solution somewhere. I think I remember an old program named "monit" that used to do something similar.
That "bump" thing does nothing here...what does it mean anyway?
Since dnsmasq is the DNS server, you'll have to be sure to setup your notification by IP.
You ask can you set up a system, sure. It's easier not to make a bad config, though. As I described...just have the machine send you an email if theres no running dnsmasq process - the mail server will have to be addressed by IP.
(Unless you're asking someone to write such a thing for you...then you may need to "bump" more...I been staring for days wondering "what is the question - since everything described is possible"?)
"Bump" is typically an empty post intended to help keep an unanswered post nearer the top of the list (via having a fresh reply), so it doesn't get lost. The original reply was unhelpful/vacuous and snide, so I considered this post unanswered.
Thank you. I'll keep that in mind.
Yeah I've gone ahead and tasked monit to watch the processes, send notifications, and attempt to restart things. I should add a hosts entry for my email server though, good point.
I hoped the point would have been obvious: Not whether something is possible, but how people here are choosing to handle the situation. Now that I've implemented monit, I might reply to future posts with the general idea of how I've set it up and what it does. I was hoping someone knew of something "better".
Also the question remains partially unanswered:
How can I get luci to visually indicate a down'ed service?
Is there a way to "try" a config first, before breaking a service? Example: Apache has "httpd -S" or "apachectl -S" to help indicate whether your config is OK. I'm wondering what the dnsmasq/etc analogs are, and if there is a way to get luci to try them before applying changes.
Why do you think it has such in the first place? Not every service has anything like that. In fact, most of them don't. You want dnsmasq to have something similar, you should take it up with the dnsmasq-devs.
Yeah thanks. I've been looking into the log, have monit running, all that now. It would just be so so so much better if luci gave any sort of visual indication of a crash. As it stands, I'm in luci doing changes, then have to login via ssh to check things ... which kinda defeats the convenience of having luci in the first place.
I know it's not what you are after, but I always keep backups of dhcp, network, and other working configuration files, so if I make mistakes by either editing files, or using Luci, I can quickly revert back to a previous working setup.
There aren't and that's why LuCI does not try to. Unfortunately there's no validation at all on the OpenWrt backend side, so LuCI has to do all syntactic and semantic validation in the frontend. There's also no "config test mode" for services, so no way to test things upfront.
Even if there would be such a mechanism, it wouldn't be clear how to handle it in the ui, short of forcibly rolling back any changes the user has made and refusing to save them.
Your particular combination of values unfortunately "slipped through" and validation in the dnsmasq config related views must be extended to catch this semantic problem in the future.
I tried to replicate your issue now by creating some dummy static lease with different mac and ip using the same hostname as current dynamic one. Then I turned the dynamic one static using the "Set static" button and dnsmasq happily restarted, printing this log line:
Wed Jun 22 00:17:42 2022 daemon.warn dnsmasq-dhcp: not giving name example.lan to the DHCP lease of 10.11.12.176 because the name exists in /tmp/hosts/dhcp.cfg01411c with address 10.11.12.254
So I assume the behavior has been changed/fixed upstream in dnsmasq to turn this name collision into a warning.