Hey I'm running the following setup.
Model ELECOM WRC-2533GST2
Architecture MediaTek MT7621 ver:1 eco:3
Target Platform ramips/mt7621
Firmware Version OpenWrt 22.03.2 r19803-9a599fee93
I've made quite afew customizations including openvpn, adblock, etc. They don't directly relate to what I am attempting to do and I'm uncertain if they are the cause of the problem or not.
The problem occurs intermittently when I connect to the openwrt box via ssh and execute "wifi down".
Periodically dnsmasq dies due to repeated failures. I've included the exact error message below.
Sun Feb 5 19:41:08 2023 kern.info kernel: [290008.019012] device wlan0 left promiscuous mode
Sun Feb 5 19:41:08 2023 kern.info kernel: [290008.028743] br-lan: port 5(wlan0) entered disabled state
Sun Feb 5 19:41:08 2023 daemon.notice netifd: Wireless device 'radio1' is now down
Sun Feb 5 19:41:09 2023 daemon.notice netifd: Network device 'wlan0' link is down
Sun Feb 5 19:41:09 2023 daemon.notice netifd: Wireless device 'radio0' is now down
Sun Feb 5 19:41:11 2023 daemon.info dnsmasq[1]: exiting on receipt of SIGTERM
Sun Feb 5 19:41:11 2023 daemon.info procd: Instance dnsmasq::cfg01411c s in a crash loop 6 crashes, 0 seconds since last crash
The error does not always happen and sometimes dnsmasq dies and restarts fine.
Sun Feb 5 21:06:27 2023 kern.info kernel: [295126.929874] device wlan0 left promiscuous mode
Sun Feb 5 21:06:27 2023 kern.info kernel: [295126.939353] br-lan: port 4(wlan0) entered disabled state
Sun Feb 5 21:06:27 2023 daemon.notice netifd: Network device 'wlan0' link is down
Sun Feb 5 21:06:27 2023 daemon.notice netifd: Wireless device 'radio1' is now down
Sun Feb 5 21:06:28 2023 daemon.notice netifd: Wireless device 'radio0' is now down
Sun Feb 5 21:06:29 2023 daemon.info dnsmasq[1]: exiting on receipt of SIGTERM
Sun Feb 5 21:06:29 2023 daemon.info dnsmasq[1]: started, version 2.86 cachesize 150
Sun Feb 5 21:06:29 2023 daemon.info dnsmasq[1]: DNS service limited to local subnets
EDIT:
Just to be absolutely clear there were no other instances of dnsmasq crashing in the hour prior to the error the only prior instance was the following. So this is not matter of 5 errors occurring within the default 3600 second period specified for respawn_threshold.
Sun Feb 5 18:11:52 2023 daemon.info dnsmasq[1]: exiting on receipt of SIGTERM
Sun Feb 5 18:11:57 2023 daemon.info dnsmasq[1]: started, version 2.86 cachesize 150
Also in this case where it restarted successfully the respawn takes exactly 5 seconds.
Even in the second successful respawn of dnsmasq above (the one after calling "wifi down" it seems like the restart time after receiving the SIGTERM was 0 seconds (not 5).
It seems like perhaps after receiving a SIGTERM from an interface going down, it does not user the same "respawn_timeout" of 5 seconds, but 0 seconds. It looks like the "respawn_retry" are still 5 as it says it failed 6 times.
It's also unclear to me how it failed 6 times with 0 seconds since last failure if it were using the procd.sh parameters. As based on what I read online the procd.sh should default the "respawn_timeout" to 5 seconds so the last respawn attempt should have been 5 seconds ago not 0 seconds ago as the error message says. But perhaps this is just a quirk of which timestamp that error message is checking against.
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=package/system/procd/files/procd.sh
464 local respawn_vals
465 _json_no_warning=1
466 if json_select respawn ; then
467 json_get_values respawn_vals
468 if [ -z "$respawn_vals" ]; then
469 local respawn_threshold=$(uci_get system.@service[0].respawn_threshold)
470 local respawn_timeout=$(uci_get system.@service[0].respawn_timeout)
471 local respawn_retry=$(uci_get system.@service[0].respawn_retry)
472 _procd_add_array_data ${respawn_threshold:-3600} ${respawn_timeout:-5} ${respawn_retry:-5}
473 fi
474 json_select ..
475 fi
476
477 json_close_object
Anyways what I would like to do is change the respawn_retry parameter for dnsmasq so that it has 30 retries instead of just 5. And also to ensure that respawn_timeout is at least 1 second (and possibly the default 5 seconds) and not 0 seconds. I'm hopeful this may eliminate my problem with dnsmasq crashing leaving all devices on my network stranded.
I'm unsure where to make this change though. I looked at the dnsmasq init.d script and it doesn't appear to supply any specific respawn parameters. I'm unsure if this means it should be using the defaults or not.
/etc/init.d/dnsmasq
procd_open_instance $cfg
procd_set_param command $PROG -C $CONFIGFILE -k -x /var/run/dnsmasq/dnsmasq."${cfg}".pid
procd_set_param file $CONFIGFILE
[ -n "$user_dhcpscript" ] && procd_set_param env USER_DHCPSCRIPT="$user_dhcpscript"
procd_set_param respawn
Is the most appropriate place for me to add my desired respawn_retry and respawn_timeout as additional "procd_set_param" statements after the one pasted above inside /etc/init.d/dnsmasq , or is there a better place to put this that doesn't require modifying the init.d script for dnsmasq?
Or perhaps changing the procd respawn is not related to my error at all, and there is some other respawn mechanism backed into dnsmasq for when an interface goes down. Perhaps I need to add some manual delay to this or increase the retry count....
Thanks in advance, also if anyone has any tips to prevent it crashing intermittently I'd be more than happy to see if they help.