Thanks for trying but the script is not running; I didn't setup the cronjob yet. BTW, it's the vanilla script on openwrt's wiki at the high-availability page.
And currently there are no dhcp clients in the network; no client can trigger that. I've one wired host only on r1-r2-r3 collision domain, ie: another router with static IP, NATing a bunch of servers and a couple of my terminals; but those are 1 host only from r1-r2-r3 perspective. And no AP is configured on r1-r2-r3. No clients.
In the whiletime I noticed I can reboot r1 and r2, and get dnsmasq SIGTERM'ed on r3. I just triple-checked; it happened 3 times out of 3 reboots. When this happens, r3 becomes vrrp MASTER for both the virtual IPs as both the higher priority routers are gone. But dnsmasq dies. Logread:
Mon Nov 13 03:36:24 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:36:24 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:36:29 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:36:29 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:36:34 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:36:34 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:36:36 2023 daemon.info Keepalived_vrrp[2778]: (primary) Entering MASTER STATE
Mon Nov 13 03:36:36 2023 daemon.info Keepalived_vrrp[2778]: (secondary) Entering MASTER STATE
Mon Nov 13 03:36:39 2023 kern.info kernel: [ 1046.420306] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Mon Nov 13 03:36:48 2023 kern.info kernel: [ 1055.780008] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up
Mon Nov 13 03:37:14 2023 kern.info kernel: [ 1081.780181] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Mon Nov 13 03:37:18 2023 kern.info kernel: [ 1085.939925] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up
Mon Nov 13 03:37:19 2023 kern.info kernel: [ 1086.980171] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is down
Mon Nov 13 03:37:38 2023 kern.info kernel: [ 1105.699887] Atheros AR8216/AR8236/AR8316 mdio.0:00: Port 1 is up
Mon Nov 13 03:37:40 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:40 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:45 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:45 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:46 2023 daemon.info Keepalived_vrrp[2778]: (primary) Master received advert from 192.168.168.250 with higher priority 255, ours 253
Mon Nov 13 03:37:46 2023 daemon.info Keepalived_vrrp[2778]: (primary) Entering BACKUP STATE
Mon Nov 13 03:37:47 2023 authpriv.info dropbear[6369]: Early exit: Terminated by signal
Mon Nov 13 03:37:47 2023 authpriv.info dropbear[8522]: Not backgrounding
Mon Nov 13 03:37:50 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:50 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:55 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:37:55 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:00 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:00 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:05 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:05 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:10 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:10 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:15 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:15 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:20 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:20 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:25 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:25 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:26 2023 daemon.info Keepalived_vrrp[2778]: (secondary) Master received advert from 192.168.168.249 with higher priority 255, ours 253
Mon Nov 13 03:38:26 2023 daemon.info Keepalived_vrrp[2778]: (secondary) Entering BACKUP STATE
Mon Nov 13 03:38:26 2023 authpriv.info dropbear[8522]: Early exit: Terminated by signal
Mon Nov 13 03:38:27 2023 authpriv.info dropbear[8805]: Not backgrounding
Mon Nov 13 03:38:30 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:30 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:35 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:35 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:40 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:40 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:45 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:45 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:50 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:50 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:55 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:38:55 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:00 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:00 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:05 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:05 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:10 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:10 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:16 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:16 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:20 2023 daemon.warn odhcpd[1800]: No default route present, overriding ra_lifetime!
Mon Nov 13 03:39:21 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
Mon Nov 13 03:39:21 2023 daemon.info Keepalived_vrrp[2778]: Printing VRRP as json for process(2778) on signal
A part from the log flooded with "Printing VRRP as json..."; it shows:
- becoming MASTER for both the virtual IPs (named "primary" and "secondary").
- the ethernet link going down/up 3 times. That's the cable connecting r1 to r3; as r1 is rebooting, r3 looses the ethernet link. That link is in r3's br-lan bridge; ie: the network interface being "lan" on r3, the one dnsmasq is serving dhcp on.
- r3 goes back to BACKUP state for both the virtual IPs as soon as it receives a new probe from r1 and r2.
- my ssh terminal resets the connection to dropbear (as my terminal is wired to r1 as well, so it looses link as well when I reboot r1).
So, yes, looks like the wireless interfaces on wan are innocent. It's the lan ethernet link resetting on reboot; and r1 booting makes r3's ethernet link go down/up 3 times, the same number of dnsmasq restarts. Then procd detects a faulty condition and doesn't start dnsmasq anymore.
At this point I should place a switch between the routers, as it should be. But I don't have one handy; I had to remove the one I had a few days ago... a storm cooked it and now there's AC on the ethernet cables connected to it... it's a good tesla coil rather than a switch...
My peculiar situation aside, where a switch is missing between the 2 routers; my humble advice to openwrt's devs is to raise the respawn limits for procd to 5, as @hnyman stated above. Make it default for openwrt.
I didn't check why the ethernet link goes down/up three times but I imagine being kind of normal for a device rebooting and having the ethernet interface in a bridge. So, it's a pretty common setup for consumer grade equipment (ie: many users don't have an l2 switch and an l3 router; they just have 1 device to make it all). Good to have a respawn limit but 3 sounds a bit too strict.
I'll report once more once the issue is solved. Just to confirm the things stated in this message are correct.