Keepalived-sync docs and config examples

Hi!

After a few weeks, it is time for a summary of what I have seen with Keepalived. First, luci-app-keepalived, keepalived-sync and keepalived works, and it is pretty cool. Thank you @jempatel for your great work :slight_smile:

Nevertheless it has not been possible to support totally my needs, and a few things could also be improved:

  • Using sysupgrade -l in the rsync.sh script is too agressive. I suggest to synchronize only files from the user defined sync_list. In my case, the routers have different roles in MASTER and BACKUP modes, and copying all configuration files prevents such cases. Using sysupgrade -l could be kept as an option. The workaround is to modify the rsync.sh script locally line 46:
	for sync_file in $sync_list $(sysupgrade -l); do
  • Route specification needs to be extended with more options. In my case, metric is useful for masking the BACKUP default route, while preserving the interaction with other subsystems like mwan3.
  • The configuration of keepalived has evolved in its latest version and is not completely supported: global_track for example has been replaced by other mechanisms, max_auto_priority is not supported, etc.
  • The luci-app-keepalived+keepalived-sync Wiki page is not describing the current usage. Without the links provided by @jempatel, I would probably have resigned.
  • Action NOTIFY of type GROUP are not managed by the hotplug scripts. It raises errors in the logs.
  • The sync through ssh and rsync is very noisy, filling up the journals of low added value information. It would be nice to provide a way to reduce the verbosity:
Sun Feb 11 09:26:45 2024 authpriv.info dropbear[16112]: Child connection from 192.168.1.1:46634
Sun Feb 11 09:26:45 2024 authpriv.notice dropbear[16112]: Pubkey auth succeeded for 'keepalived' with ssh-ed25519 key SHA256:5N2aUWNlcosUvHwRdobELPRuf4V0HizXCGuC27X/sLg from 192.168.1.1:46634
Sun Feb 11 09:26:45 2024 authpriv.info dropbear[16112]: Exit (keepalived) from <192.168.1.1:46634>: Exited normally
  • dnsmasq is stopped from time to time while it is not expected. In asymetrical cases like mine, the BACKUP routers still needs dnsmasq for remaining operational (DNS resolving and concurrent fail-over DHCP service). The problem has probably be pointed out by @Blackfeather in Dnsmasq dies on boot after receiving SIGTERM.

I would be pleased to contribute to any of these topics if it can help. Comments and questions welcome :slight_smile:

In addition, here is the alternate keepalived config file I finally use, for reference:

global_defs {
	script_user root
	enable_script_security
	process_names
	router_id router_1
}

vrrp_script rsync {
	script /etc/keepalived/scripts/rsync.sh
	interval 60
	weight 100
}

vrrp_instance VI_1 {
	authentication {
		auth_type PASS
		auth_pass XXXXXXXX
	}
	state BACKUP
	interface lan
	unicast_src_ip 192.168.1.1
	virtual_router_id 1
	priority 100
	advert_int 1
	debug 2
	garp_master_delay 1
	garp_master_refresh 1
	garp_master_repeat 1
	garp_master_refresh_repeat 1
	nopreempt
	notify_backup "/bin/busybox env -i ACTION=NOTIFY_BACKUP TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
	notify_master "/bin/busybox env -i ACTION=NOTIFY_MASTER TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
	notify_fault "/bin/busybox env -i ACTION=NOTIFY_FAULT TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
	notify_stop "/bin/busybox env -i ACTION=NOTIFY_STOP TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
	virtual_ipaddress {
		192.168.1.2/24 dev lan label lan:vip scope global
		192.168.100.2/24 dev dmz label dmz:vip scope global
		192.168.110.2/24 dev iot label iot:vip scope global
		192.168.120.2/24 dev wan label wan:vip scope global
	}
	virtual_routes {
		src 192.168.120.2 0.0.0.0/0 via 192.168.120.1 dev wan metric 5
	}
	track_script {
		rsync weight 100 
	}
}