After a few years of usage, I am now refreshing the keepalived configuration of my routers, especially regarding the synchronisation of config and dhcp leases files. It looks Jaymin Patel has commited a nice synchronization feature about 18 month ago (https://github.com/openwrt/packages/commit/33398a38aacc02c89c277f5106c079b9c61b97a2), but there is so little documentation that I don't see how to configure it.
After a few weeks, it is time for a summary of what I have seen with Keepalived. First, luci-app-keepalived, keepalived-sync and keepalived works, and it is pretty cool. Thank you @jempatel for your great work
Nevertheless it has not been possible to support totally my needs, and a few things could also be improved:
Using sysupgrade -l in the rsync.sh script is too agressive. I suggest to synchronize only files from the user defined sync_list. In my case, the routers have different roles in MASTER and BACKUP modes, and copying all configuration files prevents such cases. Using sysupgrade -l could be kept as an option. The workaround is to modify the rsync.sh script locally line 46:
for sync_file in $sync_list $(sysupgrade -l); do
Route specification needs to be extended with more options. In my case, metric is useful for masking the BACKUP default route, while preserving the interaction with other subsystems like mwan3.
The configuration of keepalived has evolved in its latest version and is not completely supported: global_track for example has been replaced by other mechanisms, max_auto_priority is not supported, etc.
The luci-app-keepalived+keepalived-sync Wiki page is not describing the current usage. Without the links provided by @jempatel, I would probably have resigned.
Action NOTIFY of type GROUP are not managed by the hotplug scripts. It raises errors in the logs.
The sync through ssh and rsync is very noisy, filling up the journals of low added value information. It would be nice to provide a way to reduce the verbosity:
Sun Feb 11 09:26:45 2024 authpriv.info dropbear[16112]: Child connection from 192.168.1.1:46634
Sun Feb 11 09:26:45 2024 authpriv.notice dropbear[16112]: Pubkey auth succeeded for 'keepalived' with ssh-ed25519 key SHA256:5N2aUWNlcosUvHwRdobELPRuf4V0HizXCGuC27X/sLg from 192.168.1.1:46634
Sun Feb 11 09:26:45 2024 authpriv.info dropbear[16112]: Exit (keepalived) from <192.168.1.1:46634>: Exited normally
dnsmasq is stopped from time to time while it is not expected. In asymetrical cases like mine, the BACKUP routers still needs dnsmasq for remaining operational (DNS resolving and concurrent fail-over DHCP service). The problem has probably be pointed out by @Blackfeather in Dnsmasq dies on boot after receiving SIGTERM.
I would be pleased to contribute to any of these topics if it can help. Comments and questions welcome
In addition, here is the alternate keepalived config file I finally use, for reference:
global_defs {
script_user root
enable_script_security
process_names
router_id router_1
}
vrrp_script rsync {
script /etc/keepalived/scripts/rsync.sh
interval 60
weight 100
}
vrrp_instance VI_1 {
authentication {
auth_type PASS
auth_pass XXXXXXXX
}
state BACKUP
interface lan
unicast_src_ip 192.168.1.1
virtual_router_id 1
priority 100
advert_int 1
debug 2
garp_master_delay 1
garp_master_refresh 1
garp_master_repeat 1
garp_master_refresh_repeat 1
nopreempt
notify_backup "/bin/busybox env -i ACTION=NOTIFY_BACKUP TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
notify_master "/bin/busybox env -i ACTION=NOTIFY_MASTER TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
notify_fault "/bin/busybox env -i ACTION=NOTIFY_FAULT TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
notify_stop "/bin/busybox env -i ACTION=NOTIFY_STOP TYPE=INSTANCE NAME=VI_1 /sbin/hotplug-call keepalived"
virtual_ipaddress {
192.168.1.2/24 dev lan label lan:vip scope global
192.168.100.2/24 dev dmz label dmz:vip scope global
192.168.110.2/24 dev iot label iot:vip scope global
192.168.120.2/24 dev wan label wan:vip scope global
}
virtual_routes {
src 192.168.120.2 0.0.0.0/0 via 192.168.120.1 dev wan metric 5
}
track_script {
rsync weight 100
}
}
Thanks Pedro, I'm on that track too but I had to do other stuff and I couldn't take the job to the end. As you already have your hands on the topic, allow me to advice for you to translate this text into a code patch. If you have some more time to commit on this topic.
Because writing on the forum might end up here; and it's a pity to loose your work. If you produce a patch and attach to some issue tracker instead (ex: the one on github repo, as we are not permanent developers so we can't access the main repo), there are more chances for some of the actual developers to discuss the patch with you inside the issue you opened, and finally include your findings in the upstream.
I don't know current practices of openwrt's developers, but having a patch ready to be merged might be just a few clicks away from upstream for some of them. If you don't serve your findings in a patch, they should spend more time to implement and test if your findings are good or not...
Basically you need to git clone openwrt repo, mod and test, make the patch and attach to an issue. You can also use github's features to fork, mod and issue a pull request.
You are right @anichang. Unfortunately I have also very little free time for it. I'll do my best for proposing something in the coming weeks nevertheless.