Syncing configuration between two routers

I"m setting 2 routers up so that if one fails the other will transparently take over. I've had this working with OpenWRT using KeepAliveD but one of the things that I didn't like was having to manually keep the configuration of the two routers in sync. Other than the host name and some IP addresses the configuration of the two need to be identical. Does anyone have any suggestions on a good way to keep everything in sync? While not as critical it would be great to keep the DHCP lease information in sync as well. Thanks in advance.

I surmise, you could setup an FTP cron job - or other file transfer of the relevant files, between the two routers.

cron job to scp or rsync the /etc/config/* files (don't forget to use sed or uci -c to change the host name in /etc/config/system) and the additional files listed in /etc/sysupgrade.conf and the /etc/sysupgrade.conf itself.

1 Like

I'll have to give this a try.

For disseminating the DHCP information, Ive been thinking about MQTT. Server on the DHCP "master" (or any other convenient host all can reach) and clients on the other units that update a "fake" dhcp.leases file. I'm guessing it is so that LuCI picks up the information, not that you've got multiple DHCP servers on the same network.

Hey there.

How are you going to handle the handover?

All your clients have a default gateway as well as a name server setting in place. As soon as your primary router dies, your second router needs to take over its IP address for all your clients to keep going.
If your second router decides to take over the IP address while your first router is still active chances are routing switches from one device and back every couple of seconds, depending on how long your clients keep ARP information.

You might think of some STONITH mechanism, like not only taking the IP but cutting power supply for the primary node.

For the file syncing part, you could try something like Resilio Sync (former Bittorrent Sync) or Syncthing. I'd try setting up a single folder for synced files, move all relevant files there and symlink them back to the original place.

I try to not do any settings through LuCI web ui but only via command line. I keep track of my confg files by GIT and push them to an external storage. Another benefit is change log, I can tell exactly which change happend months and years back.

As for DHCP: Are you sure you need to sync lease files? Imho that's a default situation for DHCP and properly handled.
You might want to give it a try, set up a small lab for that.

  • Set up a fresh LEDE, set lease to 10m, boot up a client.
  • Turn on your client.
  • If the client is up and running, unplug the LEDE device and plug it back in again.
  • The client should keep functioning since gateway an routing is still valid but the lease is not known to the router.

If i'm correct, after 8 to 9 minutes (87.5% of the lease time) the lease should appear in LEDEs lease file.

That's called "rebinding" (in contrast to "renewal", which is default for normal operation). See steps 7 to 10 here:
http://www.tcpipguide.com/free/t_DHCPLeaseRenewalandRebindingProcesses-2.htm

Every DHCP server should accept rebind requests if they don't conflict with existing leases or other network rules such as static lease information or IP range limitations.

[edit]

I actually explained that already here:

Regards,
Stephan.

The issue is suppose a new node comes online, then the DHCP server may hand out an IP that is leased to someone else because it doesn't have record of the lease.

This is the point at which I start to scratch my head and wonder why people are trying to build high-availability systems solely on commodity-grade WiFi APs, especially with dnsmasq trying to handle everything and the limitations of busybox for scripting and process monitoring.

Want HA DHCP? Run kea on multiple servers and use a HA database.

Good point. But to be honest, I'd consider it not worth the fuzz if lease time is set as low as only a couple of minutes.

I'd rather spend my energy on not putting the router on a cheap SoHo box but put in e.g. on a virtual machine in an vmware HA cluster with shared storage and multiple hosts. If the availability of your network is that important you might have something like that anyway.

As I tried to explain with my previous post: Shring the temporary lease file is by far he least of all problems.

Regards,
Stephan.

Or just switch to ipv6 only and nat64

:grinning: