Is it possible to have two dhcp servers, coordinating leases?

Hi,

I am trying to figure out how to setup my network.

And one thing I would like is to be able to have multiple dhcp server working together.

Forgetting the "why" for a moment

Could there be two openwrt hosts running dhcp servers.

Where either one or the other could stop working at any time.

Where the active leases would be synchronized between the two automatically.

And the pre-defined static leases would also be shared by both. Meaning if I create a static lease in one, it would also be replicated to the other dhcp servers ?

Is there a concept of dhcp server failover, dhcp server high availability redundancy that is also convenient to use ?

I've heard of a "dhcp proxy" mode, where the proxy dhcp server, stores the leases in a "main" server. But could the main server go offline and the proxy dhcp server become, temporarily, authoritative ?

If so, how does the main server coming back online is handled ? Are all leases renewed, or are the active leases in the proxy dhcp server replicated back to the main when it comes back online ?

After formulating my above question, I tried to answer it myself.

I don't know if the following is true, if people in the know could verify these statement it would be appreciated.

dnsmasq
does not have "high availability support"
does not support the "dhcp failover protocol"
does not have failover modes know as "active-active" or "primary-secondary"
cannot synchronize leases with another dnsmasq instance
support dhcp proxy mode, but there is still only one main dhcp server
--if the main dhcp server fails, the proxy cannot work

there could be a second, disabled dhcp server, and watchdog script that turns it on or off if the main server stops responding, but this does not allow active leases to be copied from the main to backup or backup to main as it goes online and offline

The ISC DHCP server, has
has lease synchronization between servers
supports dhcp failover protocol with "active-active" and "primary-secondary" modes
The ISC DHCP server is available in openwrt, but is deprecated in favor of "Kea DHCP server"
also it appears that using the ISC DHCP server, breaks all DHCP configuration and monitoring options in openwrt, all configuration and monitoring has to be done by hand in the console

The Kea DHCP server
has everything the ISC dhcp server does
plus automated lease replication
plus it has API based configuration
but does not work on openwrt

Alternatives that still use dnsmasq are

Split Scope DHCP
Two servers serve different ranges within the same subnet.
This is not true failover, it doesn’t synchronize leases.
Clients may get a different lease if one server goes down and comes back later.

or manually activated backup, which will require more scription (that might be the best solution ?)

One of the reason for my setup, is that I have a high speed connection but am running of a power constrained environement.

So I have a low power router that uses less than 10 watts. And another server that needs about 100 watt but can do full multi VPN at 3gbps. So I want to be able to switch seamlessly between them, and also account for the possibility of both being turned on at the same time with minimal issues.

I found a few threads about this,

https://old.reddit.com/r/PFSENSE/comments/11papaf/is_it_possible_to_have_a_secondary_dhcp_server_on/
https://old.reddit.com/r/openwrt/comments/13fj4e5/need_advice_configuring_dns_server_failover_on_an/

https://old.reddit.com/r/openwrt/comments/1edpm01/how_to_have_two_dhcp_servers_that_dont_interfere/

And right now I think the most openwrt-friendly way of doing this is

  1. synchronize leases via scripts
    synchronize the file dhcp.leases with a script, using maybe netcat
    So that the servers stay in sync, this requires figuring out how to make the dhcp server -reload it's dhcp.leases file when it is changed externally

  2. synchronize configuration via cron script
    Also having another cron activated script that checks if /etc/config/dhcp has changed every 20 seconds.
    If it has changed, then copy this tile to all other dhcp server and make the dhcp server reload their configuration, maybe luci will also have to reload ?

  3. create server watchdogs
    another cron script will have one server query the other server at a fast interval, say once per 20 seconds. Each server knows it's priority level. If there is no other server at a high priority level that responds then that server is turned on.

This should preserve all the luci dhcp server configuration abilities

Has anyone done this before/already ?

I didn't read all your posts, but just going from the title alone:

Short answer: No.

Longer answer: This is possible for large scale networks such as enterprise and the like, but it requires very special and complex configurations for high availability. This is simply not practical for the vast majority of home and small business networks. (I will admit, I have no idea how it is actually done, but I know that it doesn't scale down to small networks easily).

Why do you want to have more than one DHCP server? How large is your network (how many infrastructure devices like routers/switches/APs? how many client devices?)

I made a little script, that will allow one openwrt host
To have a list of other dhcp servers, to query them, and to turn itself on or off based on the verified functioning of the other "higher priority" dhcp servers

Here is the script itself

/usr/bin/dhcp-interface-watchdog.sh

#!/bin/bash

# Max allowed log file size in KB
LOG_MAX_SIZE=100

# Whether to ignore loopback interface
IGNORE_LOOPBACK=true

# Host identity
SELF_HOSTNAME=$(hostname -f 2>/dev/null || hostname)
DNSMASQ_SERVICE="/etc/init.d/dnsmasq"

# Read global log configuration
WATCHDOG_LOG=$(uci -q get dhcp.watchdoglog)

# Declare maps for storing state
declare -A INTERFACE_IPS

# Write a log message to syslog or file depending on WATCHDOG_LOG
# Usage: log <interface> <message>
log() {
    local iface="$1"
    shift
    local msg="[$iface] $*"

    # Logging is disabled
    if [ "$WATCHDOG_LOG" = "0" ]; then
        return
    fi

    # Log to syslog if no value set
    if [ -z "$WATCHDOG_LOG" ]; then
        logger -t "dhcp-watchdog" "$msg"
    else
        if [ -f "$WATCHDOG_LOG" ]; then
            local filesize
            filesize=$(du -k "$WATCHDOG_LOG" | cut -f1)
            if [ "$filesize" -ge "$LOG_MAX_SIZE" ]; then
                : > "$WATCHDOG_LOG"
            fi
        fi
        echo "$(date +'%Y-%m-%d %H:%M:%S') $msg" >> "$WATCHDOG_LOG"
    fi
}

# Returns a list of UCI interface names (e.g., lan, wan)
get_all_interfaces() {
    uci show network | grep '=interface' | cut -d. -f2 | cut -d= -f1
}

# Populates INTERFACE_IPS[] with IP addresses of physical interfaces
get_interface_ip_map() {
    for iface in $(ls /sys/class/net); do
        [ "$iface" = "lo" ] && continue
        ip=$(ip -4 addr show dev "$iface" | awk '/inet / {print $2}' | cut -d/ -f1)
        [ -n "$ip" ] && INTERFACE_IPS["$iface"]=$ip
    done
}

# Returns IP address for UCI interface name
get_ip_for_interface() {
    local net="$1"
    local ifname
    ifname=$(uci -q get network."$net".ifname)
    echo "${INTERFACE_IPS[$ifname]}"
}

# Returns array of DHCP priority peers for given interface
get_priority_list_for_interface() {
    local net="$1"
    uci -q get dhcp."$net".serverpriority | sed "s/'//g" | xargs -n1 2>/dev/null
}

# Returns true if serverwatchdog is enabled for given interface
is_watchdog_enabled_for_interface() {
    local net="$1"
    [ "$(uci -q get dhcp."$net".serverwatchdog)" = "1" ]
}

# Resolves host or IP string to IP
resolve_to_ip() {
    local name="$1"
    getent hosts "$name" | awk '{print $1}'
}

# Returns true if value is our hostname or one of our IPs
is_self_ip_or_hostname() {
    local value="$1"
    [[ "$value" == "$SELF_HOSTNAME" ]] && return 0
    for ip in "${INTERFACE_IPS[@]}"; do
        [[ "$value" == "$ip" ]] && return 0
    done
    return 1
}

# Sends dummy DHCP request to a peer to test if it's active
send_dhcp_probe() {
    local target_ip="$1"
    local source_ip="$2"
    timeout 2 dhcping -s "$target_ip" -r -c "$source_ip" -h "de:ad:be:ef:00:01" >/dev/null 2>&1
}

# Enables or disables DHCP server for a given interface
# Usage: dhcp_service <enable|disable> <interface>
dhcp_service() {
    local action="$1"
    local net="$2"
    local current_ignore
    current_ignore=$(uci -q get dhcp."$net".ignore)

    if [ "$action" = "enable" ]; then
        if [ "$current_ignore" = "1" ]; then
            log "$net" "Enabling DHCP server on $net"
            uci del dhcp."$net".ignore
            uci commit dhcp
            $DNSMASQ_SERVICE restart
        else
            log "$net" "DHCP already enabled on $net"
        fi
    elif [ "$action" = "disable" ]; then
        if [ "$current_ignore" != "1" ]; then
            log "$net" "Disabling DHCP server on $net"
            uci set dhcp."$net".ignore='1'
            uci commit dhcp
            $DNSMASQ_SERVICE restart
        else
            log "$net" "DHCP already disabled on $net"
        fi
    fi
}

# Main per-interface watchdog logic
# Usage: run_watchdog_for_interface <interface>
run_watchdog_for_interface() {
    local net="$1"

    # STEP 1: Get the local IP address
    local local_ip
    local_ip=$(get_ip_for_interface "$net")
    if [ -z "$local_ip" ]; then
        log "$net" "Could not determine IP address"
        return
    fi

    # STEP 2: Fetch the priority list from UCI
    local peer_found=false
    local priority_list=()
    mapfile -t priority_list < <(get_priority_list_for_interface "$net")

    # STEP 3: Check each server in the priority list
    for entry in "${priority_list[@]}"; do
        local resolved
        resolved=$(resolve_to_ip "$entry")
        if [ -z "$resolved" ]; then
            log "$net" "Could not resolve $entry"
            continue
        fi

        # STEP 3a: If this is us, we take responsibility
        if is_self_ip_or_hostname "$entry" || is_self_ip_or_hostname "$resolved"; then
            log "$net" "Reached self in priority list ($entry); assuming DHCP responsibility"
            break
        fi

        # STEP 3b: Try probing for a DHCP response
        log "$net" "Probing $resolved for DHCP..."
        if send_dhcp_probe "$resolved" "$local_ip"; then
            log "$net" "DHCP server is alive at $resolved"
            dhcp_service disable "$net"
            peer_found=true
            break
        else
            log "$net" "No DHCP response from $resolved"
        fi
    done

    # STEP 4: If no peer found, we enable our own DHCP server
    if ! $peer_found; then
        dhcp_service enable "$net"
    fi
}

# Script entry point
main() {
    get_interface_ip_map
    for net in $(get_all_interfaces); do
        [ "$IGNORE_LOOPBACK" = true ] && [ "$net" = "loopback" ] && continue
        is_watchdog_enabled_for_interface "$net" || continue
        run_watchdog_for_interface "$net"
    done
}

main
exit 0

For it to do anything, you have to set the following uci keys in your network.interface

serverwatchdog, controls if the watchdog is enabled for this interface
serverpriority, a list of dhcp server by priority
dhcp.watchdoglog, controls watchdog log method (system, off or file)

By default, it logs to system (logger) log

# Log to file (will truncate if > 100 KB)
uci set dhcp.watchdoglog='/tmp/dhcp.watchdog.log'
# Disable all logging (completely silent)
uci set dhcp.watchdoglog=0
# Use system log (default behavior if unset)
uci delete dhcp.watchdoglog

Next enable the dhcp watchdog script per interface, example

# Enable watchdog on LAN
uci set dhcp.lan.serverwatchdog=1
uci add_list dhcp.lan.serverpriority=192.168.1.1
uci add_list dhcp.lan.serverpriority=192.168.1.2
uci commit dhcp

# Enable watchdog on WAN (only if you need DHCP there, usually you don’t)
uci set dhcp.wan.serverwatchdog=1
uci add_list dhcp.wan.serverpriority=10.0.0.1
uci add_list dhcp.wan.serverpriority=10.0.0.2
uci commit dhcp

# Enable watchdog on VPN (assuming vpn is a valid UCI interface)
uci set dhcp.vpn.serverwatchdog=1
uci add_list dhcp.vpn.serverpriority=172.16.0.1
uci add_list dhcp.vpn.serverpriority=172.16.0.2
uci commit dhcp

Finally, the watchdog script is added to cron to be executed at a regular interval (every minute)

(crontab -l 2>/dev/null; echo "*/1 * * * * /usr/bin/dhcp-interface-watchdog.sh") | crontab -

Next will be, synchronizing both active leases and lease configuration from other hosts

NOTE : this code is untested

Honestly, it's 2025, consider just turning off ipv4, and running NAT64 and DNS64 on your router. Then, let all the devices on your network get themselves SLAAC addresses and drop DHCP entirely.

Did you write the code yourself? Why is it posted if not yet tested?

1 Like

I will come back and edit it as I finish it.
I will remove this notice once it is tested.

It is not tested because it also needs another script to synchronizing the leases and the config

However, I am fairly confident it would work as is.
If there is something wrong with it, with how well commented it is, it should be easy to figure out whatever might be wrong with it.

My gpon isp does not support ipv6, I asked, and they are a natural monopoly.

1 Like

Where did it come from?

Openwrt does not use bash (by default).

while you wait for them to get their act together, instead of using private ipv4 space like 10.1.23.0/24 or 192.168.0.0/24 or whatever you can actually just use your ULA prefix to give all your machines internal IPv6 addresses. Then use NAT64 and DNS64 on the router.

Depending on your hardware and software this can be a minor issue, like for example if you have a managed switch from TP-Link that only supports ipv4 or whatever, but a surprisingly large amount of stuff will just work these days on ULA + NAT64/DNS64

This is a very very bad ideas as nothing directs clients to request/renew lease from one dhcp server vs the other. This can cause major problems when a client attempts to renew a lease from the “other” server and this will cause a NACK since it that server doesn’t have the address in its pool. This ends up casing issues for the clients.

1 Like

That is not really how it works. The expectation is that there is only one dhcp server on the network.

1 Like

I’d actually suggest that it is dumb luck that you haven’t encountered any issues.

By default, dnsmasq is configured to check if there is another dhcp server on the network. If it finds one, it does not start its own. If it does not detect another server, it is considered safe for it to start one. This check happens only when the server attempts to start, so it will not “detect” that the other server has failed unless the system or service is restarted.

You can override this with the force option, but the reason this requires an override is that you generally shouldn’t be architecting a network with 2 or more dhcp servers, and it is trying to prevent accidental (or semi-intentional) problems on the network.

That said, consider that many home and small business networks have the dhcp server on the network gateway. If the dhcp server fails, it is likely that it is because of other issues with that main router. And, if that is the case, chances are the network has bigger issues (like no route to the internet). Thus, a “backup” dhcp server really doesn’t provide that much value.

In large networks (enterprise), the reason for multiple dhcp servers has to do with load balancing a huge number of client machines (well into the thousands), and it is part of a HA strategy that includes alternate routes to the internet or other critical resources.

2 Likes

OMG I just reread my post, LOL I do all of this on 2 pihole's not on openwrt, I have to disable DHCP in openwrt to make it work

I will now delete my posts I don't know what I was thinking, Too much coffee maybe

Just the same, though, only one of them should have the dhcp server enabled (if used). Redundant dns is a good thing, though.

1 Like