Keepalived not working

Really hope someone can help or at least tell my I'm out of my mind.

Each of openwrt router has 2x WAN ports, WANa (eth1), WANb (eth2), and 1x LAN port (eth5).
Using 2x vrrp_instances to monitor the 2x WAN interfaces respectively. Using ping 8.8.8.8 to detect the WAN port, if one of the WAN ports has a problem, switch "both" to the Backup and shutdown the LAN port(eth5).

MASTER:

cat /etc/keepalived/keepalived.conf

global_defs {
    router_id HUAWEI1
}

vrrp_script check_wan {
    script "/usr/bin/test $(ping -c 3 -I $INTERFACE 9.9.9.1 | grep 'received' | awk -F ',' '{print $2}' | awk '{print $1}') -eq 0 && ip link set eth5 down"    
    interval 10
    weight 3
}

vrrp_sync_group BOX1 {
    group {
        WANa
        WANb
    }
}

vrrp_instance WANa {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 101
    advert_int 1
    virtual_ipaddress {
        8.8.8.2
    }
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth1
        eth2
    }
    track_script {
        check_wan {
            interface eth1
        }
    }

}

vrrp_instance WANb {
    state MASTER
    interface eth2
    virtual_router_id 51          
    priority 101
    advert_int 1
    virtual_ipaddress {
        9.9.9.2
    }
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth1
        eth2
    }
    track_script {
        check_wan {
            interface eth2
        }
    }

}

BACKUP:

cat /etc/keepalived/keepalived.conf

global_defs {
    router_id HUAWEI2
}

vrrp_script check_wan {
    script "/usr/bin/test $(ping -c 3 -I $INTERFACE 9.9.9.1 | grep 'received' | awk -F ',' '{print $2}' | awk '{print $1}') -eq 0 && ip link set eth5 down"   
    interval 10
    weight 3
}

vrrp_sync_group BOX2 {      
    group {
        WANa
        WANb
    }
}

vrrp_instance WANa {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 99
    advert_int 1
    virtual_ipaddress {
        8.8.8.2
    }
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth1
        eth2
    }
    track_script {
        check_wan {
            interface eth1
        }
    }

}

vrrp_instance WANb {
    state BACKUP
    interface eth2
    virtual_router_id 51
    priority 99
    advert_int 1
    virtual_ipaddress {
        9.9.9.2
    }
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth1
        eth2
    }
    track_script {
        check_wan {
            interface eth2
        }
    }

}

Problem 1:
script "/usr/bin/test $(ping -c 3 -I $INTERFACE 8.8.8.8 | grep 'received' | awk -F ',' '{print $2}' | awk '{print $1}') -eq 0 && ip link set eth5 down" works when tested on the command line. When ping to 9.9.9.1 is not reachable, it will shutdown eth5. For example: /usr/bin/test $(ping -c 3 -I eth1 9.9.9.1 | grep 'received' | awk -F ',' '{print $2}' | awk '{print $1}') -eq 0 && ip link set eth5 down

However, when this command is put into keepalived.conf with script "", it does not work and eth5 will not be shut down.

Problem 2:
When ping 9.9.9.1 is not reachable, I did not see the two WAN ports switch to the backup. When I enter "ip a" on the MASTER, I can still see the virtual IP addresses: 8.8.8.2 and 9.9.9.2 as below:

root@OpenWrt:~# ip  a
...

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:90:27:e7:17:02 brd ff:ff:ff:ff:ff:ff
    inet 8.8.8.8/24 brd 8.8.8.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 8.8.8.2/32  scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::290:27ff:fee7:1702/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:90:27:e7:17:03 brd ff:ff:ff:ff:ff:ff
    inet 9.9.9.8/24 brd 9.9.9.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet 9.9.9.2/32 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::290:27ff:fee7:1703/64 scope link 
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond-lag1 state UP  group default qlen 1000
    link/ether 00:90:27:e7:17:04 brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond-lag1 state UP group default qlen 1000
    link/ether 00:90:27:e7:17:04 brd ff:ff:ff:ff:ff:ff permaddr 00:90:27:e7:17:05
7: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-lan state UP group default qlen 1000
    link/ether 00:90:27:e7:17:06 brd ff:ff:ff:ff:ff:ff
...

Can you try putting the command into a sh file and referring to that in your config?

1 Like

Sure, I also changed the configuration to be simpler, mainly for testing: When ping 9.9.9.1 fails, shutdown eth5. I haven't seen the error log yet, but eht5 just can't be closed.

First I created the ping_sh.sh file, which is in sh format. and changed its permissions chmod +x /etc/keepalived/ping_sh.sh

root@OpenWrt:~# ll /etc/keepalived/ping_sh.sh
-rwxrwxrwx    1 root     root           104 Mar 28 00:08 /etc/keepalived/ping_sh.sh*
cat /etc/keepalived/ping_sh.sh
#!/bin/sh
if [ $(ping -c 3 -I eth1 9.9.9.1 | grep "received" | awk -F "," '{print $2}' | awk '{print $1}') -eq 0 ]; then
  ifconfig eth5 down
fi
sleep 3
cat /etc/keepalived/keepalived.conf

global_defs {
    script_user nobody
    router_id WEI1
}

vrrp_script check_wana {
    script '/etc/keepalived/ping_sh.sh'
    interval 10
    weight -10
}



vrrp_instance WANa {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 101
    advert_int 1
    virtual_ipaddress {
        9.9.9.2
    }
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_interface {
        eth1
        eth2
    }

    track_script {
        check_wana
    }

    preempt_delay 20
    preempt
}

In the following log, there seems to be no error

root@OpenWrt:~# logread -e keepalived
Tue Mar 28 01:33:16 2023 daemon.info Keepalived[4689]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:16 2023 daemon.info Keepalived[4689]: Command line: '/usr/sbin/keepalived'
Tue Mar 28 01:33:16 2023 daemon.info Keepalived[4689]: Configuration file /etc/keepalived/keepalived.conf
Tue Mar 28 01:33:16 2023 daemon.info Keepalived[4690]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Tue Mar 28 01:33:24 2023 daemon.info Keepalived[4749]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:24 2023 daemon.info Keepalived[4749]: Command line: '/usr/sbin/keepalived' '-n' '-f' '/tmp/keepalived.conf'
Tue Mar 28 01:33:24 2023 daemon.info Keepalived[4749]: Configuration file /tmp/keepalived.conf
Tue Mar 28 01:33:29 2023 daemon.info Keepalived[4812]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:29 2023 daemon.info Keepalived[4812]: Command line: '/usr/sbin/keepalived' '-n' '-f' '/tmp/keepalived.conf'
Tue Mar 28 01:33:29 2023 daemon.info Keepalived[4812]: Configuration file /tmp/keepalived.conf
Tue Mar 28 01:33:34 2023 daemon.info Keepalived[4814]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:34 2023 daemon.info Keepalived[4814]: Command line: '/usr/sbin/keepalived' '-n' '-f' '/tmp/keepalived.conf'
Tue Mar 28 01:33:34 2023 daemon.info Keepalived[4814]: Configuration file /tmp/keepalived.conf
Tue Mar 28 01:33:39 2023 daemon.info Keepalived[4822]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:39 2023 daemon.info Keepalived[4822]: Command line: '/usr/sbin/keepalived' '-n' '-f' '/tmp/keepalived.conf'
Tue Mar 28 01:33:39 2023 daemon.info Keepalived[4822]: Configuration file /tmp/keepalived.conf
Tue Mar 28 01:33:44 2023 daemon.info Keepalived[4824]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:44 2023 daemon.info Keepalived[4824]: Command line: '/usr/sbin/keepalived' '-n' '-f' '/tmp/keepalived.conf'
Tue Mar 28 01:33:44 2023 daemon.info Keepalived[4824]: Configuration file /tmp/keepalived.conf
Tue Mar 28 01:33:49 2023 daemon.info Keepalived[4836]: WARNING - keepalived was built for newer Linux 5.10.168, running on Linux 5.10.161 #0 SMP Tue Jan 3 00:24:21 2023
Tue Mar 28 01:33:49 2023 daemon.info Keepalived[4836]: Command line: '/usr/sbin/keepalived' '-n' '-f' '/tmp/keepalived.conf'
Tue Mar 28 01:33:49 2023 daemon.info Keepalived[4836]: Configuration file /tmp/keepalived.conf
Tue Mar 28 01:33:49 2023 daemon.info procd: Instance keepalived::instance1 s in a crash loop 6 crashes, 0 seconds since last crash

the ip a command shows the interface eth5 is still up, when ping 9.9.9.1 fails

root@OpenWrt:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-lan state UP group default qlen 1000
    link/ether 00:90:27:e7:17:01 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:90:27:e7:17:02 brd ff:ff:ff:ff:ff:ff
    inet 8.8.8.8/24 brd 8.8.8.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet 8.8.8.2/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::290:27ff:fee7:1702/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:90:27:e7:17:03 brd ff:ff:ff:ff:ff:ff
    inet 9.9.9.8/24 brd 9.9.9.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::290:27ff:fee7:1703/64 scope link 
       valid_lft forever preferred_lft forever
5: eth3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond-lag1 state UP group default qlen 1000
    link/ether 00:90:27:e7:17:04 brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond-lag1 state UP group default qlen 1000
    link/ether 00:90:27:e7:17:04 brd ff:ff:ff:ff:ff:ff permaddr 00:90:27:e7:17:05
7: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br-lan state UP group default qlen 1000
    link/ether 00:90:27:e7:17:06 brd ff:ff:ff:ff:ff:ff

This should be root. Nobody wouldn't be able to shut down the interface. Does the script work when you run it from the shell?

Also at the end can you add echo $?

The script must return 0 if everything is ok and non zero if there's a problem.

1 Like

Thanks again, I just recognized this fault, I should use root and give echo $. Actually I have another problem, I've noticed that the keepalived on openwrt doesn't support notify and notify_master. I'm wondering how to detect the state when it switches to BACKUP and accordingly shut down the eth5 interface. When it switches back to MASTER, how can I open the eth5 interface? Do you have any guidance or suggestions? Thank you!