Wireguard on a Router - never ending reboots after adding "watchdog" in Crontab

Fridolin · August 13, 2023, 4:27pm

Hello,
i followed this guide:

on a TL-WDR3500 with a freshly installed

openwrt-22.03.5-ath79-generic-tplink_tl-wdr3500-v1-squashfs-factory.bin

All works fine after reboots. Then i added this "watchdog", which should reboot the router when something went wrong. But this script / crontab makes the device rebooting about every 1 min. I have checked this with: "tcpdump -Ani eth1 port 4919 and udp"

And to get ssh in failsafe mode to correct this error, I pressed the wps/reset button as soon as “Please press button now to enter failsafe” occur on terminal.

Add a watchdog (optional)

Adding a watchdog will ensure that the router restarts if anything stops working.

Important: Complete this step only after you have confirmed that the router is working properly.

Use SSH to log in to the router and add the file wg-watchdog.sh (provided below) in /root using nano.

First install nano: opkg update && opkg install nano`

Then run the command nano /root/wg-watchdog.sh.

The wg-watchdog.sh file:

#!/bin/sh
# ping mullvad dns that can only be reached via the VPN tunnel
# if no contact, reboot!

tries=0
while [[ $tries -lt 5 ]]
do
        if /bin/ping -c 1 10.64.0.1
        then
                echo "wg works"
                exit 0
        fi
        echo "wg fail"
        tries=$((tries+1))
done
echo "wg failed 5 times - rebooting"
reboot

Make the file executable using the command chmod +x /root/wg-watchdog.sh.

Afterward, add the following entry in System → Scheduled Tasks in LuCI:

*/10 * * * * /root/wg-watchdog.sh

How to repair this wg-watchdog.sh script, so it only reboots when a real problem occurs? Thank you.

Solution is there:

in root terminal (e.g. ssh root@192.168.1.1) do:

rm /etc/crontabs/root (empty this to make sure the failing mullvad suggested watchdog is removed)
echo '* * * * * /usr/bin/wireguard_watchdog' >> /etc/crontabs/root
reboot

ncompact · August 13, 2023, 4:38pm

please post command:

cat /etc/crontabs/root

Fridolin · August 13, 2023, 4:40pm

root@(none):/rom/root# cat /etc/crontabs/root
*/10 * * * * /root/wg-watchdog.sh

ncompact · August 13, 2023, 4:41pm

please post command:

/bin/ping -c 5 10.64.0.1

and the command:

route

Fridolin · August 13, 2023, 4:43pm

within failsafe mode:
root@(none):/rom/root# /bin/ping -c 5 10.64.0.1

PING 10.64.0.1 (10.64.0.1): 56 data bytes
ping: sendto: Network unreachable

and

root@(none):/rom/root# route

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0.1

lleachii · August 13, 2023, 4:46pm

I'm not sure why you wrote your watchdog script to do that, but you know Wireguard comes with a watchdog, correct?

/usr/bin/wireguard_watchdog

Fridolin · August 13, 2023, 4:52pm

Because I doing what the description says and the description I followed uses hard coded IP addresses and does not need to resolve any names to IPs. So there is no need for this, I think. I have read this:

ncompact · August 13, 2023, 4:52pm

I assume that you have activated the firstboot function by mistake,

so you'll have to reconfigure everything from scratch and then test one step at a time

install
configure wan
configure wg / mullvad.net
check that everything works properly
double check that everything works properly and then you can think about what to do to restart the router in case of failure

ncompact · August 13, 2023, 4:55pm

restart the wireguard interface

systemctl restart wg-quick@wg0

and restarting the router are two different things

reboot

lleachii · August 13, 2023, 4:56pm

OK, correct - but in this case, you shouldn't need any watchdog.

Fridolin · August 13, 2023, 5:00pm

still in failsafe mode i did:
root@(none):/rom/root# rm /etc/crontabs/root
and rebooted.

I also was wondering why my linux vm I am working on says this:

ssh root@192.168.1.1
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:snip
Please contact your system administrator.
Add correct host key in /home/kali/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/kali/.ssh/known_hosts:1
  remove with:
  ssh-keygen -f "/home/kali/.ssh/known_hosts" -R "192.168.1.1"
Host key for 192.168.1.1 has changed and you have requested strict checking.
Host key verification failed.

So i did this to get ssh shell in failsafe mode (yesterday this was not neccessary after multiple reboots.

ssh-keygen -f "/home/kali/.ssh/known_hosts" -R "192.168.1.1"
# Host 192.168.1.1 found: line 1
/home/kali/.ssh/known_hosts updated.
Original contents retained as /home/kali/.ssh/known_hosts.old

lleachii · August 13, 2023, 5:04pm

Because you reset the router hence a new key was generated.

Remember, you're the system administrator.

ncompact · August 13, 2023, 5:04pm

correct ...

ssh key change ...

try again ...

Fridolin · August 13, 2023, 5:08pm

I assume that you have activated the firstboot function by mistake,

No i did not activated "firstboot function". I could not figure out what this means, but now i have deleted the crontab file named "root", did a reboot and now the devices works good and it does not reboot as before in a loop.
In normal operation mode:

traceroute to openwrt.org (139.59.209.225), 20 hops max, 46 byte packets
 1  10.64.0.1  24.870 ms
 2  185.213.155.65  24.798 ms
 3  80.81.195.151  20.927 ms
 4  *
 5  *
 6  *
 7  *
 8  139.59.209.225  26.592 ms

The last months i followed mullvads instructions for some openwrt capable routers. After a few weeks uptime sometimes the connection get lost (connected clients says, no working internet connection), so i think its a good idea to use this "watchdog" to not manual reboot this devices, but this rans into a loop (device reboots after every minute).

Can u help to figure out the problem and make a watchdog with a bigger delay? Thank u

Fridolin · August 13, 2023, 5:10pm

I did not reset the router. I just entered failsafe mode to get a ssh connection / because without failsafe mode it does the crontab script that is triggering a reboot to fast.

lleachii · August 13, 2023, 5:11pm

Just increase the time.

Your configs are not used in failsafe, hence a new key.

Fridolin · August 13, 2023, 5:11pm

Remember, i have asked for help

lleachii · August 13, 2023, 5:12pm

Not sure what that means. But I asssme the information about the longer cron period wasn't helpful?

Since I'm not sure why your script pings a private IP, I'm not sure how to assist fruther with this watchdog script.

Why do you think a watchdog is needed?

Does this remote [Mullvad endpoint] IP fail often?

Does the 10.x.x.x IP fail?

Do you loose Internet connection often?

ncompact · August 13, 2023, 5:13pm

do not use the script found:

but use a similar script that can only restart wireguard:

#!/bin/sh
# ping mullvad dns that can only be reached via the VPN tunnel

ping -c 1 8.8.8.8
if [ $? -eq 0 ]; then
tries=0
while [[ $tries -lt 5 ]]
do
        if /bin/ping -c 1 10.64.0.1
        then
                echo "wg works"
                exit 0
        fi
        echo "wg fail"
        tries=$((tries+1))
done
echo "wg failed 5 times - restart"
ifdown wg0
sleep 20
ifup wg0
fi

at least you avoid rebooting a whole router

anyway I would wait for the opinion of

who is much more competent than me.

egc · August 13, 2023, 5:15pm

Why not restart WireGuard instead of rebooting the whole router?
EDIT: like the script of @ncompact

Or first restart WireGuard and if that does not help then reboot?

Looks like a very crude solution it is not even pinging through the WG interface