Wireguard on a Router - never ending reboots after adding "watchdog" in Crontab

Hello,
i followed this guide:

on a TL-WDR3500 with a freshly installed

openwrt-22.03.5-ath79-generic-tplink_tl-wdr3500-v1-squashfs-factory.bin

All works fine after reboots. Then i added this "watchdog", which should reboot the router when something went wrong. But this script / crontab makes the device rebooting about every 1 min. I have checked this with: "tcpdump -Ani eth1 port 4919 and udp"

And to get ssh in failsafe mode to correct this error, I pressed the wps/reset button as soon as “Please press button now to enter failsafe” occur on terminal.

Add a watchdog (optional)

Adding a watchdog will ensure that the router restarts if anything stops working.

Important: Complete this step only after you have confirmed that the router is working properly.

Use SSH to log in to the router and add the file wg-watchdog.sh (provided below) in /root using nano.

First install nano: opkg update && opkg install nano`

Then run the command nano /root/wg-watchdog.sh.

The wg-watchdog.sh file:

#!/bin/sh
# ping mullvad dns that can only be reached via the VPN tunnel
# if no contact, reboot!

tries=0
while [[ $tries -lt 5 ]]
do
        if /bin/ping -c 1 10.64.0.1
        then
                echo "wg works"
                exit 0
        fi
        echo "wg fail"
        tries=$((tries+1))
done
echo "wg failed 5 times - rebooting"
reboot

Make the file executable using the command chmod +x /root/wg-watchdog.sh.

Afterward, add the following entry in System → Scheduled Tasks in LuCI:

*/10 * * * * /root/wg-watchdog.sh

How to repair this wg-watchdog.sh script, so it only reboots when a real problem occurs? Thank you.

Solution is there:

in root terminal (e.g. ssh root@192.168.1.1) do:

rm /etc/crontabs/root (empty this to make sure the failing mullvad suggested watchdog is removed)
echo '* * * * * /usr/bin/wireguard_watchdog' >> /etc/crontabs/root
reboot

please post command:

cat /etc/crontabs/root

root@(none):/rom/root# cat /etc/crontabs/root
*/10 * * * * /root/wg-watchdog.sh

please post command:

/bin/ping -c 5 10.64.0.1

and the command:

route

within failsafe mode:
root@(none):/rom/root# /bin/ping -c 5 10.64.0.1

PING 10.64.0.1 (10.64.0.1): 56 data bytes
ping: sendto: Network unreachable

and

root@(none):/rom/root# route

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0.1

I'm not sure why you wrote your watchdog script to do that, but you know Wireguard comes with a watchdog, correct?

/usr/bin/wireguard_watchdog

3 Likes

Because I doing what the description says and the description I followed uses hard coded IP addresses and does not need to resolve any names to IPs. So there is no need for this, I think. I have read this:

1 Like

I assume that you have activated the firstboot function by mistake,

so you'll have to reconfigure everything from scratch and then test one step at a time

  1. install
  2. configure wan
  3. configure wg / mullvad.net
  4. check that everything works properly
  5. double check that everything works properly and then you can think about what to do to restart the router in case of failure
2 Likes

restart the wireguard interface

systemctl restart wg-quick@wg0

and restarting the router are two different things

reboot
1 Like

OK, correct - but in this case, you shouldn't need any watchdog.

1 Like

still in failsafe mode i did:
root@(none):/rom/root# rm /etc/crontabs/root
and rebooted.

I also was wondering why my linux vm I am working on says this:

ssh root@192.168.1.1
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:snip
Please contact your system administrator.
Add correct host key in /home/kali/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/kali/.ssh/known_hosts:1
  remove with:
  ssh-keygen -f "/home/kali/.ssh/known_hosts" -R "192.168.1.1"
Host key for 192.168.1.1 has changed and you have requested strict checking.
Host key verification failed.

So i did this to get ssh shell in failsafe mode (yesterday this was not neccessary after multiple reboots.

ssh-keygen -f "/home/kali/.ssh/known_hosts" -R "192.168.1.1"
# Host 192.168.1.1 found: line 1
/home/kali/.ssh/known_hosts updated.
Original contents retained as /home/kali/.ssh/known_hosts.old
1 Like

Because you reset the router hence a new key was generated.

Remember, you're the system administrator.

correct ...

ssh key change ...

try again ...

2 Likes

I assume that you have activated the firstboot function by mistake,

No i did not activated "firstboot function". I could not figure out what this means, but now i have deleted the crontab file named "root", did a reboot and now the devices works good and it does not reboot as before in a loop.
In normal operation mode:

traceroute to openwrt.org (139.59.209.225), 20 hops max, 46 byte packets
 1  10.64.0.1  24.870 ms
 2  185.213.155.65  24.798 ms
 3  80.81.195.151  20.927 ms
 4  *
 5  *
 6  *
 7  *
 8  139.59.209.225  26.592 ms

The last months i followed mullvads instructions for some openwrt capable routers. After a few weeks uptime sometimes the connection get lost (connected clients says, no working internet connection), so i think its a good idea to use this "watchdog" to not manual reboot this devices, but this rans into a loop (device reboots after every minute).

Can u help to figure out the problem and make a watchdog with a bigger delay? Thank u

I did not reset the router. I just entered failsafe mode to get a ssh connection / because without failsafe mode it does the crontab script that is triggering a reboot to fast.

Just increase the time.

Your configs are not used in failsafe, hence a new key.

Remember, i have asked for help :slightly_smiling_face:

Not sure what that means. But I asssme the information about the longer cron period wasn't helpful?

Since I'm not sure why your script pings a private IP, I'm not sure how to assist fruther with this watchdog script.

Why do you think a watchdog is needed?

Does this remote [Mullvad endpoint] IP fail often?

Does the 10.x.x.x IP fail?

Do you loose Internet connection often?

1 Like

do not use the script found:

but use a similar script that can only restart wireguard:

#!/bin/sh
# ping mullvad dns that can only be reached via the VPN tunnel

ping -c 1 8.8.8.8
if [ $? -eq 0 ]; then
tries=0
while [[ $tries -lt 5 ]]
do
        if /bin/ping -c 1 10.64.0.1
        then
                echo "wg works"
                exit 0
        fi
        echo "wg fail"
        tries=$((tries+1))
done
echo "wg failed 5 times - restart"
ifdown wg0
sleep 20
ifup wg0
fi

at least you avoid rebooting a whole router

anyway I would wait for the opinion of

who is much more competent than me.

1 Like

Why not restart WireGuard instead of rebooting the whole router?
EDIT: like the script of @ncompact :+1:

Or first restart WireGuard and if that does not help then reboot?

Looks like a very crude solution it is not even pinging through the WG interface :frowning:

2 Likes