Wireguard on a Router - never ending reboots after adding "watchdog" in Crontab

we are trying to help you,

I'm not paid to help you
we are a forum if someone wants to give you a hand they will

Or first restart WireGuard

good idea ...

and if that does not help then reboot?

bad idea ....

example (I too have and will have problems):

1 Like

@lleachii U said:

Remember, you're the system administrator.

I answered to this:

Remember, i have asked for help :slightly_smiling_face:

That means, try to answer questions or give solution or dont say anything like "ur the admin", because this is not helpfull.

But I asssme the information about the longer cron period wasn't helpful?
Cant see a suggestion for a longer cron period.


Afterward, add the following entry in System → Scheduled Tasks in LuCI:
*/10 * * * * /root/wg-watchdog.sh

This will call the wg-watchdog.sh every 10 Minutes. So i would expect, if there are connection issues, the router would reboot after 10 Minutes, not 0 or 1 Minutes but after 10 Minutes. So making */10 bigger would not solve the problem. The problem is with this watchdog the router reboots withing 60 secondes after powering it up. This is a problem. Can u help solve it?

Script is pinging private IP 10.64.0.1 to check for a bad connection. Its a mullvad specific thing.

Why do you think a watchdog is needed?

To get rid of connectivity issues which occurs after some weeks of operation (uptime).

Does this remote [Mullvad endpoint] IP fail often?
Not so often, most mullvad wireguard servers are very stable. It would be great to be able to automatically switch to another Wireguard / Mullvad endpoint, if the rare case happens that e.g.

au-per-wg-301.relays.mullvad(.)net
IPv4 103.108.231.50
IPv6 2404:f780:8:deb::a01f

fails. But we not talking about failing remote endpoints. Because if one of my wireguard routers running a few weeks, the wlan clients connected to the router saying "no internet connection" and i still was not able to determine what the problem is (what would u do to debug this?). In such a case i have to manual reboot router or disconnect power for 1 sec and repower to get a working connection back. Thank u.

*/10 in the crontab does not mean "after 10 minutes", it means "every time the clock shows minutes that are evenly divisible by 10".

Another hint is that the router, lacking a realtime clock, will at bootup assume as the current time the modification time of the most recently changed file in /etc (before getting a better idea about the current time later on, usually from a time server).

Which would suggest that the minute timestamp for your most recently changed file in /etc for you ends in a "9", and that those 60 seconds aren't enough for your router to exit the failstate for your homebrew watchdog.

This is a common problem, usually solved by touching /etc/banner before rebooting, so the current time will be picked up at the next reboot. But that may, again, be quite close and maybe too close to the next divisible-by-10 minute marker.

I must say, though, that rebooting the router within minutes of a failstate is a rather blunt instrument. Especially since more refined ones, like the aforementioned wireguard watchdog, exist.

4 Likes

view this post from "takimata"

a good resolution

1 Like

systemctl restart wg-quick@wg0

How to find out in luci gui and in ssh what my wg-quick@something is named correctly? Thank u.

uci show | grep wireguard

ip link | grep "POINTOPOINT" | awk -F":" '{print $2}'


I made a mistake the command:

/etc/init.d/wireguard

does not exist ...

sorry ...

if wireguard interface = "wg0" then

ifdown wg0
sleep 20
ifup wg0

1 Like

I was referring to the output that said contact the administrator. I apologize for the confusion. Was j/k.

I'll think about troubleshooting steps. I've never personally experienced an issue like you described unless the remote IP had an issue.

I have WG interfaces running for months (maybe a year +) specifying the remote endpoint IP (instead of hostname.

The important question:

Does restarting WG actually fix anything?

Thank u @lleachii and @ncompact. I wil hav a look on it.

Remember: I have never had any problem with dns / resolving name, because mullvad recommend using hard coded ip addresses. I not need reresolv anything. So pls stop pointing me to solutions that resolv such dns issues.

No because at the moment there is no problem. But still want to know how to change the watchdog from mullvad or get good suggestions from people who know what there are talking about, so i can do this without hanging in a boot loop.

Temporarly i have removed this cronjob until solve the issue.

Give me a second to figure out if this would be helpfull (i not have any resolving issues, because im using hardcoded ip adresses instead of hostnames).

We are talking about a freshly installed wdr-3500. Maybe u have better devices, which can run for a long period without memory leaks or random "not working anymore". I have solved problems in the past by just power off power and power on the other openwrt wireguard routers. How to debug problems to solve them if any problems occur in future. I have experienced connection issues, syslog doesnt tell me enough about a problem, maybe im not enough administrator and still learning. So I ask for help. It help me a lot u point me to solutions. Thank u for that.

Since today i dont want to reboot it manual. I ask for help what is wrong with the script, why it is triggering so early and what could a script look like.

Maybe the problem in the other routers - which still are long time in use - i forgot the persistent keep alive. In this new router we talking about today there is a 25 sec keep alive. Maybe we dont need any solution because there will never be any problem, i dont know. Because i do this a few times every year (setting up router with mullvad wireguard) i want it works more relaiable (every few weeks, the wlan clients says: there is no internet connection --> i have to reboot the wg router).

root@wdr3500:~# wg show
interface: wg_mullvad
  public key: snip
  private key: (hidden)
  listening port: snip

peer: snip
  endpoint: 185.209.196.78:51820
  allowed ips: 0.0.0.0/0
  latest handshake: 1 minute, 37 seconds ago
  transfer: 21.02 KiB received, 35.55 KiB sent
  persistent keepalive: every 25 seconds
1 Like

You shouldn't need a listening port on the local side; but I don't think this would cause an issue.

I haven't once mentioned DNS issues - I have no clue what you're talking about. Please re-review my post. If you're asking me to stop assisting you, no worries - no need to be rude.

Yea, we already discussed this. To be clear - given your setup (i.e. with IPs only) I wouldn't think any watchdog is needed.

I believe takimata explain cron and how you can make the watchdog run at longer intervals - but I don't think it's necessary.

I would try restarting Wireguard when the issue occurs and see if that fixes it.

Yes - I understand you'll have to wait until the issue occurs.

1 Like

/usr/bin/wireguard_watchdog

I did not know that. It looks like the perfect solution for the cases I have experienced in the past. So i not even have to manually reboot the routers.

This watchdog script tries to reresolve and reconnect to inactive wireguard peers.

Use it for peers with a frequently changing dynamic IP.

persistent_keepalive must be set, recommended value is 25 seconds.

Run this script from cron every minute:

echo '* * * * * /usr/bin/wireguard_watchdog' >> /etc/crontabs/root

This looks great:

https://github.com/openwrt/openwrt/blob/master/package/network/utils/wireguard-tools/files/wireguard_watchdog

#This watchdog script tries to re-resolve hostnames for inactive WireGuard peers.
#Use it for peers with a frequently changing dynamic IP.
#persistent_keepalive must be set, recommended value is 25 seconds.
snip
#skip IP addresses
#check taken from packages/net/ddns-scripts/files/dynamic_dns_functions.sh
local IPV4_REGEX="[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}"
local IPV4=$(echo ${endpoint_host} | grep -m 1 -o "$IPV4_REGEX$")    #do not detect ip in 0.0.0.0.example.com

Because it checks if wireguard peers still connected or not and try to reconnect and logs any reconnect triggered event to syslog. Sounds like a good solution. I have no idea why mullvad.net gives us a instruction to boot loop the router... (also keep in mind to touch some file under /etc to get a proper time, sounds buggy). Im sorry for the post before where i said this would not be helpfull because i not use hostnames. Ur suggestion looks very helpfull. Good job @lleachii :+1:

1 Like

The listening port will be randomly generated. http://192.168.1.1/cgi-bin/luci/admin/network/network there leave the port options field blank and it will random generate this. So this is by design.

I thought you may have specified one in the config, if not no worries. Apologies if that wasn't already obvious and clear. Since you used wg show and didn't show the config, we wouldn't know.

I wish you the best with your WG config.

1 Like

This should be the explanation of the happening. Mullvad should update the Blog https://mullvad.net/de/help/running-wireguard-router/ last updated: 6 Juli 2023.

My TP-Link routers with openwrt + wireguard from mullvad are sometimes powered off for minutes or hours but after plug in they work as expected for days and weeks.

We also have to check, if the needed 25 second "keep alive" in wg-interface is set. Because the routers are connected as dhcp client next to another router by wifi or ethernet cable (blue port) and this sometimes changes and I want to work without leaks any or dropping internet/vpn-connection.

In wireshark we should expect on all the routers before create a wireguard tunnel and send all data through it there must be a ntp connection to update the exact time from a timeserver to be able to do the crypto things. Openwrt has a inbuilt preconfigured and working ntp-client. Normally, we have to do nothing to keep the correct time. Maybe the VPN tunnel prevent the inbuilt ntp-client from asking to the server and after few weeks problems occur... Need to know what is expected how often ntp-client would ask for time. I could check this with wireshark or have a look into syslog for "ntp" to figure out, is the wg-interface blocking the ntp-client. But for now, this is a good solution for wireguard users:

You just need to do this one thing:

Thank u @takimata @ncompact @lleachii for the great help and sorry for my rude speaking. All of you did a good job helping me and have to earn respect from people for your time you invest here and your experience u share with us.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.