Wireguard site-to-site: watchdog not working on 22.3.5?

Hi. I set up site-to-site wireguard for some time, following this page:
https://openwrt.org/docs/guide-user/services/vpn/wireguard/site-to-site. It had been working well, between fritzbox 4020 (Openwrt 21.2) and EdgeRouter X (openwrt 22.3.3), together with wireguard watchdog, following the advice given in
https://openwrt.org/docs/guide-user/services/vpn/wireguard/extras#dynamic_address

Without watchdog, the link has breaking every once in a while.

A few days ago, I replaced Fritzbox 4020 with EdgeRouterX, with Openwrt 22.3.5. I did the same thing as with fritzbox 4020, together with the watchdog. However, the connection breaks intermittently, as if there is no watchdog. It seems like the connection to the other router isn't bad (though it still does fail sometimes), but ping to another device on the other network is more unstable. If I let ping run for, say 20sec, it goes through in the end. So the configuration itself should be fine (I copied from fritzbox), I think it's something with the watchdog with the newer version of openwrt.

I would appreciate it if someone could please give me advice.

The watchdog script must log a message when the condition is met:

Check the log when the issue happens:

logread -e wireguard_monitor

Thank you for your reply! I just did logread -e wireguard_monitor. These messages:
Sun Aug 13 18:51:01 2023 user.notice wireguard_monitor: wg_s2s_b endpoint xx.ssacam.net:5nnn is not responding for 179 seconds, trying to re-resolve hostname were there, but oddly, the time stamps are different from the ones on the emails I get when the connection fails: ping from the other router. For example, the time stamp around that time on the email is 18:02, that's not close to 18:51. On the other router (openWRT 22.3.3), logread -e wireguard_monitor doesn't give anything (watchdog is set up properly), so I assume everything is ok, though I don't really understand why this side is "OK" while the other side is not, when it's about connection.... it seems to be down about 2-3 times a day, but within 1-2 min, it becomes ok again.

I just noticed something weird:

The port for Endpoint changes every few minutes: it shows 61205 now, but I haven't set anything with that port. It should be 51822 always. Why is it showing 61205? I don't understand where that's coming from.... I will appreciate if someone could tell me what to undertand about it...

I think that is the port the client is listening on.

On the client you can set a fixed port but if you do not set a port it will choose a random port

I thought I set a fixed port.... I just copied the config from Fritzbox, on which wiregaurd was working just fine, and the port is part of that config. I wonder why it's choosing a random port. Moreover, it's not choosing a random port all the time, it does stay on the right port, only, every several seconds, it wants to wander away from it....

Maybe the port is in use somewhere else you can not have duplicates

I just shut down the wireguard interface on Fritzbox: in fact I was doing the following: My old constellation is
Fritzbox 4020 (OpenWRT) behind Fritzbo 7530 -- ISP. Port forwarding from 7530 to 4020 was given for wireguard.

I replaced Fritzbox 4020 with EdgeRouter X, but I left FB4020 on 7530 with WAN side open, so that I can look at the config later. I disabled Port forwarding to 4020, because I need it for ERX. I thought it would cut fb4020 off from wireguard-traffic. But now I looked at fb4020, I saw TX and RX being more than zero.

Could it be that fb4020 had been interfering with wireguard ? At least I see that "Endpoint adress:port" doesn't flip around anymore. But how could that happen, if FB7530 is forwarding the port only to ERX, and not to FB4020???

It seems like this really was the cause: namely I left the FB4020 still behind FB7530, along with ERX which is currently in use. I left WAN side of both of them open (I do it when I'm still unsure about the config: just in case I mess up the LAN-side I can still configure from the WAN side), may be something weird was going on between them.... anyway, I stopped wireguard-interface of FB4020, closed WAN side and opened only the port for wireguard, it seems that wireguard site-to-site with ERX is now stable.

Thanks a lot anyways!!!

Since you see the log messages, the watchdog script must be working, but you may also want to confirm that name resolution works correctly when the script is triggered.

Keep in mind that WireGuard is time sensitive, and time synchronization depends on connectivity, so unstable system time can be both cause and effect of WireGuard connection problems.

Also note that a peer is likely behind NAT when its configured port does not match the port detected by the other peer.

@vgaetera Thank you for your hints ! You are right, in the course of checking things out, I did notice that the time zone was set wrong on the new ERX, so I fixed it. May be that was it, and not the old fritzbox4020 that was behind the modem router along with ERX.
Config itself as well as necessary port-forwarding must have been OK, because it's the identical config that had been running on FB4020. If the time is very important, I guess wrong time zone was the cause of instability? On the other hand, the time server that was used are the same: just the default of OpenWRT.
On the other hand, I thought wireguard should work even if the two routers are in different time zones....

The name resolution (are you talking about dynamic IP?) wouldn't work for a short time when the public IP changes, but it does it only once in a few months, and not 5 times a day, so I guess that wasn't the problem.
The port number was changing all the time like crazy, so I think something about the port was definitely the problem. Now it stopped changing entirely, after I stopped wireguard interface of fb4020, and setting the right time zone, closed access from WAN side to ERX except for the ports needed for wireguard. Which one of them did the job, I don't know.....

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.