WireGuard debug information

Livy · December 25, 2022, 1:02pm

I don't know the idea behind the kernel module of WireGuard, which is completely silent about connection error, which is a terrible user experience I have ever had on Linux.

I have a WireGuard connection which works for weeks, then today I changed some network config in my /etc/config/network on one of the 2 routers, and then reboot. After that the WireGuard connection stopped working. I tried rebooting both routers, still the same. Both routers could ping each other. I gave up and went do some other stuffs. After a few hours it magically worked again. The configuration changes were not WireGuard related and both routers have static public IP addresses.

Imagine if this were a production environment, it would have been a disaster. I cannot deploy such a system which works whenever it feels like to, without any information why it works and why it does not work.

I know this is not the right place to complain, but for ƒuck sake I think Linux kernel developer should re-think about how useful and usable this piece of crap is to the end users. If it works it is fast, but is a horrible user experience. The last time I scratched my head a whole day just because one of the firewalls between 2 routers is configured to block WireGuard. And yes there was no error, just no handshake between peers.

Almost all system administrators just go with Fortinet or Cisco and their proprietary technologies, which are easy to setup and maintain, and leave the annual license cost for their company to pay. I love open source software, and spend a lot of my time studying how to use, and most importantly how it works so I can troubleshoot later. But if things are like WireGuard, then all are just a waste of time.

The WireGuard problem is not just on OpenWrt, it is everywhere. You can find countless threads on Reddit about it.

Crect · December 25, 2022, 1:23pm

I'm using wireguard for years already both for S2S and P2S and if my internet is up it's rock solid. You must be doing something wrong.

lleachii · December 25, 2022, 1:55pm

Point-to-point (or key-based) configs are usually setup and left untouched. Wireguard either connects/handshakes or not. It's encrypted; an error is no connection.

What config change did you actually make?

Livy · December 25, 2022, 3:38pm

I mostly cleaned up my switch VLAN and some interface configurations, without touching WireGuard section. If you Google around, you'll see various people have issue with WireGuard when the Internet is disconnected then reconnected. Sometimes it reconnect to peer successfully, the other times it just hangs in place until the WireGuard interface is restarted.

And yeah it doesn't have an option to restart a "peer" connection, you have to do it on an interface level. It is nothing but annoying.

Nihilokrat · December 25, 2022, 4:41pm

You can easily work around this with a cronjob that executes a script for deleting/adding the wg interface every X minutes based on connectivity status.

psherman · December 25, 2022, 5:00pm

The great thing about VPNs is that there are many different protocols to choose from. If WG doesn't work for you for whatever reason, use something else.

Since you're frustrated about the way that WG works in general (not specific to OpenWrt), your best option would be to either inform the WG developers themselves (wireguard.com), or just use something else.

ulmwind · December 25, 2022, 5:21pm

It is my idea!

mk24 · December 25, 2022, 6:04pm

If both ends have static public IPs you can point both peers specifically at each other by IP and port. This is also how IPSec works.

Most corporate and national firewalls are configured to drop rather than reject, which does make it difficult to troubleshoot why a packet did not go through.

jedboy · December 25, 2022, 6:32pm

I agree with @Livy having no logging for Wireguard sucks.

I have a Wireguard tunnel to a VPN provider that is not currently working. I'm not looking for help troubleshooting, just adding an example where some kind of logging would help.

I didn't change anything, just power cycled. Other end is commercial provider, I don't think they changed anything.

The wg0 interface comes up, the default route gets set to wg0.
'wg show' shows most of the normal lines, but no handshake.

From router, can't ping 8.8.8.8 or connect to anything. So, tunnel seems like it is halfway up (took over route, wg0 shows in ifconfig). But, no handshake line in 'wg show' and no internet connectivity.

Would be nice if there was some logging to give a hint why. In this case I suspect an expired key, but why can't a log tell me that?

psherman · December 25, 2022, 6:36pm

Just guessing here, but I suspect your problem is related to time. If the WG interface is started before the time has been synced (NTP), the time may be off (in the past) by a significant amount since most routers do not have an RTC. Therefore, delaying your WG interface start until after NTP success will usually fix the issue.

jedboy · December 25, 2022, 6:47pm

Thanks for the idea. Not the issue in my case. Did 'ifdown wg0' then ntpdate and ifup. End up back in the same place. But, if that was the problem, why couldn't their be a log telling your that '... time out of tolerance ...'.

I know OpenWrt can't fix Wireguard, just piling on.

I likely will choose to keep using Wireguard.

For troubleshooting, even though there isn't much to go off of, there also isn't a long list of things to check.

psherman · December 25, 2022, 6:54pm

Wireguard is not a "chatty" protocol... this will be important in a moment.

On your local side, the device doesn't know that the clock is off.
On the far side, the peer will sees cryptographic information that is not quite right because the time is off*.
Since WG will only respond if the cryptographic info is correct, it issues no response at all. This is to prevent 'replay attacks' where a third party has intercepted a data stream and replays it sometime in the future to try to establish a connection and break the encryption keys.

*I'm not exactly sure if the time is actually used as part of the crypto recipe (thus making the entire cryptographic keyset invalid) or if it is simply sent along side the key information, but either way, the far side peer sees the data as invalid and evidence of a potential replay attack.

So, since WG does not reply unless everything is correct, it issues no reply at all and therefore there is no way for your side to be notified that the problem is related to the time.

slh · December 25, 2022, 9:18pm

And time certainly is a recurrent issue on plastic routers and SBCs without a battery backed RTC, but can also be problematic (in the sense of significant clock drift and jumps) for badly set up virtual machines.

Livy · December 27, 2022, 11:00am

I know the time sync issue with WireGuard. That's why I always ensure that NDP servers are setup correctly on both sides. On my QEMU virtual machines, I always use -rtc base=utc,clock=host to avoid time drifting issue.

My WireGuard interface has been working well since the creation of this thread. I still need to test on more devices, with a longer duration and various network instability scenarios before I am more confident to deploy this in production environment.