Wireguard interfaces sometimes do not come up automatically in 21.02

root@mangoT:~# date
Sun Sep 26 08:23:51 EDT 2021
root@mangoT:~# ifup wg0
root@mangoT:~# sleep 30
root@mangoT:~# wg show
root@mangoT:~# ip route get 1
1.0.0.0 via 192.168.8.1 dev eth0.2 src 192.168.8.83 uid 0
    cache
root@mangoT:~# nslookup openwrt.org
Server:         127.0.0.1
Address:        127.0.0.1#53

Name:      openwrt.org
Address 1: 139.59.209.225
Address 2: 2a03:b0c0:3:d0::1af1:1
root@mangoT:~# head -v -n -0 /etc/resolv.* /tmp/resolv.* /tmp/resolv.*/*
==> /etc/resolv.conf <==
search lan
nameserver 127.0.0.1
nameserver ::1

==> /tmp/resolv.conf <==
search lan
nameserver 127.0.0.1
nameserver ::1

==> /tmp/resolv.conf.d <==
head: /tmp/resolv.conf.d: I/O error

==> /tmp/resolv.conf.d/resolv.conf.auto <==
# Interface wan
nameserver 192.168.8.1
search lan
# Interface wan6
nameserver fdf3:9c1b:d2b2::1
root@mangoT:~#
1 Like

The file /lib/netifd/proto/wireguard.sh is not even getting called when I do ifup wg0 or /etc/init.d/network reload. I have added logging and nothing comes out.

Is there any other more direct way to force the reloading of wg0 from the command line?

1 Like

Try a full network restart:

/etc/init.d/network restart

Well the restart did in fact bring up wg0, but since it did not fail I do not get interesting logging.

I am curious why ifup wg0 was not actually calling wireguard,sh after wg had not originally started. When I run ifup wg0 now that wg0 is up, then it does indeed run wireguard.sh and restart the interface.

1 Like

Since the above works, there must be some kind of race condition resulting in a deadlock inside netifd, so it is best to file a bug against netifd to the OpenWrt core issue tracker.

@twinkleLED, are you specifying the VPN endpoint by IP or domain name?

I am specifying the endpoint by domain name, though I am ensuring it is cached before doing the ifup.

Couple of observations about hypothetical race condition:

  • Normally it reboots fine
  • It is only sometimes that wg does not come up after reboot, but in those situations the rest of the network always comes up fine
  • When wg has not come up, and after the system is very quiet (i.e. no more network initialization activiity), trying to manually bring up wg does not work. This is a quite different system activity profile than the initial reboot, but it still does not work.

I will file a bug report...

1 Like

Most likely netifd sometimes fails to resolve the domain name at startup due to a race condition.
Also it doesn't seem to re-resolve the peer on reload/ifup, so only restarting the service is effective.

Have same issue with 21.02 stable, after some reboots, wireguard interface totally did not up. Manual restart did not help, only rebooting is hepls. Server is IP address, not an domain.

1 Like

I can make failure happen far more frequently if I create a chain of DNS CNAME records that point to an eventual real DNS endpoint record. If I do this, I get a 20% failure rate on my nightly reboots

I have created an OpenWRT bug tracker report at https://bugs.openwrt.org/index.php?do=details&task_id=4102

1 Like

Failure to connect by IP is likely a different issue caused by NTP.
Anyway I'm afraid these problems are unlikely to be solved in a short term.
So, if you are affected, consider to use workarounds.
I recommend to mitigate DNS/NTP race conditions with PBR.