Yet another Wireguard is shut down on wan connection loss problem

I have a PPPoE connection (ISP modem) and use Wireguard. Sometimes, the ISP disconnects and at the same time Wireguard goes down - and never comes back. The only solution is to reboot the router, an ifup WireGuard results in an error message (I think it was "unknown interface", I will re-trigger it on the weekend).
This has happened to other people as well, but none of the workarounds seem to work for me:

https://forum.openwrt.org/t/wireguard-disappears-after-each-episode-of-wan-disconnect

https://forum.openwrt.org/t/wireguard-shuts-down-itself/

My config:

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'eth0'
	list ports 'eth1'
	list ports 'eth2'
	list ports 'eth3'
	list ports 'eth4'

config interface 'lan'
	option proto 'static'
	option netmask '255.255.255.0'
	option ip6assign '60'
	option device 'br-lan.300'
	option ipaddr '192.168.1.1'

config bridge-vlan
	option device 'br-lan'
	option vlan '300'
	list ports 'eth0:t'

config bridge-vlan
	option device 'br-lan'
	option vlan '100'
	list ports 'eth0:t'

config interface 'modem'
	option proto 'static'
	option device 'br-lan.100'
	option ipaddr '10.0.0.1'
	option netmask '255.255.255.0'

config interface 'wan'
	option proto 'pppoe'
	option device 'br-lan.100'
	option username 'xxxx'
	option password 'xxxx'
	option ipv6 'auto'
	option keepalive '3 5'
	option peerdns '0'
	list dns '208.67.222.222'
	list dns '208.67.220.220'

config interface 'WireGuard'
	option proto 'wireguard'
	option private_key 'xxxx'
	option listen_port '51820'
	list addresses '172.27.66.1/24'

My log:

Fri Feb 17 03:04:33 2023 daemon.info pppd[6838]: No response to 3 echo-requests
Fri Feb 17 03:04:33 2023 daemon.notice pppd[6838]: Serial link appears to be disconnected.
Fri Feb 17 03:04:33 2023 daemon.info pppd[6838]: Connect time 474.4 minutes.
Fri Feb 17 03:04:33 2023 daemon.info pppd[6838]: Sent 336250391 bytes, received 1013283699 bytes.
Fri Feb 17 03:04:33 2023 daemon.notice netifd: Network device 'pppoe-wan' link is down
Fri Feb 17 03:04:33 2023 daemon.notice netifd: Interface 'wan' has lost the connection
Fri Feb 17 03:04:33 2023 daemon.notice netifd: Interface 'WireGuard' has lost the connection
Fri Feb 17 03:04:33 2023 daemon.notice netifd: Network device 'WireGuard' link is down
Fri Feb 17 03:04:33 2023 daemon.warn dnsmasq[1]: no servers found in /tmp/resolv.conf.d/resolv.conf.auto, will retry
Fri Feb 17 03:04:34 2023 daemon.notice netifd: Interface 'WireGuard' is now down
Fri Feb 17 03:04:34 2023 daemon.notice netifd: Interface 'WireGuard' is setting up now
Fri Feb 17 03:04:35 2023 daemon.notice netifd: Interface 'WireGuard' is now down
Fri Feb 17 03:04:39 2023 daemon.notice pppd[6838]: Connection terminated.
Fri Feb 17 03:04:39 2023 daemon.notice pppd[6838]: Modem hangup
Fri Feb 17 03:04:39 2023 daemon.info pppd[6838]: Exit.

[...]

Fri Feb 17 03:05:26 2023 daemon.notice netifd: Interface 'wan' is now down
Fri Feb 17 03:05:26 2023 daemon.notice netifd: Interface 'wan' is setting up now
Fri Feb 17 03:05:26 2023 daemon.err insmod: module is already loaded - slhc
Fri Feb 17 03:05:26 2023 daemon.err insmod: module is already loaded - ppp_generic
Fri Feb 17 03:05:26 2023 daemon.err insmod: module is already loaded - pppox
Fri Feb 17 03:05:26 2023 daemon.err insmod: module is already loaded - pppoe
Fri Feb 17 03:05:26 2023 daemon.info pppd[7429]: Plugin pppoe.so loaded.
Fri Feb 17 03:05:26 2023 daemon.info pppd[7429]: PPPoE plugin from pppd 2.4.9
Fri Feb 17 03:05:26 2023 daemon.notice pppd[7429]: pppd 2.4.9 started by root, uid 0
Fri Feb 17 03:05:26 2023 daemon.info pppd[7429]: PPP session is 1
Fri Feb 17 03:05:26 2023 daemon.warn pppd[7429]: Connected to xx:xx:xx:xx:xx:xx via interface br-lan.100
Fri Feb 17 03:05:26 2023 kern.info kernel: [28594.720746] pppoe-wan: renamed from ppp0
Fri Feb 17 03:05:26 2023 daemon.info pppd[7429]: Renamed interface ppp0 to pppoe-wan
Fri Feb 17 03:05:26 2023 daemon.info pppd[7429]: Using interface pppoe-wan
Fri Feb 17 03:05:26 2023 daemon.notice pppd[7429]: Connect: pppoe-wan <--> br-lan.100
Fri Feb 17 03:05:30 2023 daemon.info pppd[7429]: CHAP authentication succeeded: CHAP authentication success
Fri Feb 17 03:05:30 2023 daemon.notice pppd[7429]: CHAP authentication succeeded
Fri Feb 17 03:05:30 2023 daemon.notice pppd[7429]: peer from calling number xxxx authorized
Fri Feb 17 03:05:30 2023 daemon.notice pppd[7429]: local  IP address xxxx
Fri Feb 17 03:05:30 2023 daemon.notice pppd[7429]: remote IP address xxxx
Fri Feb 17 03:05:30 2023 daemon.notice pppd[7429]: primary   DNS address xxxx
Fri Feb 17 03:05:30 2023 daemon.notice pppd[7429]: secondary DNS address xxxx
Fri Feb 17 03:05:30 2023 daemon.notice netifd: Network device 'pppoe-wan' link is up
Fri Feb 17 03:05:30 2023 daemon.notice netifd: Interface 'wan' is now up
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: reading /tmp/resolv.conf.d/resolv.conf.auto
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using nameserver 208.67.222.222#53
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using nameserver 208.67.220.220#53
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for test
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for onion
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for localhost
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for local
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for invalid
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for bind
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for sym.zone
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using only locally-known addresses for piwik.qmmedia.zone
Fri Feb 17 03:05:30 2023 daemon.info dnsmasq[1]: using 52174 more local addresses
Fri Feb 17 03:05:30 2023 user.notice SQM: Stopping SQM on pppoe-wan
Fri Feb 17 03:05:30 2023 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev pppoe-wan ingress
Fri Feb 17 03:05:30 2023 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot find specified qdisc on specified device.
Fri Feb 17 03:05:30 2023 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev pppoe-wan root
Fri Feb 17 03:05:30 2023 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Fri Feb 17 03:05:31 2023 user.notice SQM: Starting SQM script: piece_of_cake.qos on pppoe-wan, in: 37000 Kbps, out: 8500 Kbps
Fri Feb 17 03:05:31 2023 user.notice SQM: piece_of_cake.qos was started on pppoe-wan successfully
Fri Feb 17 03:05:32 2023 user.notice firewall: Reloading firewall due to ifup of wan (pppoe-wan)

I tried to set force_link=1 on the wg interface, didn't work.
I tried the following hotplug script, didn't work:

[ "$INTERFACE" = wan ] || exit 0
[ "${ACTION}" = ifdown ] && ubus call network.interface.WireGuard down
[ "${ACTION}" = ifup ] && ubus call network.interface.WireGuard up

Any other ideas to try?
Why is the wg interface brought down by netifd anyway?

EDIT:
NB: This was on 22.03.0, I made an upgrade to 22.03.3 yesterday and I will check on the weekend if that fixes the problem.

It shouldn't be, no such thing here.

Perhaps you should provide the peer config for review as well?

  • When your WAN goes down, does your Public IP change? :bulb:
    • If so, does your remote peer run Wireguard watchdog to reestablish the connection?
1 Like

Yes if your server's public IP changes, there is nothing the server side can do. The clients will be detached until they take initiative to determine the new address and reconnect.

I think that the DDNS scripts will be triggered to initiate an update as soon as wan comes back up, but I don't know for sure. Even then it will be some minutes before the clients' DNS will provide the new address. Bottom line here is you need a better ISP.

1 Like

Yes, but all clients run the watchdog script.

Note that his behavior does not occur if I manually reconnect the pppoe connection, only when it breaks.

To be fair, I was running 22.03.0 and just upgraded to 22.03.3. Let's see if that fixes things (I suppose I can just shut off the modem to trigger it).

Peer config follows later today, I'm on the phone right now.

So this is the config of the other box, the problem does not happen on this end. But there is no PPPoE involved, it's just DHCP over DSL.

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'xxxxxxx::/48'

config atm-bridge 'atm'
	option vpi '1'
	option vci '32'
	option encaps 'llc'
	option payload 'bridged'
	option nameprefix 'dsl'

config dsl 'dsl'
	option annex 'b'
	option tone 'b'
	option ds_snr_offset '0'
	option firmware '/lib/firmware/xcpe_8.D.1.C.1.7_8.D.0.E.1.2.bin'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.177.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config device
	option name 'dsl0'
	option macaddr 'xxxxxxxxxxx'

config interface 'WireGuard'
	option proto 'wireguard'
	option private_key 'xxxxxxxxxxx'
	option listen_port '51820'
	list addresses '172.27.66.254/24'

And the individual peers - one end:

config wireguard_WireGuard
	option description 'Home'
	option public_key 'xxxxxxxxxx'
	option endpoint_host 'xxxxxxxxxxxxx'
	option endpoint_port '51820'
	option persistent_keepalive '25'
	option route_allowed_ips '1'
	list allowed_ips '192.168.1.0/24'
	list allowed_ips '172.27.66.1/32'

The other end:

config wireguard_WireGuard
	option description 'Office'
	option public_key 'xxxxxxxxxxxxxx'
	option endpoint_host 'xxxxxxxxxxx'
	option endpoint_port '51820'
	option persistent_keepalive '25'
	option route_allowed_ips '1'
	list allowed_ips '172.27.66.254/32'
	list allowed_ips '192.168.177.0/24'

This is all true, but unrelated to the problem. As mentioned in the first post, the complete interface disappears from the box. ifup WireGuard results in an error message, the only way to restore functionality is a reboot. I edited the first post to reflect this a bit better.

@lleachii @mk24
A bit more information on the problem, but still no solution:

The problem is easily triggered by pulling the cable between modem and router. I just tried it with 22.03.3 and WireGuard went down (same log as in OP). I cannot bring it back up unless I reboot the router:

  • There is no WireGuard interface in ifconfig -a
  • ifdown WireGuard reports no error
  • ifup WireGuard reports no error
  • LuCI says:

So I take this as a hint to finally move to a Fritz!7520 and replace the separate modem.

I moved to a FRITZ!7520 and set up the latest snapshot. Unfortunately, the issue still persists.
As I couldn't explain why the dropping PPPoE connection brings down WireGuard, I had a look at /lib/netfid/proto/wireguard.sh and found a call to proto_add_host_dependency. I guess that this call brings down the WireGuard interface when PPPoE breaks. I'm now testing a configuration that prevents the call to proto_add_host_dependency by setting nohostroute = '1' in the Wireguard configuration.