Wireguard peer becomes unreachable after a few minutes

Hi,
my setup is:

  • globally reachable server S, wireguard ip 10.0.1.1
  • random router R from my ISP
  • openwrt router O, wireguard ip 10.0.1.9. O also acts as an AP, bridging its own wifi network with the one from R

O goes through R to connect to S. I have configured Wireguard on O to connect to S.

The output of wg on O looks like:

peer: SOME_PUBLIC_KEY
  endpoint: IP_FROM_S:PORT_S
  allowed ips: 10.0.1.0/24
  latest handshake: 36 seconds ago
  transfer: 26.80 KiB received, 840 B sent
  persistent keepalive: every 25 seconds

The latest handshake never goes beyond a few minutes (2 I think).

allowed ips is /24 so that I can access other peers through S. I've tried with a /32 but it didn't improve the situation.

Problem is: while the tunnel works fine at the beginning (ping works from both sides, I can start an ssh connection from one side to the other), it no longer works after a few minutes. It varies, sometimes 2min, sometimes 10min. At this point, wg still has about the same content: recent handshake, and the KiB of data sent / received keeps increasing slightly, which I find surprising given that ping no longer works.

If I restart the interface on O, nothing changes, it still doesn't work.

But if I restart the whole device O, then it works again, for a few minutes. So I suspect that my bridge config interacts with wireguard in a bad way somehow.

The output of ip r seems to stay the same when it's working and when it's not:

root@OpenWrt:~# ip r
default via 192.168.1.1 dev wlan1 proto static src 192.168.1.228
10.0.1.0/24 dev all_ping proto static scope link
192.168.1.0/24 dev wlan1 proto kernel scope link src 192.168.1.228
192.168.10.0/24 dev br-lan proto kernel scope link src 192.168.10.1

Output from cat /etc/config/network; cat /etc/config/firewall ; cat /etc/config/dhcp ; ip -4 addr ; ip -4 ro ; ip -4 ru

config interface 'loopback'
	option ifname 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fdcb:f984:b6c3::/48'

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0.1'
	option proto 'static'
	option netmask '255.255.255.0'
	option ip6assign '60'
	option ipaddr '192.168.10.1'
	option gateway '192.168.1.1'

config device 'lan_eth0_1_dev'
	option name 'eth0.1'

config interface 'wan'
	option ifname 'eth0.2'
	option proto 'dhcp'

config device 'wan_eth0_2_dev'
	option name 'eth0.2'

config interface 'wan6'
	option ifname 'eth0.2'
	option proto 'dhcpv6'

config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'

config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '0 1 2 3 6t'

config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '4 6t'

config interface 'wwan'
	option proto 'dhcp'

config interface 'repeater_bridge'
	option proto 'relay'
	list network 'lan'
	list network 'wwan'
	option ipaddr '192.168.1.228'

config interface 'wwan6'
	option proto 'dhcpv6'
	option reqprefix 'auto'
	option reqaddress 'none'
	option ifname '@wwan'

config interface 'all_ping'
	option proto 'wireguard'
	option private_key 'PRIVKEY'
	option delegate '0'
	list addresses '10.0.1.9/32'
	option listen_port '51992'

config wireguard_all_ping
	option public_key 'pubkey'
	option description 'descrip'
	option route_allowed_ips '1'
	option persistent_keepalive '25'
	option endpoint_host 'IP'
	option endpoint_port 'PORT'
	list allowed_ips '10.0.1.0/24'


config defaults
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option synflood_protect '1'

config zone 'lan'
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	option network 'lan wwan repeater_bridge wwan6 vpn wwan'

config zone 'wan'
	option name 'wan'
	option input 'REJECT'
	option output 'ACCEPT'
	option forward 'REJECT'
	option masq '1'
	option mtu_fix '1'
	option network 'wan wan6'

config forwarding
	option src 'lan'
	option dest 'wan'

config rule
	option name 'Allow-DHCP-Renew'
	option src 'wan'
	option proto 'udp'
	option dest_port '68'
	option target 'ACCEPT'
	option family 'ipv4'

config rule
	option name 'Allow-Ping'
	option src 'wan'
	option proto 'icmp'
	option icmp_type 'echo-request'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-IGMP'
	option src 'wan'
	option proto 'igmp'
	option family 'ipv4'
	option target 'ACCEPT'

config rule
	option name 'Allow-DHCPv6'
	option src 'wan'
	option proto 'udp'
	option src_ip 'fc00::/6'
	option dest_ip 'fc00::/6'
	option dest_port '546'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-MLD'
	option src 'wan'
	option proto 'icmp'
	option src_ip 'fe80::/10'
	list icmp_type '130/0'
	list icmp_type '131/0'
	list icmp_type '132/0'
	list icmp_type '143/0'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Input'
	option src 'wan'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	list icmp_type 'router-solicitation'
	list icmp_type 'neighbour-solicitation'
	list icmp_type 'router-advertisement'
	list icmp_type 'neighbour-advertisement'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-ICMPv6-Forward'
	option src 'wan'
	option dest '*'
	option proto 'icmp'
	list icmp_type 'echo-request'
	list icmp_type 'echo-reply'
	list icmp_type 'destination-unreachable'
	list icmp_type 'packet-too-big'
	list icmp_type 'time-exceeded'
	list icmp_type 'bad-header'
	list icmp_type 'unknown-header-type'
	option limit '1000/sec'
	option family 'ipv6'
	option target 'ACCEPT'

config rule
	option name 'Allow-IPSec-ESP'
	option src 'wan'
	option dest 'lan'
	option proto 'esp'
	option target 'ACCEPT'

config rule
	option name 'Allow-ISAKMP'
	option src 'wan'
	option dest 'lan'
	option dest_port '500'
	option proto 'udp'
	option target 'ACCEPT'

config include
	option path '/etc/firewall.user'

config zone
	option network 'all_ping'
	option input 'ACCEPT'
	option forward 'REJECT'
	option name 'wg'
	option output 'ACCEPT'


config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.auto'
	option localservice '1'

config dhcp 'lan'
	option interface 'lan'
	option ignore '1'
	option ra 'relay'
	option ndp 'relay'
	list dns '2606:4700:4700::1111'
	list dns '2606:4700:4700::1001'

config dhcp 'wan6'
	option ignore '1'
	option interface 'wwan'
	option ra 'relay'
	option ndp 'relay'

config odhcpd 'odhcpd'
	option maindhcp '0'
	option leasefile '/tmp/hosts/odhcpd'
	option leasetrigger '/usr/sbin/odhcpd-update'
	option loglevel '4'

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
5: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.10.1/24 brd 192.168.10.255 scope global br-lan
       valid_lft forever preferred_lft forever
8: all_ping: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.0.1.9/32 brd 255.255.255.255 scope global all_ping
       valid_lft forever preferred_lft forever
9: wlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.1.228/24 brd 192.168.1.255 scope global wlan1
       valid_lft forever preferred_lft forever
default via 192.168.1.1 dev wlan1 proto static src 192.168.1.228 
10.0.1.0/24 dev all_ping proto static scope link 
192.168.1.0/24 dev wlan1 proto kernel scope link src 192.168.1.228 
192.168.10.0/24 dev br-lan proto kernel scope link src 192.168.10.1 
0:	from all lookup local 
2:	from all iif lo lookup 16800 
2:	from all iif wlan1 lookup 16801 
2:	from all iif br-lan lookup 16802 
32766:	from all lookup main 
32767:	from all lookup default 

Any idea?

1 Like

I have very similar symptoms and this helps:
Openwrt 19.07.1 sync with browser - #9 by vgaetera

Also remove that if the router is running as a VPN client:

1 Like

Thanks! It does not seem to help though, I get the same result with the workaround you mentioned.

I'm not sure it would apply to my situation: the router always has (almost) direct Internet access, even with Wireguard on, since I don't route everything through it, so NTP should work correctly.

To be clear, you're only talking about the sysntpd restart in the crontab, right? I'll have a look at the other sections, maybe they give some hints.

I was using listen_port to help debugging, but yes it's same without it and if it uses a random port.

1 Like

Does it work properly if you set up the same VPN connection directly on your PC?

Yes, the same configuration works perfectly fine from my computer to the server.

And again, the connection works when openwrt boots. It just stops working eventually. So I think the Wireguard config itself is kind of ok, it just interacts badly with other parts of my openwrt config...

1 Like

Did you set "Persistent Keep Alive" in the peers config? Without it the tunnel will go down from time to time, unless you constantly transfer data.

Yes, keepalive is set to 25s. And there are new handshakes every 2min (apparently):

interface: all_ping
  public key: ...
  private key: (hidden)
  listening port: 55027

peer: ...
  endpoint: ...:...
  allowed ips: 10.0.1.0/24
  latest handshake: 43 seconds ago
  transfer: 48.61 KiB received, 48.95 KiB sent
  persistent keepalive: every 25 seconds

(output of wg when it's working as it is currently, I'm waiting for it to fail and I can send some new output)

You are not using DDNS, or DNS that may resolve to IPv6, are you?

In my current config I don't use the domain (which I want to use eventually), I use the public IPv4 directly to make sure it's not because of the DNS.

But maybe your workaround actually did work, the Wireguard connection has been working for 13min now. It was not working when I just copy-pasted the fix, but after a reboot it seems a bit more stable. I'll update this if it does fail like before. I'm still not sure why that would apply to my situation though, but if the results are here...

Edit: no it actually broke after 18min, and that's the output of wg:

interface: all_ping
  public key: ...
  private key: (hidden)
  listening port: 55027

peer: ...
  endpoint: ip:port
  allowed ips: 10.0.1.0/24
  latest handshake: 46 seconds ago
  transfer: 136.61 KiB received, 129.68 KiB sent
  persistent keepalive: every 25 seconds

As you can see, recent handshake. But I can't ping O from S or vice-versa.

1 Like

From the logs, the interruption seems to correlate with:

Fri May 14 17:19:35 2021 daemon.info hostapd: wlan0: STA MAC IEEE 802.11: authenticated
Fri May 14 17:19:35 2021 daemon.info hostapd: wlan0: STA MAC IEEE 802.11: associated (aid 2)
Fri May 14 17:19:35 2021 daemon.notice hostapd: wlan0: AP-STA-CONNECTED MAC
Fri May 14 17:19:35 2021 daemon.info hostapd: wlan0: STA MAC WPA: pairwise key handshake completed (RSN)

The problem seems to be that the traffic to 10.0.1.0/24 ends up being router through another interface:

The first traceroute fails just like ping, using the default interface. If I specify the interface (second command), it seems to still work fine.

root@OpenWrt:~# traceroute 10.0.1.1
traceroute to 10.0.1.1 (10.0.1.1), 30 hops max, 38 byte packets
 1  192.168.1.1 (192.168.1.1)  2.355 ms  2.411 ms  2.381 ms
...

root@OpenWrt:~# traceroute -i all_ping 10.0.1.1
traceroute to 10.0.1.1 (10.0.1.1), 30 hops max, 38 byte packets
 1  10.0.1.1 (10.0.1.1)  16.366 ms  16.769 ms  17.080 ms

Now I just need to figure out how to make sure that any traffic related to 10.0.1.0/24 always goes through this all_ping interface instead of somehow switching to something else after.

To be clear, the output of ip route is the same both when it works and when it doesn't.

Sounds like some sort of PBR:

ip route get 10.0.1.1; ip rule show

When it's not working:

root@OpenWrt:~# ip route get 10.0.1.1; ip rule show
10.0.1.1 via 192.168.1.1 dev wlan1 table 16800 src 192.168.1.228 uid 0
    cache
0:      from all lookup local
2:      from all iif lo lookup 16800
2:      from all iif wlan1 lookup 16801
2:      from all iif br-lan lookup 16802
32766:  from all lookup main
32767:  from all lookup default

So that's indeed not what I want. How do I force it to go directly to 10.0.1.1?

Yep, looks like mwan3, or vpn-policy-routing, or vpnbypass, etc.

ls -1 /etc/config /etc/rc.d
/etc/config:
dhcp
dropbear
etherwake
firewall
firewall-opkg
luci
luci-opkg
network
p910nd
rpcd
system
ucitrack
ucitrack-opkg
uhttpd
wireless

/etc/rc.d:
K10gpio_switch
K50dropbear
K85odhcpd
K89log
K90boot
K90network
K90sysfixtime
K90umount
S00sysfixtime
S00urngd
S10boot
S10system
S11sysctl
S12log
S12rpcd
S19dnsmasq
S19dropbear
S19firewall
S20network
S35odhcpd
S50cron
S50uhttpd
S60etherwake
S80relayd
S80ucitrack
S94gpio_switch
S95done
S96led
S98sysntpd
S99bootcount
S99set-irq-affinity
S99urandom_seed

As I mentioned in Wireguard peer becomes unreachable after a few minutes - #11 by hiq, I think have something to do with hostapd.

1 Like

Nothing out of the ordinary besides relayd.
Yep, it looks like the root cause of the issue.

You should be able to override it with something like this:

uci -q delete network.lan_vpn
uci set network.lan_vpn="rule"
uci set network.lan_vpn.dest="10.0.1.0/24"
uci set network.lan_vpn.lookup="main"
uci set network.lan_vpn.priority="1"
uci commit network
/etc/init.d/network restart
2 Likes

Thanks! After a few hours wireguard seems to still work fine.

Somehow these new settings broke my connection to the openwrt router: I was using 192.168.10.1, and this doesn't work anymore. I haven't looked into it much yet, but it's a minor problem if wireguard now works.

1 Like

Such behavior is absolutely normal for Wireguard protocol.
According to protocol specification handshake happens every 120 seconds.
For further details please refer to the following links:

.. handshake occurs every few minutes, in order to provide rotating keys for perfect forward secrecy.

In practice, the handshake happens some time between 120 and 180 seconds.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.