Wireguard vpn-policy-routing startup

Hi, I'm setting up Wireguard tunnels in OpenWrt for the first time and have a question.
I have the split tunnel set up and functioning though I have one thing I can't seem to resolve.
So far I have been doing everything through LuCI.

On device reboot, vpn-policy-routing shows an error (below), but upon being started has no errors and my setup [mostly works].


Whether it's started via UI or console, the result is a success.
0x8e4wu
As the syslog has

user.notice vpn-policy-routing [1405]: ERROR: Failed to set up 'wan/eth0/wan_ip [✓]' ERROR: Failed to set up 'wg_wan/wg_wan_ip' ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000 -s 192.168.10.145/32 -m comment --comment wg_client ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000 -s 192.168.20.1/24 -m comment --comment test ERROR: failed to set up any gateway!

I assume this is due to the wireguard (wg_wan) interface not being connected quite yet, but am not certain. I am not sure on a solution to this, as a crontab (crontab -e) of

@reboot sleep 60;/etc/init.d/vpn-policy-routing start

does not start it, with nothing in logs. Cron restarted / the crontab applied with no errors before reboot.

This happens maybe 1/3 - 1/4 reboots, but enough that I'd want to solve this automatically.

Isn't there the Boot Time-out option under Advanced Settings tab?
The @reboot is not supported in the crontab of OpenWrt. You can use /etc/rc.local instead.

1 Like

Interesting, I got the @reboot from the OpenWrt wiki on Scheduling

Found the boot time out. Not sure how I overlooked that at first. Much obliged!

1 Like

@gandalf since you did this change in the wiki, maybe it is not so applicable?

1 Like

It seems that the waiting was not the issue of vpn-policy-routing start up, or the waiting is not working.
Everything related to wg_wan, as well as any output by vpn-policy-routing in the logs:

Tue Aug 10 23:27:04 2021 daemon.notice netifd: Interface 'wg_wan' is setting up now
Tue Aug 10 23:27:05 2021 user.notice sysctl: net.ipv4.conf.wg_wan.rp_filter = 2
Tue Aug 10 23:27:06 2021 daemon.notice netifd: Interface 'wg_wan' is now down
Tue Aug 10 23:27:08 2021 daemon.notice netifd: Interface 'wg_wan' is setting up now
Tue Aug 10 23:27:09 2021 user.notice sysctl: net.ipv4.conf.wg_wan.rp_filter = 2
Tue Aug 10 23:27:09 2021 user.notice vpn-policy-routing [1412]: service waiting for wan gateway...
Tue Aug 10 23:27:10 2021 daemon.notice netifd: Interface 'wg_wan' is now up
Tue Aug 10 23:27:10 2021 daemon.notice netifd: Network device 'wg_wan' link is up
Tue Aug 10 23:27:10 2021 user.notice firewall: Reloading firewall due to ifup of wg_wan (wg_wan)
Tue Aug 10 23:27:14 2021 user.notice vpn-policy-routing [1412]: Creating table 'wan/eth0/wan_ip' [✗]
Tue Aug 10 23:27:14 2021 user.notice vpn-policy-routing [1412]: Creating table 'wg_wan/wg_wan_ip' [✗]
Tue Aug 10 23:27:15 2021 user.notice vpn-policy-routing [1412]: Routing 'wg_client' via wg_wan [✗]
Tue Aug 10 23:27:15 2021 user.notice vpn-policy-routing [1412]: Routing 'remote_wg' via wg_wan [✗]
Tue Aug 10 23:27:16 2021 user.notice vpn-policy-routing [1412]: service monitoring interfaces: wan wg_wan [✓]
Tue Aug 10 23:27:17 2021 user.notice vpn-policy-routing [1412]: ERROR: Failed to set up 'wan/eth0/wan_ip [✓]' ERROR: Failed to set up 'wg_wan/wg_wan_ip' ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000  -s 192.168.10.80/28 -m comment --comment wg_client  ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000  -s 10.10.10.3/32 -m comment --comment remote_wg  ERROR: failed to set up any gateway!

I am also not entirely sure why it looks my wg_wan interface goes down and restarts after starting.
I've tried various settings of timeouts, ranging from 5 to 60. The log above is with a Boot Time-out setting of 120, which should be massively excessive.
If it helps, I do have Firewall -> Custom Rules set up for forcing dns to a local pihole. Only mentioning it cause saving on the custom rules page turns off vpn-policy-routing even if its running fine, and then needs to be manually started. The specific firewall restart after wg_wan is up is, however, the last one, so it shouldn't be a case of the custom rules applying and turning off the vpn-policy-routing (seems to never turn on).

sed -i 's|/etc/init.d/vpn-policy-routing reload|sleep 11; /etc/init.d/vpn-policy-routing reload|g' /etc/hotplug.d/firewall/70-vpn-policy-routing

(technically this is not correct... you need to be calling this reload from iface/wg/wwan on upstream addressing acquired... which may be better off in the pbr reload logic to add a wait on wwan or similar... brute reload on boot has too many inter-dependancies)

this would probably be more logical at the end of rc.local if the issues are only at boot;

(sleep 120; /etc/init.d/vpn-policy-routing reload) &
1 Like

Seems the rc.local solution (also mentioned by trendy initially) at least works reliably, after going through several reboots, though it seems that a sleep of 10 seems to restart the service almost a minute later after init actually finishes

Wed Aug 11 14:11:15 2021 user.notice vpn-policy-routing [1411]: Creating table 'wan/eth0/wan_ip' [✗]
Wed Aug 11 14:11:16 2021 daemon.err uhttpd[1316]: luci: accepted login on / for root from local_ip
Wed Aug 11 14:11:16 2021 user.notice vpn-policy-routing [1411]: Creating table 'wg_wan/wg_wan_ip' [✗]
Wed Aug 11 14:11:18 2021 user.notice vpn-policy-routing [1411]: Routing 'wg_client' via wg_wan [✗]
Wed Aug 11 14:11:18 2021 user.notice vpn-policy-routing [1411]: Routing 'remote_wg' via wg_wan [✗]
Wed Aug 11 14:11:19 2021 user.notice vpn-policy-routing [1411]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:11:20 2021 user.notice vpn-policy-routing [1411]: ERROR: Failed to set up 'wan/eth0/wan_ip [✓]' ERROR: Failed to set up 'wg_wan/wg_wan_ip' ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000  -s 192.168.10.80/28 -m comment --comment wg_client  ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000  -s 10.10.10.3/32 -m comment --comment remote_wg  ERROR: failed to set up any gateway!
Wed Aug 11 14:11:21 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB1
Wed Aug 11 14:11:21 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB2
Wed Aug 11 14:11:21 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN2G
Wed Aug 11 14:11:21 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN5G
Wed Aug 11 14:11:22 2021 daemon.info procd: - init complete -
Wed Aug 11 14:12:17 2021 daemon.err modprobe: xt_set is already loaded
Wed Aug 11 14:12:17 2021 daemon.err modprobe: ip_set is already loaded
Wed Aug 11 14:12:17 2021 daemon.err modprobe: ip_set_hash_ip is already loaded
Wed Aug 11 14:12:18 2021 user.notice vpn-policy-routing [2741]: Creating table 'wan/eth0/wan_ip' [✓]
Wed Aug 11 14:12:19 2021 user.notice vpn-policy-routing [2741]: Creating table 'wg_wan/wg_wan_ip' [✓]
Wed Aug 11 14:12:19 2021 user.notice vpn-policy-routing [2741]: Routing 'wg_client' via wg_wan [✓]
Wed Aug 11 14:12:19 2021 user.notice vpn-policy-routing [2741]: Routing 'remote_wg' via wg_wan [✓]
Wed Aug 11 14:12:20 2021 user.notice vpn-policy-routing [2741]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:12:20 2021 user.notice vpn-policy-routing [2741]: service started with gateways: wan/eth0/wan_ip [✓] wg_wan/wg_wan_ip

Seems I can get away with no sleep in rc.local based on how late in the process it runs.

Wed Aug 11 14:23:43 2021 daemon.info procd: - init complete -
Wed Aug 11 14:23:44 2021 daemon.err modprobe: xt_set is already loaded
Wed Aug 11 14:23:44 2021 daemon.err modprobe: ip_set is already loaded
Wed Aug 11 14:23:44 2021 daemon.err modprobe: ip_set_hash_ip is already loaded
Wed Aug 11 14:23:45 2021 user.notice vpn-policy-routing [2667]: Creating table 'wan/eth0/wan_ip' [✓]
...

On further investigation, I have no idea, as no sleep also produced what looks to be a 5min jump in logs

Wed Aug 11 14:24:17 2021 user.notice vpn-policy-routing [1412]: Creating table 'wg_wan/wg_wan_ip' [✓]
Wed Aug 11 14:24:18 2021 user.notice vpn-policy-routing [1412]: Routing 'wg_client' via wg_wan [✓]
Wed Aug 11 14:24:18 2021 user.notice vpn-policy-routing [1412]: Routing 'remote_wg' via wg_wan [✓]
Wed Aug 11 14:24:19 2021 user.notice vpn-policy-routing [1412]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:24:19 2021 user.notice vpn-policy-routing [1412]: service started with gateways: wan/eth0/wan_ip [✓] wg_wan/wg_wan_ip
Wed Aug 11 14:24:20 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB1
Wed Aug 11 14:24:20 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB2
Wed Aug 11 14:24:20 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN2G
Wed Aug 11 14:24:20 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN5G
Wed Aug 11 14:24:21 2021 daemon.info procd: - init complete -
Wed Aug 11 14:24:21 2021 daemon.err modprobe: xt_set is already loaded
Wed Aug 11 14:24:22 2021 daemon.err modprobe: ip_set is already loaded
Wed Aug 11 14:24:22 2021 daemon.err modprobe: ip_set_hash_ip is already loaded
Wed Aug 11 14:29:20 2021 user.notice vpn-policy-routing [2747]: Creating table 'wan/eth0/wan_ip' [✓]
Wed Aug 11 14:29:20 2021 user.notice vpn-policy-routing [2747]: Creating table 'wg_wan/wg_wan_ip' [✓]
Wed Aug 11 14:29:21 2021 user.notice vpn-policy-routing [2747]: Routing 'wg_client' via wg_wan [✓]
Wed Aug 11 14:29:22 2021 user.notice vpn-policy-routing [2747]: Routing 'remote_wg' via wg_wan [✓]
Wed Aug 11 14:29:22 2021 user.notice vpn-policy-routing [2747]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:29:23 2021 user.notice vpn-policy-routing [2747]: service started with gateways: wan/eth0/wan_ip [✓] wg_wan/wg_wan_ip

Perhaps the system time updating upon wan / ntp update being stablished?
Looking at it some more, there are times when restarting a working config will fail on either one or both gateways.
Below both are with a sleep of 10, which does not seem to be getting respected based on log times.

Wed Aug 11 14:29:53 2021 user.notice vpn-policy-routing [1412]: Routing 'wg_client' via wg_wan [✓]
Wed Aug 11 14:29:53 2021 user.notice vpn-policy-routing [1412]: Routing 'remote_wg' via wg_wan [✓]
Wed Aug 11 14:29:54 2021 user.notice vpn-policy-routing [1412]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:29:54 2021 user.notice vpn-policy-routing [1412]: service started with gateways: wan/eth0/wan_ip [✓] wg_wan/wg_wan_ip
Wed Aug 11 14:29:54 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB1
Wed Aug 11 14:29:55 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB2
Wed Aug 11 14:29:55 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN2G
Wed Aug 11 14:29:55 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN5G
Wed Aug 11 14:29:56 2021 daemon.info procd: - init complete -
Wed Aug 11 14:29:56 2021 daemon.err modprobe: xt_set is already loaded
Wed Aug 11 14:29:56 2021 daemon.err modprobe: ip_set is already loaded
Wed Aug 11 14:29:57 2021 daemon.err modprobe: ip_set_hash_ip is already loaded
Wed Aug 11 14:29:58 2021 user.notice vpn-policy-routing [2747]: Creating table 'wan/eth0/wan_ip' [✗]
Wed Aug 11 14:29:59 2021 user.notice vpn-policy-routing [2747]: Creating table 'wg_wan/wg_wan_ip' [✗]
Wed Aug 11 14:30:00 2021 user.notice vpn-policy-routing [2747]: Routing 'wg_client' via wg_wan [✓]
Wed Aug 11 14:30:00 2021 user.notice vpn-policy-routing [2747]: Routing 'remote_wg' via wg_wan [✓]
Wed Aug 11 14:30:00 2021 user.notice vpn-policy-routing [2747]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:32:39 2021 user.notice vpn-policy-routing [2747]: ERROR: Failed to set up 'wan/eth0/wan_ip [✓]' ERROR: Failed to set up 'wg_wan/wg_wan_ip' ERROR: failed to set up any gateway!

and

Wed Aug 11 14:36:20 2021 user.notice vpn-policy-routing [1412]: Creating table 'wan/eth0/wan_ip' [✗]
Wed Aug 11 14:36:21 2021 user.notice vpn-policy-routing [1412]: Creating table 'wg_wan/wg_wan_ip' [✗]
Wed Aug 11 14:36:22 2021 user.notice vpn-policy-routing [1412]: Routing 'wg_client' via wg_wan [✗]
Wed Aug 11 14:36:22 2021 user.notice vpn-policy-routing [1412]: Routing 'remote_wg' via wg_wan [✗]
Wed Aug 11 14:36:23 2021 user.notice vpn-policy-routing [1412]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:36:23 2021 user.notice vpn-policy-routing [1412]: ERROR: Failed to set up 'wan/eth0/wan_ip [✓]' ERROR: Failed to set up 'wg_wan/wg_wan_ip' ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000  -s 192.168.10.80/28 -m comment --comment wg_client  ERROR: iptables -t mangle -A VPR_PREROUTING -g VPR_MARK0x020000  -s 10.10.10.3/32 -m comment --comment remote_wg  ERROR: failed to set up any gateway!
Wed Aug 11 14:36:24 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB1
Wed Aug 11 14:36:24 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led USB2
Wed Aug 11 14:36:24 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN2G
Wed Aug 11 14:36:24 2021 daemon.notice procd: /etc/rc.d/S96led: setting up led WLAN5G
Wed Aug 11 14:36:24 2021 daemon.info procd: - init complete -
Wed Aug 11 14:36:26 2021 daemon.err uhttpd[1315]: luci: accepted login on / for root from 192.168.10.20
Wed Aug 11 14:37:48 2021 daemon.err modprobe: xt_set is already loaded
Wed Aug 11 14:37:48 2021 daemon.err modprobe: ip_set is already loaded
Wed Aug 11 14:37:48 2021 daemon.err modprobe: ip_set_hash_ip is already loaded
Wed Aug 11 14:37:49 2021 user.notice vpn-policy-routing [2704]: Creating table 'wan/eth0/wan_ip' [✓]
Wed Aug 11 14:37:50 2021 user.notice vpn-policy-routing [2704]: Creating table 'wg_wan/wg_wan_ip' [✗]
Wed Aug 11 14:37:51 2021 user.notice vpn-policy-routing [2704]: Routing 'wg_client' via wg_wan [✓]
Wed Aug 11 14:37:52 2021 user.notice vpn-policy-routing [2704]: Routing 'remote_wg' via wg_wan [✓]
Wed Aug 11 14:37:52 2021 user.notice vpn-policy-routing [2704]: service monitoring interfaces: wan wg_wan [✓]
Wed Aug 11 14:37:53 2021 user.notice vpn-policy-routing [2704]: service started with gateways: wan/eth0/wan_ip [✓]
Wed Aug 11 14:37:53 2021 user.notice vpn-policy-routing [2704]: ERROR: Failed to set up 'wg_wan/wg_wan_ip'

But there are other times when a success reloads into a success. Seems it's still not 100% success rate on reload.

This would be the most likely reason. Furthermore wireguard needs accurate system time to work. Maybe this is the reason for the delays.

1 Like

I am also seeing major inconsistencies at boot, or with network restarts with vpn-policy-routing, more than 50% of the boots result in no connectivity. But some do work.
Same for a service network restart, some work, most do not.
An ifdown wg / ifup wg has a >70% chance of bringing it back.

This on an x86 (i3) platform with 19.07.8 and the wireguard / vpn-policy-routing modules.
Since the x86 has an RTC, it's not the time at boot. And as indicated, will happen well after boot and NTP sync if one issues a network restart.

From the logs, it seems that there might be a race condition, as hotplugs are restarting stuff constantly during a network restart, and some sequences work, others do not.

I'm positive it is NOT related to the wireguard interface and service, as I've tested that through all permutations of boot, network restart, loss of wan connectivity I could think off and it is reliable once wan connectivity is available. So this problem was introduced with policy routing.
My rules for PR are simple, send requests to gstatic.com and a (guest) subnet to the WAN.

So the question is, how to dubug this?
Better yet, how to guarantee this is reliable?

If you need a more reliable solution, there's PBR with netifd:
https://openwrt.org/docs/guide-user/network/routing/pbr
This method entirely eliminates race conditions related to netifd.

1 Like

I'd recommend you play with increasing the procd_reload_delay option.

4 Likes

Thanks @stangri , but as noted, I'm on 19.07.8, and looking at the /etc/init.d/vpn-policy-routing file, there is no support for that option in that release.
I can't upgrade to a later OS, so is there a version of the init script you recommend I pull from the repo?

You can try installing the latest from my repo, but after multiple reports of later VPR versions not working on 19.07 and being unable to debug the problems I've had to revert to an older version in the 19.07 packages. So that is to say it may not work at all.

PS. Or you can just patch the init file and insert the following in the service_triggers() function just below is_enabled || return 1 and above procd_open_validate:

PROCD_RELOAD_DELAY=$(( 1 * 1000 ))

1 above means 1 second, you can play with that number.

PPS. TBH I'm not sure if 19.07 PROCD supports that tho.

1 Like

WRONG! - my apologies, further testing showed that there were some DNS race conditions that were the likely culprits for lack of connectivity. These have been resolved by performing DNS lookups well away from interface up/downs and only using fixed IPs at critical points in time.

Keeping the wireguard tunnel interface in the correct state did require a /etc/hotplug.d/iface script to catch corner cases when it failed to come up on its own. I now have a fairly reliable tunnel.

Once that was in place, the vpn-policy routing loaded OK. Still saw a few too many loads during the ifup, so see next.

So I added this in the location indicated, and it does prevent excess invocations. So this works on 19.07

@stangri Thanks, for the assistance and for the cool package.

I'm back, as I now struggle with repeatability. Meaning that after a reboot, or a service network restart, the rules for domain-based policies are inconsistently applied.

on the system in question, all traffic defaults to a wireguard tunnel unless there is a specific policy re-routing it the wan (and bypass the tunnel).

There are two domain-based policies and one subnet policy (for the guest network).

The subnet policy works consistently, all the traffic from the subnet is routed via wan.
the domain-based policies are a hit and miss after a boot or network restart, sometimes they go through the tunnel, other times they go via the wan.
One can restart vpn-policy-routing and no change.

But occasionally, when traffic that should go to wan is going through the tunnel, all of the sudden it will start going out the wan like its supposed to. Correlating with the logs, it seems a dnsmasq restart happened (not clear why, either).
But a manual restart of dnsmasq is not a cure. one can do those over and over after a network restart and the domain traffic will go into the tunnel.

Stumped as to why this is going on, and more importantly, how to make this a reliable thing.

the output of iptables-save | grep -i -e mark is consistent, but I'm guessing the problem is the domain to IP mapping is getting nuked somehow, maybe that's why when dnsmasq bounces it (sometimes) works.

what's the best method for confirming the ipset being used for a domain?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.