VPN Policy-Based Routing + Web UI -- Discussion

yazdan · November 10, 2020, 3:24pm

First of all thanks for all the effort on making this package happen.

I have a corner case problem with package but I cloud not resolve it using my knowledge of OpenWrt and linux and send you a PR on github, so I'm seeking help here.

This my situation. I have 3 wireguard(2 to work and to a datacenter) vpns and 3 internet connections(TD-LTE, pppoe over adsl and pppoe over p2p wifi provider). Sometimes phone line is noisy and pppoe over ADSL disconnects and connects 2 or 3 times. Lets assume it is 3 times. This triggers reload operation 3 times. Last two pppoe reconnection happens when 1st reload is running. after 1st reload is complete then 2nd and 3rd reload run. I think one of the 2nd and 3rd reloads can be eliminated because the start time 2nd reload is greater than 3rd reconnection. I Think following diagram can clear what I'm trying to describe

How can I eliminate such thing?
I was thinking about logging event time and reload start time somewhere and then check those to decide. My idea is to implement reload_service and save those things using procd_set_param but documentation around procd is somehow sparse and I'm not sure if it can be done this way or not.

I you can give some clues about how it must be done I would be happy to help

Again Thanks for all your efforts

stangri · November 10, 2020, 4:56pm

Please try your rules as they were before I introduced the ignore option and then put ignore policies at the very bottom. Let me know if that works.

stangri · November 10, 2020, 4:57pm

If I'm remembering things right, if your internet works wherever the OpenVPN is up or down and you have a separate firewall zone for the OpenVPN connection (the latter may actually not be necessary) VPR should work. If you have OpenVPN in a killswitch mode, that your internet doesn't work when OpenVPN is down, VPR won't work.

stangri · November 10, 2020, 5:04pm

I would try to fight it by reducing the reload time. There are some steps towards it already, for example the dnsmasq (dnsmasq restart is the most expensive reload operation) is only restarted when its ipsets files change. If you can figure out the way to improve the speed of reload even further, please let me know.

I'd like to continue relying on PROCD for signals that VPR needs to be restarted and if PROCD sends down 3 reloads within seconds I'm going to assume there's a good reason for it. For example when the router is booting up, multiple interfaces can come online within second(s) of each other.

If there's a way to kill the previous reloads when the new one is received tho, that would be ideal, as VPR cleans everything up on start, so there shouldn't be any dangling settings. If you discover the way to use PROCD this way, please let me know, I'd be much obliged.

dydx · November 10, 2020, 6:02pm

Yes, that works.

It seems rules are inserted from bottom to top.
However, for MARK targets, rules higher on the list overwrite the ones lower on the list.

stangri · November 10, 2020, 6:20pm

Thank you for quick test. Here's how things work -- when the policy results in an iptables entry, they are processed in the order of the config file and each policy/rule is added at the top of the chain (pushing the previously inserted entries down). When iptables rules are traversed, the whole chain is traversed, so the lowest entry in the chain gets processed last and overrides previous result within the chain (the priority is reversed compared to config file).

Now IGNORE rules work differently in the way that they stop the rest of the iptables chain from being processed, and naturally they need to be on top of the table -- ie inserted last.

In the next version I'll make sure that the IGNORE rules are added to the top of the chain wherever they are in the config file.

Wolfie · November 10, 2020, 7:27pm

The issue I'm having is that when VPR is enabled, I lose internet access regardless of the rules. When I have a VPN server enabled (started with a PID), everywhere uses that (as is expected), but when stopped, it uses the standard connection. I take it that means there is no 'kill switch' enabled.

Separate firewall? As in, don't have it associated with the WAN firewall?

dydx · November 10, 2020, 8:25pm

That makes perfectly sense. Thanks for clarifying and I'll wait for an updated version.

yazdan · November 10, 2020, 8:49pm

That's a good idea. In my case most expensive step is creating tables see the logs below and manually restarting dnsmasq is not that time consuming. I think if something can be done about table creation phase, like reusing currently available tables it is a huge improvement.

My router is Archer C5

Table creation times: 10s, 10s, 10s, 10s, 10s, 10s total of 60 seconds
Adding all routings: 5s
Restarting dnsmasq: 1s

Wed Nov 11 00:04:42 2020 user.notice root: VPR manual reload
Wed Nov 11 00:04:45 2020 daemon.err modprobe: xt_set is already loaded
Wed Nov 11 00:04:45 2020 daemon.err modprobe: ip_set is already loaded
Wed Nov 11 00:04:45 2020 daemon.err modprobe: ip_set_hash_ip is already loaded
Wed Nov 11 00:04:56 2020 user.notice vpn-policy-routing [22752]: Creating table 'wan/eth0.2/0.0.0.0' [✓]
Wed Nov 11 00:05:06 2020 user.notice vpn-policy-routing [22752]: Creating table 'tunel/10.19.x.x' [✓]
Wed Nov 11 00:05:16 2020 user.notice vpn-policy-routing [22752]: Creating table 'wan_adsl/pppoe-wan_adsl/46.249.x.x' [✓]
Wed Nov 11 00:05:26 2020 user.notice vpn-policy-routing [22752]: Creating table 'work_mn/10.9.x.x' [✓]
Wed Nov 11 00:05:36 2020 user.notice vpn-policy-routing [22752]: Creating table 'work_as/10.9.x.x' [✓]
Wed Nov 11 00:05:46 2020 user.notice vpn-policy-routing [22752]: Creating table 'wan_p2p/pppoe-wan_p2p/5.202.x.x' [✓]
Wed Nov 11 00:05:47 2020 user.notice vpn-policy-routing [22752]: Routing 'youtube' via tunel [✓]
Wed Nov 11 00:05:47 2020 user.notice vpn-policy-routing [22752]: Routing 'twitter' via tunel [✓]
Wed Nov 11 00:05:47 2020 user.notice vpn-policy-routing [22752]: Routing 'slack' via tunel [✓]
Wed Nov 11 00:05:47 2020 user.notice vpn-policy-routing [22752]: Routing 'torrent tracker' via tunel [✓]
Wed Nov 11 00:05:50 2020 user.notice vpn-policy-routing [22752]: Routing 'work internal' via arvan_mn [✓]
Wed Nov 11 00:05:50 2020 user.notice vpn-policy-routing [22752]: Routing 'mobinnet' via tunel [✓]
Wed Nov 11 00:05:50 2020 user.notice vpn-policy-routing [22752]: Routing 'recrutee' via tunel [✓]
Wed Nov 11 00:05:50 2020 user.notice vpn-policy-routing [22752]: Routing 'slite' via tunel [✓]
Wed Nov 11 00:05:51 2020 user.notice vpn-policy-routing [22752]: Routing 'yts' via tunel [✓]
Wed Nov 11 00:05:51 2020 user.notice vpn-policy-routing [22752]: Routing 'google' via tunel [✓]
Wed Nov 11 00:05:51 2020 user.notice vpn-policy-routing [22752]: Routing 'telegram' via tunel [✓]
Wed Nov 11 00:05:51 2020 user.notice vpn-policy-routing [22752]: Routing 'mibox' via wan_adsl [✓]
Wed Nov 11 00:05:52 2020 user.notice vpn-policy-routing [22752]: Routing 'downloaders' via wan_adsl [✓]
Wed Nov 11 00:05:52 2020 user.notice vpn-policy-routing [22752]: Routing 'work' via wan_adsl [✓]
Wed Nov 11 00:05:52 2020 user.notice vpn-policy-routing [22752]: Routing 'adslers' via wan_adsl [✓]
Wed Nov 11 00:05:52 2020 user.notice vpn-policy-routing [22752]: service started with gateways: wan/eth0.2/0.0.0.0 tunel/10.19.x.x wan_adsl/pppoe-wan_adsl/46.249.x.x work_mn/10.9.x.x work_as/10.9.x.x wan_p2p/pppoe-wan_p2p/5.202.x.x [✓]
Wed Nov 11 00:05:53 2020 user.notice vpn-policy-routing [22752]: service monitoring interfaces: wan tunel wan_adsl arvan_mn arvan_as wan_p2p .
Wed Nov 11 00:05:53 2020 user.notice root: dnsmasq manual reload
Wed Nov 11 00:05:55 2020 daemon.info dnsmasq[31026]: read /etc/hosts - 4 addresses
Wed Nov 11 00:05:55 2020 daemon.info dnsmasq[31026]: read /tmp/hosts/dhcp.cfg01411c - 13 addresses
Wed Nov 11 00:05:55 2020 daemon.info dnsmasq-dhcp[31026]: read /etc/ethers - 0 addresses

I agree with you on this, it is a good choice and handles all the situation

I think PROCD knows somehow that the script is running and delays the startup of second script until first one ends. I will investigate if it is possible to override some hooks in PORCD to kill previous instance if it is running.

stangri · November 10, 2020, 9:22pm

Thanks for bringing it up, unless you have a lot of WG server (as in WG running on router and accepting remote connections from WAN) IPs this is abnormally long.

See if things speed up if you comment this whole section: https://github.com/openwrt/packages/blob/master/net/vpn-policy-routing/files/vpn-policy-routing.init#L576-L581

I believe README has the example of /etc/config/firewall. I haven't used OpenVPN for a while, but before I ceased using it, I've put my tested/working config into README.

Thanks for bringing this idea up and testing the updates!

stangri · November 10, 2020, 9:31pm

Hey, move your IGNORE policies back to the top and try this: https://dev.melmac.net/repo/vpn-policy-routing_0.3.0-0_all.ipk

Hopefully it will also speed up processing on systems with a lot of policies resulting in iptables rules.

yazdan · November 10, 2020, 10:18pm

All my vpn connections are Wireguard clients not servers that is 3 connections

After commenting It became around 1s for each interface

Regarding to this PROCD calls procd_lock(it is defined in /lib/procd.sh) before calling nearly everything. An it uses flock to create a lock and wait upon it.

I found out that there is service_running hook in PROCD but it returns True if I add something like this to VPR

reload_service()
{
	if service_running; then
		output "sorry running \\n"
	else
		start
	fi
}

Wolfie · November 10, 2020, 10:18pm

You do, but you also mentioned about there being incompatible setups, thus why I was asking about a recommended guide, so I could follow those instructions and know that any issues wouldn't be due to the OpenVPN configuration.

Any way I can try to figure out why VPR is killing connections when the VPN is active?

cantenna · November 10, 2020, 11:11pm

So, recently moved to a different VPN Provider that doesn't offer a static DNS, so, how do I prevent a DNS leak?

stangri · November 11, 2020, 12:18am

Is your firewall config identical to one in README?

dydx · November 11, 2020, 4:55am

Works nicely!
Thanks!

stangri · November 11, 2020, 5:34am

Outstanding job testing two different packages in rapid succession! I've merged the testing branch back to main branch and will move forward with this version which appends rather than inserts new policies on top and uses goto vs jumps in iptables.

Let me think on how to best update the Web UI and in the mean time I'll try to update the README to reflect the changes.

Wolfie · November 11, 2020, 9:30am

Removed four packages (vpr/luci-vpr, openvpn/luci-ovpn), backed up the config files, removed from router, removed vpn device(s) from router, then reinstalled.

What I had wanted to do is to have VPN enabled but only for specific clients or client ranges. So did the "pull-filter" ignore option mentioned in your guide, turned on the VPN, added an entry to only affect one client and BOOM, it worked.

Now I just need to figure out how to have multiple VPN connections that only use a selected interface so that I can do routing based on which VPN I want them to have. So, researching how to properly have multiple VPN's running at one time.

Thanks for the app, it's wonderful.

Nove11 · November 11, 2020, 8:42pm

Hello,

I have multi vpn client setup working but the issue is that the policies are not being respected.
Here is the example: I have Three VPN clients with three tunnel interfaces over three wireless APs (and so the three subnets.) This setup is working fine except for the routing with respect to subnets.

In the screenshot below yellow and red indicates two reboots. for example, subnet 10.0.2.1/24 sometimes uses tunnel interface TUN_UK or sometime it uses TUN_IN or even TUN_US. and this only happens across reboots otherwise VPN and routing is stable.

But i want policies to be respected so that the subnet 10.0.2.1/24 should only go through tunnel TUN_US (in the example above).

Is this expected behavior or have I miss-configured something? I have read this but i am not sure if i understood correctly. I have been using openwrt for only two days so bit of guidance would be very helpful and Thanks a lot for PBR package its been quite useful.

stangri · November 12, 2020, 5:29am

Did you read this?