Implementing policy routing in (or compatible with) mwan3

I hate to beat the PBR vs mwan3 dead horse, but I'm at my wit's end. I've run mwan3 since roughly Chaos Calmer, primarily for automatic failover to a backup ISP. I was also using it to implement policy routing, but that stopped working around 18.06 or 19.07 (in hindsight I probably just needed iptables -t mangle -I PREROUTING 1 -m comment --comment "Do not inherit the mark of encrypted packets" -j MARK --set-xmark 0x0/0x3f00 but I didn't know that trick at the time), so I installed vpn-policy-routing, which was fine until 22.03.3 when that package went away.

My most concise question, before we dive into my config, is: given a LAN br-ch and OpenVPN WAN tun-ch, what is the shortest config to:

  • route all traffic from br-ch to tun-ch
  • but still be able to SSH to a host on br-ch from my primary LAN
  • while having mwan3 still manage failover of my primary WAN interface

? Because that's all I want to do: I have a secondary LAN br-ch tied to a VLAN, alternate SSID, and LXC instance that I want to send to a VPN exclusive of everything else except be able to login to a host on it from my primary LAN. I don't care if I use pbr, do it all in mwan3, or something else.

Fwiw I've been through:

  • PBR not working with mwan3: fine, I accept pbr and mwan3 may be incompatible. I'm okay with that and was previously capable of getting mwan3 to implement my policy routing years ago, but it just doesn't work in this layout for reasons I cannot determine (and yes, my wan interfaces all have different metrics).
  • Configure OpenVPN only on some LAN ports: intriguing, but it uses vpn-policy-routing, which is gone and had worked for me previously. Since my problem is with pbr, I'm not sure this helps.
  • Source-IP routing rule to vpn tunnel: seems the easiest and most concise solution for my goal, but it also doesn't work for me. The config is so small and intuitive I'm somewhat dumbfounded it doesn't work, but it doesn't.

At this point I've tried all the things, including the things that contradict the other things (literally I'm 60-80 hours on-and-off over the last 7-8 years into trying to just route a network to a VPN). At this point I have a pbr config that works if mwan3 is off and stops working as soon as mwan3 is on, which is the most progress I've had recently (and yes, I have pbr set to "Insert", which the README says makes it compatible w/ mwan3).

Again, though, I'd be happy with any config. E.g., back to Source-IP routing rule to vpn tunnel, which is now closed, is the most promising but doesn't have a concise, final solution. Could $someone (maybe @trendy or @phoedos) give me the final outcome there?

To that solution, I have (on 22.03.4):

config route
	option target '0.0.0.0'
	option netmask '0.0.0.0'
	option metric '100'
	option table 'all_ch'
	option interface 'VPN_CH'

config rule
	option src '192.168.5.32/28'
	option lookup 'all_ch'

Which worked for me briefly, but then I couldn't connect from my primary LAN to a host on the secondary LAN br-ch (presumably no route). Then after a round of testing/debugging I reverted and rebooted, and the above stopped working (perhaps some race w/ DHCP since it needs to hand out an address and the host on the secondary LAN br-ch didn't get an address the second time).

And back to a possible mwan3-only solution, my last attempt on that config was:

config interface 'vpn_ch'
	option initial_state 'online'
	option family 'ipv4'
	option track_method 'ping'
	option reliability '1'
	option count '1'
	option size '56'
	option max_ttl '60'
	option timeout '4'
	option interval '10'
	option failure_interval '5'
	option recovery_interval '5'
	option down '5'
	option up '5'
	option enabled '1'

config member 'ch_m1_w1'
	option metric '1'
	option weight '1'
	option interface 'vpn_ch'

config policy 'ch_policy'
	list use_member 'ch_m1_w1'
	option last_resort 'unreachable'

config rule 'ch_rule'
	option family 'ipv4'
	option proto 'all'
	option src_ip '192.168.5.32/28'
	option sticky '0'
	option use_policy 'ch_policy'
	option dest_ip '0.0.0.0/0'

Which also failed to route 192.168.5.32/28 to WAN vpn_ch.

Aside from 22.03.4, my related packages are:

# opkg list-installed | egrep 'iptables|pbr|mwan3|policy'
iptables-mod-conntrack-extra - 1.8.7-7
iptables-mod-ipopt - 1.8.7-7
iptables-nft - 1.8.7-7
luci-app-mwan3 - git-23.093.40772-fa4dc75
luci-app-pbr - 1.1.0-1
mwan3 - 2.11.4-1
pbr - 1.0.1-16

Thanks!

Make sure the all_ch is in rt_tables, otherwise use a number instead.
As for mwan3 it may not be working if there is ipv6 configured but not working.
Other than that, the mwan3 diagnostics and the ip -4 addr; ip -4 ro list table all; ip -4 ru can help understand where may be the problem.

2 Likes

Yes, it is:

# fgrep -w all_ch /etc/iproute2/rt_tables
101 all_ch

:man_facepalming: That could do it:

# mwan3 restart
# Warning: iptables-legacy tables present, use iptables-legacy to see them
# Warning: iptables-legacy tables present, use iptables-legacy to see them
# Warning: iptables-legacy tables present, use iptables-legacy to see them
Error: argument "1:" is wrong: preference value is invalid

I assume the last line is a v6 localhost error? Due to my primary ISP not supporting v6 but my backup ISP having v6 plus other weird problems with v6 over the years, I've systematically ripped it out of every config possible, down to:

# cat /etc/rc.local 
sysctl -w net.ipv6.conf.all.disable_ipv6=1
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
sysctl -w net.ipv6.conf.default.disable_ipv6=1

Though I still find the occasional interface that somehow gets a link-local address. So now the question is, what's the minimal v6 config to make mwan3 happy w/o having my LAN devices get v6 addresses and start routing all their traffic out the backup ISP?

I'll put that on the TODO list. Right now the static route experiment is still running and I'm afraid my wife will divorce me if I reboot the router one more time. :neutral_face:

What I meant was to remove all ipv6 relevant configuration from mwan3, like wan6 interface, ::/0 policy, etc.

It won't affect the router to run these commands.

OIC. Yeah, I never had any v6 configuration in mwan3. Also I'm thinking maybe the error about "1:" is about lo (below) as it's prefixed with "1:"?

Understood. I realize almost nothing with OpenWrt requires a reboot (most changes work shockingly well), but with all the mix of mwan3, pbr, static routes, etc. I've found it best to reboot between major changes as I've seen not everything get turned off (especially pbr disabled still leaves nft/iptables entries) and I'm trying to avoid posting red herrings.

I think this is everything relevant: (and btw I moved br-ch from 192.168.5.32/28 to 192.168.5.16/28 since the OP -- I've actually got 2 VPNs and was trying different configs in parallel to speed-up debugging):

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc htb state UP group default qlen 1024
    inet 216.139.wx.yz/30 brd 216.139.wx.yz scope global eth2
       valid_lft forever preferred_lft forever
34: br-ch: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.5.17/28 brd 192.168.5.31 scope global br-ch
       valid_lft forever preferred_lft forever
37: br-pvlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.10.1/24 brd 192.168.10.255 scope global br-pvlan
       valid_lft forever preferred_lft forever
65: tun-ch: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc cake state UNKNOWN group default qlen 500
    inet 10.8.2.2/24 scope global tun-ch
       valid_lft forever preferred_lft forever
default via 216.139.wx.yz dev eth2 table 1 proto static src 216.139.wx.yz 
10.8.2.0/24 dev tun-ch table 1 proto kernel scope link src 10.8.2.2 
192.168.5.16/28 dev br-ch table 1 proto kernel scope link src 192.168.5.17 
192.168.10.0/24 dev br-pvlan table 1 proto kernel scope link src 192.168.10.1 
216.139.wx.yz/30 dev eth2 table 1 proto kernel scope link src 216.139.wx.yz 
default via 216.139.wx.yz dev eth2 proto static src 216.139.wx.yz 
10.8.2.0/24 dev tun-ch proto kernel scope link src 10.8.2.2 
192.168.5.16/28 dev br-ch proto kernel scope link src 192.168.5.17 
192.168.10.0/24 dev br-pvlan proto kernel scope link src 192.168.10.1 
216.139.wx.yz/30 dev eth2 proto kernel scope link src 216.139.wx.yz 
broadcast 10.8.2.0 dev tun-ch table local proto kernel scope link src 10.8.2.2 
local 10.8.2.2 dev tun-ch table local proto kernel scope host src 10.8.2.2 
broadcast 10.8.2.255 dev tun-ch table local proto kernel scope link src 10.8.2.2 
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
broadcast 192.168.5.16 dev br-ch table local proto kernel scope link src 192.168.5.17 
local 192.168.5.17 dev br-ch table local proto kernel scope host src 192.168.5.17 
broadcast 192.168.5.31 dev br-ch table local proto kernel scope link src 192.168.5.17 
broadcast 192.168.10.0 dev br-pvlan table local proto kernel scope link src 192.168.10.1 
local 192.168.10.1 dev br-pvlan table local proto kernel scope host src 192.168.10.1 
broadcast 192.168.10.255 dev br-pvlan table local proto kernel scope link src 192.168.10.1 
broadcast 216.139.wx.yz dev eth2 table local proto kernel scope link src 216.139.wx.yz 
local 216.139.wx.yz dev eth2 table local proto kernel scope host src 216.139.wx.yz 
broadcast 216.139.wx.yz dev eth2 table local proto kernel scope link src 216.139.wx.yz 
0:	from all lookup local
1001:	from all iif eth2 lookup 1
2001:	from all fwmark 0x100/0x3f00 lookup 1
2061:	from all fwmark 0x3d00/0x3f00 blackhole
2062:	from all fwmark 0x3e00/0x3f00 unreachable
3001:	from all fwmark 0x100/0x3f00 unreachable
32766:	from all lookup main
32767:	from all lookup default

I don't see neither table 101/all_ch in routing tables, nor any rule to classify the 192.168.5.16/28 . Did you restart network service after you added the rule and route?

1 Like

@trendy sorry, I misunderstood: so in the above output I'm trying to use mwan3-only to implement the policy routing. E.g.,

# mwan3 interfaces
Interface status:
 interface wan is online 14h:41m:54s, uptime 21h:20m:48s and tracking is active
 interface wan_teather is offline and tracking is not enabled
 interface VPN_CH is online 00h:00m:00s, uptime 14h:49m:12s and tracking is not enabled
# mwan3 rules
Active ipv4 user rules:
16059 1085K - wan_only  udp  --  *      *       0.0.0.0/0            216.139.32.0/24      multiport dports 53 
 101K   19M - default_policy  all  --  *      *       0.0.0.0/0            0.0.0.0/0            
    0     0 - ch_policy  all  --  *      *       192.168.5.16/28      0.0.0.0/0            

If you want me to go back to the previous attempt using static routes, I'll try that later today.

Also btw no one asked, but primarily my debugging method is to attach a computer on the VLAN tied to this network and have it ping out to the Internet, and also tcpdump the VPN tun connection just to catch the case where perhaps the packets are forwarded out but not back. In every failure case for all my myriad of attempts, the packets never go out -- it fails at the source interface (IIRC "Port unreachable" or similar).

https://docs.openwrt.melmac.net/pbr/#rule_create_option README clearly states it works on 21.02 and earlier. Which it does/has been reported to work.

So if you want pbr, install 21.02 with mwan3 and pbr-iptables, change the abovementioned option and it should work. Until mwan3 is rewritten to support nft, there's no way to have (nft-supported) pbr work with it. The pbr in iptables mode or pbr-iptables should theoretically work with mwan3 on 22.03 (especially if you follow advise from @bluewavenet here), but I have not heard of anyone using it this way.

If you have just a few simple rules, you may be better off manually inserting them with iptables commands into the chains mwan3 uses or other chains which mwan3 respects on 22.03.

2 Likes

I was on 21.02 with this configuration and at least vpn-policy-routing worked fine, as well as 22.03.0-2 until vpn-policy-routing went away. It's just in my experience 22.03 has been out a while and it's best to not put off the inevitable upgrade. I could stay at 22.03.2 w/ vpn-policy-routing as I have a custom image with it built-in, but as soon as there's a security vulnerability, I'm stuck and then it's an emergency to sort all this out.

I just tried pbr-iptables and same thing as before: it works until mwan3 is started. Also luci-app-pbr won't install as it wants pbr, not luci-app-pbr-iptables. And I do have iptables-nft installed, not iptables-zz-legacy.

So I'm open to this as well: any suggestions as to where to put the rule? Remember I just want to route one subnet (192.168.5.16/28) to one interface (VPN_CH in UCI or tun-ch by device), no bells-and-whistles except that it needs to be reachable from my primary LAN (this is where giving it its own routing table went south).

Or how about something to replace mwan3 as I'm only using a small subset of its features. Any ideas for WAN failover?

Since your use case is relatively simple, you can use MWAN with netifd:
https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan_netifd

This method relies on the built-in features of netifd and should work no matter the OpenWrt version, so you don't need to install extra packages or deal with legacy dependencies such as iptables.

2 Likes

Thanks. Fwiw I am trying the various suggestions in this thread, it's just that most days I only get 15-30 minutes to mess around with the router. Also I may have misled you on the "simplicity of my config" -- I'm just trying to avoid superfluous things like having multiple VPNs because I assume once I get one working, I can replicate it to others as the config is almost identical. Or that I don't just have one LAN -- I have a Guest LAN, IoT LAN, etc. In most cases this shouldn't matter (and also, I don't want my IoT LAN to be able to connect to my VPN LAN/DMZ, anyway), but if I start having to lay out a myriad of static routes myself and fail them over when the primary WAN goes down, it's going to get tedious and error-prone.

Anyhow, this morning I tried a subset of these suggestions (just the first part w/ routing, not the mwan failover substitute) and I must have messed-up the translation of example names to my actual network/interface names and locked myself out of my own LAN, necessitating a serial cable to most quickly revert the change.

At this point I am starting to question the life choices that led me down this path, but seriously in 2023 I can't be the only person who wants automatic failover to a backup ISP and wants to put a VPN on a SSID or VLAN?

This should be your firewall configuration that is unrelated to routing.
There's usually no need to duplicate firewall with prohibitive routing.

Do you even need static routes?
Perhaps you are confusing static routes with routing rules?

The above implementation just adds 3 routing rules and requires no static routes.
Scaling needs only minor modification unless you want to add more custom rules.

The MWAN script should switch you to a working WAN.
You can also manually switch WANs from the web interface.

Make the configuration as intuitive and plain as possible.
Try using short simple parameter values and avoid capitals.
If the issue occurs, collect and analyze the relevant diagnostics.

ACK. I believe my firewall rules are good (I've tested them a few times and they're largely unchanged over the last 4-5 years).

Yeah, I guess I don't understand the term "routing rules". I'll put that on my OpenWrt reading list. Meanwhile, back to your example config, you proposed:

uci set network.lan_vpn="rule"
uci set network.lan_vpn.in="lan"
uci set network.lan_vpn.src="192.168.5.16/28"
uci set network.lan_vpn.lookup="vpn"
uci set network.lan_vpn.priority="40000"

Shouldn't that be network.lan_vpn.in="dmz"? Because I'm assuming you are calling dmz the LAN (SSID/VLAN) that I tie to the VPN, and the goal is to route all its traffic out the VPN.

Speaking of which, annoyingly at some point (22.03?) Luci started forcing interface names to be in ALL CAPS, so a VPN I configured a couple of years ago is vpn_foo and one added more recently is VPN_BAR. Once my configs are stable again, I'm going to go back and make them uniform. So you recommend lowercase interface names? Because I was going to go uppercase as that seems the new custom.

The rule #1 is for a specific subnet as part of your main LAN.
The rule #2 is for all clients using a separate DMZ interface.
This is an example that you can customize to your needs.

LuCI tries to look pretty but makes it more confusing.
The init scripts generate default configs in lowercase.

Yes, it is best to keep the configs lowercase.

1 Like

So thanks for everyone (especially @stangri and @vgaetera) who provided suggestions. Fwiw I retried tweaks on a couple of the later suggestions to no avail. I'd really like to understand modern routing w/ multiple routing tables, etc. better, but it's a PitA to do it on your primary router: aside from the family complaining, it's really hard to hit the OpenWrt forum when the Internet is unreachable.

At some point I'll have to setup at testbed within my network with a spare router so I can test at my leisure, but ultimately since 21.02 is still supported and gets security updates, I just built a custom 21.02.7 image with my favorite packages. I was reluctant to do this for multiple reasons, including config compatibility and drift (i.e., my last backup before 22.03 was too long ago to be valid and I was afraid to use 22.03 configs on 21.02). Nevertheless, they worked just fine with vpn-policy-routing vs pbr being the only config that required a minor update.

So you ended up switching to vpn-policy-routing from pbr on 21.02?

Yes. I never had any issues with vpn-policy-routing even on 22.03 until they deleted it from the repository between 22.03.2 and 22.03.3.

Quick update: I bought a test / backup router (Linksys EA6350 v4), flashed it with 23.05.2, and set about trying various configs in a non-time-pressure situation. My conclusions:

  • mwan3 still doesn't support the PBR-like routing I used years ago (OpenWrt c18.x or so)
  • pbr+mwan3 is still incompatible for me even with the pbr-iptables hacks documented elsewhere

However, installing pbr_extras plus mwan_netifd appears to work with only minor additional configuration. The above pages are not very well commented, but if you follow them step-by-step, they do work. The only necessary additional config in /etc/config/network is, for each of my three VPN/LAN pair names xx:

config rule
	option priority '30000'
	option in 'lan_xx'
	option lookup 'vpn_xx'

And that's it!

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.