Policy-Based-Routing (pbr) package discussion

image
I have used the line above on PBR Policies to force Cloudflare DNS servers on OpenVPN connection. But since same weeks (don't remember when I checked it for last time) it doesn't seem to do anything. Did something change in the latest build 118r20?

To check, ssh into the router and do:

traceroute 1.0.0.1

This assumes nordvpntun is up and working

Edit: can you show the exact pbr rule:
cat /etc/config/pbr

The 1.0.0.1 should be the destination and it is possible that that does not work in latest build.

ok, I will just use another http-dns-proxy server for specific websites, since I already use that package anyway

I think the latest updates to PBR have broken the validation of ipaddress / domains.
The ip address 1.0.0.1 is validated as domain and an nftset is made which does not work.

I already signaled stangri.

You might try my test build which should mitigate this, see:

2 Likes

Build 1.1.8-r26 is out for testing

1 Like

Hi, the last working pbr for my config is 1.1.8-r4.
Other newer version works only after a restart.

So, at the first start I get:

Failed to resolve 'edu.it'!
Failed to resolve 'aiv-cdn.net'!
Failed to resolve 'pv-cdn.net'!
Failed to resolve 'aiv-delivery.net'!
Failed to resolve 'amazon.dev'!
Failed to resolve 'ssl-images-amazon.com'!
Failed to resolve 'media-amazon.com'!
Failed to resolve 'a2z.com'!
Failed to resolve 'imrworldwide.com'!
Failed to resolve 'akamaized.net'!
Failed to resolve 'akamaihd.net'!
Failed to resolve 'akamaiedge.net'!
Failed to resolve 'akamai.net'!
Failed to resolve 'disco-api.com'!
Failed to resolve 'cloudfront.net'!
Failed to resolve 'ttvnw.net'!
Failed to resolve 'nflxso.net'!

.....

Please try the 1.1.8-r28 from my repo and set dnsmasq_debug option to 1 in the config and if this happens upon restart, please post the contents of /var/run/pbr.debug.

Thanks @pesa1234 for reporting

For my understanding I assume that the problem arises after a reboot and that it is resolved by doing service pbr restart?

If so besides the output of /var/run/pbr.debug can you post the output of nft list ruleset:

cat /var/run/pbr.debug
nft list ruleset

Can you also show after a service pbr restart:

cat /var/run/pbr.debug
nft list ruleset

Do you have any packages installed which influences DNSMasq or the firewall e.g.
HTTPS-DNS-proxy, AdGuard, SmartDNS, or an Adblock package?

The problem seems to be that on boot up some packages are interfering with PBR, in earlier builds that could be mitigated by using a boot_delay (procd_boot_delay) but that has been removed as it is replaced by checking for available resources, but it looks like this is not fool proof.

I can trigger this same behaviour while using Adblock

Stangri is aware of this and has added some debug options.

I might have a test version later to test some ideas :slight_smile:

Hi, here:

Thanks a lot for checking

I have adblock

I think that a startup delay is necessary in some case

Probably

The problem looks indeed to be a race condition at boot up.

The easy way for now is probably to add to /etc/rc.local (the startup):

sleep 60 && service pbr restart

In my case the problem seems to be solved, can you or @stangri please check this:

Just the rework on triggers solve the issue in my case. Thanks guys

Thanks @pesa1234 we are still studying about the root cause.

Unfortunately your solution did not work for me but the solution might indeed be in the region you are looking as the problem seems very early in the boot process.

I am pursuing a different angle but also not sure if that will work.

I have uploaded my test version: pbr-1.1.8-r28-egc-5.bash, see:https://github.com/egc112/OpenWRT-egc-add-on/tree/main/pbr

But if someone decides to test it make sure you make a backup

To be continued

1 Like

Hi, with 1.1.8-r29 seems ok on my side...

Thanks for testing, unfortunately it did not for me :frowning:

If you have the time can you test my version:

It is easy to switch back

Hi, I've tested your version, the problem is still present on my side sorry.
I confirm that r29 is ok!

1 Like

Hello,

I'm having an issue with routing a specific subnet through my WireGuard VPN tunnel. My host is an OpenWrt VM in Proxmox without a WAN interface, connected to my LAN and two other VLANs. The default gateway is my physical OpenWrt device on the lan interface.

I have tried setting procd_lan_device and procd_wan_interface to every combination I can think of, but none of them make the PBR service start correctly. In all cases, I get the following errors:

...
Setting up routing for 'wg0/10.2.0.2' [โœ—]
...
ERROR:
ip -4 route add default via 10.50.10.1 dev eth0 proto static table 256
ERROR: Failed to set up 'wg0/10.2.0.2'!
ERROR: Failed to set up any gateway!

My goal is to simply have traffic from my VMs and LXCs connected to my pvt VLAN routed through my VPN tunnel, except for traffic bound to private networks (which I will firewall accordingly).

This issue occurs even if no policies are enabled. See full config and logs below.


ubus call system board
{
        "kernel": "6.6.86",
        "hostname": "pvt-openwrt",
        "system": "AMD Ryzen 5 5600 6-Core Processor",
        "model": "QEMU Standard PC (Q35 + ICH9, 2009)",
        "board_name": "qemu-standard-pc-q35-ich9-2009",
        "rootfs_type": "ext4",
        "release": {
                "distribution": "OpenWrt",
                "version": "24.10.1",
                "revision": "r28597-0425664679",
                "target": "x86/64",
                "description": "OpenWrt 24.10.1 r28597-0425664679",
                "builddate": "1744562312"
        }
}
uci export dhcp
package dhcp

config dnsmasq
        option domainneeded '1'
        option localise_queries '1'
        option rebind_protection '0'
        option local '/lan/'
        option domain 'home.arpa'
        option expandhosts '1'
        option cachesize '1000'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option localservice '1'
        option ednspacket_max '1232'
        list server '10.50.10.1'
        option dhcpleasemax '500'
        option dnsforwardmax '1000'
        option logqueries '1'

config dhcp 'lan'
        option interface 'lan'
        option start '100'
        option limit '150'
        option leasetime '12h'
        option dhcpv4 'server'
        option ignore '1'

config dhcp 'wan'
        option interface 'wan'
        option ignore '1'

config odhcpd 'odhcpd'
        option maindhcp '0'
        option leasefile '/tmp/hosts/odhcpd'
        option leasetrigger '/usr/sbin/odhcpd-update'
        option loglevel '4'

config dhcp 'pvt'
        option interface 'pvt'
        option start '100'
        option limit '150'
        option leasetime '1h'
        option force '1'

config dhcp 'services'
        option interface 'services'
        option start '100'
        option limit '150'
        option leasetime '1h'
        option force '1'

config host
        list mac 'BC:24:11:DF:14:DF'
        option ip '10.0.50.3'
        option leasetime '12h'
        option dns '1'
uci export network
package network

config interface 'loopback'
        option device 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fd9e:c330:2035::/48'
        option packet_steering '1'

config interface 'lan'
        option device 'eth0'
        option proto 'static'
        option ipaddr '10.50.60.1'
        option netmask '255.255.0.0'
        option gateway '10.50.10.1'
        list dns '10.50.10.1'
        option delegate '0'

config device
        option name 'eth0'
        option acceptlocal '1'
        option ipv6 '0'

config interface 'wg0'
        option proto 'wireguard'
        option private_key '<redacted>'
        list addresses '10.2.0.2/32'
        list dns '10.2.0.1'
        option delegate '0'

config wireguard_wg0
        option public_key '<redacted>'
        option endpoint_host '144.48.38.178'
        option endpoint_port '51820'
        option persistent_keepalive '25'
        option description 'Proton AU#297'
        list allowed_ips '0.0.0.0/0'

config interface 'pvt'
        option proto 'static'
        option device 'eth0.60'
        list ipaddr '10.0.60.2/24'
        list dns '10.2.0.1'

config interface 'services'
        option proto 'static'
        option device 'eth0.50'
        list ipaddr '10.0.50.2/24'
uci export pbr
package pbr

config pbr 'config'
        option enabled '1'
        option verbosity '2'
        option strict_enforcement '1'
        option resolver_set 'none'
        list resolver_instance '*'
        option ipv6_enabled '0'
        option boot_timeout '20'
        option rule_create_option 'add'
        option procd_reload_delay '5'
        option webui_show_ignore_target '1'
        option nft_rule_counter '1'
        option nft_set_auto_merge '1'
        option nft_set_counter '1'
        option nft_set_flags_interval '1'
        option nft_set_flags_timeout '0'
        option nft_set_policy 'performance'
        list webui_supported_protocol 'all'
        list webui_supported_protocol 'tcp'
        list webui_supported_protocol 'udp'
        list webui_supported_protocol 'tcp udp'
        list webui_supported_protocol 'icmp'
        list procd_lan_device 'eth0'
        list procd_lan_device 'eth0.50'
        list procd_lan_device 'eth0.60'
        option procd_wan_interface 'lan'

config policy
        option name 'Ignore to Local'
        option interface 'ignore'
        option enabled '0'
        option dest_addr '10.0.0.0/8 192.168.0.0/16 172.16.0.0/12'

config policy
        option name 'Private Services'
        option src_addr '10.0.60.0/24'
        option interface 'wg0'
        option enabled '0'

config dns_policy
        option name 'Private Services DNS WireGuard'
        option src_addr '10.0.60.0/24'
        option dest_dns 'wg0'
        option enabled '0'
/etc/init.d/pbr status [service stopped]
pbr - environment
pbr 1.1.8-r16 running on OpenWrt 24.10.1.

Dnsmasq version 2.90  Copyright (c) 2000-2024 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP conntrack no-ipset nftset auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile

pbr chains - policies
        chain pbr_forward { # handle 45
        }
        chain pbr_input { # handle 46
        }
        chain pbr_output { # handle 47
        }
        chain pbr_postrouting { # handle 49
        }
        chain pbr_prerouting { # handle 48
        }
        chain pbr_dstnat { # handle 44
        }

pbr chains - marking

pbr nft sets

pbr tables & routing
/etc/init.d/pbr reload [service starts]
Using uplink interface (on_start): lan [โœ“]
Found uplink gateway (on_start): 10.50.10.1 [โœ“]
Setting up routing for 'wg0/10.2.0.2' [โœ—]
Installing fw4 nft file [โœ“]
Setting interface trigger for wg0 [โœ“]

pbr 1.1.8-r16 monitoring interfaces: wg0
ERROR:
ip -4 route add default via 10.50.10.1 dev eth0 proto static table 256
ERROR: Failed to set up 'wg0/10.2.0.2'!
ERROR: Failed to set up any gateway!
WARNING: Please set 'dhcp.lan.force=1' to speed up service start-up.
/etc/init.d/pbr status [service running]
pbr - environment
pbr 1.1.8-r16 running on OpenWrt 24.10.1.

Dnsmasq version 2.90  Copyright (c) 2000-2024 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP conntrack no-ipset nftset auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile

pbr fw4 nft file: /usr/share/nftables.d/ruleset-post/30-pbr.nft
add chain inet fw4 pbr_mark_0x010000
add rule inet fw4 pbr_mark_0x010000 counter mark set mark and 0xff00ffff xor 0x010000
add rule inet fw4 pbr_mark_0x010000 return

pbr chains - policies
        chain pbr_forward { # handle 45
        }
        chain pbr_input { # handle 46
        }
        chain pbr_output { # handle 47
        }
        chain pbr_postrouting { # handle 49
        }
        chain pbr_prerouting { # handle 48
        }
        chain pbr_dstnat { # handle 44
        }

pbr chains - marking
        chain pbr_mark_0x010000 { # handle 3291
                counter packets 0 bytes 0 meta mark set meta mark & 0xff01ffff | 0x00010000 # handle 3292
                return # handle 3293
        }

pbr nft sets

pbr tables & routing
IPv4 table 256 pbr_wg0 route:
default via 10.2.0.2 dev wg0
IPv4 table 256 pbr_wg0 rule(s):
30000:  from all fwmark 0x10000/0xff0000 lookup pbr_wg0

Manually entering the command that is shown as the one creating the error (ip -4 route add...), returns RTNETLINK answers: File exists. If I ignore this, and enable the policies anyway,

uci export pbr
root@pvt-openwrt:~# uci export pbr
package pbr

config pbr 'config'
        option enabled '1'
        option verbosity '2'
        option strict_enforcement '1'
        option resolver_set 'none'
        list resolver_instance '*'
        option ipv6_enabled '0'
        option boot_timeout '20'
        option rule_create_option 'add'
        option procd_reload_delay '5'
        option webui_show_ignore_target '1'
        option nft_rule_counter '1'
        option nft_set_auto_merge '1'
        option nft_set_counter '1'
        option nft_set_flags_interval '1'
        option nft_set_flags_timeout '0'
        option nft_set_policy 'performance'
        list webui_supported_protocol 'all'
        list webui_supported_protocol 'tcp'
        list webui_supported_protocol 'udp'
        list webui_supported_protocol 'tcp udp'
        list webui_supported_protocol 'icmp'
        list procd_lan_device 'eth0'
        list procd_lan_device 'eth0.50'
        list procd_lan_device 'eth0.60'
        option procd_wan_interface 'lan'

config policy
        option name 'Ignore to Local'
        option interface 'ignore'
-       option enabled '0'
        option dest_addr '10.0.0.0/8 192.168.0.0/16 172.16.0.0/12'

config policy
        option name 'Private Services'
        option src_addr '10.0.60.0/24'
        option interface 'wg0'
-       option enabled '0'

config dns_policy
        option name 'Private Services DNS WireGuard'
        option src_addr '10.0.60.0/24'
        option dest_dns 'wg0'
-       option enabled '0'
/etc/init.d/pbr reload
  Using uplink interface (on_start): lan [โœ“]
  Found uplink gateway (on_start): 10.50.10.1 [โœ“]
  Setting up routing for 'wg0/10.2.0.2' [โœ—]
+ Routing 'Ignore to Local' via ignore [โœ“]
+ Routing 'Private Services' via wg0 [โœ“]
+ Routing 'Private Services DNS WireGuard' DNS to wg0 [โœ“]
  Installing fw4 nft file [โœ“]
  Setting interface trigger for wg0 [โœ“]

  pbr 1.1.8-r16 monitoring interfaces: wg0
  ERROR:
  ip -4 route add default via 10.50.10.1 dev eth0 proto static table 256
  ERROR: Failed to set up 'wg0/10.2.0.2'!
  ERROR: Failed to set up any gateway!
  WARNING: Please set 'dhcp.lan.force=1' to speed up service start-up.
/etc/init.d/pbr status
  pbr - environment
  pbr 1.1.8-r16 running on OpenWrt 24.10.1.

  Dnsmasq version 2.90  Copyright (c) 2000-2024 Simon Kelley
  Compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP conntrack no-ipset nftset auth cryptohash DNSSEC no-ID loop-detect inotify dumpfile

  pbr fw4 nft file: /usr/share/nftables.d/ruleset-post/30-pbr.nft
  add chain inet fw4 pbr_mark_0x010000
  add rule inet fw4 pbr_mark_0x010000 counter mark set mark and 0xff00ffff xor 0x010000
  add rule inet fw4 pbr_mark_0x010000 return
+ add rule inet fw4 pbr_prerouting ip daddr { 10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12 } counter return comment "Ignore to Local"
+ add rule inet fw4 pbr_prerouting ip saddr { 10.0.60.0/24 } counter goto pbr_mark_0x010000 comment "Private Services"
+ add rule inet fw4 pbr_dstnat ip saddr { 10.0.60.0/24 } counter meta nfproto ipv4 tcp dport 53 dnat ip to 10.2.0.1:53 comment "Private Services DNS WireGuard"
+ add rule inet fw4 pbr_dstnat ip saddr { 10.0.60.0/24 } counter meta nfproto ipv4 udp dport 53 dnat ip to 10.2.0.1:53 comment "Private Services DNS WireGuard"

  pbr chains - policies
          chain pbr_forward { # handle 45
          }
          chain pbr_input { # handle 46
          }
          chain pbr_output { # handle 47
          }
          chain pbr_postrouting { # handle 49
          }
          chain pbr_prerouting { # handle 48
+                ip daddr { 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 } counter packets 20 bytes 1188 return comment "Ignore to Local" # handle 3377
+                ip saddr 10.0.60.0/24 counter packets 0 bytes 0 goto pbr_mark_0x010000 comment "Private Services" # handle 3378
          }
          chain pbr_dstnat { # handle 44
+                ip saddr 10.0.60.0/24 counter packets 0 bytes 0 meta nfproto ipv4 tcp dport 53 dnat ip to 10.2.0.1:53 comment "Private Services DNS WireGuard" # handle 3379
+                ip saddr 10.0.60.0/24 counter packets 0 bytes 0 meta nfproto ipv4 udp dport 53 dnat ip to 10.2.0.1:53 comment "Private Services DNS WireGuard" # handle 3380
          }

  pbr chains - marking
-          chain pbr_mark_0x010000 { # handle 3291
+          chain pbr_mark_0x010000 { # handle 3373
-                  counter packets 0 bytes 0 meta mark set meta mark & 0xff01ffff | 0x00010000 # handle 3292
+                  counter packets 0 bytes 0 meta mark set meta mark & 0xff01ffff | 0x00010000 # handle 3374
-                  return # handle 3293
+                  return # handle 3375
          }

  pbr nft sets

  pbr tables & routing
  IPv4 table 256 pbr_wg0 route:
  default via 10.2.0.2 dev wg0
  IPv4 table 256 pbr_wg0 rule(s):
  30000:  from all fwmark 0x10000/0xff0000 lookup pbr_wg0

my test Windows VM still does not get internet access:

ipconfig
Windows IP Configuration


Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : home.arpa
   Link-local IPv6 Address . . . . . : fe80::5bbb:e270:9f1f:cdb2%5
   IPv4 Address. . . . . . . . . . . : 10.0.60.158
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.0.60.2
ping 1.1.1.1
Pinging 1.1.1.1 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 1.1.1.1:
    Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),

Changing procd_wan_interface to wg0 does not change anything.

I'm not sure if this is exactly the issue, but the status command says that PBR is setting the default gateway for the table to 10.2.0.2:

IPv4 table 256 pbr_wg0 route:
default via 10.2.0.2 dev wg0

Not only does that conflict with the output of the service start command, which I believe tries to set the default gateway for the same table to my WAN one:

ERROR:
ip -4 route add default via 10.50.10.1 dev eth0 proto static table 256
ERROR: Failed to set up 'wg0/10.2.0.2'!
ERROR: Failed to set up any gateway!

But 10.2.0.2 is the local WireGuard interface's IP address; should it not be 10.2.0.1 (which is provided as the DNS server on Proton VPN)?

Then again, I don't believe the gateway address of a WireGuard interface is ever known (is it even a concept in WireGuard?), all I've ever seen is setting the WireGuard device as the gateway; if I were to set option route_allowed_ips '1' on my WireGuard interface, ip route would show something like: default dev wg0 table default...

I recently updated pbr from 1.18 r6 o r29, and it just doesn't start up on boot anymore.

It used to work. Nothing else is changed.

Thanks for reporting, the boot up is currently under investigation.

A new build is already up, 1.1.8-r30, can you try that build and add to /etc/config/pbr:

procd_boot_trigger_delay '9000'

Note that if you have many services which start e.g. adblock, it can take a couple of minutes before PBR starts,

If you cannot update, as a workaround try the following, add to the system >startup > local startup (/etc/rc.local):

sleep 30 && service pbr restart

You might need more or less than 30 depending on the services you have running

Hello!

I am getting a lot of daemon.err dnsmasq[1]: nftset inet fw4 pbr_wgtun_4_dst_ip_cfg086ff5 Error: No such file or directory errors in my logs after pbr restarts. How I might investigate those?

Using OpenWrt 24.10.1 and pbr 1.1.8-r16.

You might be having a policy with an invalid domain.

Latest build 1.1.8-r30 has better error detection, consider upgrading to that build

2 Likes