Policy Based routing corrupting rt_tables

Hello,

For the last couple days I've been looking into issues with PBR. I have a remote router that's used for VPN traffic routing, it acts as both server and client, over 2 VPN bridges (OpenVPN). One bridge is partitioned into 2 VLANs as well. This complicated setup is because I don't have a public IP where I live, but my parents do so I connect through their network when on the go.

I've got one rule in PBR which says, traffic coming from the OpenVPN server connection needs to be routed through the client VPN bridge.

I've noticed that /etc/iproute2/rt_tables on the router gets corrupted with tables without names (I can see table ids without names in it.) Over time there see to be more and more of those entries, not sure what's causing the issue. Sometimes it will happen when doing service pbr restart, but not if I do stop/start separately. If I just let it run, corrupted entries will pile up even without me restarting the service (perhaps it restarts on its own and keeps generating entries, but not sure how to figure that out ...). Needless to say PBR service will no longer restart gracefully when corrupted entries are there, also can't say whether the routing is still effective / working at allwhen those entries are present.

Happy to provide more logs if someone can tell me what to look at.

Thanks!

The TIDs that pbr inserts would have the pbr_ prefix in it. It's essentially hardcoded: https://github.com/openwrt/packages/blob/4ecd9d67e90651a8e93760bf0b5771f7057c74a8/net/pbr/files/etc/init.d/pbr.init#L49

https://github.com/openwrt/packages/blob/4ecd9d67e90651a8e93760bf0b5771f7057c74a8/net/pbr/files/etc/init.d/pbr.init#L1666

I don't see how use of pbr would create tables without names. I highly suspect there are other factors at play here.

Which device is that? Can you test your flash for corruption?

Device is TP-LINK Archer C7. Checked again this morning and I've got lots of empty entries.
I also have a flow to reproduce it:

  • remove corrupted entries from rt_tables
  • restart PBR, at this point no corruption
  • restart firewall which will also re-initialize PBR routes which will complain about corrupted entries (and rt_tables would show them)

To mention I've got a couple ignored interfaces in PBR config + some of the devices (for ignored and monitored itnerfaces) have names containing _ and ..

When I'll have a bit of downtime for my network setup I'll disable PBR and let it run without to see if it fixes the issue. I'm not saying it's PBR, but haven't had that issue when not running it in the past (it's a new setup that I'm trying out now).

I'll check for flash corruption, don't know how yet but there must be tutorials, I'll be able to do it only if it doesn't require the router to be taken offline.

Thanks!

Noticed something else while watching the logs on the router, PBR re-initializes itself every 10 minutes and I can pinpoint the table creation to that ...

EDIT: Not every 10 mins, interval varies, but I can't see anything around it that would correlate.

Would be nice to see the uncorrupted /etc/iproute2/rt_tables from your router after pbr has successfully started.

Here you go

#
# reserved values
#
128     prelocal
255     local
254     main
253     default
0       unspec
#
# local
#
#1      inr.ruhep
256 pbr_wan
257 pbr_vpn
258 pbr_GPS
259 pbr_UNSEC_LON
260 pbr_UNSEC_LON6

I've added code to limit overwrites of this file unless the new interface is added in 1.0.1-3, it should be available from official repo as soon as buildbots get to packages.

Most kind of you doing this, many thanks!

Loaded the new version and provisionally done a service firewall restart.
Got this error on pbr init:

Setting up routing for 'UNSEC_LON/tap0_unsec/192.168.10.1' Error: argument "259
260" is wrong: invalid table ID

There are no corrupted entries in rt_tables though.

Try 1.0.1-5 from my repo. The pbr wasn't handling UNSEC_LON and UNSEC_LON6 well prior to that version.

I can't see the new version in the feed, do I need to load it manually somehow?
Thanks!

What feed are you referring to?

1 Like

I can't upgrade to version 1.0.1-5 as it hasn't been published yet.