For the last couple days I've been looking into issues with PBR. I have a remote router that's used for VPN traffic routing, it acts as both server and client, over 2 VPN bridges (OpenVPN). One bridge is partitioned into 2 VLANs as well. This complicated setup is because I don't have a public IP where I live, but my parents do so I connect through their network when on the go.
I've got one rule in PBR which says, traffic coming from the OpenVPN server connection needs to be routed through the client VPN bridge.
I've noticed that /etc/iproute2/rt_tables on the router gets corrupted with tables without names (I can see table ids without names in it.) Over time there see to be more and more of those entries, not sure what's causing the issue. Sometimes it will happen when doing service pbr restart, but not if I do stop/start separately. If I just let it run, corrupted entries will pile up even without me restarting the service (perhaps it restarts on its own and keeps generating entries, but not sure how to figure that out ...). Needless to say PBR service will no longer restart gracefully when corrupted entries are there, also can't say whether the routing is still effective / working at allwhen those entries are present.
Happy to provide more logs if someone can tell me what to look at.
Device is TP-LINK Archer C7. Checked again this morning and I've got lots of empty entries.
I also have a flow to reproduce it:
remove corrupted entries from rt_tables
restart PBR, at this point no corruption
restart firewall which will also re-initialize PBR routes which will complain about corrupted entries (and rt_tables would show them)
To mention I've got a couple ignored interfaces in PBR config + some of the devices (for ignored and monitored itnerfaces) have names containing _ and ..
When I'll have a bit of downtime for my network setup I'll disable PBR and let it run without to see if it fixes the issue. I'm not saying it's PBR, but haven't had that issue when not running it in the past (it's a new setup that I'm trying out now).
I'll check for flash corruption, don't know how yet but there must be tutorials, I'll be able to do it only if it doesn't require the router to be taken offline.
Noticed something else while watching the logs on the router, PBR re-initializes itself every 10 minutes and I can pinpoint the table creation to that ...
EDIT: Not every 10 mins, interval varies, but I can't see anything around it that would correlate.
I've added code to limit overwrites of this file unless the new interface is added in 1.0.1-3, it should be available from official repo as soon as buildbots get to packages.