Mwan3 port to nftables

mwan3 version 3.5.3 and luci-app-mwan3 version 3.5.3 up for testing

This is a luci release: adds substantial new functionality relating to ipsets and policies.

I'd also like to draw your attention to a significant addition to the documentation which will probably be quite helpful. This clearly documents how conntrack flushes occur at different lifecycle stages and the mechanism for custom extension. An understanding of this will help to ensure expectations match actual behaviour. Previously this has been a very poorly understood and documented area of mwan3.

The Policy tab is rewritten with a tier-based policy builder that lets users specify failover tiers and per-interface load-balancing weights without manually creating member definitions. The builder handles member creation and garbage-collects orphaned members automatically, and updates the policy list to display IPv4 and IPv6 policies in symbolic form.

The new Policy tab metaphor and layout should greatly improve how both new and existing users get to grips with mwan3, whose flexible but less than intuitive interface-member-policy definitions sometimes pose a substantial barrier to immediate use.

The traffic simulator gains hostname resolution so you can test dnsmasq-based ipset rules: Source and Destination fields now accept hostnames as well as IP addresses, resolved via the new resolve_host rpcd method against the local DNS server.

The Status app's IP Set page adds buttons to allow Flush, Reload, and Resolve operations on each ipset, alongside the existing Expand button. These do exactly what the name implies: Flush flushes the ipset, Reload reloads all static entries from the mwan config and loadfile. Resolve HUPs dnsmasq to flush its cache, then iterates through all list domain entries for that ipset and queries dnsmasq to populate the set. This way you can be sure your ipset population is working as intended.

Some bug fixes: mwan3_create_iface_route is changed to use ip route replace instead of ip route add, spurious errors when mwan3rtmon or a racing event had already inserted the route (visibly manifests as RTNETLINK errors during upgrade mostly).

A missing default argument in the config_get_bool call for the enabled option is added, suppressing cosmetic sh: out of range noise on startup where there is no explicit option enabled '1' for an interface definition in /etc/config/mwan3.

Fixes matching in the Traffic Simulator on ipsets that have counters or timeouts defined.

5 Likes

if it's possible would love a checkmark to enable/disable rules, because currently i had to drag some rules to the bottom to dismiss it without deleting the rule.

1 Like

It's possible, but it's far more than just a "checkmark". It would require a new config option enable '0' in a rule definition in /etc/config/mwan3 and specific logic in the code to avoid processing those rules during the setup of table inet mwan3.

The checkmark would just be the visible part of that implementation in luci.

EDIT: just a few lines of code and then the checkbox implementation in luci-app-mwan3. It is a bit of an anomaly that interfaces, members and now ipsets have this option, but rules don't.

2 Likes

Yes, errors are gone. :+1:

1 Like

All works for me! Great work! The IPset tab in the manager is so cool.

1 Like

You're referring to the tab in the Network menu or in the Status menu? Assuming the latter, since that's what got the change in the latest update

Yes. I find it very useful.

How do you find the Policy tab update with the new Policy builder?

Only some wording comments from my side. I think if the mode is pure failover, then percentages are probably not needed. Maybe “Primary” could also be renamed to “Balance.”

Not sure how it behaves in practice though, since I don’t use two physical WANs in load-balancing mode..

Had some time for early morning tests on latest version:

Scenario
main (internet): wan and wan_lte
on the router there are also supporting: vpns: 3 wireguard tunnels [separate] and tailscale
[all of them should go out to network/endpoints via main network failover connectivity, either wan or wan_lte]

mwan3 alone

With no mwan3.user interaction, 'flush' is configured on wan for connected (mwan3) and disconnected (mwan3) scenario.

I've ran separate ping instances on PC (LAN) to 1.1.1.1, endpoint of wireguard internal IP and some IP I have routed via tailscale (to catch issues on main internet, one of tunnels and tailscale network routing).

Ping on pc to endpoints of the wireguard and main internet - i can clearly see connection moved almost without packet lost (higher latency).

DNS from router (nextdns) did not manage to get failover, no DNS in LAN - errors in logread [example:
Wed May 6 05:08:19 2026 daemon.notice nextdns[5348]: Query 127.0.0.1:37577 UDP/5553 192.168.x.x A connectivitycheck.gstatic.com. 9112eb (qry=80/res=47) cache fallback HTTP/2.0: doh resolve: context deadline exceeded]

Restart of DNS did not help, no new connections managed to get out to the internet - only already present (i.e. ping to 1.1.1.1) still were active (or spotify client ;-))

mwan3 + mwan3.user

After mwan3.user enabled (everything else left as it was, including flush on connected/disconnected wan). /etc/mwan3.user file content:


# When WAN down - set WAN_LTE as primary
if [ "${ACTION}" = "disconnected" ] && [ "${INTERFACE}" = "wan" ] ; then
    logger -t mwan3.user "Failover, running: $INTERFACE[$DEVICE]: $ACTION"
    ip route del default via 195.x.x.x 2>/dev/null
    ip route add default via 195.x.x.x metric 100 2>/dev/null
fi

# When WAN up - remove WAN_LTE priority route
if [ "${ACTION}" = "connected" ] && [ "${INTERFACE}" = "wan" ] ; then
    logger -t mwan3.user "Failover, running: $INTERFACE[$DEVICE]: $ACTION"
    ip route del default via 195.x.x.x 2>/dev/null
    ip route add default via 195.x.x.x metric 10 2>/dev/null
fi

(the same I've mentioned earlier) and issuing iptables -I FORWARD -o eth1 -j DROP; iptables -I OUTPUT -o eth1 -j DROP again:

  • no issues with ping to 1.1.1.1
  • no issues with ping to wireguard endpoint (inside tunnel)
  • no issues with ping to routed ip via tailscale exit node
  • no DNS issues in log on router or with clients (everything was working without interruption)
  • I didn't need to restart any connection or app like dns

Seeing that I've ran iptables -D FORWARD -o eth1 -j DROP; iptables -D OUTPUT -o eth1 -j DROP and all connections soon went back to main internet, pings dropped and full proper failover happened as desired.

Just for clarity, metric from 10 to 100 is just to go below 20 which is configured for wan_lte.
Could be just change to 25 and it would work the same:

# ip route # those 4 are configured in mwan3 so far
default via 195.x.x.x dev eth1 metric 10 
default via 10.0.10.1 dev VLANs.1 metric 20 
default dev vpn_unl proto static scope link metric 201 
default dev wg_vpn proto static scope link metric 202 
[...]

Just to make sure it was not a fluke or sth, I've added exit 0 again and while fialover happened this time, DNS was not working again (so no new resolves on LAN available).

I'm fully happy with how it works right now, just wondering: why do I need to mangle metrics with mwan3.user and why it seems not to work without it? And as it seems it works quite well, why doesn't mwan3 do it by design?

I'm just curious and would like to understand - it's either something messed up on my side or not sure what else. :wink:
What's further for me to dig into is why wg tunnel to VPS which is not configured in mwan3 does failover properly while other wg tunnel (which I forward traffic through for tests for one IP) does not failover itself.

The first tier is always the primary tier, the lowest metric. If there's only one interface, it's not balanced by definition. Removing the percentage would require specific treatment and make the dialog code unnecessarily complicated. 100% still makes it pretty clear what's going on.

In the example above, you would not want nordlynx in the 2nd tier. If it's failed already, there's no point having it in the failover tier.

This dialog is just implementing what the metrics and weights do anyway. Tier 1 is the lowest metric, the next lowest metric would be the 2nd tier and so on.

If the mode is pure failover, you have one interface in the first tier and another in the second tier.

1 Like

It will take some time to reply to this and I'll need a lot more detail on your configuration. Please run the mwan3-diag script and PM me the output. It's sanitised to remove your public IPs. See the documentation for details....

1 Like

1. What I think is your problem

Based on your description and the article you linked earlier in the thread, it seems that you want full transparent failover for all traffic, both LAN clients and router-originated traffic (nextdns DoH, wireguard, tailscale) when the primary wan goes down and this isn't happening as you expect.

You're reporting that without the mwan3.user routing metric manipulations, new router-originated connections don't failover, while LAN-originated traffic fails over correctly.

Manipulating the main routing table metrics in mwan3.user fixes everything for you and you want to know why this should be necessary.

2. My observations

You should not need to manipulate routing table metrics. Failover should happen correctly as mwan3 handles both forwarded and router originated traffic.

You appear to be treating a symptom and not the root cause.

You're getting behaviour that approximates your expectations, but in reality isn't what you've explicitly configured in mwan3.

You have ip rules at higher priority than mwan3 that are very likely overriding mwan3's processing and could be the cause.

Fix the root cause first.

2.1 IPv4 Policy Rules apparently interfering with mwan3

Your diagnostic output shows the following:

  0:           from all lookup 255
  1001:        from all iif eth1 lookup 1
  1002:        from all iif VLANs.1 lookup 2
  1003:        from all iif vpn_unl lookup 3
  1005:        from all iif wg_vpn lookup 5
  1006:        from all iif wg_awh lookup 6
  1310:        from all fwmark 0x80000/0xff0000 lookup 254
  1330:        from all fwmark 0x80000/0xff0000 lookup 253
  1350:        from all fwmark 0x80000/0xff0000 unreachable
  1370:        from all lookup 52
  2001:        from all fwmark 0x100/0x3f00 lookup 1
  2002:        from all fwmark 0x200/0x3f00 lookup 2
  2003:        from all fwmark 0x300/0x3f00 lookup 3
  2005:        from all fwmark 0x500/0x3f00 lookup 5
  2006:        from all fwmark 0x600/0x3f00 lookup 6
  2061:        from all fwmark 0x3d00/0x3f00 blackhole
  2062:        from all fwmark 0x3e00/0x3f00 unreachable
  3001:        from all fwmark 0x100/0x3f00 unreachable
  3002:        from all fwmark 0x200/0x3f00 unreachable
  3003:        from all fwmark 0x300/0x3f00 unreachable
  3005:        from all fwmark 0x500/0x3f00 unreachable
  3006:        from all fwmark 0x600/0x3f00 unreachable
  32766:       from all lookup 254
  32767:       from all lookup 253
  90050:       from all iif lo lookup 200

Note the rules at priority 1310 to 1370. These rules will override mwan3's processing.

These rules do not come from mwan3

mwan3 only installs:

  • iif-based rules at priorities 1001-1006
  • fwmark 0xN00/0x3f00 rules at priorities 2001-2006 and 2061-2062
  • fwmark 0xN00/0x3f00 unreachable fallbacks at priorities 3001-3006

The 0xff0000 mask and 0x80000 value are not from mwan3's 0x3f00 bit field, so these rules were installed by another application.

A vanilla openwrt installation has two unconditional ip rules: 32766: from all lookup main (table 254) and 32767: from all lookup default (table 253). These are normal and expected. Any packet not claimed by a higher-priority rule falls through to lookup main at 32766, which is the normal forwarding path.

The rules at 1310-1350 are an additional fwmark-conditional set of rules that have been inserted at a much higher priority and that intercept packets carrying 0x80000 in the 0xff0000 bits and route them directly to the main table at priority 1310.

Without those rules, a packet carrying 0x80000 would continue down the rule list, pass through mwan3's fwmark rules at 2001-2006, and be routed correctly by mwan3. The rules at 1310-1350 are the problem - not because they reference tables 254 and 253 (which is normal), but because they do so at a priority that pre-empts mwan3 for any packet carrying the 0x80000 mark.

I don't know the source but speculate that it might come from pbr as the fwmark mask 0xff0000 matches the fw_mask default configuration value of pbr. pbr's default installation behavior, however, does not match this rule pattern:

  • pbr places rules at priority ~30000 by default (option uplink_ip_rules_priority '30000'), not 1310-1350, however this can be overridden in pbr's configuration quite easily, so the difference in priority order here doesn't rule out pbr
  • pbr installs one rule per configured uplink pointing to a named table (pbr_IFACE). It does not install explicit rules pointing to tables 254 or 253.
  • pbr does not install three rules per fwmark. The (lookup 254, lookup 253, unreachable) pattern for the same fwmark is likely not pbr's.
  • pbr's global uplink rule uses lookup main suppress_prefixlength N, not a bare lookup 254.

So there is some indication that they could come from pbr, but there's also some contradictory data that says they're not.

The critical unknown is what is applying that mark and to what that mark is being applied.

If pbr was previously configured with non-default priorities and then removed or reconfigured without a clean stop, ip rules installed at those priorities could persist. However, even under that scenario, the three-rule pattern does not match any pbr rule-installation path in the code, so the question as to what installed them remains, in my mind, unresolved.

You need to investigate where these rules come from.

UPDATE: on second review, the pattern in the priority numbering, starting at 1310 and incrementing by 20 for each rule until the final rule at priority 1370, which we know is a tailscale rule, now leads me to believe that tailscale itself could be the source of the rules. The sequence of the numbering is too predictable to be a coincidence and these rules are probably installed by tailscale in a single block

2.2 Reason for failover path for router-originated traffic not working

  1. Something marks router-originated packets, including nextdns DoH, with fwmark 0x80000 in the 0xff0000 bit range.
  2. mwan3's mwan3_output hook runs and applies the failover policy mark 0x200 (wan_lte's mark, in the 0x3f00 bits). Since 0xff0000 and 0x3f00 are non-overlapping bit fields, both marks coexist: the final fwmark on the packet is 0x80200.
  3. The kernel walks the ip rule list. Rule 1310 (fwmark 0x80000/0xff0000 lookup 254) fires before mwan3's rule 2002 (fwmark 0x200/0x3f00 lookup 2), because 1310 < 2002.
  4. Rule 1310 sends the packet to table 254 (main). The main table still has default via wan-GW dev eth1 metric 10. Traffic hits eth1, which has no connectivity and thus your dns / router-originated traffic fails.
  5. mwan3's mark of 0x200 is completely ignored because of these higher priority rules.

2.3 Reason for failover path for LAN forwarded traffic working correctly

The ip rules at 1310-1350 are fwmark-conditional. They fire for any packet carrying 0x80000 in the 0xff0000 bits, regardless of whether it is forwarded or locally generated. There is nothing in those rules that inherently restricts them to router-originated traffic.

LAN traffic probably works because LAN forwarded packets are likely not carrying the 0x80000 mark when the routing decision is made.

So a plausible explanation is that the marking is applied in your nftables OUTPUT chain, which processes only router-generated packets.

Forwarded LAN traffic passes through PREROUTING and FORWARD but never through OUTPUT, so it arrives at the routing decision without 0x80000 set. Rule 1310 does not fire, mwan3's rule 2002 fires, and LAN traffic is routed to table 2 (wan_lte) correctly.

This is my speculation only and nft list ruleset would help to confirm it.

2.4 Probable reason why your mwan3.user metric workaround fixes it

By raising wan's main-table route metric from 10 to 100, wan_lte (metric 20) becomes the lowest-metric default in table 254.

When rule 1310 sends traffic to table 254, it now exits via wan_lte instead of eth1.

Rule 1310 still fires and overrides mwan3's mark, sending it to the main table, but it no longer causes harm because the main table's preference has changed to wan_lte.

Your mwan3.user workaround is fixing the symptom, not the cause.

2.5 The 1370: from all lookup 52 rule

Table 52 is shown in your diagnostics to contain host routes and a /22 subnet, all via tailscale0.

The rule at priority 1370 is unconditional: every packet is looked up in table 52. If the destination is a tailscale virtual IP present in table 52, the packet is routed via tailscale0 regardless of mwan3's marks. For non-tailscale destinations, table 52 has no matching route and the kernel falls through to the next rule.

My attribution of the from all lookup 52 rule to tailscale is based on the fact that all routes in table 52 are via tailscale0.

This rule is not the cause of your failover problem for non-tailscale destinations.

Whether it fully explains why tailscale virtual IP traffic "works fine" during failover depends on whether the 0x80000 marking (from the unknown source of rules 1310-1350) also affects tailscale-destined traffic or not.

If tailscale packets carry 0x80000, rule 1310 fires before rule 1370, routing them to table 254 rather than table 52. If they do not carry 0x80000, rule 1370 intercepts them and routes via tailscale0 correctly. This cannot be determined without knowing the source and scope of any 0x80000 marking that might be occurring.

2.6 10.0.0.0/8 in mwan3_connected_v4

Your diagnostics shows mwan3_connected_v4 contains a /8 entry. This is 10.0.0.0/8, added because wg_vpn has interface address 10.1.1.59/8 (a /8 prefix length), which generates a 10.0.0.0/8 connected route in the main routing table.

mwan3_set_connected_ipv4 reads all CIDR routes from the main table and adds them to mwan3_connected_v4.

Any traffic destined for 10.0.0.0/8, the wg_vps peers at 10.0.50.x, wg_awh at 10.0.31.x, vpn_unl at 10.103.209.125, or any internal DoH endpoint using a 10.x.x.x address will match the mwan3_connected chain and receive the MMX_DEFAULT mark, bypassing mwan3_rules entirely.

For destinations with specific host routes in the main table (e.g., 10.0.50.1/32 dev wg_vps), traffic still arrives correctly even with MMX_DEFAULT, because the main table's specific route is more specific than the default gateway. This is why LAN pings to wireguard tunnel IPs work during failover even though they receive MMX_DEFAULT.

If nextdns is configured to use a DoH endpoint at a 10.x.x.x address via a VPN tunnel, this is a contributing cause of its failure. If nextdns uses standard public IPs for DoH, the 10.0.0.0/8 issue does not affect it and the higher-priority fwmark rules at 1310-1350 are the sole primary cause.

2.7 What I recommend you do to fix it

2.7.1 Identify and neutralise the 1310-1350 rules

Identify the application or script that installed the 0x80000/0xff0000 ip rules and is marking packets with 0x80000.

Once identified, either stop and cleanly uninstall it, then verify ip rules are removed, or reconfigure it to use priorities above mwan3's range (above 3006).

If the rules are orphans from a package that is no longer active, flush them manually with ip rule del priority 1310; ip rule del priority 1330; ip rule del priority 1350 and confirm nothing reinstalls them on reboot.

Frankly, I'd just go ahead and manually remove them, reboot and then see if they reappear. If they don't, then see if things work. If they do, then find what's installing those rules and stop it from installing them.

2.7.2 Fix the wg_vpn prefix

Change wg_vpn's wireguard interface configuration to use a more appropriately specific prefix (e.g., /24 or /32) instead of /8. This removes the very big 10.0.0.0/8 block from mwan3_connected_v4 and ensures traffic to 10.x.x.x addresses is classified by mwan3_rules rather than having the entire /8 block simply bypassed by mwan3 because it's classified as a directly connected network.

2.8 Additional information you need to be sure of the diagnosis

  • What installed the ip rules at 1310-1350?
  • What is marking router-originated packets with 0x80000/0xff0000? Start by searching init scripts, config files and hotplug scripts for 80000 and ff0000
  • What nftables chains are marking packets with the 0xff0000 fwmark range? Run: nft list ruleset | grep -B4 -A4 "ff0000". This identifies what is setting 0x80000 on packets, which is required for the ip rules at 1310-1350 to have any effect.
  • What is in routing table 200? Run: ip -4 route list table 200 (there is a rule 90050: from all iif lo lookup 200 with no corresponding table section in the diagnostics I got from you since it doesn't dump that table as part of the diagnostics run).
  • What is nextdns configured to use as its DoH upstream? Is is a public IP or one routed to a private IP over a VPN tunnel / proxy?
  • Is the /8 prefix in the wireguard config intentional, or is it a misconfiguration (I suspect an unintentional misconfiguration because you're unaware that the /8 essentially bypasses mwan3 for anything at all with a 10.x.x.x destination address)?

Thanks for clarification. I’ll cross that bridge when I come to it. For now, everything is working the way I want.

Ok, so I downloaded the tailscale source and checked for you.

Yes, those rules are all added by tailscale. Each rule has a relative offset that is added to a runtime field called ipPolicyPrefBase. The offsets and the resulting absolute priorities (when ipPolicyPrefBase = 1300) are:

Relative offset Absolute priority Rule
+10 1310 fwmark 0x80000/0xff0000 lookup main (254)
+30 1330 fwmark 0x80000/0xff0000 lookup default (253)
+50 1350 fwmark 0x80000/0xff0000 unreachable
+70 1370 (unconditional) lookup tailscale (52)

The default value of ipPolicyPrefBase is 5200, which would normally place these rules at 5210, 5230, 5250, 5270, so after mwan3.

However, the reason they appear at 1310 - 1370 is a specific mwan3 detection path in tailscale. When this detection fires, ipPolicyPrefBase is shifted from 5200 to 1300 and tailscale logs "mwan3 on openWRT detected, switching policy base priority to 1300".

So tailscale is actively looking for mwan3 and bypassing it.

tailscale only shifts to 1300 if mwan3 is installed and has at least one interface configured. The comment in the source at lines 157-167 explains the design intent:

"we shift the priority of our policies to 13xx. This effectively puts us between mwan3's permit-by-src-ip rules and mwan3 lookup of its own routing table which would drop the packet."

Tailscale marks its own outbound control packets programatically, not via nftables.

The first rule matches any packet carrying the tailscale bypass mark (0x80000) in the 0xff0000 bit field and sends it to the main routing table (table 254). The main table contains the system's default gateway routes. This completely bypasses mwan3.

The purpose is to give tailscaled's own outbound control connections a "normal" route to the internet, to DERP relay servers, the coordination server (login.tailscale.com), and any other tailscale infrastructure.

If the primary WAN is the lowest-metric default in the main table and that WAN is down, rule 1310 will send tailscale's own traffic to a dead route.

This is an inherent tailscale design limitation when mwan3 is running: its control traffic bypasses mwan3's failover entirely and relies on the main table's route selection since mwan3 isn't exposing any specific "active wan" information that could be read and acted upon by a separate application [note: this could be fixed, but it would require mwan3 to expose this information and tailscale to act on it, so bidirectional cooperation].

The second rule is just a backup for the first rule, falling back to the default routing table.

The third rule says if neither of the first two rules matched a route (and this packet has tailscale's bypass mark on it), then generate a RTNETLINK answers: Network is unreachable error for the packet.

Without these three rules, any bypass-marked tailscale packets destined for IP address that are in the 52 table would be routed through the fourth rule 1370, into tailscale0, which would cause a routing loop.

The fourth rule 1370 matches every packet that wasn't caught by the preceding 3 rules and if it matches any of the tailscale peers, then it's handled by table 52, routed before mwan3 can touch it. Non-tailnet destinations fall through and pass to mwan3's rules.

So, tailscale will always go down if your main wan link goes into a disconnected state as mwan3 sees it, because well, it's bypassed mwan3's failover deliberately.

That's probably bad news for you, since I think that's one of the problems you've been trying to solve and your metric manipulation in /etc/mwan3.user actually does that by altering the main routing table metrics to make the backup wan the preferred route so that tailscale's bypass rule 1310 that routes its control traffic will continue to function if wan goes into a disconnected state. But this isn't a mwan3 bug. Rather, it's a specific interaction that tailscale is avoiding to prevent a routing loop in its own software, which to solve properly, would require bidirectional cooperation between tailscale and mwan3.

I'm not going to make an already long and technically complex post even more complex by explaining that the main routing table is managed by netifd and netifd and mwan3's idea of "down" is not necessarily the same: netifd will change the routes on carrier loss or protocol-level signals, whereas mwan3 will change the routes when they go dead, that is to say, an icmp probe fails but there's no carrier loss (which is the far more common real-world case); so mwan3 can switch its route while netifd still considers the technically non-functional interface to be up.

Again, this isn't an mwan3 failure: it's a deliberate tailscale action, a hard necessity to prevent tailscale bypass marked packets from looping into tailscale0. The three-rule sequence (1310 main, 1330 default, 1350 unreachable) exists specifically to prevent that loop. It cannot be removed while keeping tailscale functional.

That said, I no longer think it's tailscale or these rules causing your problem. This is because the mark that matches these rules is applied programatically as socket creation time, not via nftables. Every socket that tailscale opens for its own outbound control connections gets this mark and nothing else.

So only tailscale control traffic is being snarfed by these rules, but it also means that tailscale's control plane only functions if your main wan is still functionally up, and otherwise not.

As to why your dns, wireguard and some other stuff isn't functioning, the hypothesis falls back to that overly broad netmask /8 on your wireguard interface that will force anything with a 10.x.x.x address to bypass mwan3 too (assuming your forwarder is in this block).

If your dns is not going via one of these routes, then I still have not identified what's causing that particular failure. It's not immediately apparent. Perhaps you can give me some more details on your dns setup? What IP are you using for the forwarder?

4 Likes

Same here, After rebooting, running /etc/init.d/mwan3 restart does thinks go back as they should

OpenWRT 25.12.3 Linux 6.12.85
mwan3-3.5.3-1 luci-app-mwan3_26.999.3.5.3

mwan3 version 3.6 and luci-app-mwan3 version 3.6 up for testing

I keep saying no more functionality upgrades and I keep going back on that!

Version 3.6 adds three new features to mwan3 rules.

  • Rules now support an fwmark/fwmask option to match packets by meta mark using a masked comparison, working alongside or instead of address and ipset matching; mwan3 logs a warning if the fwmask overlaps its internal MMX_MASK since such a mask would match packets already carrying an mwan3 classification mark. Warning: you can break mwan3's own routing if you get this rule match wrong, so be sure the rule does what you're intending. Try the simulator...

  • The ip rule priority tiers for per-interface rules are now configurable via three new globals UCI options (iif_rule_base, fwmark_rule_base, unreachable_rule_base), shifting from the fixed 1000/2000/3000 defaults; two ordering constraints are enforced at startup and rule deletion is rewritten to use content-based matching so it remains correct across base or mmx_mask changes. Warning: don't change this unless you need to. You'll know if you need to.

  • Rules gain option enabled 0/1, consistent with interfaces, ipsets, and members.

The LuCI interface is updated throughout to reflect all three additions.

  • The rule modal gains an Fwmark field with MMX_MASK overlap validation

  • The Globals tab gains the three base priority fields with live cross-field ordering validation and an automatic mwan3 restart when any base changes.

  • The Routing Health tab gains the unreachable rule row per interface and marks an interface as degraded if the unreachable rule is absent; base priorities are now displayed dynamically from the rpcd endpoint rather than assumed from fixed offsets.

  • The Traffic Simulator, rule shadowing analysis, Status Overview, and diagnostics helper are all updated to handle fwmark matching, disabled rules, and configurable bases. Mutual enable protection is added to prevent the silent misconfiguration of an enabled rule referencing a disabled IP set. mwan3track now validates that libwrap_mwan3_sockopt.so is present at startup, exiting with a clear error rather than silently producing incorrect tracking results if the library is missing.

2 Likes

@4rtz1z this one is for you. It contains the rule-disabling functionality. Adding the disabled flag was easy, but this had a knock on effect that meant other parts of man3 and luci-app-mwan3 now had to be aware of and skip the disabled rule as if it were not defined (mwan3 rule processing, traffic simulator, config checker) as well as new validity checking to prevent inconsistencies.

This was a genuine gap, the addition of which does make it easier to manage rules.

1 Like

@drdut the rest of this one is for you, even if that's not immediately apparent just looking at the functionality description.

It's to make your tailscale play nicely with mwan3 and bypass tailscale's bypass.

What I've done is to add a generally useful rule matching type (fwmark/fwmask) and configurable ip rule priority base to mwan3 and then fully implement a legacy mwan3 feature by using a previously unimplemented hook to make the rt_table_lookup option in the Globals section of the config properly functional by using mwan3rtmon to dynamically update routing changes from this table into the associated custom sets.

Together, these three things will allow you to bypass tailscale's bypass (!), make it play nicely with mwan3 and failover properly, avoiding tailscale's routing loop, correctly routing tailnet traffic and eliminate your routing table metric hocus-pocusery that you're doing in mwan3.user.

Left in default configuration, mwan3 will behave exactly the same as it did before and tailscale will work the same with mwan3 as it did before.

However, applying these new features in conjunction with one another, you should be able to achieve what you've been looking to achieve without success for some time now. All using features of mwan3 that are generally useful to other users and not specific to tailscale (self-satisfied smile here :smile: )

You can find the detailed configuration instructions here as well as a patched mwan3rtmon that you will need to install.

I didn't add the mwan3rtmon patch to the 3.6 release as I'd like you to test that this setup works correctly first. I've confirmed the route propagation works fine, but would be helpful to see it functions properly in practice - just compare the routes in ip route list table 52 with the output of nft list set inet mwan3 mwan3_custom_v4 periodically. The should match and the routes should be more or less instantly propagated as they appear or disappear from table 52.

2 Likes

thanks, will test it right away