A.) mwan3 repo and releases
B.) luci-app-mwan3 repo and releases
- Yes
- No
- Partially
I believe this port is now in a stable enough state to make some release apks available for testing by those who are interested. It runs stably for me for some time now and it's in daily use, although I've not been able to fully test the IPv6 functionality since I only have one IPv6 capable wan interface. I'm interested in any and all feedback on how it works (or doesn't work) for you.
The initial port took care to exactly replicate the logic and flow of the original mwan3, simply moving everything over from iptables to nftables.
This is a complex piece of software because of its interactions with the networking stack and precisely mirroring the logic of mwan3 helps to avoid too many regressions. The aim was to make this a straight replacement that replicates the original, just integrates with fw4 and nftables in a way that the old mwan3 doesn't.
Subsequent to the initial port and testing, I have completed the following enhancements
ucode-mod-rtnl module provides direct, structured netlink access and so we don't have to fork out to ip commands and parse text output and ucode-mod-uloop implements the event loop instead of using a shell pipe+read. The first implementation faithfully mirrored the logic of mwan3rtmonflush_conntrack mechanism. This forces flows on the failed interface to immediately re-establish via the updated policy rather than waiting for a retransmit timeout, so it improves failover response time once mwan3track has detected the failure.option track_gateway 1 that causes mwan3track to automatically discover the next hop peer on a point to point link (such as pppoe) and use it as a tracking ip. The gateway IP is stored in $MWAN3TRACK_STATUS_DIR/GATEWAY rather than committed to uci, which prevents stale tracking ips across reboots or next hop peer changes. The option will be silently ignored if no nexthop peer exists. This is an option practically useful only for IPv4.luci-app-wan3 has also been updated so that it is aware of the changes above.
This port is a close match to the original mwan3, with only mwan3rtmon having undergone substantial rewrite / logic flow changes. Precedence of some of the hotplug handlers needed to be altered. The port is 100% backward compatible with the current mwan3.
The repos with releases in them can be found below. You'll need both mwan3 and luci-app-mwan3.
Great work, thank you, I've been wishing for a mwan3 update to nft for a while.
Unfortunately I won’t be able to test as I'm travelling long term and at 10K km from where my mwan3 routers are, so I need a tried and tested failover, also can't pull cables or risk disabling interfaces. But I'm very glad that you’ve taken this initiative!
VERY good reason not to test indeed! Anyone else who does install and test this, please reply here even if it’s simply to say “it works”.
it's working in my setup, and mwan3 doesn't spit any error log
Sat Apr 4 17:05:10 2026 user.notice mwan3track[23756]: Stopping mwan3track for interface "wan". Status was "online"
Sat Apr 4 17:05:10 2026 user.notice mwan3track[23757]: Stopping mwan3track for interface "wan2". Status was "online"
Sat Apr 4 17:05:10 2026 user.notice mwan3-hotplug[25329]: Execute ifup event on interface wan (wan)
Sat Apr 4 17:05:10 2026 user.notice mwan3-hotplug[25335]: Execute ifup event on interface wan2 (lan2)
Sat Apr 4 17:05:11 2026 user.notice mwan3track[25582]: Interface wan2 (lan2) is online
Sat Apr 4 17:05:11 2026 user.notice mwan3track[25581]: Interface wan (wan) is online
Thanks, good to know. Let me know if all the functionality works for you too: failover, nftset integration with dnsmasq, rule-based outbound route selection, etc.
What target are you using?
Compiled for x86-64 and tested with VRRP on two nodes. So far, so good.
I’ll continue testing later under more demanding conditions, such as failovers and link flaps. I’ll let you know if anything shows up.
it doesn't spit out error but the policies didn't really applied ? and loadbalancing didn't happening in my setup, yet it don't have problem with OG mwan3
Interface status:
interface wan is online and tracking is active (online 00h:00m:09s, uptime 05h:47m:51s)
interface wan2 is online and tracking is active (online 00h:00m:09s, uptime 05h:47m:47s)Current policies:
wan_only:
wan
balanced:
wan (66%)Directly connected ipv4 networks:
10.0.0.0/24
10.5.0.0/16
192.168.1.0/24
192.168.10.0/24
224.0.0.0/3Directly connected ipv6 networks:
fe80::Active user rules:
tcp dport 443 meta mark & 0x00003f00 == 0x00000000 S https
ip daddr 0.0.0.0/0 meta mark & 0x00003f00 == 0x00000000 - balanced
Sun Apr 5 17:35:21 2026 user.notice mwan3-fw-rebuild[2374]: Rebuilding mwan3 rules after fw4 reload
Sun Apr 5 17:36:30 2026 user.notice mwan3track[12325]: Stopping mwan3track for interface "wan". Status was "online"
Sun Apr 5 17:36:30 2026 user.notice mwan3track[12326]: Stopping mwan3track for interface "wan2". Status was "disabled"
Sun Apr 5 17:39:07 2026 user.notice mwan3-hotplug[29032]: Execute ifup event on interface wan (wan)
Sun Apr 5 17:39:07 2026 user.notice mwan3-hotplug[29038]: Execute ifup event on interface wan2 (lan2)
Sun Apr 5 17:39:08 2026 user.notice mwan3track[29268]: Interface wan2 (lan2) is online
Sun Apr 5 17:39:08 2026 user.notice mwan3track[29267]: Interface wan (wan) is online
Sun Apr 5 17:39:08 2026 user.warn mwan3rtmon[29269]: failed to add 10.5.0.0/16 dev wgsg proto kernel scope link src 10.5.0.3 to table 1 - error: RTNETLINK answers: File exists
Sun Apr 5 17:39:08 2026 user.warn mwan3rtmon[29269]: failed to add 10.5.0.0/16 dev wgsg proto kernel scope link src 10.5.0.3 to table 2 - error: RTNETLINK answers: File exists
Sun Apr 5 21:14:31 2026 user.info mwan3track[29268]: Check (ping) failed for target "8.8.8.8" on interface wan2 (lan2). Current score: 10
Sun Apr 5 23:24:57 2026 user.notice mwan3track[29267]: Stopping mwan3track for interface "wan". Status was "online"
Sun Apr 5 23:24:57 2026 user.notice mwan3track[29268]: Stopping mwan3track for interface "wan2". Status was "online"
Sun Apr 5 23:25:11 2026 user.notice mwan3-hotplug[22953]: Execute ifup event on interface wan (wan)
Sun Apr 5 23:25:11 2026 user.notice mwan3-hotplug[22959]: Execute ifup event on interface wan2 (lan2)
Sun Apr 5 23:25:11 2026 user.notice mwan3track[23194]: Interface wan2 (lan2) is online
Sun Apr 5 23:25:11 2026 user.notice mwan3track[23193]: Interface wan (wan) is online
root@ax6000:~# cat /etc/config/mwan3
config globals 'globals'
option mmx_mask '0x3F00'config interface 'wan'
option enabled '1'
option family 'ipv4'
option reliability '1'
option initial_state 'online'
option track_method 'ping'
option count '1'
option size '56'
option max_ttl '60'
option timeout '4'
option interval '10'
option failure_interval '5'
option recovery_interval '5'
option down '5'
option up '5'
list track_ip '1.1.1.1'
list track_ip '1.0.0.1'config member 'wan_m1_w10'
option interface 'wan'
option weight '2'
option metric '1'config policy 'wan_only'
list use_member 'wan_m1_w10'
option last_resort 'unreachable'config policy 'balanced'
option last_resort 'unreachable'
list use_member 'wan2_m1'
list use_member 'wan_m1_w10'config rule 'https'
option sticky '1'
option dest_port '443'
option proto 'tcp'
option use_policy 'balanced'
option family 'ipv4'config rule 'default_rule_v4'
option dest_ip '0.0.0.0/0'
option use_policy 'balanced'
option family 'ipv4'
option proto 'all'
option sticky '0'config interface 'wan2'
option initial_state 'online'
option family 'ipv4'
list track_ip '8.8.8.8'
list track_ip '9.9.9.9'
option track_method 'ping'
option reliability '1'
option count '1'
option size '56'
option max_ttl '60'
option timeout '4'
option interval '10'
option failure_interval '5'
option recovery_interval '5'
option down '5'
option up '5'
option enabled '1'config member 'wan2_m1'
option interface 'wan2'
option weight '1'
option metric '1'
Can you provide the steps to reproduce this? How exactly are you testing load balancing?
Your config looks ok. The policies and rules are also all there, shown in the mwan3 status output.
wan2 missing from the balanced policy display
Your mwan3 status output shows balanced: wan (66%) with wan2 absent despite being online.
This is a display bug in mwan3 status. wan2 has weight 1, so its numgen map entry is 2-2. When nft list chain outputs this it normalises 2-2 to just 2, and the stats parser's requires the start-end dash format, so wan2 is just dropped from the mwan3 status output. The nft rule itself is correct. Load balancing is working.
So there's a bug in the display code which I'll fix, but it doesn’t actually affect the functionality.
Load balancing not happening — likely because of a sticky routing rule
Your https rule has sticky '1'. All https traffic from a given source is pinned to whichever wan it first went out on. If you're testing by browsing then you'll always see the same wan. Test with non-https traffic to observe the load balancing (browsers also don’t always close the connection after loading a web page). Connections need to be seen as completely separate in order to be load balanced.
*mwan3rtmon "File exists" warnings — benign, but it looks like you have a corrupted install?
The RTNETLINK answers: File exists warnings mean the wireguard route 10.5.0.0/16
was already in the mwan3 routing tables when mwan3rtmon tried to add it — most likely
because both interfaces came up simultaneously and triggered two population passes.
The route is there and correct; the warnings are harmless noise.
More importantly, that error message format is from the original shell-script
mwan3rtmon, not the new ucode version. The ucode version uses replace
semantics and would never log this. Please check:
head -1 /usr/sbin/mwan3rtmon
It should say #!/usr/bin/env ucode. If it says #!/bin/sh you have the wrong version
installed — try reinstalling the package.
you can test with speedtest-cli, if loadbalancing works it will combine your wans speed
root@ax6000:~# head -1 /usr/sbin/mwan3rtmon
#!/usr/bin/env ucode
root@ax6000:~#
old mwan3 (wan 1Gbit/s + 5G WWAN)

new mwan3 (looks like only 1Gbit active)

Looks like something really broken with LB.
Also, not sure is this related:
[ 556.862521] ucode[23889]: segfault at 10 ip 00007fdc80896095 sp 00007ffd8c3ecdc0 error 4 in rtnl.so[2095,7fdc80896000+7000] likely on CPU 0 (core 0, socket 0)
[ 556.862580] ucode[23890]: segfault at 10 ip 00007f021cba5095 sp 00007ffc25c4a420 error 4
[ 556.868651] Code: 0f 01 00 c3 31 c0 48 85 ff 74 0e 0f b7 07 83 e8 04 48 98 48 39 f0 0f 9d c0 c3 c7 06 02 00 00 00 b8 02 00 00 00 c3 53 48 89 fb <48> 8b 77 10 31 d2 48 8b 3d 0e 10 01 00 ff 15 68 0e 01 00 48 8b 43
[ 556.873617] in rtnl.so[2095,7f021cba5000+7000] likely on CPU 3 (core 3, socket 0)
[ 556.901201] Code: 0f 01 00 c3 31 c0 48 85 ff 74 0e 0f b7 07 83 e8 04 48 98 48 39 f0 0f 9d c0 c3 c7 06 02 00 00 00 b8 02 00 00 00 c3 53 48 89 fb <48> 8b 77 10 31 d2 48 8b 3d 0e 10 01 00 ff 15 68 0e 01 00 48 8b 43
Looking at the ucode log_msg function:
function log_msg(level, msg) {
warn(`mwan3rtmon[${family_name}] ${level}: ${msg}\n`);
}
The only error call is:
log_msg("warn", "failed to add route to table " + tid + ": " + err);
So the ucode version would produce a log message like this:
mwan3rtmon[29269]: mwan3rtmon[ipv4] warn: failed to add route to table 1: RTNETLINK answers: File exists
Your log shows:
mwan3rtmon[29269]: failed to add 10.5.0.0/16 dev wgsg proto kernel scope link src 10.5.0.3 to table 1 - error: RTNETLINK answers: File exists
mwan3rtmon[ipv4] warn: prefix in the message body- error: separator instead of :There is no code path in the ucode mwan3rtmon that could produce the message in your logs, so it comes from the original mwan3rtmon code.
If you installed the package while mwan3 was already running, the new mwan3rtmon binary would be on disk but the old shell process would still be running until mwan3 was restarted.
However, looking at the log timestamps — the errors appear at 17:39:08 following fresh ifup events. So mwan3rtmon was restarted (procd would have started a new instance on interface up). If the old shell version was still on disk at that point, procd would launch that when it respawns mwan3rtmon.
Some possible explanations are that you installed the package but the install didn't overwrite /usr/sbin/mwan3rtmon, or you manually replaced some files but not mwan3rtmon or part of the logs come from a different session when the shell version was still installed.
Looking at the timestamps, there is a gap:
17:39:08 — mwan3rtmon errors (possibly old install)
...
23:24:57 — next restart
23:25:11 — fresh interfaces up (no mwan3rtmon errors this time)
The absence of the same errors at 23:25:11 after the second restart supports this — if the ucode version was already installed by then it would behave correctly.
Can you do a reinstall from a fresh installation from the apk please.
Do you have the same dmesg error as @woffko ?
Two ucode processes are segfaulting inside rtnl.so at the same offset within the library. This may be two mwan3rtmon instances.
The segfault is in rtnl.so, which if it is mwan3rtmon, means mwan3rtmon is crashing during route operations, so policies probably not applied.
If mwan3rtmon crashes on startup, it never completes populating the per-interface routing tables, so even though nftables marks packets, there are no routing table entries to route those marks and traffic will fall back to the default route through a single wan.
From the raw dump you posted, we can see that
rtnl.so in two separate processes with different load addresses, and the Code: dump is identical, so this is reproducible, which is goodIt's either a bug in the rtnl ucode module itself or mwan3rtmon is passing something to rtnl.request() that triggers a null dereference in the library.
What version of openwrt are you running? What version of ucode-mod-rtnl are you running.?
Try this first
mwan3 stop
/usr/sbin/mwan3rtmon ipv4 &
echo $!
If it segfaults mwan3rtmon will crash immediately (during populate_iface_routes or populate_connected_set at startup) and you'll see the segfault in dmesg with the same PID printed by the echo $! command.
Running it like this also means any output goes directly to the terminal rather than syslog, which may give more context before the crash.
Run dmesg | tail -20 immediately after to capture kernel log details too.
If you determine that it is mwan3rtmon crashing, then this test script will help to determine why.
Save it to rtnl-test.uc and run it on the router with ucode ./rtnl-test.uc and send me the output (by private message if you wish as it prints IPs).
The script mirrors exactly what mwan3rtmon does at startup using a harmless test table and it cleans up after itself (well, unless it segfaults that is!). So if it segfaults, we should be able to determine where and what the inputs were and hopefully reproduce it.
#!/usr/bin/env ucode
'use strict';
import * as rtnl from "rtnl";
const RTM_GETROUTE = rtnl.const.RTM_GETROUTE;
const RTM_NEWROUTE = rtnl.const.RTM_NEWROUTE;
const RTM_DELROUTE = rtnl.const.RTM_DELROUTE;
const NLM_F_DUMP = rtnl.const.NLM_F_DUMP;
const NLM_F_CREATE = rtnl.const.NLM_F_CREATE;
const NLM_F_REPLACE = rtnl.const.NLM_F_REPLACE;
const RT_TABLE_MAIN = rtnl.const.RT_TABLE_MAIN;
const AF_INET = rtnl.const.AF_INET;
// Test table - high number unlikely to conflict with anything real.
// If script segfaults before cleanup, run: ip route flush table 9999
const TEST_TABLE = 9999;
// ---------------------------------------------------------------------------
warn("rtnl-test: step 1 - import rtnl OK\n");
warn("rtnl-test: RTM_GETROUTE=" + RTM_GETROUTE +
" NLM_F_DUMP=" + NLM_F_DUMP +
" AF_INET=" + AF_INET + "\n");
// ---------------------------------------------------------------------------
// Step 2: route dump
warn("rtnl-test: step 2 - calling rtnl.request(RTM_GETROUTE, NLM_F_DUMP)...\n");
let routes = rtnl.request(RTM_GETROUTE, NLM_F_DUMP, { family: AF_INET });
warn("rtnl-test: step 2 - rtnl.request returned\n");
let err = rtnl.error();
if (err) {
warn("rtnl-test: step 2 FAILED: " + err + "\n");
exit(1);
}
warn("rtnl-test: step 2 OK - got " + length(routes) + " routes\n");
// ---------------------------------------------------------------------------
// Step 3: iterate returned route objects and access each field individually
// so we can pinpoint any crash during field access
warn("rtnl-test: step 3 - iterating routes and accessing fields...\n");
let main_routes = [];
let i = 0;
for (let r in routes) {
i++;
warn(" [" + i + "] accessing .table...\n"); let tbl = r.table;
warn(" [" + i + "] accessing .dst...\n"); let dst = r.dst;
warn(" [" + i + "] accessing .oif...\n"); let oif = r.oif;
warn(" [" + i + "] accessing .gateway...\n"); let gw = r.gateway;
warn(" [" + i + "] accessing .prefsrc...\n"); let src = r.prefsrc;
warn(" [" + i + "] accessing .protocol...\n"); let proto = r.protocol;
warn(" [" + i + "] accessing .scope...\n"); let scope = r.scope;
warn(" [" + i + "] accessing .priority...\n"); let prio = r.priority;
warn(" [" + i + "] accessing .type...\n"); let type = r.type;
warn(" [" + i + "] accessing .tos...\n"); let tos = r.tos;
warn(" [" + i + "] accessing .metrics...\n"); let metr = r.metrics;
warn(" [" + i + "] all fields OK: table=" + tbl +
" dst=" + (dst ?? "default") + " dev=" + (oif ?? "none") +
" gw=" + (gw ?? "none") + " src=" + (src ?? "none") +
" proto=" + (proto ?? "none") + " scope=" + (scope ?? "none") + "\n");
if (tbl == RT_TABLE_MAIN)
push(main_routes, r);
}
warn("rtnl-test: step 3 OK - " + length(main_routes) + " main table routes\n");
// ---------------------------------------------------------------------------
// Step 4: copy each main table route to test table with NLM_F_CREATE|NLM_F_REPLACE
// (same flags mwan3rtmon uses). Pre-flush first so old crash leftovers are gone.
warn("rtnl-test: step 4 - flushing test table " + TEST_TABLE + "...\n");
system("ip route flush table " + TEST_TABLE);
warn("rtnl-test: step 4 - copying routes to test table " + TEST_TABLE + "...\n");
let n = 0;
for (let r in main_routes) {
n++;
let tr = { family: AF_INET, table: TEST_TABLE };
for (let f in ["dst", "gateway", "oif", "prefsrc", "priority",
"scope", "type", "tos", "metrics"]) {
if (r[f] != null)
tr[f] = r[f];
}
warn(" [" + n + "] calling rtnl.request(RTM_NEWROUTE): " + sprintf("%J", tr) + "\n");
rtnl.request(RTM_NEWROUTE, NLM_F_CREATE | NLM_F_REPLACE, tr);
warn(" [" + n + "] rtnl.request returned\n");
err = rtnl.error();
if (err)
warn(" [" + n + "] error (may be harmless): " + err + "\n");
else
warn(" [" + n + "] OK\n");
}
warn("rtnl-test: step 4 done\n");
// ---------------------------------------------------------------------------
// Step 5: clean up test table routes
warn("rtnl-test: step 5 - removing test table routes...\n");
n = 0;
for (let r in main_routes) {
n++;
let tr = { family: AF_INET, table: TEST_TABLE };
for (let f in ["dst", "gateway", "oif", "prefsrc", "priority",
"scope", "type", "tos", "metrics"]) {
if (r[f] != null)
tr[f] = r[f];
}
warn(" [" + n + "] calling rtnl.request(RTM_DELROUTE): " + sprintf("%J", tr) + "\n");
rtnl.request(RTM_DELROUTE, 0, tr);
warn(" [" + n + "] rtnl.request returned\n");
}
warn("rtnl-test: step 5 done\n");
// ---------------------------------------------------------------------------
warn("rtnl-test: all steps completed without segfault\n");
Ok, I reproduced the same segfault as you.
It comes from a bug in ucode-mod-rtnl that is reproducible. The listener.close() code in ucode-mod-rtnl is nulling a pointer and then when the listener variable itself goes out of scope, the destructor attempts to access members of a structure that has already been nulled by the close() and that causes the segfault.
Fortunately there's an easy workaround: just don't call listener.close() and let the destructor cleanup.
This segfault would not have had any impact on the functionality of mwan3rtmon as it only triggered once an mwan3rtmon process was ending and it had nothing useful to do anymore. procd then automatically respawned a new mwan3rtmon.
During normal operation between restarts, the crash path was never reached. Route monitoring, nft set updates, and routing table management all work correctly. The only real consequence was a segfault kernel log entry on every clean shutdown.
As to the load balancing not working, try the following (the ip address is for ifconfig.me, that returns the ip of the caller; just make sure ifconfig.me resolves to the same ip for you)
Make sure you have a load balancing policy
config member 'wan_m1_w1'
option interface 'wan'
option metric '1'
option weight '1'
config member 'wan2_m1_w1'
option interface 'wan2'
option metric '1'
option weight '1'
config policy 'loadbalance'
list use_member 'wan_m1_w1'
list use_member 'wan2_m1_w1'
option last_resort 'unreachable'
Then add this as the top rule
config rule 'lb_test'
option proto 'all'
option dest_ip '34.160.111.145'
option sticky '0'
option use_policy 'loadbalance'
With this in place on my config, the following executed from a lan client gives the result below (IP addresses for my two wan interfaces are sanitised to "A.A.A.A" and "B.B.B.B"). Perfect round robin load balancing.
user@host:~$ while true; do echo $(curl https://ifconfig.me 2> /dev/null); done
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
A.A.A.A
B.B.B.B
Could you please do a simple test like this first to establish definitively whether the load balancing is working for you or not? I'm not sure what speedtest_cli is doing, so just eliminate it from the testing for the moment.
I will patch the mwan3rtmon code to remove the call to listener.close() that causes the segfault.
I need to find some time to work on this. Hopefully, I can look into it this evening.
This fixes a number of pre-existing issues in mwan3track that were leading to excessive and spurious log messages indicating connectivity failure where there is no failure. These fixes will likely result in a vast reduction in the amount of log message spam and ensure that only real connectivity issues are logged.
Spurious "disconnecting" state on interface reconnection: when a WAN interface came back up after being down, mwan3track would fire one round of health checks using stale interface state (wrong or empty device name) before refreshing its configuration. This caused a false "disconnecting" event to be logged and briefly propagated, even though the interface was healthy. The fix ensures interface state is refreshed before any probing begins.
Raise the disconnecting threshold and log the recovery: the disconnecting state fired immediately on the first score drop (any single ping failure), which meant that historically, even a transient ping failure to a public DNS would cause log message entries that the interface is disconnecting. Raised the threhold to ceil(down/3) failures below the maximum score, so single or double transient losses no longer trigger the warning. Also now logs explicitly when the score recovers out of disconnecting state.
Misleading ping failure log entries when multiple tracking IPs are configured: if you configure multiple track_ip addresses with reliability 1, a failure on the first IP was always logged as "Check failed" even when a subsequent IP responded successfully and the interface was perfectly healthy. On a 3-tracking-ip setup, this meant that ~95% of all logged ping failures were noise, making it very difficult to distinguish real connectivity problems from normal operation. The fix suppresses per-host failure messages when the reliability threshold is ultimately met; a failure is only logged when the round as a whole fails (i.e. no host responded within the configured reliability requirement).
Ping process killed without explanation when source IP becomes invalid: the socket wrapper library (libwrap_mwan3_sockopt) that forces mwan3track's health-check pings to use the correct source address and interface was calling exit() if it couldn't bind to the configured source IP or interface — for example after a DHCP address change that doesn't trigger a full interface reconnect cycle. This silently terminated the ping process with no log message, making it appear as an unexplained connectivity failure with no diagnostic trail. The fix replaces the exit() calls with proper error returns, so the ping fails cleanly and mwan3track logs it in the normal way.
Fixed ghost mwan3track processes after procd respawn: if procd respawned mwan3track without cleanly terminating the previous instance, both processes ran concurrently for the same interface. The ghost instance didn't receive signals from procd_send_signal, so its internal health score diverged from reality and eventually fired a spurious ACTION=disconnecting hotplug event even though the interface was fine.
Optimised netlink lookups: mwan3rtmon was issuing a full kernel route table dump (RTM_GETROUTE) on every route-delete event to verify whether an ECMP path still existed before updating per-interface routing tables. Replaced with an in-memory main_route_cache that is built at startup and maintained incrementally, making the check an O(1) lookup with no kernel round-trip.
Fix the segfault issue identified by @woffko, which was due to a bug in ucode-mod-rtnl
Do get the new version first though. Segfault error fixed. See above description.
using latest version mwan3-3.1.2-1
on my device it's not A B A B A B in order,
it's more A A B B B B B A B B B A B B B A B
A is ip wan
B is ip wan2
but speedtest either via cli or web is what users always do to test loadbalancing in mwan3 iptables, that's how we sure it combined the wans.
also it broke pbr too using it together, the old mwan3 can work together with pbr (with ip rules priority 900, smaller than mwan3 default). by broke i mean pbr is started but none being routed to what we assign.
I don’t know… I’ve been messing with the settings a bit, and I got load balancing to work. Usually, I don’t use load balancing, only a backup configuration.
config globals 'globals'
option mmx_mask '0x3F00'
option rtmon_interval '5'
option logging '1'
option loglevel 'debug'
config interface 'wan'
option enabled '1'
option family 'ipv4'
option initial_state 'online'
option track_method 'ping'
option reliability '2'
option count '1'
option size '56'
option max_ttl '60'
option timeout '2'
option interval '5'
option failure_interval '5'
option recovery_interval '5'
option down '3'
option up '3'
list track_ip '1.1.1.1'
list track_ip '9.9.9.9'
list track_ip '208.67.222.222'
option check_quality '1'
option failure_latency '1000'
option failure_loss '40'
option recovery_latency '500'
option recovery_loss '10'
config interface 'wanb'
option enabled '1'
option family 'ipv4'
option initial_state 'online'
option track_method 'ping'
option reliability '2'
option count '1'
option size '56'
option max_ttl '60'
option timeout '2'
option interval '5'
option failure_interval '5'
option recovery_interval '5'
option down '3'
option up '3'
list track_ip '1.1.1.1'
list track_ip '9.9.9.9'
list track_ip '208.67.222.222'
option check_quality '1'
option failure_latency '1000'
option failure_loss '40'
option recovery_latency '500'
option recovery_loss '10'
config interface 'wan_6'
option enabled '1'
option family 'ipv6'
option initial_state 'online'
option track_method 'ping'
option reliability '2'
option count '1'
option size '56'
option timeout '2'
option interval '5'
option failure_interval '5'
option recovery_interval '5'
option down '3'
option up '3'
list track_ip '2606:4700:4700::1111'
list track_ip '2620:fe::fe'
list track_ip '2620:fe::9'
option max_ttl '60'
option check_quality '1'
option failure_latency '1000'
option failure_loss '40'
option recovery_latency '500'
option recovery_loss '10'
list flush_conntrack 'ifdown'
list flush_conntrack 'ifup'
list flush_conntrack 'connected'
list flush_conntrack 'disconnected'
config interface 'wanb_6'
option enabled '1'
option family 'ipv6'
option initial_state 'online'
option track_method 'ping'
option reliability '2'
option count '1'
option size '56'
option timeout '2'
option interval '5'
option failure_interval '5'
option recovery_interval '5'
option down '3'
option up '3'
list track_ip '2606:4700:4700::1111'
list track_ip '2620:fe::fe'
list track_ip '2620:fe::9'
option max_ttl '60'
option check_quality '1'
option failure_latency '1000'
option failure_loss '40'
option recovery_latency '500'
option recovery_loss '10'
list flush_conntrack 'ifup'
list flush_conntrack 'ifdown'
list flush_conntrack 'connected'
list flush_conntrack 'disconnected'
config member 'wan_m1_w1'
option interface 'wan'
option metric '1'
option weight '1'
config member 'wan_m2_w1'
option interface 'wan'
option metric '2'
option weight '1'
config member 'wanb_m1_w1'
option interface 'wanb'
option metric '1'
option weight '1'
config member 'wanb_m2_w1'
option interface 'wanb'
option metric '1'
option weight '1'
config member 'wan6_m1_w1'
option interface 'wan_6'
option metric '1'
option weight '1'
config member 'wan6_m2_w1'
option interface 'wan_6'
option metric '2'
option weight '1'
config member 'wanb6_m1_w1'
option interface 'wanb_6'
option metric '1'
option weight '1'
config member 'wanb6_m2_w1'
option interface 'wanb_6'
option metric '2'
option weight '1'
config policy 'wan_only'
list use_member 'wan_m1_w1'
list use_member 'wan6_m1_w1'
config policy 'wanb_only'
list use_member 'wanb_m1_w1'
list use_member 'wanb6_m1_w1'
config policy 'wan_wanb'
list use_member 'wan_m1_w1'
list use_member 'wanb_m2_w1'
option last_resort 'unreachable'
config policy 'wanb_wan'
list use_member 'wanb_m1_w1'
list use_member 'wan_m2_w1'
option last_resort 'unreachable'
config rule 'default_rule_v4'
option dest_ip '0.0.0.0/0'
option family 'ipv4'
option use_policy 'wan_wanb'
config rule 'default_rule_v6'
option dest_ip '::/0'
option family 'ipv6'
option use_policy 'wan_6_wanb_6'
option proto 'all'
option sticky '0'
config policy 'wan_6_wanb_6'
option last_resort 'unreachable'
list use_member 'wan6_m1_w1'
list use_member 'wanb6_m2_w1'
Note that I temporarily changed the metric option to 1 on wanb_m2_w1, just to avoid touching the default policy. The other change I made is that I split the policy for IPv6 and IPv4; previously, they were combined.
At the same time, I’m seeing something strange on the status page, but the speeds are clearly being combined, and the interface counters are increasing accordingly.
wan is PPPoE with public IPv4 and IPv6 addresses, while wanb is double NAT for IPv4 and public IPv6.