Multiwan with mwan3 in 22.03.5 - not working properly

Hi,

I'm trying to set up multiwan, using mwan3 in 22.03.5 running under the ARM architecture.

My interfaces are as follows:
eth0 - 1Gbps (WAN_Failover)
eth1 - 2.5Gbps (WAN)
eth2 - 2.5Gbps (LAN)

I've installed luci-app-mwan3, which automatically installed its dependencies, including mwan3. The versions are:

opkg list-installed *mwan3*
luci-app-mwan3 - git-23.093.40772-fa4dc75
mwan3 - 2.11.7-1

I've configured mwan3, using LuCI per instructions:

  1. Assigned metric 10 to interface WAN and metric 20 to interface WAN_FAILOVER in LuCi > Network > Interfaces

  2. In Network > MultiWAN Manager > Interface, added two interfaces (WAN, WAN_FAILOVER). The names match the interface names under the Interfaces tab in LuCI and in /etc/network/config. The metrics 10 was automatically assigned to WAN and 20 automatically assigned to WAN_FAILOVER. The tracking on interface WAN is set as follows:

*Tracking hostname or IP address: 8.8.8.8 8.8.4.4
*Tracking method: ping
*Tracking reliability: 1
*Ping count: 1
*Ping size: 56
*Max TTL: 60
*Check link quality (unselected for now)
*Ping timeout: 1 second
*Ping interval: 5 seconds
*Failure interval: 5 seconds
*Keep failure interval (unselected)
*Interface down: 2
*Interface up: 3 (so short for testing purposes only)
*Flush conntrack table: ifup (netifd); ifdown (netifd); connected (mwan3); disconnected (mwan3)

  1. In Network > MultiWAN Manager > Member, I created two members: WAN_member (interface WAN; metric 1; weight 1) and WANFAILOVER_member (interface WAN_FAILOVER; metric 2; weight 2).

  2. In Network > MultiWAN Manager > Policy, I created a policy named WAN_to_WANFAILOVER, added members WAN and WAN_FAIOVER to the policy, and set Last resort to unreachable (reject).

  3. In Network > MultiWAN Manager > Rule, I modified the default_rule_v4 by assigning to it the policy WAN_to_WANFAILOVER

I have two modems connected to the OpenWRT router:
WAN_Failover: T-Mobile 5G Small Business Internet
WAN: Xfinity Cable modem

  1. Things that do work:
    When I unplug the Xfinity modem from interface eth1 (WAN), the OpenWRT router starts routing internet-bound traffic out of interface eth0 (WAN_FAILOVER) with a minimum packet loss (usually less than a second worth of pings).

  2. Things that don't work:
    When I unplug the coax cable from the Xfinity cable modem (without downing interface eth1 (WAN) in the router running OpenWRT), mwan3 notices that interface WAN goes down and shows so in Status > MultiWAN Manager. However, the internet-bound traffic doesn't get re-routed out of interface eth0 (WAN_FAILOVER). Instead, the traffic continues to be routed out of interface eth1 (WAN) to the cable modem and gets black-holed there.

The real-world failure scenario of my primary Internet connection (XFinity Cable Internet) is for the cable modem to lose its signal to the CMTS and for the ping test to fail to 8.8.8.8 without the physical interface eth1 (WAN) on the OpenWRT router going down, so the real-world failure test doesn't work in that the Internet-bound traffic doesn't get properly re-routed out of the secondary Internet connection (eth0; WAN_FAILOVER) with this setup even though the mwan3 script detects that the connectivity to 8.8.8.8 fails via the primary WAN interface eth1.

I'm not actually sure how the mwan3 script is supposed to make the default route with the lower metric in the routing table not be used. I don't know if the mwan3 script is supposed to delete the default route with the lower metric from the routing table or mark it inactive in some way to prevent it from being used for the Internet-bound traffic. Whatever the mwan3 script is supposed to do to prevent the default route with metric 10 from being used when the network reachability is lost via the primary WAN interface isn't working. Only the downing of interface eth1 (WAN) by unplugging the Ethernet cable from it results in the failover of the Internet-bound traffic via interface eth0 (WAN_FAILOVER).

When I physically unplug the Ethernet cable from interface eth1 (WAN), the default route via this interface installed in the routing table is automatically removed. I can see that the default route via interface eth1 (WAN) is removed if I issue the ip route command before I unplug the network cable and again after I unplug the network cable from interface eth1 (WAN). However, this behavior of the routing table is not a function of the mwan3 package but rather the standard Linux behavior in that when an interface physically goes down, all the routes via this interface are removed from the routing table.

So, what I'm after is to get mwan3 to fail over the Internet-bound traffic from interface eth1 (WAN) to interface eth0 (WAN_FAILOVER) when the cable modem loses its singal to the CMTS but interface eth1 (WAN) remains up.

I need help please. Thank you.

Is that the real name of your policy?

Check the logs for errors. When you run mwan3 status you should see the created policies and rules listed.

The name WAN_to_WANFAILOVER is the real name of the policy that I configured. Where did you see the limitation of 15 characters for the policy name?

https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3#policy_configuration

Should I rename the policy to reduce its name to fewer than 15 characters?

Definitely. I'm pretty sure that if you run mwan3 policies; mwan3 rules, the policy and rule will not exist.

Irrelevant post. Removed by OP

OK, so before I change the mwan3 policy name to conform to below 15 characters in length, this is what the ***mwan status *** command output has for Current ipv4 policies:

Current ipv4 policies:
# Warning: iptables-legacy tables present, use iptables-legacy to see them

Now I've removed the mwan3 policy with the name WAN_to_WANFAILOVER and created an identical policy with the name WanFailover, which conforms to the policy-name length of fewer than 15 characters. I've also re-connected the primary WAN interface (eth1), which is my Xfinity Internet cable modem.

Current ipv4 policies:
# Warning: iptables-legacy tables present, use iptables-legacy to see them
WanFailover:
# Warning: iptables-legacy tables present, use iptables-legacy to see them
# Warning: iptables-legacy tables present, use iptables-legacy to see them
# Warning: iptables-legacy tables present, use iptables-legacy to see them
 WAN (100%)

Current ipv6 policies:
WanFailover:
 unreachable
balanced:

This is the mwan3 rules that are now displayed at the end of the mwan3 status command output:

Active ipv4 user rules:
 2345  308K S default_rule_v4  all  --  *      *       0.0.0.0/0            0.0.0.0/0            
    0     0 S https  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 443 

Active ipv6 user rules:
    0     0 S https  tcp      *      *       ::/0                 ::/0                 multiport dports 443 

After unplugging the primary WAN (eth1) interface, the output of the mwan3 status command shows this:


# mwan3 status

Interface status:
interface WAN is offline and tracking is paused
interface WAN_FAILOVER is online 00h:02m:01s, uptime 00h:02m:02s and tracking is active

Current ipv4 policies:
# Warning: iptables-legacy tables present, use iptables-legacy to see them

WanFailover:
# Warning: iptables-legacy tables present, use iptables-legacy to see them
# Warning: iptables-legacy tables present, use iptables-legacy to see them
# Warning: iptables-legacy tables present, use iptables-legacy to see them

WAN_FAILOVER (100%)

Current ipv6 policies:
WanFailover:
unreachable
balanced:

So, the failover from WAN (eth1) to WAN_FAILOVER (eth0) seems to be working now.

I do have another question about established connections. For example, if I am ping IP 8.8.8.8 (Google DNS server) out interface WAN (eth1) and I unplug the coax cable from the Xfinity Cable modem connected to interface WAN (eth1) on the OpenWRT router, the ping fails. However, if I stop the ping and restart it to the same IP 8.8.8.8, the ping works via my secondary WAN_FAILOVER (eth0) connection.

It appears to me that there is some sort of route caching happening in OpenWRT in that once the route lookup occurs, all subsequent packets that confirm to the IP 5-Tuple (source/destination IP, source/destination MAC, protocol) are routed down the same route without subsequent routing-table lookups (probably to reduce the CPU load). Therefore, I have to break the existing connection (probably break the IP 5-Tuple) for a new routing-table lookup to occur so that the traffic gets forwarded out of the backup WAN_Failover (eth0) interface after the WAN failover and vice versa after the WAN failback.

When my primary WAN (eth1) interface comes back up (based on the mwan3 policy) and the WAN failback occurs, the established connection (including ICMP) continues to be routed out the backup WAN_FAILOVER (eth0) interface until I stop the ping and restart it again.

Below is an example of what happens during a failover from interface WAN (eth1) to interface WAN_FAILOVER (eth0):

  1. From my MacBook Pro, I initiate the command ping 8.8.8.8 while the primary WAN (eth1) interface on the OpenWRT router is connected to the Xfinity Cable modem.
~ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=56 time=11.242 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=56 time=11.574 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=56 time=11.839 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=56 time=11.495 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=56 time=11.535 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=56 time=11.604 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=56 time=10.582 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=56 time=11.046 ms
64 bytes from 8.8.8.8: icmp_seq=8 ttl=56 time=10.376 ms
64 bytes from 8.8.8.8: icmp_seq=9 ttl=56 time=11.627 ms
64 bytes from 8.8.8.8: icmp_seq=10 ttl=56 time=9.972 ms
64 bytes from 8.8.8.8: icmp_seq=11 ttl=56 time=10.407 ms
64 bytes from 8.8.8.8: icmp_seq=12 ttl=56 time=11.124 ms
64 bytes from 8.8.8.8: icmp_seq=13 ttl=56 time=12.868 ms
64 bytes from 8.8.8.8: icmp_seq=14 ttl=56 time=13.521 ms
64 bytes from 8.8.8.8: icmp_seq=15 ttl=56 time=10.155 ms
64 bytes from 8.8.8.8: icmp_seq=16 ttl=56 time=10.161 ms
64 bytes from 8.8.8.8: icmp_seq=17 ttl=56 time=11.218 ms
64 bytes from 8.8.8.8: icmp_seq=18 ttl=56 time=10.821 ms
64 bytes from 8.8.8.8: icmp_seq=19 ttl=56 time=10.317 ms
64 bytes from 8.8.8.8: icmp_seq=20 ttl=56 time=9.865 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=56 time=9.912 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=56 time=10.581 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=56 time=10.467 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=56 time=9.833 ms

Now I unscrew the coax cable from the back of the Xfinity Cable modem:

Request timeout for icmp_seq 25
Request timeout for icmp_seq 26
Request timeout for icmp_seq 27
Request timeout for icmp_seq 28
Request timeout for icmp_seq 29
Request timeout for icmp_seq 30
Request timeout for icmp_seq 31
Request timeout for icmp_seq 32
Request timeout for icmp_seq 33
Request timeout for icmp_seq 34
Request timeout for icmp_seq 35
Request timeout for icmp_seq 36
Request timeout for icmp_seq 37
92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 e59d   0 0000  3e  01 fdbc 192.168.200.150  8.8.8.8 

92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 2e6e   0 0000  3e  01 b4ec 192.168.200.150  8.8.8.8 

92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 5e41   0 0000  3e  01 8519 192.168.200.150  8.8.8.8 

Request timeout for icmp_seq 38
Request timeout for icmp_seq 39
Request timeout for icmp_seq 40
92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 f4bc   0 0000  3e  01 ee9d 192.168.200.150  8.8.8.8 

92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 1db0   0 0000  3e  01 c5aa 192.168.200.150  8.8.8.8 

92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 59fd   0 0000  3e  01 895d 192.168.200.150  8.8.8.8 

Request timeout for icmp_seq 41
Request timeout for icmp_seq 42
Request timeout for icmp_seq 43
92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 faf8   0 0000  3e  01 e861 192.168.200.150  8.8.8.8 

Request timeout for icmp_seq 44
Request timeout for icmp_seq 45
Request timeout for icmp_seq 46
92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net ((73.184.XX.XXX: Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 a9f0   0 0000  3e  01 396a 192.168.200.150  8.8.8.8 

Request timeout for icmp_seq 47
Request timeout for icmp_seq 48
Request timeout for icmp_seq 49
92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 85d5   0 0000  3e  01 5d85 192.168.200.150  8.8.8.8 

92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 0981   0 0000  3e  01 d9d9 192.168.200.150  8.8.8.8 

92 bytes from c-73-184-XX-XXX.hsd1.ga.comcast.net (73.184.XX.XXX): Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 5400 163a   0 0000  3e  01 cd20 192.168.200.150  8.8.8.8 

Now I kill the ping session running in the MacBook Pro:

^C
--- 8.8.8.8 ping statistics ---
51 packets transmitted, 25 packets received, 51.0% packet loss
round-trip min/avg/max/stddev = 9.833/10.966/13.521/0.904 ms

Now I restart the ping session from the MacBook Pro:

% ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=113 time=36.862 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=113 time=33.560 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=113 time=33.756 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=113 time=37.276 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=113 time=34.276 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=113 time=35.230 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=113 time=36.237 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=113 time=34.229 ms
^C
--- 8.8.8.8 ping statistics ---
8 packets transmitted, 8 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 33.560/35.178/37.276/1.356 ms

As you can see, the restarted ping session is now going via the backup WAN_FAILOVER (eth0) connection, which is connected to the T-Mobile 5G modem. You can see that the ICMP is now routed out of the backup WAN_FAILOVER connected by looking at the increased latency (from 11 ms to 35 ms). The increased latency is due to the 5G cellular connection inherently having a much higher latency to Internet servers than the latency of the Xfinity Cable modem connection.

If the route caching actually occurs (as I have assumed), shouldn't the mwan3 script clear the routing cache after the WAN failover and WAN failback occur in order to force a new routing-table lookup so as to prevent the established connections from breaking down?

Thank you in advance for any insight into this.

By default you will get iptables-zz-legacy installed as a dependency of mwan3.

Unfortunately mwan3 with iptables-zz-legacy does not work correctly.

Try removing all packages with -legacy in the name then install the package iptables-nft.

mwan3 doesn't work that way.

The routing tables will not be changed unless you run ifdown or unplug the cable.

If the interface state changes to offline because the ping failed, the only thing that will change is the assigned firewall mark (by an iptables rule) that forwards the packets to the corresponding routing table.

There is nothing you can do about already established connections.

Thank you Pavel. So, is it safe to assume that the mwan3 doesn’t provide a perfectly seamless failover? Many applications will lose their connectivity once the primary WAN connection is marked as down by mwan3, and so the Internet-bound trafic on established connections will be black-holed up the primary WAN connection, which is down during that time.

Is it not possible to shut the primary WAN interface briefly and then bring it back up with a script to force the established connections up the failover WAN connection (by following the secondary default route)?

Thank you.