MWAN3 failover not working properly

I've been using MWAN3 for a while and while it does most of what I need it for, I have some problems with it that stop features like failover from working properly.

Right now I have a landline connection coming into my house and a second router with a 4G connection. I'm using a VLAN in OpenWRT so that the 4G router is connected via an ethernet port and I've created an interface for it as WANB. I only currently use MWAN to set some devices on my network to use the 4G router and the others to use the landline, for bandwidth reasons. The routers are each on their own subnets, 192.168.1.1 for the OpenWRT main one and 192.168.2.1 for the 4G router.

However I have wanted to set up failover so that if the 4G disconnects everything will switch to the generally more reliable landline. The devices I want on wanb are set to the wanb_wan rule in MWAN, and when I stop the interface in Openwrt, my wanb devices switch over to wan perfectly fine. But if I actually go into the 4G router and disable the connection, MWAN reports that the interface has gone down but does not actually switch any wanb devices to wan. Basically this means that failover doesn't work if I actually lose the internet connection which is its main purpose.

My setup for this has mostly just been following the official wiki page for MWAN3, but I must admit parts of the requirements I haven't totally understood and may not be entirely right. For example, in the section that says to trying pinging with each interface, I do not actually get a return for my wanb interface, however since it seemed to be otherwise functional I haven't bothered to look into why that's happening. I also don't understand entirely what it means by "Every WAN interface should have a default gateway configured."

My configs are as follows:

uci show network:

network.loopback=interface
network.loopback.device='lo'
network.loopback.proto='static'
network.loopback.ipaddr='127.0.0.1'
network.loopback.netmask='255.0.0.0'
network.globals=globals
network.globals.ula_prefix='fd50:5426:f85a::/48'
network.@device[0]=device
network.@device[0].name='br-lan'
network.@device[0].type='bridge'
network.@device[0].ports='eth0.1'
network.lan=interface
network.lan.device='br-lan'
network.lan.proto='static'
network.lan.ipaddr='192.168.1.1'
network.lan.netmask='255.255.255.0'
network.lan.ip6assign='60'
network.lan.dns='8.8.8.8' '8.8.4.4'
network.@device[1]=device
network.@device[1].name='eth0.2'
network.@device[1].macaddr='0c:80:63:91:b3:fd'
network.wan=interface
network.wan.device='eth0.2'
network.wan.proto='pppoe'
network.wan.metric='10'
network.wan.ipv6='0'
network.wan.peerdns='0'
network.wan.dns='8.8.8.8' '8.8.4.4'
network.wan6=interface
network.wan6.device='eth0.2'
network.wan6.proto='dhcpv6'
network.wan6.reqaddress='try'
network.wan6.reqprefix='auto'
network.wan6.metric='20'
network.@switch[0]=switch
network.@switch[0].name='switch0'
network.@switch[0].reset='1'
network.@switch[0].enable_vlan='1'
network.@switch_vlan[0]=switch_vlan
network.@switch_vlan[0].device='switch0'
network.@switch_vlan[0].vlan='1'
network.@switch_vlan[0].vid='1'
network.@switch_vlan[0].ports='0t 3 4 5'
network.@switch_vlan[1]=switch_vlan
network.@switch_vlan[1].device='switch0'
network.@switch_vlan[1].vlan='2'
network.@switch_vlan[1].ports='0t 1'
network.@switch_vlan[1].vid='2'
network.@switch_vlan[2]=switch_vlan
network.@switch_vlan[2].device='switch0'
network.@switch_vlan[2].vlan='3'
network.@switch_vlan[2].ports='0t 2'
network.@switch_vlan[2].vid='3'
network.wanb=interface
network.wanb.device='eth0.3'
network.wanb.proto='static'
network.wanb.ipaddr='192.168.2.2'
network.wanb.netmask='255.255.255.0'
network.wanb.gateway='192.168.2.1'
network.wanb.dns='8.8.8.8' '8.8.4.4'
network.wanb.metric='30'
network.wanb6=interface
network.wanb6.proto='dhcpv6'
network.wanb6.reqaddress='try'
network.wanb6.reqprefix='auto'
network.wanb6.device='@wanb'
network.wanb6.auto='0'
network.wanb6.metric='40'
network.@route6[0]=route6
network.@route6[0].interface='wanb6'
network.@route6[0].target='::/0'

uci show mwan3:

mwan3.JonPC=rule
mwan3.JonPC.src_ip='192.168.1.100'
mwan3.JonPC.proto='all'
mwan3.JonPC.sticky='1'
mwan3.JonPC.family='ipv4'
mwan3.JonPC.dest_ip='0.0.0.0/0'
mwan3.JonPC.timeout='20'
mwan3.JonPC.use_policy='wanb_wan'
mwan3.JonPCv6=rule
mwan3.JonPCv6.family='ipv6'
mwan3.JonPCv6.src_ip='192.168.1.100'
mwan3.JonPCv6.dest_ip='::/0'
mwan3.JonPCv6.proto='all'
mwan3.JonPCv6.sticky='1'
mwan3.JonPCv6.timeout='20'
mwan3.JonPCv6.use_policy='wanb_only'
mwan3.JonPhone1=rule
mwan3.JonPhone1.family='ipv4'
mwan3.JonPhone1.src_ip='192.168.1.101'
mwan3.JonPhone1.dest_ip='0.0.0.0/0'
mwan3.JonPhone1.proto='all'
mwan3.JonPhone1.sticky='1'
mwan3.JonPhone1.timeout='20'
mwan3.JonPhone1.use_policy='wanb_wan'
mwan3.PSP=rule
mwan3.PSP.family='ipv4'
mwan3.PSP.src_ip='192.168.1.102'
mwan3.PSP.dest_ip='0.0.0.0/0'
mwan3.PSP.proto='all'
mwan3.PSP.sticky='1'
mwan3.PSP.timeout='20'
mwan3.PSP.use_policy='wanb_only'
mwan3.Laptop=rule
mwan3.Laptop.family='ipv4'
mwan3.Laptop.src_ip='192.168.1.103'
mwan3.Laptop.dest_ip='0.0.0.0/0'
mwan3.Laptop.proto='all'
mwan3.Laptop.sticky='1'
mwan3.Laptop.timeout='20'
mwan3.Laptop.use_policy='wanb_wan'
mwan3.Unraid=rule
mwan3.Unraid.family='ipv4'
mwan3.Unraid.src_ip='192.168.1.110'
mwan3.Unraid.dest_ip='0.0.0.0/0'
mwan3.Unraid.proto='all'
mwan3.Unraid.sticky='1'
mwan3.Unraid.timeout='20'
mwan3.Unraid.use_policy='wanb_wan'
mwan3.LapSupport=rule
mwan3.LapSupport.family='ipv4'
mwan3.LapSupport.dest_ip='0.0.0.0/0'
mwan3.LapSupport.proto='all'
mwan3.LapSupport.sticky='1'
mwan3.LapSupport.timeout='20'
mwan3.LapSupport.src_ip='192.168.1.250'
mwan3.LapSupport.use_policy='wanb_only'
mwan3.Raspberrypi=rule
mwan3.Raspberrypi.family='ipv4'
mwan3.Raspberrypi.src_ip='192.168.1.230'
mwan3.Raspberrypi.dest_ip='0.0.0.0/0'
mwan3.Raspberrypi.proto='all'
mwan3.Raspberrypi.sticky='1'
mwan3.Raspberrypi.timeout='20'
mwan3.Raspberrypi.use_policy='wanb_wan'
mwan3.Oculus=rule
mwan3.Oculus.family='ipv4'
mwan3.Oculus.src_ip='192.168.1.115'
mwan3.Oculus.dest_ip='0.0.0.0/0'
mwan3.Oculus.proto='all'
mwan3.Oculus.sticky='1'
mwan3.Oculus.timeout='20'
mwan3.Oculus.use_policy='wanb_only'
mwan3.Everything=rule
mwan3.Everything.family='ipv4'
mwan3.Everything.dest_ip='0.0.0.0/0'
mwan3.Everything.proto='all'
mwan3.Everything.sticky='0'
mwan3.Everything.use_policy='wan_only'
mwan3.Everythingv6=rule
mwan3.Everythingv6.family='ipv6'
mwan3.Everythingv6.dest_ip='::/0'
mwan3.Everythingv6.proto='all'
mwan3.Everythingv6.sticky='0'
mwan3.Everythingv6.use_policy='wan_only'
mwan3.globals=globals
mwan3.globals.mmx_mask='0x3F00'
mwan3.wan=interface
mwan3.wan.enabled='1'
mwan3.wan.family='ipv4'
mwan3.wan.initial_state='online'
mwan3.wan.track_method='ping'
mwan3.wan.count='1'
mwan3.wan.size='56'
mwan3.wan.max_ttl='60'
mwan3.wan.check_quality='0'
mwan3.wan.timeout='4'
mwan3.wan.interval='10'
mwan3.wan.failure_interval='5'
mwan3.wan.recovery_interval='5'
mwan3.wan.down='5'
mwan3.wan.up='5'
mwan3.wan.track_ip='1.1.1.1' '1.0.0.1' '9.9.9.9' '149.112.112.112'
mwan3.wan.reliability='1'
mwan3.wan6=interface
mwan3.wan6.enabled='0'
mwan3.wan6.track_ip='2001:4860:4860::8844' '2001:4860:4860::8888' '2620:0:ccd::2' '2620:0:ccc::2'
mwan3.wan6.family='ipv6'
mwan3.wan6.reliability='2'
mwan3.wanb=interface
mwan3.wanb.family='ipv4'
mwan3.wanb.reliability='1'
mwan3.wanb.initial_state='online'
mwan3.wanb.track_method='ping'
mwan3.wanb.count='1'
mwan3.wanb.size='56'
mwan3.wanb.max_ttl='60'
mwan3.wanb.check_quality='0'
mwan3.wanb.recovery_interval='5'
mwan3.wanb.enabled='1'
mwan3.wanb.timeout='3'
mwan3.wanb.interval='5'
mwan3.wanb.failure_interval='5'
mwan3.wanb.down='5'
mwan3.wanb.up='10'
mwan3.wanb.track_ip='8.8.8.8' '208.67.222.222' '208.67.220.220' '8.8.4.4'
mwan3.wanb6=interface
mwan3.wanb6.track_ip='2001:4860:4860::8844' '2001:4860:4860::8888' '2620:0:ccd::2' '2620:0:ccc::2'
mwan3.wanb6.family='ipv6'
mwan3.wanb6.reliability='1'
mwan3.wanb6.initial_state='online'
mwan3.wanb6.track_method='ping'
mwan3.wanb6.count='1'
mwan3.wanb6.size='56'
mwan3.wanb6.max_ttl='60'
mwan3.wanb6.check_quality='0'
mwan3.wanb6.timeout='4'
mwan3.wanb6.interval='10'
mwan3.wanb6.failure_interval='5'
mwan3.wanb6.recovery_interval='5'
mwan3.wanb6.down='5'
mwan3.wanb6.up='5'
mwan3.wanb6.enabled='0'
mwan3.wan_m1_w3=member
mwan3.wan_m1_w3.interface='wan'
mwan3.wan_m1_w3.metric='1'
mwan3.wan_m1_w3.weight='3'
mwan3.wan_m2_w3=member
mwan3.wan_m2_w3.interface='wan'
mwan3.wan_m2_w3.metric='2'
mwan3.wan_m2_w3.weight='3'
mwan3.wanb_m1_w2=member
mwan3.wanb_m1_w2.interface='wanb'
mwan3.wanb_m1_w2.metric='1'
mwan3.wanb_m1_w2.weight='2'
mwan3.wanb_m2_w2=member
mwan3.wanb_m2_w2.interface='wanb'
mwan3.wanb_m2_w2.metric='2'
mwan3.wanb_m2_w2.weight='2'
mwan3.wan6_m1_w3=member
mwan3.wan6_m1_w3.interface='wan6'
mwan3.wan6_m1_w3.metric='1'
mwan3.wan6_m1_w3.weight='3'
mwan3.wan6_m2_w3=member
mwan3.wan6_m2_w3.interface='wan6'
mwan3.wan6_m2_w3.metric='2'
mwan3.wan6_m2_w3.weight='3'
mwan3.wanb6_m1_w2=member
mwan3.wanb6_m1_w2.interface='wanb6'
mwan3.wanb6_m1_w2.metric='1'
mwan3.wanb6_m1_w2.weight='2'
mwan3.wanb6_m2_w2=member
mwan3.wanb6_m2_w2.interface='wanb6'
mwan3.wanb6_m2_w2.metric='2'
mwan3.wanb6_m2_w2.weight='2'
mwan3.wan_only=policy
mwan3.wan_only.use_member='wan_m1_w3' 'wan6_m1_w3'
mwan3.wanb_only=policy
mwan3.wanb_only.use_member='wanb_m1_w2' 'wanb6_m1_w2'
mwan3.wanb_only.last_resort='unreachable'
mwan3.balanced=policy
mwan3.balanced.use_member='wan_m1_w3' 'wanb_m1_w2' 'wan6_m1_w3' 'wanb6_m1_w2'
mwan3.wan_wanb=policy
mwan3.wan_wanb.use_member='wan_m1_w3' 'wanb_m2_w2' 'wan6_m1_w3' 'wanb6_m2_w2'
mwan3.wanb_wan=policy
mwan3.wanb_wan.use_member='wan_m2_w3' 'wanb_m1_w2' 'wan6_m2_w3' 'wanb6_m1_w2'

mwan3 status:

Interface status:
 interface wan is online 00h:54m:25s, uptime 17h:18m:20s and tracking is active
 interface wan6 is offline and tracking is down
 interface wanb is online 00h:49m:33s, uptime 00h:50m:23s and tracking is active
 interface wanb6 is offline and tracking is down

Current ipv4 policies:
balanced:
 wanb (40%)
 wan (60%)
wan_only:
 wan (100%)
wan_wanb:
 wan (100%)
wanb_only:
 wanb (100%)
wanb_wan:
 wanb (100%)

Current ipv6 policies:
balanced:
 unreachable
wan_only:
 unreachable
wan_wanb:
 unreachable
wanb_only:
 unreachable
wanb_wan:
 unreachable

Directly connected ipv4 networks:
192.168.2.0/24
127.0.0.0/8
224.0.0.0/3
84.66.212.81
90.247.192.1
192.168.1.0/24

Directly connected ipv6 networks:
fd50:5426:f85a::/64
fe80::/64

Active ipv4 user rules:
10311  649K S JonPC  all  --  *      *       192.168.1.100        0.0.0.0/0     
    0     0 S JonPhone1  all  --  *      *       192.168.1.101        0.0.0.0/0 
    0     0 S PSP  all  --  *      *       192.168.1.102        0.0.0.0/0       
    0     0 S Laptop  all  --  *      *       192.168.1.103        0.0.0.0/0    
  237 17093 S Unraid  all  --  *      *       192.168.1.110        0.0.0.0/0    
    0     0 S LapSupport  all  --  *      *       192.168.1.250        0.0.0.0/0
  911 71314 S Raspberrypi  all  --  *      *       192.168.1.230        0.0.0.0/0
    0     0 S Oculus  all  --  *      *       192.168.1.115        0.0.0.0/0    
 4047  404K - wan_only  all  --  *      *       0.0.0.0/0            0.0.0.0/0  
    0     0 - wanb_only  all  --  *      *       192.168.1.135        0.0.0.0/0 

Active ipv6 user rules:
  340 71508 - wan_only  all      *      *       ::/0                 ::/0       

Please let me know if you need any more information.

I'll note that another problem is that if the main wan goes down, wanb on all devices also stops working unless I change my "everything" rule to wanb. Even though the devices should be taking their own rules as priority.

The first question is what version of OpenWrt and mwan3 you have? E.g.,

# grep VERSION_ID /etc/os-release 
VERSION_ID="22.03.2"
# opkg list | grep mwan
luci-app-mwan3 - git-21.340.50573-2af8158
mwan3 - 2.11.1-1

Fwiw I'm on the above and have used mwan3 for years going back to at least 18.06 and am having issues on 22.03. I tentatively blame the firewall changes in 22.03 but haven't figured it out yet. If you are also using 22.03, you could consider trying 21.02.5 which is still supported and has the latest security patches.

-- Mike

My Openwrt version is still 21.02.2, I could try updating to 21.02.5 but I'm not sure where to find the old versions for my router (Archer C7)

Mwan is 2.10.13-1, but I don't see an update for it in OKPG. Could it be looking for a later version of Openwrt?

I have more of a feeling though that I've just configured something wrong rather than it being a problem with the software itself.

21.02.2 and 21.02.5 are probably effectively the same as far as mwan3 is concerned (at least I don't recall problems in any 21.02.x). Also, hey, I have an Archer C7, too :raised_hand:!

mwan3 configs are hard to parse visually. Since you have a case where it works and one where it doesn't, I suggest running mwan3 interfaces and pay attention to differences between working and non-working cases. You can also run mwan3 status to get all the information. I recall in the past when I had invalid configs (like a gateway missing a metric) mwan3 would complain here or in Luci, but I don't remember exactly what those messages looked like.

mwan doesn't seem to say anything in particular about configuration errors. mwan3 status is in the OP and includes mwan3 interfaces, but there is nothing obviously out of the ordinary.

I've had this issue happen again today, it kind of seemed like some DNS stopped working, as some websites I were on continued to work and could be refreshed but generally I couldn't connect to anything else.