Mwan3 takes a long time to switch policies

Can I have some help tuning my mwan3 configuration?

When I see in the status page that an interface is down or disabled, the policy will stay on the failed connection for several minutes before finally switching to the backup interface. I do want it to stay on backup until the primary/higher priority connection is online, but I also want to fail to a backup connection quickly.

cat /etc/config/mwan3

config globals 'globals'
	option mmx_mask '0x3F00'
	option logging '1'
	option loglevel 'notice'

config interface 'wan'
	option enabled '1'
	option family 'ipv4'
	option initial_state 'offline'
	option track_method 'ping'
	option size '56'
	option max_ttl '60'
	option keep_failure_interval '1'
	option reliability '2'
	option interface 'wan'
	option metric '1'
	option weight '3'
	list track_ip '1.1.1.1'
	list track_ip '1.0.0.1'
	list track_ip '8.8.8.8'
	list track_ip '8.8.4.4'
	option failure_interval '3'
	option down '3'
	option up '10'
	option recovery_interval '1'
	option timeout '10'
	option interval '1'
	option count '2'

config rule 'StarlinkModem'
	option family 'ipv4'
	option proto 'all'
	option sticky '0'
	option dest_ip '192.168.100.0/24'
	option use_policy 'default'
	option logging '1'

config rule 'default_rule_v4'
	option dest_ip '0.0.0.0/0'
	option family 'ipv4'
	option proto 'all'
	option sticky '0'
	option use_policy 'failover'

config interface 'wwan'
	option enabled '1'
	option family 'ipv4'
	option track_method 'ping'
	option reliability '2'
	option size '56'
	option keep_failure_interval '1'
	option metric '2'
	option interface 'wwan'
	option weight '2'
	list track_ip '1.1.1.1'
	list track_ip '1.0.0.1'
	list track_ip '8.8.8.8'
	list track_ip '8.8.4.4'
	option initial_state 'offline'
	option failure_interval '3'
	option down '3'
	option recovery_interval '1'
	option timeout '10'
	option max_ttl '120'
	option interval '1'
	option count '2'
	option up '10'

config interface 'easytether'
	option enabled '1'
	option family 'ipv4'
	option keep_failure_interval '1'
	option interface 'easytether'
	option metric '3'
	option weight '1'
	option timeout '10'
	option initial_state 'online'
	option up '1'
	option failure_interval '3'
	option recovery_interval '1'
	list track_ip '1.1.1.1'
	list track_ip '1.0.0.1'
	list track_ip '8.8.8.8'
	list track_ip '8.8.4.4'
	option track_method 'httping'
	option httping_ssl '1'
	option reliability '2'
	option count '1'
	option interval '60'
	option down '5'

config policy 'failover'
	option last_resort 'blackhole'
	list use_member 'wan_only'
	list use_member 'wwan_only'
	list use_member 'easytether_only'
	list use_member 'usb_only'

config member 'wan_only'
	option interface 'wan'
	option metric '1'
	option weight '4'

config member 'wwan_only'
	option interface 'wwan'
	option metric '2'
	option weight '3'

config member 'easytether_only'
	option interface 'easytether'
	option metric '3'
	option weight '2'

config interface 'wan6'
	option enabled '1'
	option family 'ipv6'
	list track_ip '2606:4700:4700::1111'
	list track_ip '2606:4700:4700::1001'
	list track_ip '2001:4860:4860::8888'
	list track_ip '2001:4860:4860::8844'
	option track_method 'ping'
	option reliability '2'
	option size '56'
	option max_ttl '60'
	option keep_failure_interval '1'
	option failure_interval '3'
	option timeout '2'
	option down '3'
	option recovery_interval '1'
	option initial_state 'offline'
	option interval '1'
	option count '2'
	option up '10'

config interface 'wwan6'
	option enabled '1'
	option family 'ipv6'
	list track_ip '2606:4700:4700::1111'
	list track_ip '2606:4700:4700::1001'
	list track_ip '2001:4860:4860::8888'
	list track_ip '2001:4860:4860::8844'
	option track_method 'ping'
	option reliability '2'
	option size '56'
	option max_ttl '60'
	option keep_failure_interval '1'
	option failure_interval '3'
	option timeout '2'
	option down '3'
	option recovery_interval '1'
	option initial_state 'offline'
	option interval '1'
	option count '2'
	option up '10'

config member 'wan6_only'
	option interface 'wan6'
	option metric '1'
	option weight '3'

config member 'wwan6_only'
	option interface 'wwan6'
	option metric '2'
	option weight '2'

config policy 'failover6'
	option last_resort 'blackhole'
	list use_member 'wan6_only'
	list use_member 'wwan6_only'
	list use_member 'usb6_only'

config rule 'default_rule_v6'
	option family 'ipv6'
	option proto 'all'
	option sticky '0'
	option use_policy 'failover6'
	option dest_ip '::/0'

config interface 'usb'
	option enabled '1'
	option initial_state 'online'
	option family 'ipv4'
	list track_ip '1.1.1.1'
	list track_ip '1.0.0.1'
	list track_ip '8.8.8.8'
	list track_ip '8.8.4.4'
	option track_method 'ping'
	option reliability '2'
	option size '56'
	option timeout '10'
	option down '5'
	option up '1'
	option interval '60'
	option failure_interval '3'
	option recovery_interval '1'
	option max_ttl '65'
	option count '2'

config interface 'usb6'
	option enabled '1'
	option initial_state 'online'
	option family 'ipv4'
	list track_ip '2606:4700:4700::1111'
	list track_ip '2606:4700:4700::1001'
	list track_ip '2001:4860:4860::8888'
	list track_ip '2001:4860:4860::8844'
	option track_method 'ping'
	option reliability '2'
	option size '56'
	option timeout '10'
	option interval '60'
	option failure_interval '3'
	option recovery_interval '1'
	option down '5'
	option up '1'
	option max_ttl '65'
	option count '2'

config member 'usb_only'
	option interface 'usb'
	option metric '4'
	option weight '1'

config member 'usb6_only'
	option interface 'usb6'
	option metric '3'
	option weight '1'

Several minutes ... I have a functionally similar setup, just testing it for a while, having 2 interfaces (wan,wwan). In normal state, both interfaces are up. wan (eth1) is default, wwan(qmi) backup. Switchover happens in about a minute or so. Although I do not really know, when switchover happens exactly, because of the various states and transitions, logged by mwan3. Or, what switchover exactly means for mwan3. In my understanding, it should be the instant of time, another default gateway becomes effective. Tried to enable 'debug' logging for mwan3, but have seen, it actually is not working. May be, to patch the code. A full log of a switchover might be helpful here, to compare the activies and the timing.

I have to admit, I do not understand the purpose of
config rule 'StarlinkModem' Only defined, never used.

For diagnosis, as another approach, I would reduce the configuration just to 2 interfaces, i.e. wan and wwan, check functionality, and then to add one interface after another.

Starlink modem rule just forces all traffic to the status page of starlink uplink regardless of where the default route policy is activated. This is so i can see if the modem is having a problem when its failed over to cellular.

I think the issue is that mwan3 doesnt ping all test ips in one ping interval, it ping each X seconds apart. So if there are 4 ips and the failover is set to fail after 5 tests 10 second interval apart, with a reliability of 2, it wont fail until 400 seconds have passed and at least 3 ips dont respond in each 80sec test of 4

I improved this by setting the fail to 1, with 1 sec inbetween tests. Which means it will take at most 8 seconds to fail. And then i set the recover score to 10, so it wont failback until 36 seconds of sucessful tests.

Mwan3 is pretty damn good for free software, but it definitely isnt built for fast failover detection. Im still happy with the results after understanding how it really works.
I would prefer if each test were run in parallel, it makes more sense to test 4 ips at the same time than 4 ips x seconds apart to know if the link is failing at a particular moment.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.