Error installing mwan3 and later no active user rules

Fireball · September 22, 2021, 6:33pm

Yes, I did. Same behavior.

Here is mwan3

config globals 'globals'
        option mmx_mask '0x3F00'

config interface 'wan'
        option enabled '1'
        option family 'ipv4'
        option reliability '2'

config interface 'vwan'
        option enabled '1'
        option family 'ipv4'
        option reliability '1'

config member 'wan_m1_w3'
        option interface 'wan'
        option metric '1'
        option weight '3'

config member 'wan_m2_w3'
        option interface 'wan'
        option metric '2'
        option weight '3'

config member 'vwan_m1_w2'
        option interface 'vwan'
        option metric '1'
        option weight '2'

config member 'vwan_m2_w2'
        option interface 'vwan'
        option metric '2'
        option weight '2'

config policy 'wan_only'
        list use_member 'wan_m1_w3'

config policy 'vwan_only'
        list use_member 'vwan_m1_w2'

config policy 'balanced'
        list use_member 'wan_m1_w3'
        list use_member 'vwan_m1_w2'

config policy 'wan_vwan'
        list use_member 'wan_m1_w3'
        list use_member 'vwan_m2_w2'

config policy 'vwan_wan'
        list use_member 'wan_m2_w3'
        list use_member 'vwan_m1_w2'

config rule 'MachineX'
        option family 'ipv4'
        option use_policy 'vwan_only'
        option src_ip 'MachineX'
        option dest_ip '0.0.0.0/0'
        option proto 'all'
        option sticky '0'
        option logging '0'

config rule 'default_rule_v4'
        option family 'ipv4'
        option use_policy 'wan_only'
        option dest_ip '0.0.0.0/0'

mwan3 status

Interface status:
 interface wan is online 00h:00m:00s, uptime 08h:39m:02s and tracking is not enabled
 interface vwan is online 00h:00m:00s, uptime 08h:39m:02s and tracking is not enabled

Current ipv4 policies:
balanced:
 vwan (40%)
 wan (60%)
vwan_only:
 vwan (100%)
vwan_wan:
 vwan (100%)
wan_only:
 wan (100%)
wan_vwan:
 wan (100%)

Current ipv6 policies:
balanced:
 unreachable
vwan_only:
 unreachable
vwan_wan:
 unreachable
wan_only:
 unreachable
wan_vwan:
 unreachable

Directly connected ipv4 networks:
192.168.6.0/24
224.0.0.0/3
192.168.4.0/24
127.0.0.0/8
192.168.1.0/24

Directly connected ipv6 networks:
fe80::/64

Active ipv4 user rules:
    0     0 - vwan_only  all  --  *      *       192.168.4.xyz        0.0.0.0/0           
 5102  553K - wan_only  all  --  *      *       0.0.0.0/0            0.0.0.0/0

Active ipv6 user rules:

ip -4 addr

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet 192.168.1.4/24 brd 192.168.1.255 scope global eth1
       valid_lft forever preferred_lft forever
5: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    inet 192.168.6.11/24 brd 192.168.6.255 scope global eth2
       valid_lft forever preferred_lft forever
6: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.4.1/24 brd 192.168.4.255 scope global br-lan
       valid_lft forever preferred_lft forever

ip -4 ro list table all

default via 192.168.1.1 dev eth1 table 1 proto static metric 10
192.168.1.0/24 dev eth1 table 1 proto static scope link metric 10
192.168.4.0/24 dev br-lan table 1 proto kernel scope link src 192.168.4.1
default via 192.168.6.2 dev eth2 table 2 proto static metric 20
192.168.4.0/24 dev br-lan table 2 proto kernel scope link src 192.168.4.1
192.168.6.0/24 dev eth2 table 2 proto static scope link metric 20
default via 192.168.1.1 dev eth1 proto static metric 10
default via 192.168.6.2 dev eth2 proto static metric 20
192.168.1.0/24 dev eth1 proto static scope link metric 10
192.168.4.0/24 dev br-lan proto kernel scope link src 192.168.4.1
192.168.6.0/24 dev eth2 proto static scope link metric 20
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1
broadcast 192.168.1.0 dev eth1 table local proto kernel scope link src 192.168.1.4
local 192.168.1.4 dev eth1 table local proto kernel scope host src 192.168.1.4
broadcast 192.168.1.255 dev eth1 table local proto kernel scope link src 192.168.1.4
broadcast 192.168.4.0 dev br-lan table local proto kernel scope link src 192.168.4.1
local 192.168.4.1 dev br-lan table local proto kernel scope host src 192.168.4.1
broadcast 192.168.4.255 dev br-lan table local proto kernel scope link src 192.168.4.1
broadcast 192.168.6.0 dev eth2 table local proto kernel scope link src 192.168.6.11
local 192.168.6.11 dev eth2 table local proto kernel scope host src 192.168.6.11
broadcast 192.168.6.255 dev eth2 table local proto kernel scope link src 192.168.6.11

ip -4 ru

0:      from all lookup local
1001:   from all iif eth1 lookup 1
1002:   from all iif eth2 lookup 2
2001:   from all fwmark 0x100/0x3f00 lookup 1
2002:   from all fwmark 0x200/0x3f00 lookup 2
2061:   from all fwmark 0x3d00/0x3f00 blackhole
2062:   from all fwmark 0x3e00/0x3f00 unreachable
3001:   from all fwmark 0x100/0x3f00 unreachable
3002:   from all fwmark 0x200/0x3f00 unreachable
32766:  from all lookup main
32767:  from all lookup default

trendy · September 22, 2021, 8:37pm

I don't see anything out of the ordinary here. User rules are in place. What you could do to narrow down the culprit of the lost pings is to run a tcpdump on the interface: tcpdump -i eth1 -c 10 icmp and host 8.8.4.4 then run a ping -I eth1 -c 5 8.8.4.4
Post here the output of both commands.

Fireball · September 23, 2021, 5:08am

This is all I got for the lagging interface:

07:04:04.827904 IP 192.168.6.11 > dns.google: ICMP echo request, id 30251, seq 4, length 64
07:04:04.969792 IP dns.google > 192.168.6.11: ICMP echo reply, id 30251, seq 4, length 64

2 packets captured
2 packets received by filter
0 packets dropped by kernel

and this for the normal one

07:05:38.426058 IP 192.168.1.4 > dns.google: ICMP echo request, id 1733, seq 0, length 64
07:05:38.472726 IP dns.google > 192.168.1.4: ICMP echo reply, id 1733, seq 0, length 64
07:05:39.426307 IP 192.168.1.4 > dns.google: ICMP echo request, id 1733, seq 1, length 64
07:05:39.472324 IP dns.google > 192.168.1.4: ICMP echo reply, id 1733, seq 1, length 64
07:05:40.426538 IP 192.168.1.4 > dns.google: ICMP echo request, id 1733, seq 2, length 64
07:05:40.472831 IP dns.google > 192.168.1.4: ICMP echo reply, id 1733, seq 2, length 64
07:05:41.426758 IP 192.168.1.4 > dns.google: ICMP echo request, id 1733, seq 3, length 64
07:05:41.472819 IP dns.google > 192.168.1.4: ICMP echo reply, id 1733, seq 3, length 64
07:05:42.426998 IP 192.168.1.4 > dns.google: ICMP echo request, id 1733, seq 4, length 64
07:05:42.473066 IP dns.google > 192.168.1.4: ICMP echo reply, id 1733, seq 4, length 64
10 packets captured
10 packets received by filter
0 packets dropped by kernel

trendy · September 23, 2021, 9:07am

Try this one:
tcpdump -i eth1 -c 10 icmp and host 8.8.4.4 then run a ping -I eth2 -c 5 8.8.4.4
I want to make sure there is no leak.

Fireball · September 24, 2021, 8:56pm

This is what I get on the ping side

PING 8.8.4.4 (8.8.4.4): 56 data bytes
64 bytes from 8.8.4.4: seq=4 ttl=115 time=281.922 ms

--- 8.8.4.4 ping statistics ---
5 packets transmitted, 1 packets received, 80% packet loss
round-trip min/avg/max = 281.922/281.922/281.922 ms

and I get nothing on the tcpdump side.

I also tried to do the reverse, pinging on eth1, I get all 5 responses and there is also nothing on the tcpdump -i eth2 side.

trendy · September 25, 2021, 4:04pm

Does this happen only when you ping from the OpenWrt or it happens when you ping from a lan host?

Fireball · September 25, 2021, 4:29pm

I tried it out on one of the LAN servers and it does not seem to have the same problem. It seems to be on OpenWRT machine only based on this one expirement.

trendy · September 26, 2021, 12:08am

Add track_ip under the interface definitions of mwan3 and try it again.

Fireball · September 28, 2021, 6:25pm

I am sorry but I think I am back to square one. I had to reboot power off the OpenWRT RPi4 and power it on again. And all of a sudden, it is back to assigning IP addresses in the unused 192.168.0.0/24 subnet and again my network is down. I had to disable mwan3 for it to assign proper IPs and then enable mwan3 again. I can't find what seems to have went wrong again.

trendy · September 28, 2021, 7:21pm

mwan3 definitely is not responsible for advertising different dhcp pool.
Verify on the lan hosts which one is the dhcp server of these addresses. ipconfig /all in windows or in linux.

Fireball · September 30, 2021, 10:43am

Thank you, Trendy. You are probably correct. I have dug more into the logs of OpenWRT and indeed it complained that there is another dhcp server on the lan and recommended to use option force '1' in the lan section of the dhcp configuration file.
I did so and it resolved the problem (at least it seems to have done so based on a couple of shutdown and turning on I did so far).

However, I did go through my lan network again and could not find any device that has a dhcp server. I am currently suspecting the access points I have but I can't pin point which one so far. Anyways, the problem is resolved in that sense. Thank you very much again for your kind support. Appreciate all your time and your following up consistently with me.

Fireball · September 30, 2021, 10:55am

By the way, and just for completeness, I tried enabling tracking on both wan interfaces and tested the ping on OpenWRT and still the VPN wan interface still misses reporting the first 4 ping responses.

trendy · September 30, 2021, 10:57am

The force option is more of a workaround rather than a solution. You must find the rogue dhcp server in the network otherwise you cannot be certain that the lan hosts will use OpenWrt as dhcp server.
I'd suggest to turn off dhcp on OpenWrt and let the hosts acquire dhcp from the rogue. Get the dhcp server address (as described earlier), pinpoint it, and disable it.
Get this fixed first and then we can see the issue with the lost pings.

cesarvog · September 30, 2021, 2:00pm

Hello,

Sorry to hijack this thread, but as I was reading it for informative purposes, I decided I would give the ping commands a try and just found out that the same thing happens in my setup. In my setup, eth1 correspond to the wan connection, while eth5.210 corresponds to the wanb connection.

When I run ping -I eth1 google.com the response starts on seq=0. If I run ping -I eth5.210 google.com, the result starts on seq=4. Same thing happens if instead of google.com I use 8.8.4.4 in the commands above.

Results of uci export mwan3; mwan3 status; ip -4 addr; ip -4 ro list table all; ip -4 ru:

package mwan3

config policy 'wan_wanb'
	list use_member 'wan_m1_w3'
	list use_member 'wanb_m2_w2'
	option last_resort 'unreachable'

config policy 'wan_only'
	list use_member 'wan_m1_w3'
	list use_member 'wan6_m1_w3'

config policy 'balanced'
	list use_member 'wan_m1_w3'
	list use_member 'wanb_m1_w2'
	option last_resort 'unreachable'

config policy 'wanb_wan'
	list use_member 'wan_m2_w3'
	list use_member 'wanb_m1_w2'
	option last_resort 'unreachable'

config policy 'wanb_only'
	list use_member 'wanb_m1_w2'
	list use_member 'wanb6_m1_w2'

config globals 'globals'
	option mmx_mask '0x3F00'

config interface 'wan'
	option enabled '1'
	option family 'ipv4'
	option reliability '2'
	option initial_state 'online'
	list track_ip '4.2.2.1'
	list track_ip '4.2.2.2'
	list track_ip '4.2.2.3'
	list track_ip '4.2.2.4'
	option track_method 'ping'
	option count '1'
	option size '56'
	option max_ttl '60'
	option check_quality '0'
	option timeout '5'
	option interval '10'
	option failure_interval '5'
	option recovery_interval '5'
	option down '5'
	option up '5'
	list flush_conntrack 'ifup'

config interface 'wanb'
	option family 'ipv4'
	option reliability '1'
	option enabled '1'
	option initial_state 'online'
	option track_method 'ping'
	option count '1'
	option size '56'
	option max_ttl '60'
	option check_quality '0'
	option timeout '5'
	option interval '10'
	option failure_interval '5'
	option recovery_interval '5'
	option down '5'
	option up '5'
	list track_ip '4.2.2.1'
	list track_ip '4.2.2.2'
	list track_ip '4.2.2.3'
	list track_ip '4.2.2.4'
	list flush_conntrack 'ifup'

config member 'wan_m1_w3'
	option interface 'wan'
	option metric '1'
	option weight '3'

config member 'wan_m2_w3'
	option interface 'wan'
	option metric '2'
	option weight '3'

config member 'wanb_m1_w2'
	option interface 'wanb'
	option metric '1'
	option weight '2'

config member 'wanb_m2_w2'
	option interface 'wanb'
	option metric '2'
	option weight '2'

config rule 'https'
	option sticky '1'
	option dest_port '443'
	option proto 'tcp'
	option use_policy 'wan_wanb'

config rule 'default_rule_v4'
	option dest_ip '0.0.0.0/0'
	option family 'ipv4'
	option proto 'all'
	option sticky '0'
	option use_policy 'wan_wanb'

Interface status:
 interface wan is online 00h:08m:08s, uptime 00h:08m:08s and tracking is active
 interface wanb is online 00h:08m:07s, uptime 00h:08m:08s and tracking is active

Current ipv4 policies:
balanced:
 wanb (40%)
 wan (60%)
wan_only:
 wan (100%)
wan_wanb:
 wan (100%)
wanb_only:
 wanb (100%)
wanb_wan:
 wanb (100%)

Current ipv6 policies:
balanced:
 unreachable
wan_only:
 unreachable
wan_wanb:
 unreachable
wanb_only:
 unreachable
wanb_wan:
 unreachable

Directly connected ipv4 networks:
224.0.0.0/3
189.31.90.0/23
127.0.0.0/8
192.168.1.0/24
71.78.219.130
192.168.15.0/24
10.13.128.0/24

Directly connected ipv6 networks:

Active ipv4 user rules:
  217 17279 S https  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            multiport dports 443 
  583 65173 - wan_wanb  all  --  *      *       0.0.0.0/0            0.0.0.0/0            

Active ipv6 user rules:
    0     0 S https  tcp      *      *       ::/0                 ::/0                 multiport dports 443 

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc cake state UP group default qlen 1000
    inet 192.168.15.8/24 brd 192.168.15.255 scope global eth1
       valid_lft forever preferred_lft forever
8: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc cake state UP group default qlen 1000
    inet 192.168.1.1/24 brd 192.168.1.255 scope global br-lan
       valid_lft forever preferred_lft forever
10: eth5.210@eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc cake state UP group default qlen 1000
    inet 189.31.91.106/23 brd 189.31.91.255 scope global eth5.210
       valid_lft forever preferred_lft forever
28: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.13.128.97/24 brd 10.13.128.255 scope global wg0
       valid_lft forever preferred_lft forever
default via 192.168.15.1 dev eth1 table 1 proto static src 192.168.15.8 metric 10 
10.13.128.0/24 dev wg0 table 1 proto kernel scope link src 10.13.128.97 
71.78.219.130 via 192.168.15.1 dev eth1 table 1 proto static metric 10 
192.168.1.0/24 dev br-lan table 1 proto kernel scope link src 192.168.1.1 
192.168.15.0/24 dev eth1 table 1 proto static scope link metric 10 
default via 189.31.90.1 dev eth5.210 table 2 proto static src 189.31.91.106 metric 20 
10.13.128.0/24 dev wg0 table 2 proto kernel scope link src 10.13.128.97 
189.31.90.0/23 dev eth5.210 table 2 proto static scope link metric 20 
192.168.1.0/24 dev br-lan table 2 proto kernel scope link src 192.168.1.1 
default via 192.168.15.1 dev eth1 table wan 
192.168.1.0/24 dev br-lan table wan proto kernel scope link src 192.168.1.1 
default via 189.31.90.1 dev eth5.210 table wanb 
192.168.1.0/24 dev br-lan table wanb proto kernel scope link src 192.168.1.1 
default via 10.13.128.97 dev wg0 table wg0 
192.168.1.0/24 dev br-lan table wg0 proto kernel scope link src 192.168.1.1 
default via 192.168.15.1 dev eth1 proto static src 192.168.15.8 metric 10 
default via 189.31.90.1 dev eth5.210 proto static src 189.31.91.106 metric 20 
10.13.128.0/24 dev wg0 proto kernel scope link src 10.13.128.97 
71.78.219.130 via 192.168.15.1 dev eth1 proto static metric 10 
189.31.90.0/23 dev eth5.210 proto static scope link metric 20 
192.168.1.0/24 dev br-lan proto kernel scope link src 192.168.1.1 
192.168.15.0/24 dev eth1 proto static scope link metric 10 
broadcast 10.13.128.0 dev wg0 table local proto kernel scope link src 10.13.128.97 
local 10.13.128.97 dev wg0 table local proto kernel scope host src 10.13.128.97 
broadcast 10.13.128.255 dev wg0 table local proto kernel scope link src 10.13.128.97 
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
broadcast 189.31.90.0 dev eth5.210 table local proto kernel scope link src 189.31.91.106 
local 189.31.91.106 dev eth5.210 table local proto kernel scope host src 189.31.91.106 
broadcast 189.31.91.255 dev eth5.210 table local proto kernel scope link src 189.31.91.106 
broadcast 192.168.1.0 dev br-lan table local proto kernel scope link src 192.168.1.1 
local 192.168.1.1 dev br-lan table local proto kernel scope host src 192.168.1.1 
broadcast 192.168.1.255 dev br-lan table local proto kernel scope link src 192.168.1.1 
broadcast 192.168.15.0 dev eth1 table local proto kernel scope link src 192.168.15.8 
local 192.168.15.8 dev eth1 table local proto kernel scope host src 192.168.15.8 
broadcast 192.168.15.255 dev eth1 table local proto kernel scope link src 192.168.15.8 
0:	from all lookup local
998:	from all fwmark 0x30000/0xff0000 lookup wg0
999:	from all fwmark 0x20000/0xff0000 lookup wanb
1000:	from all fwmark 0x10000/0xff0000 lookup wan
1001:	from all iif eth1 lookup 1
1002:	from all iif eth5.210 lookup 2
2001:	from all fwmark 0x100/0x3f00 lookup 1
2002:	from all fwmark 0x200/0x3f00 lookup 2
2061:	from all fwmark 0x3d00/0x3f00 blackhole
2062:	from all fwmark 0x3e00/0x3f00 unreachable
3001:	from all fwmark 0x100/0x3f00 unreachable
3002:	from all fwmark 0x200/0x3f00 unreachable
32766:	from all lookup main
32767:	from all lookup default

Fireball · September 30, 2021, 3:14pm

That's not hijacking. That's another data point. Thank you.

Fireball · October 1, 2021, 11:49am

Found the rogue DHCP server and fixed the problem. Now, OpenWRT seems to be booting properly without the force option. Tested it several times and it consistently boots and assigns proper addresses. Thank you.

trendy · October 2, 2021, 8:12pm

@feckert @aaronjg
There seems to be an issue with some pings lost at first, then everything works, when mwan3 is running. Is this something you are aware? Should a ticket be opened?

Fireball · October 10, 2021, 5:58pm

Should we ping them again?

trendy · October 10, 2021, 9:13pm

Better open a ticket and link it here.

Fireball · October 14, 2021, 4:29am

You mean open a ticket on GitHub?