Mwan3 dropping packets on lowest-metric interface

Here's the situation:

  • I've been following the guide: https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3
  • I've configured 2x WAN interfaces:
  • eth0.2 which is connected via ethernet to an ADSL router.
  • wlan1 which is connected via wifi to a 4g router.
  • I've verified both WAN interfaces are working.
  • I then installed mwan3 and have configured a single rule:
  • 0.0.0.0/0 which routes to [one of either eth0.2 or wlan1]
  • When I set the rule to use eth0.2 everything works perfectly.
  • When I set the rule to use wlan1 things mostly work, however I seem to be getting partial responses...

I've verified that both WAN interfaces are working from the OpenWRT router:

root@GL-AR750S:~# ping -c 1 -I eth0.2 www.google.com
1 packets transmitted, 1 packets received, 0% packet loss

root@GL-AR750S:~# ping -c 1 -I wlan1 www.google.com
1 packets transmitted, 1 packets received, 0% packet loss

root@GL-AR750S:~# curl ifconfig.co --interface eth0.2
86.xx.xx.xx

root@GL-AR750S:~# curl ifconfig.co --interface wlan1
188.xx.xx.xx

I've verified that my test scenario (explained further down) works as expected from the OpenWRT router (note the byte response size is consistent):

root@GL-AR750S:~# curl --interface wlan1 'https://openwrt.org/lib/exe/css.php' 2>/dev/null | wc
        0      6026    232760
root@GL-AR750S:~# curl --interface wlan1 'https://openwrt.org/lib/exe/css.php' 2>/dev/null | wc
        0      6026    232760
root@GL-AR750S:~# curl --interface eth0.2 'https://openwrt.org/lib/exe/css.php' 2>/dev/null | wc
        0      6026    232760
root@GL-AR750S:~# curl --interface eth0.2 'https://openwrt.org/lib/exe/css.php' 2>/dev/null | wc
        0      6026    232760

When I configure mwan3 to forward everything to eth0.2, from a client device everything consistently works great:

➜  ~ curl ifconfig.co
86.xx.xx.xx
➜  ~ curl 'https://openwrt.org/lib/exe/css.php' | wc
      0    6026  232760
➜  ~ curl 'https://openwrt.org/lib/exe/css.php' | wc
      0    6026  232760
➜  ~ curl 'https://openwrt.org/lib/exe/css.php' | wc
      0    6026  232760

Now, when I configure mwan3 to forward everything to wlan1, from a client device I seem to be losing data. Specifically, making requests for anything over ~100KB seem to break. The curl commands from above demonstrate the behaviour quite nicely:

➜  ~ curl ifconfig.co
188.xx.xx.xx
➜  ~ curl 'https://openwrt.org/lib/exe/css.php' -m 3 2>/dev/null | wc
      0    2598  147082
➜  ~ curl 'https://openwrt.org/lib/exe/css.php' -m 3 2>/dev/null | wc
      0    2598  147081
➜  ~ curl 'https://openwrt.org/lib/exe/css.php' -m 3 2>/dev/null | wc
      0    2185  130715

Bear in mind the payload should be 232k bytes. All of the curl's receive between 130k-140k bytes and hang. I added the -m 3 to tell curl to exit after 3 seconds to show how many bytes I got before things went wrong. If I leave the curl running, the remaining bytes never arrive. It's worth noting I get an inconsistent number of bytes each time.

I've recorded the network activity specifically for one of these curl commands, but I can't seem to upload it here. Here's a screenshot:

I don't suppose anyone has any idea of what I'm doing wrong? Or any tips for debugging further?

Seems to me like a MTU/MSS issue.
Can you try to ping from the router or some host without fragmenting the packet increasing payloads and verify when it stops pinging?
Check also this guide:
https://kb.netgear.com/19863/Ping-Test-to-determine-Optimal-MTU-Size-on-Router

All of these tests were taken on a client device, routing as follows...

eth0.2:

  • [client] >>ethernet>> [openWrt router] >>ethernet>> [adsl-modem] >>pppoe>> ... [openwrt.org]

wlan1:

  • [client] >>ethernet>> [openWrt router] >>802.11n>> [4g-modem] >>4g>> ... [openwrt.org]

In Wireshark on the client:

  • Routing via wlan1, packets max out at 1414 bytes on the wire. (This is the route that's not working from the client, but works from the openWrt router)
  • Routing via eth0.2, packets max out at 1508 bytes on the wire.

According to ping on the client, the max size is 1472(1500) bytes, and is identical for both interfaces.

What's interesting is that if I disable eth0.2 things improve on the client to the point where everything feels like it's working, but Wireshark still reports duplicate acknowledgements after the connection has completed. I'll see if I can get a tcpdump running on the OpenWrt router to check everything's definitely working from there.

1 Like

Ok, I've narrowed down the problem. It's definitely not specific to any WAN interface. As per the docs for mwan3:

Step 1: Configure a different metric for each WAN interface
:!: IMPORTANT:  :!: This is an important step and is compulsory. Time and time again fail to configure this and have a none working setup
 * You must configure each WAN interface with a different routing metric. This metric will only have an effect on the default routing table, not on the mwan3 routing tables.
 * The default (primary) WAN interface should have the lowest metric (e.g. 10) and each additional WAN interface a higher metric (e.g. 20, 30, etc.). Values are not important, but should always be unique.
 * Every WAN interface should have a default gateway configured.

Each WAN interface does indeed have a routing metric...

Right now, eth0.2 is 10 and wlan1 is 20. Forwarding all traffic to eth0.2 is fine, doing the same for wlan1 causes packet issues. If I swap the metrics around, forwarding all traffic to eth0.2 causes packet issues and wlan1 is fine.

Could you post the configs plus some troubleshooting?

ip -4 addr; ip -4 ro ls table all; ip -4 ru; uci show network; uci show mwan3

Absolutely :slight_smile:

root@GL-AR750S:~# ip -4 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
6: br-lan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.9.8.254/24 brd 10.9.8.255 scope global br-lan
       valid_lft forever preferred_lft forever
10: wlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.8.102/24 brd 192.168.8.255 scope global wlan1
       valid_lft forever preferred_lft forever
13: eth0.2@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 192.168.1.114/24 brd 192.168.1.255 scope global eth0.2
       valid_lft forever preferred_lft forever
root@GL-AR750S:~# ip -4 ro ls table all
default via 192.168.8.1 dev wlan1 table 1 
default via 192.168.1.254 dev eth0.2 table 2 
default via 192.168.8.1 dev wlan1 proto static src 192.168.8.102 metric 20 
default via 192.168.1.254 dev eth0.2 proto static src 192.168.1.114 metric 30 
10.9.8.0/24 dev br-lan proto static scope link metric 10 
192.168.1.0/24 dev eth0.2 proto static scope link metric 30 
192.168.8.0/24 dev wlan1 proto static scope link metric 20 
broadcast 10.9.8.0 dev br-lan table local proto kernel scope link src 10.9.8.254 
local 10.9.8.254 dev br-lan table local proto kernel scope host src 10.9.8.254 
broadcast 10.9.8.255 dev br-lan table local proto kernel scope link src 10.9.8.254 
broadcast 127.0.0.0 dev lo table local proto kernel scope link src 127.0.0.1 
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1 
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1 
broadcast 127.255.255.255 dev lo table local proto kernel scope link src 127.0.0.1 
broadcast 192.168.1.0 dev eth0.2 table local proto kernel scope link src 192.168.1.114 
local 192.168.1.114 dev eth0.2 table local proto kernel scope host src 192.168.1.114 
broadcast 192.168.1.255 dev eth0.2 table local proto kernel scope link src 192.168.1.114 
broadcast 192.168.8.0 dev wlan1 table local proto kernel scope link src 192.168.8.102 
local 192.168.8.102 dev wlan1 table local proto kernel scope host src 192.168.8.102 
broadcast 192.168.8.255 dev wlan1 table local proto kernel scope link src 192.168.8.102 
root@GL-AR750S:~# ip -4 ru
0:	from all lookup local 
1001:	from all iif wlan1 lookup main 
1002:	from all iif eth0.2 lookup main 
2001:	from all fwmark 0x100/0x3f00 lookup 1 
2002:	from all fwmark 0x200/0x3f00 lookup 2 
2061:	from all fwmark 0x3d00/0x3f00 blackhole
2062:	from all fwmark 0x3e00/0x3f00 unreachable
32766:	from all lookup main 
32767:	from all lookup default 
root@GL-AR750S:~# uci show network
network.loopback=interface
network.loopback.ifname='lo'
network.loopback.proto='static'
network.loopback.ipaddr='127.0.0.1'
network.loopback.netmask='255.0.0.0'
network.globals=globals
network.globals.ula_prefix='fd69:1cb9:8c6c::/48'
network.lan=interface
network.lan.proto='static'
network.lan.netmask='255.255.255.0'
network.lan.ip6assign='60'
network.lan.hostname='GL-AR750S-c8c'
network.lan.ipaddr='10.9.8.254'
network.lan.ifname='eth0.1'
network.lan.metric='10'
network.lan.type='bridge'
network.wan=interface
network.wan.ifname='eth0.2'
network.wan.proto='dhcp'
network.wan.hostname='GL-AR750S-c8c'
network.wan.metric='30'
network.@switch[0]=switch
network.@switch[0].name='switch0'
network.@switch[0].reset='1'
network.@switch[0].enable_vlan='1'
network.@switch_vlan[0]=switch_vlan
network.@switch_vlan[0].device='switch0'
network.@switch_vlan[0].vlan='1'
network.@switch_vlan[0].ports='2 3 0t'
network.@switch_vlan[1]=switch_vlan
network.@switch_vlan[1].device='switch0'
network.@switch_vlan[1].vlan='2'
network.@switch_vlan[1].ports='1 0t'
network.wan_4g=interface
network.wan_4g.proto='dhcp'
network.wan_4g.ifname='wlan1'
network.wan_4g.metric='20'
root@GL-AR750S:~# uci show mwan3
mwan3.globals=globals
mwan3.globals.enabled='1'
mwan3.globals.mmx_mask='0x3F00'
mwan3.adsl=policy
mwan3.adsl.last_resort='default'
mwan3.adsl.use_member='adsl_only'
mwan3.4g=policy
mwan3.4g.last_resort='default'
mwan3.4g.use_member='4g_only'
mwan3.catchall_adsl=rule
mwan3.catchall_adsl.use_policy='adsl'
mwan3.catchall_adsl.proto='all'
mwan3.catchall_adsl.sticky='0'
mwan3.wan_4g=interface
mwan3.wan_4g.enabled='1'
mwan3.wan_4g.initial_state='online'
mwan3.wan_4g.family='ipv4'
mwan3.wan_4g.track_method='ping'
mwan3.wan_4g.reliability='1'
mwan3.wan_4g.count='1'
mwan3.wan_4g.size='56'
mwan3.wan_4g.check_quality='0'
mwan3.wan_4g.timeout='2'
mwan3.wan_4g.interval='5'
mwan3.wan_4g.failure_interval='5'
mwan3.wan_4g.recovery_interval='5'
mwan3.wan_4g.down='3'
mwan3.wan_4g.up='3'
mwan3.wan_4g.flush_conntrack='never'
mwan3.wan=interface
mwan3.wan.enabled='1'
mwan3.wan.initial_state='online'
mwan3.wan.family='ipv4'
mwan3.wan.track_method='ping'
mwan3.wan.reliability='1'
mwan3.wan.count='1'
mwan3.wan.size='56'
mwan3.wan.check_quality='0'
mwan3.wan.timeout='2'
mwan3.wan.interval='5'
mwan3.wan.failure_interval='5'
mwan3.wan.recovery_interval='5'
mwan3.wan.down='3'
mwan3.wan.up='3'
mwan3.wan.flush_conntrack='never'
mwan3.adsl_only=member
mwan3.adsl_only.interface='wan'
mwan3.adsl_only.metric='1'
mwan3.adsl_only.weight='2'
mwan3.4g_only=member
mwan3.4g_only.interface='wan_4g'
mwan3.4g_only.metric='1'
mwan3.4g_only.weight='1'

You are missing some track_ip options from the config interface of the mwan3. This is supposed to be required, I am not sure how it let you start the service without it. Also raise the timeout value to something larger than 2, otherwise you might have false lost connections.
You have only one rule there sending out everything to adsl only, so do I presume correctly that when you test the 4G you change this rule into mwan3.catchall_adsl.use_policy='4g' ?

Fix these things first. I propose to follow theexample config in the documentation page and stick to the values it uses for each option. After you have it working fine, you can modify what you need.

I'd disabled the liveness checks in the web UI, which is why the config isn't there. Yep, I was swapping the catchall route between '4g' and 'adsl'.

I've tried all sorts to fix this, I'm going to call it a day for now. I might try again in the coming weeks/months. For now I'll connect different devices to different networks.

Thanks for taking the time to help @trendy, really appreciated.

1 Like