Multi WAN packet loss on 21.02 (all good on 19.07)

I always had my router configured with multiple WANs as a minimum I have an xDSL and an NCM modem that is used as backup. But I have also configurations with WANs on ethernet. This is to say that regardless of the hardware used for the WAN interfaces, in 21.02, with 2 WANs, one of the 2 experiences packets loss when both are up. If I take down the main one (let's say the pppoe interface), the other interface resumes working 100%. I tried with different routers models (mainly WRT32x and BT Home Hub 5) and I get always the same results.

Please see the issue I posted on the OpenWRT github tracker here:

The same configuration works flawlessy in 19.07. Periodically I try new 21.02 updates when available, but to date this issue remains unresolved.

I am hoping that someone could confirm this issue and help me resolve it. Is it a bug or a configuration issue? To me, it sounds like a bug introduced at the same time as the new switch configuration. Before that feature was introduced in 21.02, this release was working fairly well.

Are the packets sent and not coming back or not sent at all?

I test using ping and I get something between 80% and 50% packet loss, please follow the link to the openwrt issue I opened for an example. In some cases I get also a 100% loss.

I already followed your link. That doesn't answer my question though. Did you verify with tcpdump that packets go out of the interface and don't return or are they spread over different wan interfaces?

Nope, I just used ping and then I had to revert to 19.07. I cannot have downtime on my WANs or the router to investigate further. I would hope that some of the developers would have a test environment to reproduce the issue.

I think that the switch configuration in 21.02 is seriously messed up because when it works it is just by chance.

If it was so seriously messed up, there would be numerous complaints. However we are already in the third release of 21.02 and I haven't seen such. Finally you can use the stable version instead of the snapshot.
However without any troubleshooting from your part, if the problem is not replicated or affecting anyone else, it will be difficult to get much attention.

Could you just try to setup a configuration with 2 WAN interfaces as in my example? Is it a configuration problem? Could it be that you do not see this reported because not many use 2 WANs? I have 3 different sites and all have the same issue on 2 WANs. Wether the first is pppoe and the second an NCM modem or both Ethernet or one Ethernet and one pppoe. The result does not change. One of the 2 interfaces typically the 2nd brought up, looses packets on ping.

I would be grateful if you could look into the configuration I posted and try to reproduce the issue, perhaps it is a configuration problem. In my example there are 2 ethernet WANs, but any interface type gives the same result.

No need to try, I am already running a site with 5 wans.

root@whale:[~]#mwan3 status
Interface status:
 interface wwan is online 41h:49m:31s, uptime 53h:37m:41s and tracking is active
 interface wwan1 is online 16h:11m:09s, uptime 16h:11m:12s and tracking is active
 interface wwan2 is online 33h:42m:18s, uptime 33h:42m:21s and tracking is active
 interface wwan3 is online 41h:22m:22s, uptime 41h:22m:25s and tracking is active
 interface wwan4 is offline and tracking is paused

I took the last one out for testing.

Heve you tried to ping a known IP (like 8.8.8.8) from each interface using
ping -c 10 -I wwan 8.8.8.8
for example to see if you lose packets?
Please could you post your configuration? What router hardware do you have?

root@whale:[~]#ubus call system board
{
        "kernel": "5.4.179",
        "hostname": "whale",
        "system": "ARMv8 Processor rev 3",
        "model": "Raspberry Pi 4 Model B Rev 1.5",
        "board_name": "raspberrypi,4-model-b",
        "release": {
                "distribution": "OpenWrt",
                "version": "21.02.2",
                "revision": "r16495-bf0c965af0",
                "target": "bcm27xx/bcm2711",
                "description": "OpenWrt 21.02.2 r16495-bf0c965af0"
        }
}
root@whale:[~]#uci export network
package network

config interface 'loopback'
        option device 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fd37:6613:cf79::/48'
        option packet_steering '1'

config device
        option name 'br-lan'
        option type 'bridge'
        list ports 'eth0'

config interface 'lan'
        option proto 'static'
        option ipaddr '192.168.199.1'
        option netmask '255.255.255.0'
        option device 'eth0'
        option defaultroute '0'

config interface 'wan'
        option proto 'dhcp'
        option dns_metric '10'
        option metric '10'
        option delegate '0'
        option device 'eth0.10'

config interface 'wwan'
        option proto 'dhcp'
        option dns_metric '20'
        option metric '20'
        option delegate '0'

config interface 'wwan0'
        option proto 'modemmanager'
        option device '/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.4'
        option apn 'internet'
        option pincode '0000'
        option dns_metric '110'
        option metric '110'
        option delegate '0'
        option auth 'both'
        option signalrate '5'
        option iptype 'ipv4'

config interface 'wwan1'
        option proto 'modemmanager'
        option apn 'internet'
        option pincode '0000'
        option auth 'none'
        option iptype 'ipv4'
        option signalrate '5'
        option dns_metric '130'
        option metric '130'
        option delegate '0'
        option device '/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.2'

config interface 'wwan2'
        option proto 'modemmanager'
        option apn 'internet'
        option pincode '0000'
        option dns_metric '140'
        option metric '140'
        option delegate '0'
        option device '/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.1'
        option auth 'none'
        option iptype 'ipv4'
        option signalrate '5'

config interface 'wwan3'
        option proto 'modemmanager'
        option pincode '0000'
        option iptype 'ipv4'
        option signalrate '5'
        option delegate '0'
        option auth 'none'
        option dns_metric '120'
        option metric '120'
        option device '/sys/devices/platform/scb/fd500000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0/usb1/1-1/1-1.3'

root@whale:[~]#ping -I wwan0 -c 10 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=115 time=89.252 ms
64 bytes from 8.8.8.8: seq=1 ttl=115 time=86.835 ms
64 bytes from 8.8.8.8: seq=2 ttl=115 time=74.571 ms
64 bytes from 8.8.8.8: seq=3 ttl=115 time=76.418 ms
64 bytes from 8.8.8.8: seq=4 ttl=115 time=76.166 ms
64 bytes from 8.8.8.8: seq=5 ttl=115 time=74.951 ms
64 bytes from 8.8.8.8: seq=6 ttl=115 time=74.778 ms
64 bytes from 8.8.8.8: seq=7 ttl=115 time=75.430 ms
64 bytes from 8.8.8.8: seq=8 ttl=115 time=76.268 ms
64 bytes from 8.8.8.8: seq=9 ttl=115 time=74.991 ms

--- 8.8.8.8 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 74.571/77.966/89.252 ms
root@whale:[~]#ping -I wwan1 -c 10 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=117 time=310.541 ms
64 bytes from 8.8.8.8: seq=1 ttl=117 time=62.089 ms
64 bytes from 8.8.8.8: seq=2 ttl=117 time=62.918 ms
64 bytes from 8.8.8.8: seq=3 ttl=117 time=61.611 ms
64 bytes from 8.8.8.8: seq=4 ttl=117 time=62.366 ms
64 bytes from 8.8.8.8: seq=5 ttl=117 time=71.036 ms
64 bytes from 8.8.8.8: seq=6 ttl=117 time=62.872 ms
64 bytes from 8.8.8.8: seq=7 ttl=117 time=70.591 ms
64 bytes from 8.8.8.8: seq=8 ttl=117 time=62.375 ms
64 bytes from 8.8.8.8: seq=9 ttl=117 time=61.205 ms

--- 8.8.8.8 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 61.205/88.760/310.541 ms
root@whale:[~]#ping -I wwan2 -c 10 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=57 time=149.262 ms
64 bytes from 8.8.8.8: seq=1 ttl=57 time=63.985 ms
64 bytes from 8.8.8.8: seq=2 ttl=57 time=62.829 ms
64 bytes from 8.8.8.8: seq=3 ttl=57 time=60.601 ms
64 bytes from 8.8.8.8: seq=4 ttl=57 time=60.420 ms
64 bytes from 8.8.8.8: seq=5 ttl=57 time=61.105 ms
64 bytes from 8.8.8.8: seq=6 ttl=57 time=60.925 ms
64 bytes from 8.8.8.8: seq=7 ttl=57 time=60.773 ms
64 bytes from 8.8.8.8: seq=8 ttl=57 time=61.491 ms
64 bytes from 8.8.8.8: seq=9 ttl=57 time=60.282 ms

--- 8.8.8.8 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 60.282/70.167/149.262 ms
root@whale:[~]#ping -I wlan0 -c 10 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=58 time=7.751 ms
64 bytes from 8.8.8.8: seq=1 ttl=58 time=13.038 ms
64 bytes from 8.8.8.8: seq=2 ttl=58 time=10.811 ms
64 bytes from 8.8.8.8: seq=3 ttl=58 time=12.026 ms
64 bytes from 8.8.8.8: seq=4 ttl=58 time=10.999 ms
64 bytes from 8.8.8.8: seq=5 ttl=58 time=10.897 ms
64 bytes from 8.8.8.8: seq=6 ttl=58 time=12.298 ms
64 bytes from 8.8.8.8: seq=7 ttl=58 time=13.101 ms
64 bytes from 8.8.8.8: seq=8 ttl=58 time=11.195 ms
64 bytes from 8.8.8.8: seq=9 ttl=58 time=10.965 ms

--- 8.8.8.8 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 7.751/11.308/13.101 ms

It must be a hardware specific issue then.

We observed similar scenario with mwan3 v2.10.13-1 on RB750Gr3 running on OpenWrt 21.02 across various locations. Common scenario was loss of at least 3 pings on secondary WAN interface before ping starts going.

Sample ping:

root@DEVICEID:~# ping -I lan3 9.9.9.9
PING 9.9.9.9 (9.9.9.9): 56 data bytes
64 bytes from 9.9.9.9: seq=4 ttl=56 time=51.615 ms
64 bytes from 9.9.9.9: seq=5 ttl=56 time=51.191 ms
64 bytes from 9.9.9.9: seq=6 ttl=56 time=50.650 ms
64 bytes from 9.9.9.9: seq=7 ttl=56 time=50.852 ms
64 bytes from 9.9.9.9: seq=8 ttl=56 time=51.223 ms
64 bytes from 9.9.9.9: seq=9 ttl=56 time=50.569 ms
64 bytes from 9.9.9.9: seq=10 ttl=56 time=50.973 ms
^C
--- 9.9.9.9 ping statistics ---
11 packets transmitted, 7 packets received, 36% packet loss
round-trip min/avg/max = 50.569/51.010/51.615 ms

Observed tcpdump at the time:

10:29:15.621915 ARP, Request who-has 9.9.9.9 tell 192.168.1.33, length 28
10:29:16.624858 ARP, Request who-has 9.9.9.9 tell 192.168.1.33, length 28
10:29:17.648771 ARP, Request who-has 9.9.9.9 tell 192.168.1.33, length 28
10:29:19.636860 IP 192.168.1.33 > 9.9.9.9: ICMP echo request, id 30037, seq 4, length 64
10:29:19.688123 IP 9.9.9.9 > 192.168.1.33: ICMP echo reply, id 30037, seq 4, length 64
10:29:20.095243 IP 192.168.1.33 > 8.8.8.8: ICMP echo request, id 30067, seq 0, length 64
10:29:20.137324 IP 8.8.8.8 > 192.168.1.33: ICMP echo reply, id 30067, seq 0, length 64
10:29:20.640865 IP 192.168.1.33 > 9.9.9.9: ICMP echo request, id 30037, seq 5, length 64
10:29:20.691728 IP 9.9.9.9 > 192.168.1.33: ICMP echo reply, id 30037, seq 5, length 64
10:29:21.644869 IP 192.168.1.33 > 9.9.9.9: ICMP echo request, id 30037, seq 6, length 64
10:29:21.695165 IP 9.9.9.9 > 192.168.1.33: ICMP echo reply, id 30037, seq 6, length 64
10:29:22.636926 IP 192.168.1.33 > 101.53.132.190: ICMP echo request, id 30057, seq 5, length 64
10:29:22.648901 IP 192.168.1.33 > 9.9.9.9: ICMP echo request, id 30037, seq 7, length 64
10:29:22.699368 IP 9.9.9.9 > 192.168.1.33: ICMP echo reply, id 30037, seq 7, length 64
10:29:23.640935 IP 192.168.1.33 > 101.53.132.190: ICMP echo request, id 30057, seq 6, length 64
10:29:23.652895 IP 192.168.1.33 > 9.9.9.9: ICMP echo request, id 30037, seq 8, length 64
10:29:23.703731 IP 9.9.9.9 > 192.168.1.33: ICMP echo reply, id 30037, seq 8, length 64

As a temp fix, reverting mwan3 to v2.8.16 from OpenWrt 19.x resolved the issue.