Traffic on one interface leads to packet loss, link loss, bad rx status crc error on other interface

Device problem occurs on

  • Turris Omnia 2020

Software versions of OpenWrt/LEDE release, packages, etc.

  • OpenWrt 19.07.7 a5672f6b96f393145070ad17c8eb1d15ef49ad2e

Problem description

Network setup is multihomed with configured onboard WAN (eth2) device and Huawei E3372 (eth3) device and distinct gateways and metrics for each interface.

As soon as there is any traffic originating from eth3 beyond its gateway (!) I observe packet loss, link loss and bad rx status crc error on eth2.

Configuration

Onboard WAN device eth2:

# dmesg | grep eth2
[    4.473584] mvneta f1034000.ethernet eth2: Using hardware mac address d8:58:d7:01:14:fc
[   17.582029] mvneta f1034000.ethernet eth2: PHY [f1072004.mdio-mii:01] driver [Marvell 88E1510]
[   17.591455] mvneta f1034000.ethernet eth2: configuring for phy/sgmii link mode
[   21.831900] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx

Huawei E3372 USB-Modem with recent firmware (22.328.62.00.1217) in HiLink mode (device eth3)

# lsusb | grep Huawei
Bus 004 Device 002: ID 12d1:14dc Huawei Technologies Co., Ltd. E33372 LTE/UMTS/GSM HiLink Modem/Networkcard

System with kernel module kmod-usb-net-cdc-ether registers successfully new network device eth3

# dmesg  | grep "CDC Ethernet Device"
[   12.553958] cdc_ether 4-1:1.0 eth3: register 'cdc_ether' at usb-f10f8000.usb3-1, CDC Ethernet Device, 0c:5b:8f:27:9a:64

Network configuration for eth2 and eth3

# ip addr show eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 532
    link/ether d8:58:d7:01:14:fc brd ff:ff:ff:ff:ff:ff
    inet xxx.xxx.xxx.xxx/29 brd xxx.xxx.xxx.xxx scope global eth2
       valid_lft forever preferred_lft forever

# ip addr show eth3
14: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 0c:5b:8f:27:9a:64 brd ff:ff:ff:ff:ff:ff
    inet 192.168.8.100/24 brd 192.168.8.255 scope global eth3
       valid_lft forever preferred_lft forever

Routing

# route -n | egrep "Destination|eth"
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         xxx.xxx.xxx.xxx 0.0.0.0         UG    10     0        0 eth2
0.0.0.0         192.168.8.1     0.0.0.0         UG    20     0        0 eth3
192.168.8.0     0.0.0.0         255.255.255.0   U     20     0        0 eth3
xxx.xxx.xxx.xxx 0.0.0.0         255.255.255.248 U     10     0        0 eth2

Firewall

NAT active on eth2 only

Steps to reproduce

1. Ping from eth3 with destination beyond its gateway:
(pinging the gateway from eth3 will not produce the error)

# ping -I eth3 -i 0.5 -q 1.1.1.1

2. From different terminal window start pinging from eth2 and observe packet loss

# ping -I eth2 -c 20 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from xxx.xxx.xxx.xxx eth2: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_req=1 ttl=119 time=2.13 ms
64 bytes from 8.8.8.8: icmp_req=2 ttl=119 time=2.15 ms
64 bytes from 8.8.8.8: icmp_req=3 ttl=119 time=2.12 ms
64 bytes from 8.8.8.8: icmp_req=4 ttl=119 time=2.12 ms
64 bytes from 8.8.8.8: icmp_req=6 ttl=119 time=2.11 ms
64 bytes from 8.8.8.8: icmp_req=13 ttl=119 time=2.13 ms
64 bytes from 8.8.8.8: icmp_req=14 ttl=119 time=2.17 ms
64 bytes from 8.8.8.8: icmp_req=15 ttl=119 time=2.13 ms
64 bytes from 8.8.8.8: icmp_req=16 ttl=119 time=2.10 ms
64 bytes from 8.8.8.8: icmp_req=17 ttl=119 time=2.11 ms
64 bytes from 8.8.8.8: icmp_req=18 ttl=119 time=2.12 ms
64 bytes from 8.8.8.8: icmp_req=19 ttl=119 time=2.11 ms
64 bytes from 8.8.8.8: icmp_req=20 ttl=119 time=2.10 ms

--- 8.8.8.8 ping statistics ---
20 packets transmitted, 13 received, 35% packet loss, time 19364ms
rtt min/avg/max/mdev = 2.104/2.129/2.177/0.040 ms

3. Check kernel ring buffer for related entries (link loss, bad rx status crc error)

# dmesg
[  540.372989] mvneta f1034000.ethernet eth2: Link is Down
[  542.449648] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[  605.886917] mvneta f1034000.ethernet eth2: Link is Down
[  607.963595] mvneta f1034000.ethernet eth2: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1164.426903] device eth2 entered promiscuous mode
[ 1166.118663] device eth2 left promiscuous mode
[ 1175.666308] device eth2 entered promiscuous mode
[ 1194.086756] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=391
[ 1195.740476] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=94
[ 1196.963670] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=82
[ 1197.536843] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=157
[ 1199.641764] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=81
[ 1204.552005] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=298
[ 1204.683304] mvneta f1034000.ethernet eth2: bad rx status 0c410000 (crc error), size=156
[ 1237.241446] device eth2 left promiscuous mode

Question

What the heck is going on here?
Any ideas and help to find out what is going on here is highly welcome and appreciated.

Reference on bugs.openwrt.org

https://bugs.openwrt.org/index.php?do=details&task_id=3733

Possibly a bad patch cable? Try replacing it with a known good one...

Nope.
It's not the patch cable and also not the gateway.
Already tried replacing the patch cable and also putting a switch between the machine and the gateway.
Later realizing that the symptoms on eth2 (packet loss, link loss, bad rx status crc error) are exclusively related to traffic on eth3 (How strange is that :exploding_head:)

Yeah, it is kinda weird. The symptoms seem pretty odd indeed....

I wonder if it might be related to some path MTU issue? Easily tested by setting your two ethernet interfaces to some low MTU value like 1300...

I don't see the relevance of the MTU size here. The Ethernet frames with the icmp packets are 100 bytes in size only.

I am having the same problem on my Omnia configured as dumb AP and I get crc errors on linking interface eth2. I did get it earlier and it was fixed by replacing cables.

Now I tried to move cables around a bit but no luck. Errors are still there an real download speed falled down to around 1Mb. Is the port/interface physically broken?