Dropped & overrun packets on LAN

My setup is a fairly simple one, hence it's even more puzzling.

OpenWRT (x86) --- Switch-1 --- Switch-2

There are APs connected to both switches, 2 each.
And there are a few CCTVs connected as well.

Every day afternoon (I suppose more users at the time), I notice APs start losing ping from the router.
Drops sometimes continue for half a minute at a time.

Now I see in the router drops/overrun packets.

What could be causing this, any thoughts?
I have checked/replaced some cables. Even replaced the switch-1.
No change in behavior.

What would be the next debug step?
Do I suspect some rogue device and try to isolate them from switches?
The user traffic isn't so much that it can clog a 1Gbps LAN.

eth2 Link encap:Ethernet HWaddr 00:9F:27:E0:05:70
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:173928135 errors:0 dropped:1407941 overruns:216732 frame:0
TX packets:287083939 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:128761660352 (119.9 GiB) TX bytes:296397397000 (276.0 GiB)
Memory:d0700000-d071ffff

Yes... this is probably next.

How many ports on each of these switches? What devices (aside from the APs and cameras) are on each switch?

You should start to strategically disconnect things as part of your troubleshooting -- maybe start by disconnecting the downstream switch as a "bisect" opeation and see if the rest of the network immediately improves (as in within a few seconds to a minute, max). If not, you can probably plug the downstream switch back in and look at what else is plugged into the first switch or the router itself.

I made some comments in another thread about types of scenarios where a rogue device is to blame... for example, USB-C docking hubs with etherent, Sonos devices, even Peloton bikes can sometimes cause this type of situation... isolating the device may be tricky and may take some time, but it is probably what is causing the problem.

1 Like

In the past I've also encountered a broken switch, which started reflecting packets (packet storms) when it was powered off…

(which made up for really 'great' debugging, nothing wrong when testing locally, with the switch powered up).

:exploding_head:
A home/SME network was supposed to be "simple".

Wanted to be sure:
One of the WAN connections to my OpenWRT router is shared through a switch with another network.
I do not have to worry about any mischief with the said shared switch, because the OpenWRT router will ignore it on its WAN port.
I have default OpenWRT firewall rules.

Please confirm/bust my understanding.

Sorry, the best answer I can give at the moment is... "it depends" -- probably not what you want to hear.

Yes, OpenWrt will ignore unsolicited packets on the wan, but there are some scenarios where a misbehaving wan could affect the rest of the network, depending on the nature of the hardware (is the wan port part of a hardware switch inside the device and/or is the router forced to process the packets, etc).

So let's look at this another way... when the problem manifests:

  • likely you're having issues reaching the internet -- that's a given
  • what about reaching the router?
  • and what about other devices on the network?

Try starting multiple persistent ping tests from one machine with targets:

  • a host on the internet
  • the router itself
  • another host on your own network

by looking at the relative patterns of the ping results, you can begin to develop some hypotheses to test (possibly including disconnecting one or both of the wan connections as a test).

So it turns out it was a dumb case of a loop between switches.

Usually, that's catastrophic. Or at least that's what I thought.
Here the impact was very subtle.

Learned something new though.
Seems an "overrun" count in ifconfig output is a good indicator of such an issue.
Not all ping fails in LANs are physical layer issues.