Wired ports fail intermittently

I have a few wireless networks and my wired LAN. I typically plug my main laptop into the wired LAN. Lately, when I unplug it, all my wired ports fail within about a half hour. When I plug either of my laptops back into that wired port, all the ports recover. During the period of outage, all my wireless devices continue to talk to the router and to each other. However, none of the wired devices are connected, and my Internet service over the WAN port also dies. This is a problem that seemed like a hardware issue at first, so I replaced my TL-WR1042ND with a new AC1750, and upgraded to OpenWrt 19.07.2 r10947-65030d81f3, but the issue has persisted.

More details: My wired network and my primary wireless network were at 192.168.1.0/24. (For brevity, I will omit details about my IPv6 prefixes.) To see whether it helped, I removed bridging and now my wired network is on 192.168.1.0/24, and my wireless networks are on 192.168.2.0/24, 192.168.3.0/24, and 192.168.4.0/24. That didn't resolve the problem. The kernel log never has anything at all, from the time when the network goes down or when it comes up. The system log contains nothing about the network going down, and only the DHCP assignment for my laptop at the time the network gets restored. I have a full tcpdump for a 5-second period when I plugged in the laptop and my network got restored, but I don't see anything suspicious.

What features could be causing this, that I should be investigating? Thx.

I have three thoughts here...

  1. are you using a USB-C docking station with an ethernet port and then unplugging the USB-C cable from your laptop? If so, an issue just like this has actually been reported in the Ubiquiti forums for certain docking stations/hubs -- disconnecting the ethernet cable from the docking station or from the router resolves the problem.

  2. Assuming #1 does not apply, have you tried replacing the cables -- maybe something odd is happening with those cables. If unplugging the cable(s) directly from the router solves the problem, the cables could well be to blame.

  3. If both of the above are not resolving the issue, it would seem to me that there might be a hardware issue. But...
    You could try resetting your router to defaults (and only configure the absolute essentials like SSID and password) -- if that helps, clearly there is something wrong with the configuration or any additional packages you may have installed rather than the hardware. If you take a backup first, you can restore it after the reset... if the problem comes back, you can post your config files for review.

Your hunch was exactly right!

On March 18, I got a TOTU 13 USB-C hub, right around the time that my router started failing. This morning I ran a test, and I found that the router ports only fail if: 1) the hub is connected to the ethernet cable, and 2) no laptop is connected to the hub.

Is the next step for me to file an OpenWRT bug? Whatever the behavior of the hub, it shouldn't be able to break the router, and it certainly shouldn't be able to disrupt traffic across VLAN boundaries on my WAN.

Here is another reviewer having the exact same problem with the same model of hub: https://smile.amazon.com/gp/customer-reviews/R3EJIJ5BYSO979/ref=cm_cr_getr_d_rvw_ttl?ie=UTF8&ASIN=B07X8V3SLM

No, please don't. This is not platform specific and is likely an issue at L2 (switching) so it wouldn't even hit the CPU/SoC and thus not interact with the OS.

Thus far, I have not seen any solid explanation for exactly why this happens and how it might be resolved (aside from firmware and/or hardware fixes on the USB-C hubs). The best guess that I am aware of is that there is a huge flood of traffic in the form of a broadcast storm or something, possibly invalid data, being broadcast across the network. I have been wondering if STP or other port isolation techniques could resolve the issue, but I think that this would only possible on medium-to-high end managed switches (IIRC, there is an STP feature in OpenWrt, but I don't know if the switch chips are advanced enough to combat whatever is actually happening on the port.

Here are a few selected threads from the Ubiquiti forums: 1, 2, 3, 4

EDIT: I should add that in the UI forums, there have been some attempts to solve the issue by inserting a 'sacrificial' switch between the offending USB-C device and the rest of the upstream network -- that did not stop the broadcast storm. I don't know if anyone has tried using a high end switch with full management features to find solutions, but those mitigation factors may only be available on enterprise grade switches.

When I had my LAN in bridging mode, I tried enabling STP, but it didn't help. Since the option was only available in bridging mode, I suspect it wasn't applied to all routes. I also have an unmanaged switch in between the router and the offending hub. I can verify that does not mitigate the problem. A managed switch would be much more expensive—more expensive than the hub itself.

Thanks for the diagnosis. I've contacted TOTU about replacing their USB-C hub.

Yeah... especially because we're not talking about a basic VLAN aware smart switch (which can be pretty inexpensive) -- I think it needs to have more sophisticated management capabilities. But I'm not really sure since I haven't experienced this myself.

Hopefully with a different model (or even just returning it and buying a different product entirely). This is not likely a bug with that specific unit (as in on a per-serial number basis) -- this is a hardware and/or firmware bug that will likely plague all the units of this design unless/until they have a revision that fixes it.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.