Devices will sometimes repeatedly disconnect and reconnect every 10 seconds or so

This has been happening sporadically for a while. I thought that it was just a quirk of my desktop computer until recently, when I observed identical behavior on two phones on the network simultaneously.

Basically, what seems to happen is that, for some periods of time, a device that has an connection to the network (wired or wireless) will drop its connection to the router if it actually attempts to use the connection. The connection will re-establish itself fairly quickly, and then will drop as soon as the device attempts to send data over it again (which is usually immediately upon re-establishing the connection). The complete cycle lasts around 10 seconds, and will repeat itself for many minutes at a time.

Most of the time, everything is fine and all device's connections are stable. This only happens occasionally, and I don't know what the trigger is.

I've looked at the router overview when things are misbehaving. I don't see any evidence that the router is overloaded. The IP addresses that are assigned to the misbehaving devices are correct - I've got static IPs set for pretty much everything.

I'm running OpenWRT 21.02.1 on a TP-Link Archer C7.

Any ideas what could be causing the issue? I don't even know where to look at this point.

Thanks,
Zistack

Does the same happen with 21.02.3 or 22.03.0-rc4?

Is this happening on wired? wireless? both?

If wireless, what band (2.4GHz or 5GHz)?

If 5GHz, what channel? Could it be a DFS channel?

I was hoping to not have to flash it, but I would if that would fix it. This issue had occurred on some older versions (I don't remember which ones, exactly).

Wired and wireless. Band doesn't seem to matter.

More info. Unrelated to this, I've been working on expanding the network by adding a couple of APs. I got them all configured and tested, and they worked. Then I went to install them in their final locations, and we're suffering rolling connectivity issues across the entire network.

Before we got these APs set up, we had recently added 4 more mobile devices to the network, which increased the frequency of the initially-described instability. I am starting to think that I am hitting some kind of device or connection limit, but as I mentioned initially, the resource usage stats on the overview continue to not show any real problems.

I'd say this is quite related...

Let's see a system topology drawing (can be as simple as a photo of a sketch on paper).

You might want to start by unplugging everything from the main router aside from the internet connection... then test to see if that is stable on its own. Then connect one thing at a time back into the system and keep testing. You may have a switching loop or a piece of equipment (including end devices) that is misbehaving.

I already removed the APs and some noncritical peripheral devices (most phones/tablets, some desktops) just to regain the stability needed to use this website. I probably went overboard, but things are working now.

The network topology is tree-shaped, so I don't think we have switch loops.

We weren't having luck getting devices attached down at the bottom there, but presumably there will be some in the future. The bottom PowerBeam devices are also running OpenWRT. Their version is 21.02.3.

Also, by 'Unrelated to this', I meant that my actions installing additional APs were unrelated. The behavior subsequently observed is almost certainly related, which is why I brought it up.

This should be true as long as you don’t have a link you are not aware of. This sometimes happens accidentally, and can even be related to poorly designed devices/software. For example, there was a bug in at least one Peloton device where it bridged WiFi and wired connections (rather than treating them as separate interfaces) so if the user was connected to WiFi and then plugged in Ethernet, a loop would be created.

Other times it can be a malfunctioning device. A switch could be faulty. Or there are some usb-c docking hubs that will cause a broadcast storm when the host computer is sleeping or disconnected.

The best way to troubleshoot is to bring one thing back online at a time and watch for a recurrence of the issue. When it shows up, disconnect the last device and see if that fixes it.

I figured out why the new APs were causing issues. I was using WDS to bridge them, and I forgot to enable STP on the remote end. That can cause a packet storm (and was doing so).

However, that does not explain the problem described in my initial post, which is, unfortunately, quite transient. I took @tmomas's suggestion and upgraded the firmware on the Archer C7 to 21.02.3. We'll see if it happens again. I'll report back here if it does.

Actually, stp issues could cause clients to disconnect, especially if they are testing for internet connectivity and subsequently disconnects due to those checks.

Yes, but the initial issue predates the new APs by months.

Oh no. My phone is doing the thing right now.

Might be completely unrelated to what you are seeing, but I have seen, on more than one occasion and on differing hardware, very similar symptoms.
In my case the problem was caused by a poor power supply on the router.
It would run fine for a while but then one more device connecting would take it over the top. The SoC would appear to freeze for a few seconds and connections would be lost, sometimes just wifi but often ethernet as well. Very rarely was anything reported in the syslog.

Replacing the power supply has fixed the problem on all occasions so far.

Like I said, maybe unrelated to your problem, but worth looking at....

1 Like

Only a couple of devices experience the issue at a time. All other connected devices operate normally during this period. I can even watch the web interface on the router on another device when it is happening. If it was a power issue, I would expect to see weirdness all across the network, rather than isolated to just one or two devices at a time.

Nice try, though.

Not sure why you'd expect what you described; I'd (personally) still check the power supply.

'cause I know how hardware works.

If it was the power supply, then when the SoC freezes or browns out like he said, the other devices should see an interruption in traffic or even dropped packets, even if their connection isn't totally lost. I don't ever see anything like this on other devices when the issue is happening. They continue to function as if nothing was wrong whatsoever. Only 3 devices have ever reported having any issues, and there isn't any obvious relationship between them, nor any obvious features which would distinguish them from the rest (the two phones are identical to 3 of the other phones on the network in model and configuration).

Also, I should expect that the device would always do this when the device count or overall traffic load exceeds a certain threshold, but this doesn't appear to be the case either. I've yet to observe any correlating factors with the behavior. The only pattern I can see is that it is consistently only a subset of devices that are affected.

Wired and wifi intermittent disconnects, just expanded your network, older equipment?

Surprised no has suggested checking cables and connectors yet.

You did use cat5e for all your connections right? Short distances cat 5 will work for 1000mbps, but expect this kind of trouble for longer ones.

HTH

EDIT sorry, just saw you did write it is a specific subset of devices... you might want to share more about that and I'm assuming your looking at logs on those devices for clues about what might be happening. If these devices connect to the router via a cable - then look at that connection.

Yeah, we've checked the cables. They're all cat5e or better. That was my first thought when I noticed the pattern, as the desktop has been exhibiting the behavior the longest.

The devices are a desktop computer running Linux (mine, unfortunately), and two Teracube 2e phones running Android (one of which is mine, again unfortunately). I remember looking at the logs on my desktop and not finding anything helpful, but if I see it happen again, maybe I'll grab them and post the relevant bits here. I'm not sure how to view these logs on the android devices. The logs on the router didn't seem useful either, but again I could post some of those if I see it happen again.

Android is a bit of chore...

Enable developer options (google for instructions), enable adb debugging in developer options on the android device, install adb (a lot of linux distros have a package and you shouldn't need drivers once you install adb). On linux:
adb shell logcat > logcat.txt
will get you the logs (assuming you connect your android device via usb - there is an option to do adb over wifi). Android logs can be very verbose and a pain to sift through but at least you can get them.