Sudden speed dips and disconnects on TP-Link TL-WDR4300

LuCI reports that I have:

Model:            TP-Link TL-WDR4300 v1
Architecture:     Atheros AR9344 rev 2
Firmware Version: OpenWrt 19.07.7 r11306-c4a6851c72 / LuCI openwrt-19.07 branch git-21.044.30835-34e0d65
Kernel Version:   4.14.221

For years I've been using OpenWRT 14-something as a router in front of an ISP modem. I've moved a bunch of places, switched ISPs and their network types - everything was working perfectly.
Until the day my current ISP had to replace their modem because of a technical fault. A technician brought a completely new model that immediately refused to work with my router - absolutely no packets were getting through for some reason. Whereas directly plugging my PC into the ISP's modem worked just fine.

I know very little about networking, so in my attempts to make the router and the modem work together I almost bricked the router and had to make a hardware reset. It was then that I decided that now would be a good time to upgrade my OpenWRT installation. I followed the instructions on the relevant Wiki page - flushed the latest stock firmware first because of a new bootloader, then flushed the latest OpenWRT image. I changed some basic settings (wireless access, root password, ssh keys - that's it) and connected my PC through the refurbished router and into the modem.

And it was working! For a day. Then, after noticing lots of video buffering and disconnects in multiple apps, I tried using speedtest.net a few times. What I've found so far:

  • The download speed can suddenly drop from 70-80Mbps (my ISP speed) down to 10-20Mbps. Sometimes SpeedTest would just hang in there for a second and report "socket error"
  • top -d1 | grep ^CPU shows that immediately prior to the dip/disconnect the sirq CPU load goes up to 50-95% (usually it almost never goes beyond 10%). But rarely sirq can go up to ~60% without leading to any speed/connectivity issues as well
  • When such dips/disconnects become frequent (e.g. when watching a video stream), simply un- and re-plugging the cable between the router and modem fixes the issue
  • The issue appears at seemingly random times. It can be days between symptoms, it can be minutes. Rebooting all devices can still lead to a dip within just a few minutes
  • No idea if it's related, but running SpeedTest on my phone, which is usually connected to the router via WiFi, makes sirq skyrocket to 80-95%, and the speed there goes only up to ~60Mbps
  • logread -f shows absolutely nothing
  • Having my PC plugged into the ISP's modem and my phone still connected to my router exhibits no symptoms (although it might be that I simply don't notice very low speed on my phone - I don't use it that much)

Any tips are appreciated.

Have you tried a different Ethernet cable?

If both ends are now gigabit and you're using a 4 wire cable that commonly comes with a 10/100 DSL modem, it usually will not work well.

Just tried it. The old cable was without any markings but I found a cat6 cable and was able to reproduce the issue after a few tries at SpeedTest. For the first time, I've seen it happen without any spike in sirq.

Just in case, here's how the graph looked.


In that flat region right before the dip, the speed was reported as the same to the last digit - seems like it was an actual disconnect with a subsequent reconnect which was reported as the dip.

Huh, so it's not an OpenWRT issue after all. I think.
The page at https://openwrt.org/toh/tp-link/tl-wdr4300_v1 says:

Update to stock firmware: 3.13.33(130617) before installation (highly recommended)

Assuming that the page is just outdated, I've installed sock firmware 3.14.3 and proceeded with OpenWRT installation, as described in the OP.
A few days ago, I decided to flash the stock firmware instead of OpenWRT to see if the problem persists.
And it does! So one of these must be true:

  • It's the problem in the stock firmware that somehow affects an OpenWRT installation done on top of it
  • It's the problem both in the sock firmware and in OpenWRT that wasn't there years ago
  • It's the problem with the new ISP modem that somehow does not manifest itself when my PC is plugged directly into it

At this point, I'm not sure what else to try to debug this issue. I guess, flashing some old stock firmware would be the next step.

The recommendation to update the OEM firmware version to >=3.13.33(130617) is to retrofit push-button tftp recovery to its bootloader. The device originally didn't support easy recovery, that was only added rather late in its life cycle in summer 2013.

Flashing OpenWrt to the device replaces the complete firmware (aside from the bootloader), so there are no remains to consider once OpenWrt is running.

Thanks! So this eliminates the first item from the list. And with that, seems like also trying old OpenWRT versions would make sense, maybe I'll even be able to bisect the issue.

If you router flat out doesn't work/link with devices it's most likely time to look for something new as you're probably seeing some kind of hardware issue unfortunately it's very hard to tell as there is little to no logging enabled by default in OpenWrt or utilities.

It does work. But there are occasional disconnects during heavy network load.

That's what I want to confirm or exclude by going back to an old version of OpenWRT that I used to use. Will probably do it this week.

What do you mean by "by default" - is there a way for me to enable it, without outright writing my own logging functionality?

And using old OpenWRT didn't help either.