Unexplained hangs freezes and reboots with Archer c2600

I've been running a TP-Link C2600 for over two years now. Been through all OpenWRT versions from 17.01 to 19.07.

The issue is the unit freezes and hangs quite often. Most of the time it reboots automatically. Sometimes the lights stop blinking, and I have to yank out the power cord to recover it. I can't find anything in the system logs or dmesg.

Has anyone else been experiencing similar symptoms? I suspect I should use the Widlarizer to fix it.

Have you installed or upgraded any packages since you flashed the firmware? What have you changed in the configuration (relative to the default configuration)? How often does the problem occur? What else have you done to troubleshoot?

For those unfamiliar with the term "Widlarize(er)":

Now, when I have finished my inspection, and I am still mad as hell because I have wasted a lot of time being fooled by a bad component – what do I do? I usually WIDLARIZE it, and it makes me feel a lot better. How do you WIDLARIZE something? You take it over to the anvil part of the vice, and you beat on it with a hammer, until it is all crunched down to tiny little pieces, so small that you don’t even have to sweep it off the floor. It makes you feel better. And you know that that component will never vex you again. That’s not a joke, because sometimes if you have a bad pot or a bad capacitor, and you just set it aside, a few months later you find it slipped back into your new circuit and is wasting your time again. When you WIDLARIZE something, that is not going to happen. And the late Bob Widlar is the guy who showed me how to do it.

Bob Pease – Troubleshooting Analog Circuits

Source: https://hackaday.com/2014/04/08/heroes-of-hardware-revolution-bob-widlar/

That device should work rather well, but without more detailed logs it's hard to guess what might be the problem. Potential issues would include:

  • dying PSU, given the age of your device and the relatively cheap PSUs TP-Link is using, that might be an issue. If you have a compatible one, it wouldn't hurt trying it.
  • there is a known bug in stmmac (the ethernet driver for your device), which crashes/ reboots if it encounters a certain type of jumbo frame on the network - this is fixed in kernel >=4.19, so testing a current master snapshot (usual caveats apply) might help.
  • for anything else, logs would be required.
1 Like

@tmomas, good eye catching the reference!

The extent of my troubleshooting has been to fire up Wireshark and look for ARP floods, but nothing of the sorts came out. I have luci installed, and apcupsd. I have a couple of other managed ethernet switches that can spit out jumbo frames. I will be running with master for a while, see if that fixes it.

Regarding logs, which logs are relevant (besides dmesg) ?

I'd start with keeping logread -f running in a ssh session, depending on how quickly things go south that might not actually catch the issue - but chances aren't to bad that at least the beginnings are caught (sadly your device is rather difficult to get a serial connection).

Try replacing the ath10k-*-ct driver with the non-ct version. This got rid of the weekly reboots I was seeing on my C2600.

What I used:

opkg update
opkg remove ath10k-firmware-qca99x0-ct kmod-ath10k-ct
opkg install ath10k-firmware-qca99x0 kmod-ath10k
reboot

This continues to fail in the same manner with Openwrt master. I have ssh readlog but it doesn't seem to useful.

Next step: I'm trying the driver swap suggested by @kitfi

Hi @kitfi. I tried using the non-ct wireless driver and firmware. The problem persists.

@slh, I also tried logread -f, but it didn't catch anything before the hang.

I'm experiencing similar issue with my C2600 as well. I've been getting weird reboots since day 1. Tried to replace -ct driver with stock ath10k and the problem persists (although it seems to happen less frequently. Could be just me.). I also tried to log to a USB drive but didn't recover anything useful after a crash.

Out of curiosity what hardware revision do you have? Mine is 1.1. I'm suspecting that it could be hardware-related given that some people seems to run the router just fine.

The last random reboot I experienced was back in April.

Ever since then not one. I'm using ath10k-firmware-qca99x0-ct

Do you have HW v1.1 as well?

Yep c2600 v1.1

Just browsed amazon 1* reviews about C2600 and I found aside from the daily "f you q******m (except they said tplink but we all know what happened behind the scene) for dropping support for early adopter hw with a ton of unfixed bugs" a lot of people complained about freezing and rebooting issues similar to what we experienced. Maybe it indeed is a hardware issue that's independent of what firmware used?

BTW anyone knows how to take C2600 apart safely? I'd like to add back the UART circuitry and grab some more dmesgs but can't figure out how to take out the motherboard without cutting the bottom case in half.

If you find out, would be great if you could add this information to https://openwrt.org/toh/tp-link/tp-link_archer_c2600_v1#opening_the_case

I already read this. This only documents how to add a serial port and doesn't specify how to remove the motherboard from the bottom casing, which is what I'm having difficulty of.

EDIT: Nevermind. Figured it out. Taking out the silver frame made the motherboard come out a lot easier.

I regret the day I got my C2600. Any sort of high usage via 2,4GHz WiFi will result in packet loss, connection loss requiring a WiFi restart every 10m to 1 day. Time between is completely random, more usage = faster time to failure. If you mainly stick to 5GHz and only very sparse use 2,4GHz, you might get several days of uptime before a spontaneous crash and reset. If you stick with wired only, you're golden.

Restarting the device seems to work less effectively than power cycling, why or how I don't know. It's like whatever hardware is running the WiFi stays initialized in a bad state even after resetting/rebooting.

I've tried every release of LEDE to OpenWRT from 2017 to now, including religious testing of ath10k-ct firmware. Nothing worked 100%.

As soon as OpenWRT support a good 802.11ax AP without any apparent issues, I'm selling a kidney and migrating ASAP. The amount of grief I've experienced due to random WiFi issues at weird hours of the days is frying my sanity.

Might be related and worth reading (and worth trying out the proposed solution disabling cpufreq):

1 Like

Interesting, thanks for the link. I'll give it a shot.

Edit: When you think about it, it makes sense why the C2600 would tend to crash alongside wireless issues. It's not ath10k/-ct crashing the kernel, it's the associated downclocking that would happen due to the cpu usage lowering thanks to a sudden drop in traffic.

Edit 2: Yep, it's stable now. No more hard resets with the CPU governor set to performance. And I can't really see any noticable increase in temperature over baseline settings, maybe +1C with my CPU usage.

This doesn't really explain why the router works for some people but not others though. If it's a driver bug then everyone should see the router randomly crashing. Will try anyway and see what happens.

EDIT: Done. Let's see how it goes...

# echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_gove
rnor
# echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_gove
rnor

As for the serial capture: I'm still waiting for the parts to arrive (I ordered all the passives but forgot the level shifter chip, duh) and trying to find some time to do the modding. Also I'm not hacking an adafruit level shifter board into the router but instead I plan to populate the unpopulated serial port circuit. It seems to be the same as the VR2600 serial level translation circuit. I didn't know the values for all the passives so I made some guesstimates.

For anyone who is interested: