Unexplained hangs freezes and reboots with Archer c2600

Mine crashed after less than 20 hours of uptime. I'm going to try tweaking the values in the script to be even more aggressive. One really wonders about the quality of the silicon in these devices.

For now I'll up the base frequency from 600->650 but leave sampling_down_factor and up_threshold at default values. And I'll continue bumping the frequency after crashes up to the 800MHz min frequency mentioned in the commit message.

I changed the CPU governor to performance as has been commented. So far I am at 9+ days uptime. We'll see how long it lasts...

There's almost certainly nothing wrong with the silicon. The issue is almost entirely related to the fact that there is no documentation available on how to correctly write software for this hardware. Sadly with most consumer devices, this is the way. The hardware is more than capable if you know how to use it properly, but that's a big if, and finding out by experimentation is often the only way, in the absence of any actual official documentation.

My last reboot was an unexpected power outage, been up for 33d 15h 30m 28s.

Not sure if this help other users, I have this TP-Link Archer C2600 performance with 17.01.4 - #43 by otnert in my rc.local file, and running irqbalance.

Your post makes sense to me, binning and all. But here is what I don't understand: If the devs that worked with Qualcomm decided to stick with 800MHz minimum for stability sake, why do we go lower? It's not like these devices are overheating out of the box, nice aluminum heatsink and the PSU itself seems pretty damn good.

My uptime is higher now @ 650 versus 600, but based on what you said trying to change the min frequency higher is not worth it then. If all are binned equally, then I shouldn't be crashing @ 600.

Previously when running at max frequency 24/7, from the day I flashed the firmware to the day I flashed the next build it was 100% stable. I always assumed something went wrong during frequency changes, based on the above experience. However I also might have mistakenly assumed all processors would be more stable if you underclocked them, assuming the VID table or your undervolt wasn't too aggressive.

I'll skip 700-750 and go back to the default 600MHz minimum if it crashes the next time, and tweak the scheduler values little by little (+/-) and see if anything changes for the better. And if not, try 800MHz with stock scheduler values. And if that fails, back to aggressive.

edit: A bit over 2 days uptime with 650MHz. Now back to 600MHz and tweaking scheduler values.
edit2: I feel a bit stupid, but these are the following valid steps for clock frequency on the C2600 at least: 384000, 600000, 800000, 1000000, 1200000, 1400000. So me setting the base from 600-650 had no effect in reality.

Right now I'm trying the following settings. On my usual load one core will stay at 1,4GHz, the other will move between idle and max freq. It's more sustained/less dynamic with the frequency selection. I'm getting the current frequency data from: /sys/devices/system/cpu/cpufreq/policy*/cpuinfo_cur_freq

    echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
    echo 600000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq
    echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor (def: 10)
    echo 30 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold (def: 50)

edit3: Currently at almost 8 days uptime with the modified ondemand governor values at stock 600MHz frequency. Better than the previous two crashes (a couple of hours & less than 2 days at default values).

edit4: Crashed after 10 days roughly. I've decreased up_threshold to 20. This will probably be my final adjustment to the scheduler on 600MHz, after that I'll try default OpenWRT values but with 800MHz set as the minimum frequency.

Just a heads up regarding irqbalance,
Following posted in another topic:

After installation of irqbalance,
Users only needs to enable it in

/etc/config/irqbalance

1 Like

Mine crashed recently too. It was running continuously for about 2 weeks.

I've had irqbalance running for several months on the C2600 with no effect on stability that I could detect. Might be more effective now in combination with the tweaks to minimum frequency and scheduler values, but it still crashes regardless.

OK it crashed again today with 800MHz min frequency. So a little bit more than 1 week uptime. I think it went back to just like before without disabling the ondemand scheduler.

I'm creeping up on two weeks of uptime with the following settings:

    echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
    echo 600000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq
    echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
    echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold

Which pretty much means I never sit on min freq unless the device is idle. I suspect if I decided to loop a command to disable WAN, enable, disable to force load/idle/load it would crash a lot faster.

edit: Nevermind, crashed. I give up, it's a waste of my time to try and tweak around with something that is obviously broken. Back to performance.

Since the last power outage...

 OpenWrt 19.07.7, r11306-c4a6851c72
 -----------------------------------------------------
root@OpenWrt:~# uptime
 14:08:32 up 69 days, 18:47,  load average: 0.00, 0.04, 0.06

I have been using this.....

since chasing OpenVPN thoughput back then, now I run WireGuard though but have left the same in the rc.local.

If your after any other settings let me know.

EDIT Ahh! my router also succumbed to a spontaneous reboot - all up 72 days!

I'd guess a drop in traffic caused the scheduler to decrease frequency, and triggered some firmware/hardware bug. We all have varying success with the various fixes, but I think hardware revision differences also play a role. My C2600 is one of the earlier revisions, and I'm assuming that's why mine crashes often with ondemand, when others are reporting their problems fixed outright on the new scheduler script. Either way, it's been a problem for years now, and I've lost all hope of ever fixing it.

Some update: Been running on 800MHz-max with ondemand governer for 55 days now and not a single reboot during these days. So 800MHz seems to be the safe limit.

(I think my previous post was not accurate since I forgot to uncomment the lines that set the min freq to 800MHz, so it ran on the default low and therefore still crashed after a few weeks.)

Maybe you should just use performance then. Could actually be bad silicon if 800MHz min freq doesn't resolve it or at least make the reboots a lot less often.

1 Like

Hello,
I am quite lucky with these additional commands in /etc/rc.local:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state0/disable
echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state1/disable
echo 1 > /sys/devices/system/cpu/cpu1/cpuidle/state0/disable
echo 1 > /sys/devices/system/cpu/cpu1/cpuidle/state1/disable
2 Likes

I'm trying these settings, i've only adjusted scaling_min_freq to 800000 . Let's see how stable it is.

Hello.
I also have C2600 v1.1 with OpenWrt 19.07
On my case system work stabile when scaling_governor set to performance but my router got three times reboot after seven days.
Reboot was always when I try download big ISO file and on the same time I tried copy some file on LAN.

My max uptime was 83 day and currently 39 days.

I think this settings don't have sense.

performance - always set CPU clock to max and on this point at 1200MHz
Setting scaling_min_freq got sense only when scaling_governor is set to ondemand.

I think bellow settings should also be set to policy1 directory (I got this on OpenWrt 19.07)

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq

I think correctly should be (or other CPU frequency):

echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo ondemand > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq

The settings avoid any switching between frequencies, that is intentional. It is a bit redundant, however without any side effects.

As mentioned, at least for my routers it is stable and reboots are not unscheduled:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state0/disable
echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state1/disable
echo 1 > /sys/devices/system/cpu/cpu1/cpuidle/state0/disable
echo 1 > /sys/devices/system/cpu/cpu1/cpuidle/state1/disable

The routers are consuming more energy by not clocking down. However, I prefer it to keep running without reboots. The maximum possible speed is also not used, as that might also trigger stability issues. 1.2GHz seems stable to what I can observe.

The ondemand scheduler always caused issues sooner or later.

I only use scaling_governor, my devices are at 345, 166 and 87 (it was powered down, not a reboot) days.
Use them as APs though, so there might be a difference, the one with the longest uptime, uses 19.07.3, the other two 19.07.6.

1 Like

Hmm, okay - I follow stable, so I am at 21.02.0 now. Perhaps a regression?

Update 03.12.2021: I followed to 21.01.1 and still the C2600 devices did not reboot unscheduled (so far?).

1 Like