PPPoE Router rebooting every 15-30 minutes?

Maybe the logs?

Thank you. After reviewing them, i don't see anyhting that screams problem :frowning:

Here is a copy of the System Logs:

Here is a copy of the Kernel Logs:

I seem to have a lot of dnsmasq events, but that may simply be because it's rebooting and having to re-define everything?

EDIT: After every reboot, the system log erases everything newer then Sun Mar 28 23:20:17 2021. So i always have the logs from that moment and older, but the new stuff is always deleted after a reboot, so i may be simply missing the logs that are causing the reboot

Using ssh program (example Putty) connect to main router and run following command: logread -f
When your router is unexpectedly rebooted, the logs will still be present in the ssh window.

1 Like

Yes obviously you need the logs around the moment it crashes... For that you may save logs to file (there is an option for that) and inspect them at the following reboot, or send everything to an external log facility (even netcat listening on your machine on udp 514). The time thing should just be the change of time during the boot...

Yes the logs are in RAM only. So all logs from before the crash are lost.

You can try sending the logs to a log server, or try ssh into the router and run log read with appropriate options to continuously print the logs into the terminal

Thank you @leeandy & @xorbug & @dlakelan

I think the issue may have been wifi related. I just disabled my 5Ghz and 2.4Ghz SSIDs and i've now got over 50 minutes of uptime!

I was planning on re-IPing the router so i can put the ISP router back in place so i can get some work done today, while still being able to troubleshoot - but i seem to have accidentally found the issue?

I'll keep an eye on it in case this is just a coincidence and the reboot is on it's way, but I wonder if this helps point me in the right direction?

I've essentially configured the 5Ghz an 2.4Ghz SSIDs identically across the 3 Archer C7 devices. WPA2 + 802.11r enabled, and left the rest to the default settings.

Maybe im misconfiguring the interfaces or somethin?

If its a mips router try:
cat /sys/kernel/debug/crashlog > /tmp/last_crash_log ; cat /tmp/last_crash_log
after the crash, MIPSen try to keep some crash information around.... (I wish other architectures did the same as easily).

Update:

I've turned 5Ghz radio back on and still haven't had any reboots.
So it seems like something wonky with the 2.4Ghz.

I don't need it really, since my 2.4Ghz only devices are connected to the other AP's at the moment, so i can keep it off for now.

I'll have to start logging some syslog and reviewing it at a later date.

I've had good luck using a cheap usb stick formatted as f2fs as a logging volume. So far running continuously for over a year on my router as a central log server on a usb stick I literally found on the ground without problems

Oh i didnt even think of that! How big's your drive, and is there a guide i can follow to ensure its mounted / pointed to properly?

Great idea :slight_smile:

hope you are not a botnet zombie now :laughing:

No it was a USC university campus info drive dropped by a student near my wife's lab. Also reformatted. :wink: I think the risk was super low.

@mazza2590 my drive is only 1G I think. I don't know about a guide but if you know linux you just install the f2fs tools and gdisk and repartition to a single partition with f2fs and add the partition to /etc/fstab

1 Like

Okay, one more for the infiltration handbook, source USB media that look like they belong to the target area "naturally" and fill them with innocent local material :wink:

Eh, people have gotten malware installed directly into big name usb flash drives directly from the factory :wink:

The real thing needed is to ensure that usb host hardware, firmware, and drivers are resistant to malicious usb devices. It's a risk to plug any usb device into your machine without those proper mitigations.

Hello again all.
Finally got rsyslog working and have a lot of events.
Dont really see anything that stands out other then the following:

ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/fwcfg-pci-0000:00:00.0.txt failed with error -2 
firmware ath10k!fwcfg-pci-0000:00:00.0.txt: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:00:00.0.bin failed with error -2 
firmware ath10k!pre-cal-pci-0000:00:00.0.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/ct-firmware-5.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!ct-firmware-5.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/ct-firmware-2.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!ct-firmware-2.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/firmware-6.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!firmware-6.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/firmware-5.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!firmware-5.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/firmware-4.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!firmware-4.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/firmware-3.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!firmware-3.bin: firmware_loading_store: map pages failed 
ath10k_pci 0000:00:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/board-2.bin failed with error -2 
firmware ath10k!QCA988X!hw2.0!board-2.bin: firmware_loading_store: map pages failed 

(There were tons of repeating events, i just trimmed down and remove the duplicates as best i could)

Is it possible that these errors are what's causing my router to crash when wifi is enabled?

It's just weird that i have 3 of the same hardware, and i'm only getting 1 that crashes?
Sure the other two are doing nothing BUT wifi, with odhcpd/dnsmasq/firewalld disabled.. but still!

EDIT - Nope, i see these errors on my APs as well! This is strange

These errors seem to come from the driver looping over a list of potential firmwares to load and failing to find them, see https://forum.openwrt.org/t/showing-ath10k-firmware-load-successes/35550

Have you tried the crash debug output:
cat /sys/kernel/debug/crashlog > /tmp/last_crash_log ; cat /tmp/last_crash_log

use this immediately after a reboot following one of your crashes.

Thank you @moeller0 !

I disabled logd as part of my rsyslog & syslog-ng advendture yesterday. Will the crashlog still exist in this case? If so, i'll turn wifi back on to trigger the crash soon..

I do see some "Crashlog allocated RAM at address 0x3f00000" events, but there is no crashlog file that exists at the moment. (wifi's been off overnight, so things were stable)

I think this is independent of logd, but I have not tested that independence, I did use the crashlog feature in the past successfully though.

Appreciate it. Will report back soon!

1 Like