How to determine why the router rebooted

Last night I was doing some online training when suddenly pages stopped loading. I checked the luci web interface and saw that graphing had stopped. It didn't initially occur to me that the page was cached and that the router was down/rebooting.

Later when pages started reloading I checked the output of a log file I created that records the uptime of the router as well as the uptime of the WAN interface and saw that the uptime had reset (so it wasn't simply a networking problem).

But when I examined the System and the Kernel log, I don't see anything to indicate problems in the lead up to the reboot. Admittedly there is a good chance I don't know the significance of some of the entries and am overlooking something important.

Is there anything I should be looking at/for?

Some Kernel Log either side of the reboot event:

[    8.230709] br-lan: port 1(eth1) entered forwarding state
[    8.239763] bcmgenet: Skipping UMAC reset
[    8.245811] bcmgenet fd580000.ethernet: configuring instance for external RGMII
[    8.253528] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[    8.261844] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    8.369050] pppoe-wan: renamed from ppp0
[    9.120118] br-lan: port 1(eth1) entered disabled state
[   10.787821] r8152 2-2:1.0 eth1: Promiscuous mode enabled
[   10.793483] r8152 2-2:1.0 eth1: carrier on
[   10.798251] br-lan: port 1(eth1) entered blocking state
[   10.803484] br-lan: port 1(eth1) entered forwarding state
[ 8290.419662] br-lan: port 1(eth1) entered disabled state
[ 8290.432254] device eth1 left promiscuous mode
[ 8290.436676] br-lan: port 1(eth1) entered disabled state
[ 8290.621210] bcmgenet fd580000.ethernet eth0: Link is Down
[ 8291.828932] br-lan: port 1(eth1) entered blocking state
[ 8291.834195] br-lan: port 1(eth1) entered disabled state
[ 8291.839568] device eth1 entered promiscuous mode
[ 8291.844812] r8152 2-2:1.0 eth1: Promiscuous mode enabled
[ 8291.850611] br-lan: port 1(eth1) entered blocking state
[ 8291.855852] br-lan: port 1(eth1) entered forwarding state
[ 8291.863877] bcmgenet: Skipping UMAC reset
[ 8291.869032] bcmgenet fd580000.ethernet: configuring instance for external RGMII
[ 8291.876704] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
[ 8291.884798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Some system log from around the time of the reboot:

Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: using nameserver 8.8.8.8#53
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: using nameserver 150.203.1.10#53
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: using nameserver 8.8.8.8#53
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: using nameserver 2606:4700:4700::1111#53
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: using nameserver 2606:4700:4700::1001#53
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: read /etc/hosts - 4 addresses
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq[3666]: read /tmp/hosts/dhcp.cfg01411c - 2 addresses
Sat Aug  6 11:53:47 2022 daemon.info dnsmasq-dhcp[3666]: read /etc/ethers - 0 addresses
Sat Aug  6 11:53:51 2022 user.info : luci: accepted login on / for root from 10.0.0.62
Sun Aug  7 00:42:11 2022 daemon.notice ttyd[2816]: time discontiguity: at old time 1659750835s, new time 1659796931s: +46096s
Sun Aug  7 00:42:14 2022 cron.err crond[973]: time disparity of 768 minutes detected
Sun Aug  7 00:42:28 2022 daemon.err collectd[1773]: Sleeping only 2s because the next interval is 46064.819 seconds in the past!
Sun Aug  7 00:43:00 2022 cron.err crond[973]: USER root pid 3921 cmd /root/wanuptime.sh
Sun Aug  7 00:44:00 2022 cron.err crond[973]: USER root pid 3931 cmd /root/dns_comp_comper.sh
Sun Aug  7 00:44:00 2022 cron.err crond[973]: USER root pid 3932 cmd /root/wanuptime.sh
Sun Aug  7 00:44:05 2022 daemon.notice ttyd[2816]: HTTP / - 10.0.0.62
Sun Aug  7 00:44:05 2022 daemon.notice ttyd[2816]: HTTP /token - 10.0.0.62
Sun Aug  7 00:44:05 2022 daemon.notice ttyd[2816]: WS   /ws - 10.0.0.62, clients: 1
Sun Aug  7 00:44:05 2022 daemon.notice ttyd[2816]: started process, pid: 3943
Sun Aug  7 00:44:09 2022 auth.info login[3944]: root login on 'pts/0'
Sun Aug  7 00:45:00 2022 cron.err crond[973]: USER root pid 3963 cmd /root/tracer.sh
Sun Aug  7 00:45:00 2022 cron.err crond[973]: USER root pid 3964 cmd /root/wanuptime.sh

The logs are stored in RAM, so they will be wiped on reboot. Therefore, the only way to find out what is happening (via the logs) is to send them to a remote system (and hope that the last entries actually tell a story). To do this, look at rsyslog or similar techniques.

That said, it's probably not worth spending a lot of time to debug if this only happens once or twice. However, if you're seeing it happen with some regularity, also try to figure out the conditions under which this is happening. For example, if you have a failing power supply or some bad capacitors, you may find that the device reboots during periods of high loads (high bandwidth operation). Or if you have installed a bunch of packages, it is possible that you have an out of memory (OOM) issue.

Generally, OpenWrt is really stable, so it shouldn't crash or reboot in general... but to dig deeper, it will be a combination of remotely logging and observing the conditions that seem to precipitate the reboot.

3 Likes

My experience with OpenWrt is that it has been really stable, which is why the reboot puzzled me. As for the logs, I'm already writing them to somewhere other than RAM, but not a syslog server. Might try and set one up on the AWS free tier and send logs to it. Thanks, I'll keep what you said in mind about the power supply. Maybe this buildings power is dirty or the offical Raspberry Pi AC adapter has gone bad.

It looks like my OpenWRT is setup to reboot 3 seconds after a kernel panic:

root # sysctl kernel.panic
kernel.panic = 3

Apart from that: are there any situations it would auto-reboot? I presume nothing in userspace would trigger that?

I'd hazard a guess that it's more likely you're suffering from sudden power failures, not reboots, but it's still possible they could be reboots.

If it's a sudden power failure: there will be no logs sent out even with networked logging.

If it's a kernel panic: there will still be no logs sent out over the network AFAIK. You have to attach something to the physical serial port on the router to see what happens (?)

Hope it wasn't a timed course/quiz :frowning:

I had some issues previously with my small network switch locking up (all lights on permanently). It turns out it was a bad wall power supply, the problem was reproducible only with that supply. It's been fine in the few years since.

Hope you find the fix to your problem.

Another simple option is to use an ssh connection from another device (like a spare computer or a pi, etc.)... ssh to the router, set the scroll back to infinite and then issue logread -fon the ssh session. This will basically read anything that is happening in the logs into your ssh session, and even when that session stalls out, you'll still have a complete log of everything that came before it in the scroll back buffer.

2 Likes