Hi all,
I've been really enjoying this router with OpenWrt. I've been getting some crashes lately though - at least what I believe are crashes. This happens with 22.03.6 and 23.05.2 which I recently upgraded to. The box itself does not appear to shut off, and the "Internet" light is still blinking. However, the ssh service becomes unavailable and none of my hard-wired devices can see the router. Based on some of the other forum posts for this device, this does not seem to be a common occurrence as other users can still access their boxes over ssh when a "crash" occurs.
Unfortunately, I don't have a lot of time to debug when this happens as I'm often working and need to reboot it quickly.
I've set up remote logging to one of my servers in my LAN. The only weird thing I've managed to capture is this kernel log, which shows an out of order timestamp. The crash happened around 2024-03-14 17:29
, where I promptly power cycled it to reconnect to a meeting.
2024-03-14T16:09:02+00:00 cheese-router kernel: [464627.974246] br-lan: port 2(lan2) entered forwarding state
2024-03-10T22:11:21+00:00 cheese-router kernel: [ 21.534772] IPv6: ADDRCONF(NETDEV_CHANGE): wl0-ap0: link becomes ready
2024-03-10T22:11:21+00:00 cheese-router kernel: [ 21.541494] br-lan: port 6(wl0-ap0) entered blocking state
2024-03-10T22:11:21+00:00 cheese-router kernel: [ 21.547056] br-lan: port 6(wl0-ap0) entered forwarding state
2024-03-10T22:11:22+00:00 cheese-router kernel: [ 21.688710] IPv6: ADDRCONF(NETDEV_CHANGE): wl0-ap1: link becomes ready
2024-03-14T17:30:34+00:00 cheese-router kernel: [ 96.034885] IPv6: ADDRCONF(NETDEV_CHANGE): wl1-ap0: link becomes ready
2024-03-14T17:30:34+00:00 cheese-router kernel: [ 96.041733] br-lan: port 5(wl1-ap0) entered blocking state
2024-03-14T17:30:34+00:00 cheese-router kernel: [ 96.047246] br-lan: port 5(wl1-ap0) entered forwarding state
2024-03-14T17:33:24+00:00 cheese-router kernel: [ 265.464321] br-lan: port 5(wl1-ap0) entered disabled state
This out of order time stamp seems to line up with a collectd
log where it thinks the system time has changed or something:
2024-03-14T17:29:43+00:00 cheese-router collectd[2972]: Sleeping only 2s because the next interval is 328648.290 seconds in the past!
2024-03-14T17:29:44+00:00 cheese-router collectd[2972]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/cheese-router/cpu-0/percent-system.rrd: opening '/tmp/rrd/cheese-router/cpu-0/percent-system.rrd': No such file or directory
2024-03-14T17:29:44+00:00 cheese-router collectd[2972]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/cheese-router/cpu-0/percent-softirq.rrd: expected 1 data source readings (got 0) from /tmp/rrd/cheese-router/cpu-0/percent-softirq.rrd:...
2024-03-14T17:29:44+00:00 cheese-router collectd[2972]: rrdtool plugin: rrd_update_r failed: /tmp/rrd/cheese-router/interface-br-lan/if_octets.rrd: expected 2 data source readings (got 0) from /tmp/rrd/cheese-router/interface-br-lan/if_octets.rrd:...
Of course, I do not know if this is even significant/related. If it is, then great! This is the only abnormality that I found in the logs when the crash occurred though. There's probably some other messages that cause this that I simply haven't captured with this setup.
I've read I can open up the RT3200 and use the serial port, but with the randomness of the crashes, I need to be able to have a sort of "set it and check it later" setup. Would this be at all possible? Are there any other tips and tricks out there for capturing these things?
For what it's worth, the router is only plugged into a surge protector - no UPS or anything. Maybe this is what is causing the instability, but I would imagine there's quite a bit resiliency in these devices.
Thank you for the help!