System reboots from time to time!

The router extended the overlay through a USB device, and after that, the system would restart from time to time after more than 10 days. The router does not have any scheduled tasks. Since the system log is cleared after each restart, I don't know how to locate the fault?

In addition, you can sometimes see the following log before the device restarts


...
usb 1-1: device descriptor read/64, error
...
daemon.err hostapd: ...
...

kern.err kernel: [   26.669221] blk_update_request: I/O error, dev mtdblock0, sector 16
kern.err kernel: [   26.721564] blk_update_request: I/O error, dev mtdblock0, sector 120
kern.err kernel: [   26.746073] blk_update_request: I/O error, dev mtdblock0, sector 120
kern.err kernel: [   26.752539] Buffer I/O error on dev mtdblock0, logical block 15, async page read
kern.err kernel: [   26.784012] blk_update_request: I/O error, dev mtdblock0, sector 120
kern.err kernel: [   26.790476] Buffer I/O error on dev mtdblock0, logical block 15, async page read
kern.err kernel: [   27.035361] blk_update_request: I/O error, dev mtdblock4, sector 32
kern.err kernel: [   27.053381] blk_update_request: I/O error, dev mtdblock4, sector 40
kern.err kernel: [   27.077240] blk_update_request: I/O error, dev mtdblock4, sector 128
kern.err kernel: [   27.092646] blk_update_request: I/O error, dev mtdblock4, sector 128
kern.err kernel: [   27.099109] Buffer I/O error on dev mtdblock4, logical block 16, async page read
kern.err kernel: [   27.171238] blk_update_request: I/O error, dev mtdblock4, sector 128
kern.err kernel: [   27.177702] Buffer I/O error on dev mtdblock4, logical block 16, async page read
kern.err kernel: [   27.365285] blk_update_request: I/O error, dev mtdblock7, sector 88
kern.err kernel: [   27.398742] Buffer I/O error on dev mtdblock7, logical block 462, async page read
kern.err kernel: [   27.542394] Buffer I/O error on dev mtdblock9, logical block 11, async page read
daemon.err block: /dev/ubiblock0_0 is already mounted on /rom
daemon.err block: /dev/sda1 is already mounted on /overlay
daemon.err block: /dev/sda2 is already mounted on ***

I judge that the failure is roughly in three aspects:

  1. The router's USB device controller or driver is faulty.
  2. There is a problem with the external USB device.
  3. WIFI device failure.
  4. There is a problem with the pppd service, because there is log information that the operator forces PPPOE to drop and redial.

When the system restarts, there are two phenomena:

  1. The system will freeze for a while before restarting.
  2. After the system restarts, the usb device is not automatically mounted. The usb device must be re-plugged and unplugged, and the system can be restored to normal after restarting the system.

What parameters of device, how much memory?

You could log with SNMP over the network or you could use a serial console.
Then you have the messages you need on a second computer.

Device memory: 128 MB

But this is not the crux of the matter. Now you need to know how to locate the fault !

At present, I use local files to record logs, but the problem is that the logs during the time when the machine fails and restarts are missing. I don't know if this method can be used to record completely?

A remote syslog server is a great way to see what is happening. By sending the logs to a different device, you will be able to see what happens immediately before it crashes/restarts.

At present, I use local files to record logs, but the problem is that the logs during the time when the machine fails and restarts are missing. I don't know if this method can be used to record completely?

The local log files are stored in RAM, and will be lost on a reboot. Storing them on your internal flash memory is a bad idea because it will wear out that storage rapidly. That is why I'm recommending an external syslog server.

That said, since you are using a USB drive, you could log there -- while it has the same risk of wearing out the flash memory, USB sticks are inexpensive and easy to replace, and typical USB thumb drives are designed for way more write cycles than embedded memory.

The system configuration has been modified, and the log is recorded in a file that will not be cleaned up, but this file has a high probability of not being able to record the system log at the time of the failure. It is very likely that the system has crashed or the location of the file is faulty during this time period.

so a syslog server is probably the best option.

If it is the problem of the system itself, the log server cannot guarantee the integrity of the log, right?

The log server (on another device) can guarantee the integrity of the log file that it maintains because it would theoretically not experience unexpected reboots or other ungraceful shutdown events. Files are often corrupted as a function of the system holding the file open and/or writing to the file during an ungraceful shutdown.

That said, the log server can only record what is sent from the router. If the router isn't able to send the log entries that are critical for debugging as the system begins to crash, there isn't really anything else you can do. But this is the best method you'll have of file-based logging.

Alternatively, @hreuver suggested a serial connection -- if the syslog server method doesn't pan out, a serial connection would be more likely to catch the offending events.

1 Like

Start with memory usage. Add to crontab output of 'free -h', also 'df -h'.

Is the occasional restart caused by running out of memory?

I do not know, start from it.

1 Like