I have an Ubiquiti EdgeRouter X with OpenWRT 18.06.4 installed and I've faced sudden reboots for no reason. logread doesn't shows anything. Installed packages are up-to-date.
How can I diagnose the reason of this? I'm clueless and this is increasingly becoming worrysome for me.
Most definitive way would be with serial cable, you’d see the last thing kernel did before it restarted .
Next best would be to configure a remote rsyslog service on your laptop/pc and configure openwrt to log to there. Might or might not catch the real reason.
Beyond that, not sure a way to tell conclusively.
You could also try elimination approach:
unplug everything from the device and see if it still reboots (check periodically with uptime)
if it stays up, add a device for a day or two until the issue returns. If it does, then you’ve found the cause. (I had an Apple TV that seemed to freak out my old buffalo router when the Apple TV went to sleep. I ended up just replacing the buffalo device as it was old. But mentioning it here in case you’ve stumbled on something similar
Yeah, two reboots in less than 4 hours. I ssh'ed into the router and now I'm seeing that it's mounted in read-only mode. I've tried to run fsck but it seems that it doesn't come with OWRT.
I rebooted manually the router and I'm not seeing this log anymore. I think that I'll start the "manual debugging" of unplugging devices from it and seeing if that's causing the issue.
If you're lucky, sysupgrading OpenWrt (either the same or a newer version) while NOT keeping settings (nor restoring a potentially defective backup) might fix the situation, as that rewrites fix kernel/ rootfs and create a new overlay. If that doesn't help, you may have to dive deeper into ubi specifics (I'm not really a specialist with that).
The causes can either be hardware related (NAND is prone to individual cells going bad, that's why ubi and ubifs reserve a certain amount of spare sectors for ECC and wear leveling purposes) or just bad luck (disrupting power at the wrong time or something else breaking fs consistency.
If the problem persists after a fresh flash, if it's installed
jeff@office:~$ nandtest -h
usage: nandtest [OPTIONS] <device>
-h, --help Display this help output
-V, --version Display version information and exit
-m, --markbad Mark blocks bad if they appear so
-s, --seed Supply random seed
-p, --passes Number of passes
-r <n>, --reads=<n> Read & check <n> times per pass
-o, --offset Start offset on flash
-l, --length Length of flash to test
-k, --keep Restore existing contents after test
very carefully executed on the UBI partition, using the -k option (which usually doesn't destroy data) would be a way to get a clue as to if there is a problem with the NAND flash.
You should expect that your UBI file system gets broken by running nandtest (even if it isn't supposed to).
So, given the different answers here, it seems that I'll need to flash the router again and cross fingers. I didn't wanted to pay full attention to this since I thought that it was a corrupt package causing that. I've updated all packages last weekend and now I'm getting more reboots than before.
I'll spare some time this weekend for doing that and I'll report back here.
In general, bulk updates of packages is a bad idea for many reasons (If "incompatible ABI" is meaningful to you, then that should give you a hint).
If you want to update, flashing a complete, self-consistent image is recommended. Any packages that you need that aren't present in the image should be added at that time (same day for snapshot images).
Ok, I understand. I'm pretty accustomed to the rolling-release model of ArchLinux and one of the reasons that I found OpenWRT attractive (instead of EdgeOS) was the possibility of having an up-to-date software running -- that include packages --
There is some work afoot to at least be able to identify ABI breakage, but something as sophisticated as apt or your favorite OS's package manager is likely beyond what can be supported even ruling out 16 and 32 MB flash devices. Installing a new kernel generally can't be done with opkg.
Personally, I handle periodic updates by building from source, generally from HEAD of the master branch of OpenWrt and the package feeds. Another option is the Image Builder which will assemble pre-compiled packages from the current repos. The requirements and pre-req software are pretty much the same (Current Linux-based OS, somewhere in the range of 20-32 GB of disk, apt install build-essential git gitk libncurses5-dev gawk unzip wget curl ccache rsync zlib1g-dev or thereabouts).
If nothing else helps, maybe booting a clean OpenWrt image from RAM, dumping all ubi volumes and ubiformat would help. After that you would need to recreate vendor ubi volumes, such as "factory" and restore the content from the dump (via ubiupdatevol command), while openwrt rootfs and kernel volumes can be recreated by doing sysupgrade from the temporary RAM-only system.
If you go by this way, make absolutely sure that you have backed up all the partitions to your computer and store them in a safe place, use md5 or sha256 to control the integrity. Some of ubi partitions might contain data unique to your particular device, such as MAC address, serial number, and/or calibration data and necessary for wireless chip to work, so don't lose it.
To boot OpenWrt into RAM you need to access the serial console first, then use some of bootloader commands, which depend on particular bootloader you have and use initramfs-kernel type image, which you can upload onto device RAM and start.
Do not touch bootloader mtd partitions, however make sure to back them up as well!
In case of ubi, ubi-aware tools such as ubiupdatevol should be used for restoring, rather than nand tools, however cat /dev/ubiX_Y > /tmp/ubiX_Y.dump.img should be sufficient for saving the data.
Are there any problems with using cat /dev/mtdX > /tmp/mtdX.img with NAND? For NOR flash it works just fine at least. However cat mtdX.img > /dev/mtdX won't work, mtd write or /dev/mtdblockX have to be used instead. For ubi there is ubiblock tool, but I think it's easier to just use ubiupdatevol rather than create virtual block device which would work with cat/dd.
I have to say that I was overwhelmed at first with all the responses... some of them contained things that I've never done.
I gave this a shot through the "reset button" method and I was unlucky. I tried again with a USB to TTL adapter, flashed the recovery image and I got EdgeOS back.
Now, I'll reinstall OpenWRT.
As some of you mentioned, when the router was flashing the recovery image it found some bad blocks on the storage that were fixed during the ubi formatting process.
Thanks everyone for your help with this. This thread taught me a lot about how I need to handle updates in these kind of devices!