Buttonless Failsafe Mode

So, it turns out that this assumption -- that the only way to communicate information to the next boot without relying on flash is with the cooperation of the bootloader -- proves to be false. I grabbed a spare travel router of mine and hacked up a quick proof-of-concept to illustrate how:

The idea is to reserve a single 4K page of RAM to stash some volatile information that can survive reboots. The page should be somewhere near the middle of the physical memory range, where the bootloader and early kernel do not use it. My travel router's system RAM region is 00000000-03ffffff, so I figured that 02BADxxx would be a fitting magic number for something meant for "failsafe" use. :slight_smile:

First I added memmap=4K$0x2bad000 to my kernel command line, which causes the kernel to reserve the page early in the boot process. Then I defined a RAM-backed MTD device at that location:

root@OpenWrt:~# cat /sys/firmware/devicetree/base/bootcfg@2bad000/compatible 
mtd-ram

As you can see, it preserves its contents across warm reboots:

root@OpenWrt:~# sha256sum /dev/mtd0
1a6f70682c46ced47ddb08071cdd49ac8623082f0a8fe90cc164d2e9b6de33ef  /dev/mtd0
root@OpenWrt:~# dd if=/dev/urandom of=/dev/mtd0
dd: error writing '/dev/mtd0': No space left on device
9+0 records in
8+0 records out
root@OpenWrt:~# sha256sum /dev/mtd0
89c85779ab017a6f11e977fc6c9275493c820a4c32eca66da9b698018c79f04f  /dev/mtd0
root@OpenWrt:~# reboot

...
...
...

BusyBox v1.36.0 (2023-03-18 11:47:48 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt SNAPSHOT, r22307-4bfbecbd9a
 -----------------------------------------------------
root@OpenWrt:~# sha256sum /dev/mtd0
89c85779ab017a6f11e977fc6c9275493c820a4c32eca66da9b698018c79f04f  /dev/mtd0
root@OpenWrt:~#

At this point it became trivial to write a reboot_failsafe script, which simply fills that MTD with ASCII "FAILSAFE" repeated 512 times, and a /lib/preinit/35_check_failsafe_flag which looks for this bit pattern, overwrites it with /dev/zero, then sets FAILSAFE=true. Et voilĂ : a robust, specific, flash-free, vendor-neutral reboot-to-failsafe mechanism. From there, a device admin can create whatever software trigger they deem suitable (such as being unable to ping a host for X minutes, or "SOS" tapped out in Morse code on link-up/link-down events, ...).

This type of mechanism can have several other uses as well, such as tracking whether the last reboot was expected or not, stashing a partial kernel panic traceback, counting failed boots, carrying command-line args from reboot_failsafe that override /lib/preinit/00_preinit.conf, and so on. The physics of DRAM is even such that the contents can survive the chip being powered down for more than a few seconds, so it can also count power cuts. Remember: this is RAM, not flash, so it doesn't have the same write/erase limitations.

3 Likes