Error in Preinit documentation regarding overlays

From the preinit_mount documentation under the Mount Root Filesystem section:

3. If mtd device rootfs_data has not already been formatted, mounts a tmpfs (ramdisk) as root filesystem, and indicates that further steps should be skipped.
4. Mounts previously formatted jffs2 partition on /overlay and indicates successful mount.
5. Makes successfully mounted /overlay (if it exists) the new root filesystem and moves previous root filesystem to /rom, and indicates to skip further steps.

This is only true if a config (sysupgrade.tgz) is NOT preserved across the update. If a config is preserved, step 3 is skipped entirely, and the jffs2 overlay is immediately mounted even if rootfs_data has not been formatted, instead of being deferred till later in the boot sequence. This is a bug, since it ends up hanging the preinit_main script while the rootfs_data partition is being cleaned and formatted for the mounting of the jffs2 overlay (this is why the intended behavior is to first mount the tmp overlay). Machines with large rootfs_data partitions can lock up and crash because of this.

This bug has been around since 2008, and its a messy one to patch. For now, the official documentation should make note of this to help developers who may see inconsistent behavior across "first boots".

1 Like

I have posted a more detailed description of this bug (and my custom patch for it) in the README for this github repo. The official documentation for overlays and preinit sequences should explain this bug to prevent other developers from going down the same rabbit hole I did. Any feedback would be appreciated.

1 Like

I took the liberty of making additions to the wiki for the Sysupgrade and Preinit and Root Mount and Firstboot Scripts technical references.

Here's another more detailed but still abridged explanation of this bug for any non-believers:

Sysupgrades that preserve volatile files ('sysupgrade -c ...') replace the 0xDEADCODE marker at the rootfs/rootfs_data boundary with the tar bundle of preserved files. The 0xDEADCODE marker is moved to the start of the next erase block.

Upon the subsequent first boot, the mount_root utility reads a valid jffs2 file in the first block of rootfs_data, concludes that the partition has already been formatted, and summons the jffs2 driver. The jffs2 driver finds the 0xDEADCODE marker after the tar file and assumes that now is a safe time to format the rootfs_data partition and launch the jffs2 overlay.

This is a bug, since preinit_main hangs while the jffs2 driver formats the partition, which can cause fatal soft lockups on systems with weak cpu and large rootfs_data partitions. The intended behavior for a first boot is to have mount_root kick off an intermediate tmpfs overlay, deferring the jffs2 switch until the /etc/init.d/done call.

interesting...

keep the 0xdeadc0de marker at the rootfs/rootfs_data boundary, and writes the file to raw flash in the following erase blocks. Since the 0xdeadc0de is kept where it belongs, upon reboot after the upgrade, mount_root falls into the FS_DEADCODE case and launches the /tmp RAM overlay.

Always good to learn more about the boot process... especially the lower level procd-init/preinit mount_root side of things... and thanks for adding to the wiki + spending the time to dissect all of this!

1 Like

Sorry to revive this old topic, is this PreInit problem with Overlay still actual with latest 21.02 ?

1 Like

ref: http://lists.openwrt.org/pipermail/openwrt-devel/2020-May/029225.html