RaspberryPi sysupgrade looses overlay when /boot partition gets bigger

In the meantime, I found a way to analyze this but no idea how to fix it yet - by getting an interactive console shell after the switchover to ramfs (modified /lib/upgrade/stage2), so I could manually enter and observe the results of these partx calls, and also call ps, lsof etc.:

  • Problem is that in line 67 of /lib/upgrade/platform.sh (target/linux/brcm2708/base-files/lib/upgrade/platform.sh in buildroot), the partx -d - /dev/$diskdev which expands to

    partx -d - /dev/mmcblk0
    

    fails, because partition 2 is busy ("resource in use").
    In consequence, the larger new /boot partition cannot be added, and cannot be mounted later in platform_copy_config, and thus the config backup cannot be saved in /boot, which means it gets lost.

  • I assume this is because the f2fs overlay is still active somehow. ps shows three kernel threads related to f2fs, [f2fs_flush-7:0], [f2fs_discard-7:], [f2fs_gc-7:0].

  • lsof does not show any files open on /dev/mmcblk0p2, but I guess f2fs internals would not show in lsof anyway.

My open questions:

  • are lines 64/65 in /lib/upgrade/stage2 really sufficient to completely unmount a squashfs/f2fs overlay such that the /dev/mmcblk0p2 partition is not in use any more?

    /bin/mount -o noatime,remount,ro /overlay
    /bin/umount -l /overlay
    

    I tried manually to execute these lines, and noticed that the first line (remounting the overlay as readonly) apparently had no effect - /overlay was still rw afterwards.
    Judging from the -l (lazy) umount option in the second line I assume the author expected the volume to be still in use by something. But what? And at what later step it would become free?

  • anything else that could keep the /dev/mmcblk0p2 partition busy? At the time the switch_to_ramfs is run, all other processes have been killed, so no userspace process should have any files open that could make the partition busy.

  • any way to force the partition to get released? I guess that would be acceptable at this point in the process, because the only thing left to do is copy the config backup .tgz from /tmp to /boot (and extract config.txt early so it is ready at reboot), then reboot.

Any ideas and hints are welcome!

Also, confirmed experience of successful update of a RPi with squashfs+f2fs with expanding /boot partition would be a helpful indication that something in my setup must be special, apart from not using ext4 for rootfs.

1 Like