RaspberryPi sysupgrade looses overlay when /boot partition gets bigger

I'm trying (and failing so far) to figure out why my 19.07 based RaspberryPi installations loose all config data when sysupgrading to 22.03. The 19.07 setups are using sqashfs+f2fs already, not ext4. It is a custom build, but basically just fewer routing related packages and more IoT ones enabled in menuconfig.

I narrowed the problem down to the fact that the /boot partition increases from 20M to 64M. If I build the 22.03 image with a 20M boot partition (which is still just barely enough for bcm2708, but too small for bcm2709), the upgrade works.

I also found that it is not a 19.07 vs 22.03 issue, the same problem happens when I upgrade a 22.03 with 20M /boot to 64M /boot.

The relevant code seems to be in /target/linux/bcm27xx/base-files/lib/upgrade/platform.sh, in particular the platform_do_upgrade() function. It makes a distinction between partition table unchanged and changed. In the changed case, the entire image with all partitions is written as a whole, followed by an interesting comment and two partx statements:

  # Separate removal and addtion is necessary; otherwise, partition 1
  # will be missing if it overlaps with the old partition 2
  partx -d - "/dev/$diskdev"
  partx -a - "/dev/$diskdev"

The comment exactly describes the case of an expanding /boot partition.

Curiously, sysupgrading from a 22.03 with 64M /boot down to a version with 20M /boot works without loosing the overlay backup. This is also a partition table change and triggers the same alternative code path in platform_do_upgrade(), but apparently works.

It seems to me that platform_copy_config() is failing to actually mount /boot after a /boot partition size increase, which would explain why the backup gets lost (including config.txt and cmdline.txt which platform_copy_config() is trying to extract and apply early from the backup so these are already there at reboot).

I totally fail to see what could prevent partx to fail to make /boot mountable in this case, and I haven't figured out a way to singlestep this to see what happens.

So any ideas how to approach this are very welcome!

In the meantime, I found a way to analyze this but no idea how to fix it yet - by getting an interactive console shell after the switchover to ramfs (modified /lib/upgrade/stage2), so I could manually enter and observe the results of these partx calls, and also call ps, lsof etc.:

  • Problem is that in line 67 of /lib/upgrade/platform.sh (target/linux/brcm2708/base-files/lib/upgrade/platform.sh in buildroot), the partx -d - /dev/$diskdev which expands to

    partx -d - /dev/mmcblk0
    

    fails, because partition 2 is busy ("resource in use").
    In consequence, the larger new /boot partition cannot be added, and cannot be mounted later in platform_copy_config, and thus the config backup cannot be saved in /boot, which means it gets lost.

  • I assume this is because the f2fs overlay is still active somehow. ps shows three kernel threads related to f2fs, [f2fs_flush-7:0], [f2fs_discard-7:], [f2fs_gc-7:0].

  • lsof does not show any files open on /dev/mmcblk0p2, but I guess f2fs internals would not show in lsof anyway.

My open questions:

  • are lines 64/65 in /lib/upgrade/stage2 really sufficient to completely unmount a squashfs/f2fs overlay such that the /dev/mmcblk0p2 partition is not in use any more?

    /bin/mount -o noatime,remount,ro /overlay
    /bin/umount -l /overlay
    

    I tried manually to execute these lines, and noticed that the first line (remounting the overlay as readonly) apparently had no effect - /overlay was still rw afterwards.
    Judging from the -l (lazy) umount option in the second line I assume the author expected the volume to be still in use by something. But what? And at what later step it would become free?

  • anything else that could keep the /dev/mmcblk0p2 partition busy? At the time the switch_to_ramfs is run, all other processes have been killed, so no userspace process should have any files open that could make the partition busy.

  • any way to force the partition to get released? I guess that would be acceptable at this point in the process, because the only thing left to do is copy the config backup .tgz from /tmp to /boot (and extract config.txt early so it is ready at reboot), then reboot.

Any ideas and hints are welcome!

Also, confirmed experience of successful update of a RPi with squashfs+f2fs with expanding /boot partition would be a helpful indication that something in my setup must be special, apart from not using ext4 for rootfs.

1 Like

Really nobody else having this problem?

Maybe because OpenWrt on RPi is a niche, and using the squashfs variant seems to be a niche in that niche, and upgrading with /boot size increasing even more, so a niche^3 problem? :wink:

Still - I'd be very interested to learn what can keep the f2fs partition on /dev/mmcblk0p2 busy, when it is unmounted and all user processes except sysupgrade are already gone. Details see the original post and the followup analysis.

Still no solution, but just today, analyzing another bcm27xx sysupgrade problem, I noticed that the regular upgrade function, default_do_upgrade() in package/base-files/files/lib/upgrade/common.sh, does

sync
echo 3 > /proc/sys/vm/drop_caches

The bcm27xx specific implementation, platform_do_upgrade() in target/linux/bcm27xx/base-files/lib/upgrade/platform.sh, does not have that.

The kernel docs say that this includes flushing cached inodes - could it be that not flushing the cache is what keeps f2fs partition busy?

Would it make sense to add echo 3 > /proc/sys/vm/drop_caches to bcm27xx platform_do_upgrade()?

Only general ideas:
Are you able to see the f2fs sysfs entries?
/sys/kernel/debug/f2fs/
/sys/fs/f2fs
Either through that sysfs, or the f2fs_io utility, you should be able to make f2fs GC more urgent

There is also a trace config if you wanted to get really deep: CONFIG_F2FS_IO_TRACE

overlay has plenty of debug prints, so you may be able to build with DYNAMIC_DEBUG, then enable debug for it via kernel boot params, or later via /proc/dynamic_debug/control

Hi @johnth, thanks for these hints!

I will try that when I manage to get the test setup next time (a bit tedious because it is a point in the midst of the sysupgrade, after the pivot to ram-only operation, where a lot of tooling is missing unless I manually copy the needed executables and libs to ram).

A pending f2fs GC sounds plausible as a reason for the underlying partition still being used, but wouldn't unmounting the f2fs also trigger an urgent GC anyway?