I am in desperate need of some education. I assume the issue and solution is documented somewhere, but haven't found an answer I like.... I guess what I am looking for is the "Universally Undisputed Correct Solution"
The basic problem is installation of OpenWrt on a device with NAND flash where the OEM firmware provides no writable rootfs, and the bootloader has no UBI support. So we want to split the OEM firmware partition in a kernel mtd the bootloader can read and a UBI mtd for the rootfs and other data the bootloader don't need ot see.
This is simple in device tree: We just create an additional UBI partition starting where we want to put it.
The question is: How do we bootstrap the split properly, when installing from either OEM firmware or bootloader? They obviously only know about the OEM partition layout.
Being a simple mind, I've always thaought that this is as simple as: Create a "factory" image which is the concatination of the kernel and ubi partitions, where the kernel is padded to the split point. This image can then be flashed like the OEM firmware from e.g. the bootloader, and when booted it will known about the proper split from device tree and mount the rootfs and rootfs_data found in the UBI image.
Now I've been "lucky" enough to get hold of a device with a bad block. As expressed by the bootloader:
Check image validation:
Image1 Header Magic Number --> OK
Image2 Header Magic Number --> OK
Image1 Header Checksum --> OK
Image2 Header Checksum --> OK
Image1 Data Checksum --> ................................................ranand_read: skip reading a fact bad block 440000 -> 460000
....................................................OK
Image2 Data Checksum --> ....................................................................................................OK
Image1 Stable Flag --> Not stable
Image1 Try Counter --> 0
Image1: OK Image2: OK
The bad block is in the part of the OEM firmware partition which I must use for the kernel. So I realize that the "factory image" I described is a recipe for disaster. The bad block is properly skipped both when reading and writing. Which means that the kernel partition is one block smaller that my padding calculation assumed, and the UBI image with the rootfs ends up at the wrong position. It's no longer where the device tree says it should be, but one block offset. Resulting in
[ 3.411400] UBI error: no valid UBI magic found inside mtd4
since that magic now is actually in the second block of mtd4 instead of the first.
So I know the problem. But I still don't have a solution I like...
I could require a two step installation process, where the user has to boot an initramfs with the proper device tree first and then sysupgrade to the real installation from there. But I would prefer avoiding the additional step if possible.
Or I could replace the fixed partition split with some dynamic splitting method, allowing a combined kernel+rootfs image without any specific assumption on rootfs placement. But this is not possible with UBI AFAIK. Keeping PEM counters and doing proper wear levelling means that we can't move, add or remove underlying flash blocks. And I believe we do want UBI under any writable file system on nand flash? So this method is pretty much ruled out.
Any comments are appreciated, whether you have a solution or not.