I've been chasing down an intermittent failure to restore settings on sysupgrade on a device under development (GL-AR300M NOR flash, NAND-enabled ath79 kernel) and had thought I had it found a few times. No, it wasn't #!/bin.sh
and it wasn't even that there was some mysterious package that was making it work after the reboot, even though adding packages seemed to resolve the problem.
After determining that the "magic package" was one that just has my build info in it, no executable content, I looked to size alone. Sure enough, adding "incompressible" data to the ROM was enough to prevent/enable the failure.
It seems as if there is a point at which the upgrade image is too small to successfully preserve/restore settings.
Does this ring a bell for anyone?
Edit: Current hypothesis is that while the data is being written for /sysupgrade.tgz
in both cases, it seems to be in the "wrong" place for the smaller rootfs image, "stuck" at 0x400000, rather than 0x3f0000.
mtdsplit is done using mtd_get_squashfs_len()
which reads the super-block and is consistent with binwalk
.
mtd -j /tmp/sysupgrade.tgz write -
looks for deadc0de
to determine where to write the JFFS2 data. It finds it at 0x400000 in both cases.
Curiously, the sysupgrade.bin is the same length in both cases, even though the squasfs portion is of different sizes, crossing an erase block boundary.
On a "successful" upgrade, one can see the JFFS2 file system being initialized early, and the config from /sysupgrade.tgz
being restored
1966080 0x1E0000 Squashfs filesystem, little endian, version 4.0, compression:xz,
size: 2198678 bytes, 852 inodes, blocksize: 262144 bytes, created: 2019-06-29 22:09:24
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
[ 7.393991] eth0: link up (1000Mbps/Full duplex)
[ 7.398814] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 9.588446] jffs2_scan_eraseblock(): End of filesystem marker found at 0x10000
[ 9.598227] jffs2_build_filesystem(): unlocking the mtd device...
[ 9.598263] done.
[ 9.606662] jffs2_build_filesystem(): erasing all blocks after the end marker...
[ 52.629010] done.
[ 52.638785] jffs2: notice: (437) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.
[ 52.656267] mount_root: overlay filesystem has not been fully initialized yet
[ 52.671818] mount_root: switching to jffs2 overlay
[ 52.700774] overlayfs: upper fs does not support tmpfile.
- config restore -
[ 52.994026] urandom-seed: Seed file not found (/etc/urandom.seed)
[ 53.146404] eth0: link down
[ 53.161137] procd: - early -
[ 53.165061] procd: - watchdog -
[ 53.774244] procd: - watchdog -
[ 53.777770] procd: - ubus -
[ 53.912826] urandom_read: 5 callbacks suppressed
[ 53.912835] random: ubusd: uninitialized urandom read (4 bytes read)
[ 53.924779] random: ubusd: uninitialized urandom read (4 bytes read)
[ 53.932833] procd: - init -
Please press Enter to activate this console.
However, if the image is too small (same kernel config and executable content), the JFFS2 initialization happens much later, and /sysupgrade.tgz
isn't found
1966080 0x1E0000 Squashfs filesystem, little endian, version 4.0, compression:xz,
size: 2152274 bytes, 842 inodes, blocksize: 262144 bytes, created: 2019-06-29 22:09:24
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
[ 7.393980] eth0: link up (1000Mbps/Full duplex)
[ 7.398796] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 9.580257] mount_root: no usable overlay filesystem found, using tmpfs overlay
[ 9.612919] urandom-seed: Seed file not found (/etc/urandom.seed)
[ 9.711420] eth0: link down
[ 9.727194] procd: - early -
[ 9.730286] procd: - watchdog -
[ 10.360441] procd: - watchdog -
[ 10.364026] procd: - ubus -
[ 10.570984] urandom_read: 5 callbacks suppressed
[ 10.570992] random: ubusd: uninitialized urandom read (4 bytes read)
[ 10.583195] random: ubusd: uninitialized urandom read (4 bytes read)
[ 10.591176] procd: - init -
Please press Enter to activate this console.
[ 11.267951] kmodloader: loading kernel modules from /etc/modules.d/*
[ 11.315830] Loading modules backported from Linux version v4.19.32-0-g3a2156c839c7
[ 11.323711] Backport generated by backports.git v4.19.32-1-0-g1c4f7569
[ 11.395011] xt_time: kernel timezone is -0000
[ 11.483242] urngd: v1.0.0 started.
[ 11.523815] PPP generic driver version 2.4.2
[ 11.543766] NET: Registered protocol family 24
[ 11.674518] ieee80211 phy0: Atheros AR9531 Rev:2 mem=0xb8100000, irq=13
[ 11.773150] kmodloader: done loading kernel modules from /etc/modules.d/*
[ 11.942003] random: crng init done
[ 28.809813] jffs2_scan_eraseblock(): End of filesystem marker found at 0x20000
[ 28.822994] jffs2_build_filesystem(): unlocking the mtd device...
[ 28.823057] done.
[ 28.831416] jffs2_build_filesystem(): erasing all blocks after the end marker...
[ 31.938885] eth0: link up (1000Mbps/Full duplex)
[ 31.954534] br-lan: port 1(eth0) entered blocking state
[ 31.959941] br-lan: port 1(eth0) entered disabled state
[ 31.965728] device eth0 entered promiscuous mode
[ 32.016430] br-lan: port 1(eth0) entered blocking state
[ 32.021837] br-lan: port 1(eth0) entered forwarding state
[ 32.027694] IPv6: ADDRCONF(NETDEV_UP): br-lan: link is not ready
[ 32.148202] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 32.992832] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[ 73.697396] done.
[ 73.699438] jffs2: notice: (1041) jffs2_build_xattr_subsystem: complete building xattr subsystem, 0 of xdatum (0 unchecked, 0 orphan) and 0 of xref (0 dead, 0 orphan) found.
[ 73.872010] overlayfs: upper fs does not support tmpfile.
I've since narrowed it down to between 2,157,546 and 2,162,766 bytes, by adding already-compressed file to the root fs
>>> print("0x{:x}".format(2157546))
0x20ebea
>>> print("0x{:x}".format(2162766))
0x21004e
>>> print("0x{:x}".format(2157546+0x1E0000))
0x3eebea
>>> print("0x{:x}".format(2162766+0x1E0000))
0x3f004e
From failsafe on reboot, working:
root@(none):/# cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00010000 "u-boot"
mtd1: 00010000 00010000 "u-boot-env"
mtd2: 00fa0000 00010000 "firmware"
mtd3: 001e0000 00010000 "kernel"
mtd4: 00dc0000 00010000 "rootfs"
mtd5: 00ba0000 00010000 "rootfs_data"
mtd6: 00010000 00010000 "art"
mtd7: 00200000 00020000 "nand_kernel"
mtd8: 07e00000 00020000 "nand_ubi"
root@(none):~# hexdump -C -n 1024 /dev/mtd5
00000000 19 85 20 03 00 00 00 0c f0 60 dc 98 19 85 e0 01 |.. ......`......|
00000010 00 00 00 36 5d 44 48 fe 00 00 00 01 00 00 00 00 |...6]DH.........|
00000020 00 00 00 02 00 00 00 00 0e 08 00 00 97 8f 0a 5b |...............[|
00000030 31 ff 3d bc 73 79 73 75 70 67 72 61 64 65 2e 74 |1.=.sysupgrade.t|
00000040 67 7a ff ff 19 85 e0 02 00 00 10 44 ee 2d 30 6f |gz.........D.-0o|
and failing:
root@(none):/# cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00010000 "u-boot"
mtd1: 00010000 00010000 "u-boot-env"
mtd2: 00fa0000 00010000 "firmware"
mtd3: 001e0000 00010000 "kernel"
mtd4: 00dc0000 00010000 "rootfs"
mtd5: 00bb0000 00010000 "rootfs_data"
mtd6: 00010000 00010000 "art"
mtd7: 00200000 00020000 "nand_kernel"
mtd8: 07e00000 00020000 "nand_ubi"
root@(none):/# hexdump -C -n 1024 /dev/mtd5
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00000400
Though I'm not convinced that it is a write problem, or an MTD-split / mount_root problem yet.