Thanks, yes, that was was I was hoping/expecting to see.
The boot loader writes a new "record" on each boot consisting of
-
11 08 11 20
-- "magic number" (marker) -
01 00 00 00
-- boot count -
12 08 11 20
-- checksum
From what I can tell on my EA8300, if the boot count is 3, the boot loader rewrites the environment and sets the next boot count to 1. I haven't confirmed it, but this should be what "flips" the partition on a boot-loop situation.
On "successful" boot, the OS is responsible to write a "0" record so that the boot loader knows boot went well.
The record is not overwritten, but appended. This reduces flash wear and is perhaps also the reason that boot_count
from the environment is not used. (I've never seen anything but 0 in there and 3 in boot_part_ready
). When the partition is "full", it gets erased and starts over -- only the last-written record has significant value.
The "challenge" is that NAND flash typically has a minimum write size, 2048 bytes, for example. So on a NAND-based Linksys device, these boot-count records are 2048 bytes long. When the Linksys device(s) was ported to the IPQ40xx platform, someone noticed that it was writing 16-byte records, not the 1-byte record that the MTD parameters show, and hard-coded 16 in for the platform in general. As a result, the "pure NAND" EA8300 had some pretty strange boot behavior because the code to reset the boot count was silently failing.
I've since rewritten package/system/mtd/src/linksys_bootcount.c
to
- Actually log success and failure
- Auto-detect if the record size should be 16 or the MTD parameter (E6350v3 being the only exception I have found)
- Return a meaningful error value (and update init scripts to ignore it, so as not to stop boot)
I'll make sure it's clean and put in a PR/patch as it impacts more than just the EA8300. If you have a chance to check it, that would be great!
If you like to amuse yourself, here's what I run from /etc/profile to "see" this in action. Yes, my s_env
partition "rolls" one a week or so. Yes, it's an ugly script, but cut-and-paste for what was intended to be a diagnostic tool was the fastest as it evolved
root@OpenWrt:~# boot-info.sh
rootfs: mtd13
boot_part=2
boot_count=0
89 bootcount entries. Last ten: 02 00 01 00 01 00 01 00 01 00
#!/bin/sh
printf "rootfs: mtd%i\n" $( cat /sys/devices/virtual/ubi/ubi0/mtd_num )
fw_printenv boot_part
fw_printenv boot_count
printf "%i bootcount entries. Last ten: " $( hexdump -C /dev/mtd8ro | sed -nEe 's/^[0-9a-f]+00 11 08 11 20 ([0-9a-f][0-9a-f]).*$/\1/p' | wc -l )
hexdump -C /dev/mtd8ro | sed -nEe 's/^[0-9a-f]+00 11 08 11 20 ([0-9a-f][0-9a-f]).*$/\1/p' \
| tail -n 10 | tr '\n' ' '
printf "\n"