Kernel detects squashfs on mtd but fail to mount root vfs panic

I must admit that it's a mystery for me how does that ubi-rootfs detection works (never bothered to inspect it), but it worked on a few boards I tried running OpenWrt on.

Regarding the NAND boot issue, that's U-Boot error message. Define KERNEL variable to generate uImage (check Device/wf-2881) and try flashing it to kernel partition either via initramfs image or the U-Boot console.

I referred to GPL code that Linksys has published. EA8100 seems to have two sets of OS image.


0x0         0x80000   0xC0000    0x100000   0x140000   0x180000             0x2980000            0x5180000      0x5280000            0x8000000
 +-------------+----------+----------+----------+----------+--------------------+--------------------+--------------+---------------------+
 |    u-boot   |u-boot-env|  Factory |   s_env  |  devinfo |        kernel      |     alt_kernel     |   sysdiag    |        syscfg       |
 |   (0x80000) | (0x40000)| (0x40000)| (0x40000)| (0x40000)|      (0x2800000)   |     (0x2800000)    |  (0x100000)  |      (0x2D80000)    |
 |             |          |          |          |          |    +---------------+    +---------------+              |                     |
 |             |          |          |          |          |    |    rootfs     |    |  alt_rootfs   |              |                     |
 |             |          |          |          |          |    |  (0x2400000)  |    |  (0x2400000)  |              |                     |
 +-------------+----------+----------+----------+----------+----+---------------+----+---------------+--------------+---------------------+
                                                              0x580000             0x2D80000

It seems that the Kernel can not be loaded from the second partition (alt_kernel).

@danijeltudek @musashino Thanks for the help people!

It went back to booting at bc180000 after I set boot_part_ready=3 and boot_part=1 in uboot env. Maybe these control which "partition" to boot.

Previously these were both set to 2. (I had messed with them before.) Now I suppose that boot_part_ready is a bitmask indicating which partitions are "ready", and boot_part indicates which one to boot from. Setting both to 2 probably told the bootloader that only the 2nd kernel was ready and also it should boot from that. Haven't done anything to confirm this.

uboot env that works:

MT7621 # printenv
bootargs=console=ttyS1,115200n8 root=/dev/mtdblock6 ro rootfstype=jffs2 init=/sbin/init
bootcmd=tftp
auto_recovery=yes
bootdelay=5
baudrate=115200
ethaddr="00:AA:BB:CC:DD:10"
Image1Stable=1
filesize=3181a0
fileaddr=80A00000
ipaddr=192.168.x.x
serverip=192.168.x.x
autostart=no
bootfile=openwrt-ramips-mt7621-ea8100-initramfs-kernel.bin
stdin=serial
stdout=serial
stderr=serial
Image1Try=0
boot_part_ready=3
boot_part=1

55424923 is actually the hex for "UBI#" string which is the magic bytes in the header of a UBI block. I dumped the entire mtd (incl the 2nd part) but only found these bytes within the first "firmware" mtdblock (first kernel plus first rootfs), so the new strange thing now is that theoretically the bootloader shouldn't complain about 55424923 if it's booting from be980000 (2nd partition).

Unless! boot_part=2 means boot from firmware stored at 0x580000 which is the ubi? However there is missing puzzle before this becomes intuitive.

[    2.333965] Creating 8 MTD partitions on "MT7621-NAND":
[    2.339173] 0x000000000000-0x000000080000 : "uboot"
[    2.345142] 0x000000080000-0x0000000c0000 : "uboot_env"
[    2.351397] 0x0000000c0000-0x000000100000 : "factory"
[    2.357388] 0x000000100000-0x000000140000 : "s_env"
[    2.363268] 0x000000140000-0x000000180000 : "devinfo"
[    2.369297] 0x000000180000-0x000002980000 : "kernel"
[    2.375581] 0x000000580000-0x000002980000 : "ubi"
[    2.381554] 0x000002980000-0x000005180000 : "alt_kernel"

OK anyway now we have a working image. Next thing is to look at how to flash it without opening up the cover then I can give the image to @kelvl for further testing. Some Linksys signature detection is in place if I upload the .bin over the factory UI.

Maybe holding reset or another button during boot instructs U-Boot to start TFTP download?

@danijeltudek This didn't work. I requested the linksys source code last year and they posted it online but for so long I haven't done any deep dive. A quick grep for 180000 or 2980000 didn't turn up anything useful outside of linux kernel patches. Please let us know if you found anything in there.

Anyway... I managed to create a signature that the linksys UI accepted and flashed. However it seem to have gone into the alt_kernel so the ubi address is now not valid.

Unlikely that linksys hardcoded to target alt_kernel. Likely it flashes the one from which it did not boot from. We need some way to determine the ubi address based on where we are booted from. Will think about this tomorrow and over the weekend.

There seems to be some hints in "/usr/sbin/update_defs" and "/etc/init.d/service_autofwup.sh" in stock firmware.

I guess that boot_part is determined from some kind of a header present in the image format that Web UI accepts. You should take a look there.

I reverted both partitions to factory image from Linksys and tried booting from both.

Regardless of which partition I booted up from, it appears the Linksys UI always writes to the 2nd partition.

So I decided to build with EA8100.dts pointing kernel and ubs to the 2nd partition addresses. The -squashfs-sysupgrade.bin was flashed via Linksys UI, and dmesg say the new addresses as expected:

[    2.338784] 0x000000000000-0x000000080000 : "uboot"
[    2.344774] 0x000000080000-0x0000000c0000 : "uboot_env"
[    2.350950] 0x0000000c0000-0x000000100000 : "factory"
[    2.357014] 0x000000100000-0x000000140000 : "s_env"
[    2.362924] 0x000000140000-0x000000180000 : "devinfo"
[    2.368923] 0x000002980000-0x000005180000 : "kernel"
[    2.375207] 0x000002d80000-0x000005180000 : "ubi"

However:

[    3.924342] UBI error: no valid UBI magic found inside mtd6

I tftp boot into -initramfs-kernel.bin and found that the UBI header is not aligned:

root@OpenWrt:/# head /dev/mtd6 | hexdump  -C | head
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00020000  55 42 49 23 01 00 00 00  00 00 00 00 00 00 00 00  |UBI#............|
00020010  00 00 08 00 00 00 10 00  7e 62 6e 91 00 00 00 00  |........~bn.....|
00020020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00020030  00 00 00 00 00 00 00 00  00 00 00 00 ff d4 03 e1  |................|
00020040  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00020800  55 42 49 21 01 01 00 05  7f ff ef ff 00 00 00 00  |UBI!............|
00020810  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

I reflashed with mtd command:

root@OpenWrt:/tmp# mtd erase kernel
Unlocking kernel ...
Erasing kernel ...

Skipping bad block at 0x220000   
Skipping bad block at 0x540000   
Skipping bad block at 0x820000   
Skipping bad block at 0xf80000   
Skipping bad block at 0x1540000   
Skipping bad block at 0x1a00000   
root@OpenWrt:/tmp# mtd write openwrt-ramips-mt7621-ea8100-squashfs-sysupgrade.bi
n kernel
Unlocking kernel ...

Writing from openwrt-ramips-mt7621-ea8100-squashfs-sysupgrade.bin to kernel ...  [e]
Skipping bad block at 0x00220000[e]
Skipping bad block at 0x0054000    

However the result is the same.

I know the image is good in terms of alignment:

/home/tftpd# binwalk openwrt-ramips-mt7621-ea8100-squashfs-sysupgrade.bin

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
0             0x0             uImage header, header size: 64 bytes, header CRC: 0x9D840594, created: 2019-01-12 03:52:11, image size: 1896421 bytes, Data Address: 0x80001000, Entry Point: 0x80001000, data CRC: 0x9069E256, OS: Linux, CPU: MIPS, image type: OS Kernel Image, compression type: lzma, image name: "MIPS OpenWrt Linux-4.14.91"
64            0x40            LZMA compressed data, properties: 0x6D, dictionary size: 2097152 bytes, uncompressed size: 6051408 bytes
4194304       0x400000        UBI erase count header, version: 1, EC: 0x0, VID header offset: 0x800, data offset: 0x1000
/home/tftpd# hexdump -C openwrt-ramips-mt7621-ea8100-squashfs-sysupgrade.bin | egrep ^00400000  -A 30 | head
00400000  55 42 49 23 01 00 00 00  00 00 00 00 00 00 00 00  |UBI#............|
00400010  00 00 08 00 00 00 10 00  7e 62 6e 91 00 00 00 00  |........~bn.....|
00400020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00400030  00 00 00 00 00 00 00 00  00 00 00 00 ff d4 03 e1  |................|
00400040  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00400800  55 42 49 21 01 01 00 05  7f ff ef ff 00 00 00 00  |UBI!............|
00400810  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00400830  00 00 00 00 00 00 00 00  00 00 00 00 b8 25 64 a8  |.............%d.|

Is it due to bad erase blocks that the data got shifted? How do we work around this?

Is it due to bad erase blocks that the data got shifted? How do we work around this?

The mt7621 NAND driver tries to be smart and breaks stuff instead.

The issue is tracked in https://bugs.openwrt.org/index.php?do=details&task_id=1926 and a patch is provided which requires testing.

1 Like

@mkresin thanks for the link.

The change to shift_on_bbt = 0; as described in the patch of patch does not solve the problem. There did not seem to be making any difference. Either the fix is not working or I am encountering a different bug.

I did check that target-mipsel_24kc_musl/linux-ramips_mt7621/linux-4.14.91/drivers/mtd/nand/mtk_nand2.c was correctly updated and the changes made it to the output binaries. I tftp booted from new -initramfs-kernel.bin and wrote the new -squashfs-sysupgrade.bin to mtd again before trying to boot from 2nd partition of nand. The "patch1" word shows up in dmesg in both boots.

$ git diff target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
diff --git a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
index d50e689110..1f95d5ad7d 100644
--- a/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
+++ b/target/linux/ramips/patches-4.14/0039-mtd-add-mt7621-nand-support.patch
@@ -3576,9 +3576,9 @@ Signed-off-by: John Crispin <blogic@openwrt.org>
 +      err = mtd_device_parse_register(mtd, probe_types, &ppdata,
 +                                      NULL, 0);
 +      if (!err) {
-+              MSG(INIT, "[mtk_nand] probe successfully!\n");
++              MSG(INIT, "[mtk_nand] probe successfully patch1!\n");
 +              nand_disable_clock();
-+              shift_on_bbt = 1;
++              shift_on_bbt = 0;
 +              if (load_fact_bbt(mtd) == 0) {
 +                      int i;
 +                      for (i = 0; i < 0x100; i++)

It seem that in my case the issue isn't exactly as described by 1926. In that one, "reading a page with 0x2e00000 flash data it returns page that contains 0x2e20000 data". While for my case, when the kernel was reading the start of mtd it gave data for the previous erase block (almost like it ignored the bad block). As you can see from the hexdump of /dev/mtd6 only by reading 0x20000 can I get data originally meant for 0x0. It seems that the direction of "shift" is reversed to me.

Another thing: I have 2 bad blocks before 0x000002d80000 so actually the block offset should be shifted by 2 x 0x20000? Not sure how it's supposed to work.

[    0.898127] Bad eraseblock 71 at 0x0000008e0000
[    0.967428] Bad eraseblock 349 at 0x000002ba0000

Reading kernel partition works though when we can expect offset of 1 block. You can see the uboot image signature 27 05 19 56:

root@OpenWrt:/# head /dev/mtd5 | hexdump  -C | head
00000000  27 05 19 56 ee 16 1b ec  5c 39 80 f3 00 1c ef 94  |'..V....\9......|
00000010  80 00 10 00 80 00 10 00  e3 f3 3c a0 05 05 02 03  |..........<.....|
00000020  4d 49 50 53 20 4f 70 65  6e 57 72 74 20 4c 69 6e  |MIPS OpenWrt Lin|

I have started watching 1926 so that I can test as new patches are out. Hope that 1926 is describing the same problem as what I have but just need a different fix for it to work (so that I don't have to wait for a new fix).

@mkresin It looks like the issue in 1926 is solved but it doesn't work in my case.

This is probably what's happening:

image

Basically the suspicion is that when the image get written, every block after the bad erase block contains data meant for the previous block.

Hence when the kernel tries reading ubi at 0x2d80000 it end up getting empty padding data meant for the erase block before (i.e. the padding in the .bin file).

Is there a way to skip declaration of ubi@0xd80000 in the .dts, so that we get 1 big partition @ 0x298000-0x5180000 containing a ubi that encompass the kernel, squashfs and rootfs, and still make it properly boot? Maybe then the calculation will take into account the bad erase block will allow the kernel to properly read non-kernel data.

What to change in the source?

Raised this bug report. https://bugs.openwrt.org/index.php?do=details&task_id=2097

Now I am using mtdsplit of firmware partition so the dts no longer contain information about rootfs/ubi. It auto detects the correct block for start of the ubi.

https://bugs.openwrt.org/index.php?do=details&task_id=1926 also fixes a problem wrt bad block consideration and works in tandem.

@firemelon

May I know how is your working going? I am also interested to have a openwrt firmware installed in my EA8100

I successfully flashed the EA8100 with OpenWRT (firemelon's bins). Anybody else using the same? It is largely stable but will need to see overtime. There is an issue when you hardboot, it randomly picks up the original linksys firmware from boot partition. But couple of restarts and it comes back. Will report back after a week or so. Hopefully it remains stable

Hi could you please provide some instructions on how to flash EA8100? It seems like this link is not working.

Am tweaking server hardware. Site will be down for this weekend. You can use this google drive link for now.

Flash the initramfs-kernel-factory.img via Linksys UI. Then once openwrt loads, log in and flash squashfs-sysupgrade.bin.

You can ask questions here: Help with flashing Linksys EA8100 -MT7621

Thanks for the heads up. I installed initramfs-kernel-factory.img but now router LED is not turning on but it seems like router is turned on. With LAN connected I can ssh my router from linux but cant open in browser. But dont know how to proceed for next step.
I was also following installation process for other linksys routers and was trying to install Luci but i am getting this error "Only have 0kb available on filesystem /overlay, pkg luci needs 1"
Cannot install luci"
How should i go forward?

Did you flash squashfs-sysupgrade.bin in a ssh session after flashing initramfs-kernel-factory?

I didn’t do that as i couldn’t find out how to do that. Some pointers will be very helpful.
And now i managed to boot the router into its own firmware by restarting it 3 times so i also want to know how do i reboot into openwrt firmware when i would flash squashfs-sysupgrade.bin.

Scp the sysupgrade file into /tmp then follow the "Flash the new OpenWrt firmware" section in this page:
https://openwrt.org/docs/guide-user/installation/sysupgrade.cli

Contrary to what the page says, my ea8100 openwrt bin does not preserve settings across upgrades.