Adding new spi-nand device support (Belkin RT3200 and Linksys E8450)

I added some printk to the OEM firmware in the target/linux/mediatek/files/drivers/mtd/nand/mtk_snand.c file after the dev_warn(dev, "[mtk_snand] probe successfully!\n"); to understand how the oob area is divided:

[    1.598120] N53: nand_chip->ecc.size = 2048
[    1.602299] N53: nand_chip->ecc.bytes = 32
[    1.606393] N53: mtd->writesize = 2048
[    1.610138] N53: mtd->oobsize = 64
[    1.613538] N53: mtd->oobavail = 30
[    1.617022] N53: mtd->erasesize = 131072
[    1.620944] N53: snfc->use_bmt = 1
[    1.624341] ### ECC REGION ###
[    1.627394] N53: mtd_ooblayout_ecc(mtd, 0, &ooeccbregion1) = 0
[    1.633222] N53: REGION 0
[    1.635837] N53: ooeccbregion1->offset = 32
[    1.640015] N53: ooeccbregion1->length = 32
[    1.644193] N53: mtd_ooblayout_ecc(mtd, 1, &ooeccbregion2) = -34
[    1.650194] N53: NO REGION 1
[    1.653069] ### FREE REGION ###
[    1.656206] N53: mtd_ooblayout_free(mtd, 0, &oobfreeregion1) = 0
[    1.662207] N53: REGION 0
[    1.664822] N53: oobfreeregion1->offset = 2
[    1.669000] N53: oobfreeregion1->length = 30
[    1.673266] N53: mtd_ooblayout_free(mtd, 1, &oobfreeregion2) = -34
[    1.679439] N53: NO REGION 1

So there is an offset of two bytes, then there are 30 bytes for the free area and then 32 bytes for ecc. I am testing this configuration right now and seems promising... Do you know how to do a proper and extensive test? I have installed some big packages and after few reboots I don't see any CRC error... :crossed_fingers:

EDIT:
After some more reboots I got the first CRC problem :pensive:

[   23.047630] jffs2: notice: (762) check_node_data: wrong data CRC in data node at 0x003f83a4: read 0xb02be10d, calculated 0xa13b9885.

Just out of curiosity I added nandutils to the OEM FW I got another OOB "structure" :upside_down_face::

nanddump -p -o -l 0x1 -s 0x28F000 /dev/mtd12 | grep "OOB"

OEM FW:
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

OPENWRT (OLD FOR COMPARISON):
  OOB Data: ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

U-BOOT (OLD FOR COMPARISON):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2
ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff
1 Like

Nice detective work.
It would be great to use nandtest from initramfs, that should test the data integrity as well as ECC on the whole NAND.

Ok probably the main problem is the BMT. After another fake bad block problem I dumped the OOB again:

U-BOOT:
00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a 00 f5
ff ff ff ff ff ff ff ff ff ff 85 19 03 20 08 00 ff ff ff ff ff 0a 00 f5 ff ff ff ff ff ff ff ff

OPENWRT:
  OOB Data: ff ff 85 19 03 20 08 00 00 00 ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Excluding the double 0a 00 f5 that is still a mistery, the rest of the OpenWrt's OOB data is splitted in two parts and moved around.
The 00 00 at the end is moved to the first 2 bytes of the OOB area in the U-Boot dump, causing the fake bad blocks.
The rest of the OOB data is moved to 43rd - 48th bytes.

Probably it's time to ask nbd about this...

New tests... I tried editing directly the OOB area and I got a confirmation of what I tought.
I wrote this sample data to the OOB area of a TEST partition created by me in this position: 0x000002000000-0x000002300000 and it corresponds to /dev/mtd12

root@OpenWrt:/# hexdump -C /tmp/ZERO_OOB_4.bin 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000800  ff 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
00000810  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|
00000820  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00000840
root@OpenWrt:/# nandwrite -O /dev/mtd12 /tmp/ZERO_OOB_4.bin
Writing data to block 0 at offset 0x0

And it got written correctly

root@OpenWrt:/# nanddump -c -o -l 0x800  /dev/mtd12 | grep OOB
ECC failed: 0
ECC corrected: 4
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00000800...
  OOB Data: ff 01 02 03 04 00 00 06 08 09 0a 0b 0c 0d 0e 0f  |................|
  OOB Data: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f  |................|
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  |................|
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  |................|

However, reading the same address from U-Boot shows this instead:

MT7622> nand dump 0x2000000

[mtk_snand_check_bch_error] ECC-U, PA=65152, S=0
[mtk_snand_check_bch_error] ECC-U, PA=65152, S=1
[mtk_snand_check_bch_error] ECC-U, PA=65152, S=2
NFI, flag byte: 0 NFI, This page is occupied!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
Address 2000000 dump (2048):
ff ff [...] (I cut this part for brevity)

OOB (64):
08 09 0a 0b 0c 0d 0e 0f 18 19 1a 1b 1c 1d 1e 1f ff ff ff ff ff ff ff ff ff ff ff ff ff 0a 00 f5
10 11 12 13 14 15 16 17 ff 01 02 03 04 00 00 06 ff ff ff ff ff 0a 00 f5 ff ff ff ff ff ff ff ff

The rest of the OOB data is in the previous nand's pages:

MT7622> nand dump 0x1fff800

Address 1fff800 dump (2048):
ff ff [...] (I cut this part for brevity)

OOB (64):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10 11 12 13 14 15 16 17 ff ff ff ff ff 0a 00 f5
ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 ff 01 02 03 04 00 00 06 ff ff ff ff ff ff ff ff
MT7622> nand dump 0x1fff000

Address 1fff000 dump (2048):
ff ff [...] (I cut this part for brevity)

OOB (64):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 01 02 03 04 00 00 06
ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff ff ff 00 00 ff ff ff ff ff ff ff ff

@nbd can you confirm that the BMT in yout staging tree is working correctly for Elecom WRC-2533gent? (My only change to your BMT was the BB_TABLE_MAX from 0x2000U to 0x1000U)

Hi,

I've added preliminary support for E8450 to my staging tree:
https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=summary
I've fixed up the flash chip support patch. Turns out the chip is unstable if support for the X4 ops is not included. I've also fixed the ECC layout - based on my reading it's pretty much the same as on Winbond flash chips, so I copied over the ops from there.
Effectively the distinction between ECC covered and non-ECC areas in OOB is irrelevant, since the driver disables the flash chip's ECC support anyway. The controller handles ECC completely by itself.
I've also changed the BMT support patch to allow configuring the table size via device tree.
Please test it and let me know if it works for you.
There was also a bug in the BMT patch regarding larger writes to OOB, which I've fixed.

3 Likes

Hi nbd,
thanks for your answer and your time :slight_smile:

I was wondering why there was the patch to disable ECC, now I understand it. Thanks for the explanation.

I did some tests and the "fake bad blocks" problem seems to be fixed.
However, I think that we still have some problems with the nand (maybe its cache?).
I still get some wrong data CRC in data node in the bootlog and then data corruption.
This is what I do for testing:

  1. Write 10MB of random data:
    dd if=/dev/urandom of=/root/random.file bs=1M count=10;
  2. Sync the file system (I don't know if it's necessary):
    sync;
  3. Calculate and store the md5 of the aforementioned file:
    md5sum /root/random.file > /root/random.file.md5;
  4. Sync again the file system (I don't know if it's necessary):
    sync;
  5. Reboot the system:
    reboot
  6. After the reboot, calculate the md5 again and compare it with the previous stored md5sum and sometimes I have different values:
root@OpenWrt:/# md5sum /root/random.file 
58b802ed8d52b18b8552d3a9999337f8  /root/random.file
root@OpenWrt:/# cat /root/random.file.md5;
8506b23e9ea97fb6bb325f74c324bf8a  /root/random.file

This is the complete bootlog:

If my tests are right, I think that we still have some problems.
If you have access to the router, could you test it?

P.s.
I have some small fixes to your commit (lan4 and MT7915 were not working):

diff --git a/target/linux/mediatek/image/mt7622.mk b/target/linux/mediatek/image/mt7622.mk
index 1d8501f579..fb0bc45fa7 100644
--- a/target/linux/mediatek/image/mt7622.mk
+++ b/target/linux/mediatek/image/mt7622.mk
@@ -42,7 +42,7 @@ define Device/linksys_e8450
   DEVICE_DTS := mt7622-linksys-e8450
   DEVICE_DTS_DIR := $(DTS_DIR)/mediatek
   DEVICE_PACKAGES := kmod-usb-ohci kmod-usb2 kmod-usb3 kmod-ata-ahci-mtk \
-                    kmod-mt7615e kmod-mt7615-firmware kmod-mt7915
+                    kmod-mt7615e kmod-mt7615-firmware kmod-mt7915e
 endef
 TARGET_DEVICES += linksys_e8450
 
diff --git a/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network b/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network
index 4590c0bd8e..9a03141470 100755
--- a/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network
+++ b/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network
@@ -10,10 +10,10 @@ mediatek_setup_interfaces()
 
        case $board in
        bananapi,bpi-r64-rootdisk|\
-       bananapi,bpi-r64|\
-       linksys,e8450)
+       bananapi,bpi-r64)
                ucidef_set_interfaces_lan_wan "lan0 lan1 lan2 lan3" wan
                ;;
+       linksys,e8450|\
        mediatek,mt7622-rfb1)
                ucidef_set_interfaces_lan_wan "lan1 lan2 lan3 lan4" wan
                ;;

P.p.s.
I would also like to thank you for your amazing work with MT7622. I can get gigabit speed using the HW NAT Acceleration without any CPU usage! :smiley:
W/O HW NAT Acceleration:


W/ HW NAT Acceleration:

1 Like

(edit: all work included in upstream OpenWrt by now)
I've hacked up replacement bl2(==Preloader) as well as ATF and U-Boot more or less completely from source and enabled support for UBI in U-boot, allowing for a more robust way to deal with the NAND flash.
I'm still waiting for MTK to update patches for U-Boot and ATF 2.4 in order to get rid of the nandx driver and have support for HWECC also with vanilla U-Boot's mtk-snfi-spi driver.
But as it is, everything already works quite well.

PoC generator for factory installer images

2 Likes

Hi @daniel, that's a great news! :slightly_smiling_face:
I have to confess that I was following the irc channel so I already knew that great news were coming... But I didn't expect an image that replace the U-boot automatically! That's very cool! :smiley:
I think I will start to test it very soon also because I probably filled up the BMT pool due to the "fake" bad block problem I had in the previous weeks.

However I have a question for you: Can I use your "UBI version" even if I have a bad block somewhere before the "factory" partition? I am sure of this, because if I don't use the BMT the rest of the flash is shifted of a block and so it can't load the calibration data, while if I use the BMT it works correctly.

(Sorry for the double post, I hit CTRL-Enter and it got posted automatically)

Sounds dangerous... BMT is switched off for UBI, as UBI is handling bad blocks for us. Having offsets changing because of bad blocks is a very wrong design and cannot work with UBI.
Hence you will have to make sure that factory starts at the right offset when BMT is switched off, ie. backup it with BMT switched on and write it back to the correct offset with BMT switched off (the new U-Boot acquired it's ethaddr from factory partition as well)...
So you have to make sure the following area on flash do not have any bad blocks on them (ie. BBT switched off)

0x00000000 - 0x00020000 (Preloader)
0x00080000 - 0x00160000 (BL31+U-Boot)
0x001c0000 - 0x002c0000 (Factory)

Ie. you may be lucky and the bad block resides somewhere between 0x20000 - 0x80000 or 0x160000 - 0x1c0000. If not, you will have to wait for me porting UBI SPL for TF-A...
Try booting non-BMT Linux with initramfs and carefully try erasing, writing and reading back the 128kB blocks in those unused areas -- if you win the lottery, you find the bad block there and it's all good, all needed will be making backup of factory with BBT switched on and then writing it back to the right offset with BBT switched off.

1 Like

Update: I've improved the installer to be able to detect if something happened to factory and in that case about and let the user decide what to do.

3 Likes

Update 2: I've implemented relocation of the eeprom-data block and mac-adresses block in factory partition when running the installer, ie. if BBT/BMT did some mess there before, this is detected and fixed, so offsets are then correct without BBT/BMT running.
See

2 Likes

Hi ... I see that you have managed to completely replace the stock boot loader for the E8450. I was wondering if the same can be done for the E5600 . I have a working build with excellent work from this thread Adding OpenWrt support for Linksys E5600 . The E5600 allows two firmwares (probably as a failsafe) but the free space ends up being very little. The chip is MT7621 based unlike the on here and the flash layout looks like this

root@HomeNetwork:~# cat /proc/mtd

dev: size erasesize name

mtd0: 00080000 00020000 "boot"

mtd1: 00040000 00020000 "u_env"

mtd2: 00040000 00020000 "factory"

mtd3: 00040000 00020000 "s_env"

mtd4: 00040000 00020000 "devinfo"

mtd5: 00400000 00020000 "kernel"

mtd6: 01a00000 00020000 "ubi"

mtd7: 01e00000 00020000 "alt_firmware"

mtd8: 04200000 00020000 "gdata"

I was wondering if its possible to achieve the same for the E5600 and possibly free up more space.

You'd have to start building U-Boot for MT7621.

https://github.com/gnubee-git/GnuBee-MT7621-uboot

Edit: hackpascal posted patch for upstream U-Boot https://patchwork.ozlabs.org/project/uboot/list/?series=270363

If you want to get started, first thing is to think about a way you can try your own build U-Boot without risking to brick the board.

I'm not aware of the BootROM of MT7621 providing any fall-back method (such as loading the bootloader via Xmodem or Kermit in case loading from flash fails or by pulling a boot configuration pin).

JTAG details are public for that SoC, so if the board got MIPS EJTAG pads somewhere that would be a way to go.

Once you got U-Boot running, recognizing the NAND flash and communicating on Ethernet, you can use a UBI layout similar to what I've done with the E8450.

1 Like

Wow thanks. So unless I have a way to resolder the flash by restoring its contents I guess I only get one shot at it (which makes your flash all the more impressive). If anyone wants to buy this one and give me a working copy I would happy to test it out but with my skills I will likely brick mine if I try myself. Also the guide says you need a tftp/serial console, is that necessary if you have a working OpenWRT image.

Usually you don't need to unsolder the flash for that. You can dump and write it in-circuit using SOP8-clamp attached to USB-SPI adapter, like this:

If this is parallel NAND rather than SPI-NAND, JTAG would be the way to go (in my case the SPI-NAND chip is BGA package version, so I also opted for JTAG so I won't have to mess directly with the flash chip).

1 Like

Sorry, May I know how to enable the AX Wi-Fi by the configure file? as it only has an AC option to select in the Web GUI.
Also, how to backup the full NAND before flash to the UBI version?
Thanks.

I'm afraid AX support didn't make it to LuCI yet, also support for that in hostapd is still pending. This will all get fixed within the next couple of weeks.

Regarding making a full backup before flashing:
The best way imho is to flash openwrt-mediatek-mt7622-linksys_e8450-ubi-initramfs-recovery.itb (ie. the regular initramfs, not the installer) and use that. This gives you full access to the flash storage (see Save mtdblock contents in the System -> Backup / Flash Firmware tab of LuCI) while not making any changes by itself.
However, you can NOT use that to then directly flash the installer, you will have to use the vendor's dual-boot mechanism to return to the regular stock firmware before.

1 Like

Thank you so much for your help :pray: Sorry, new to this device, how to boot to the secondary fireware? Need to configure by uboot?

Found your info on: https://openwrt.org/toh/linksys/linksys_e8450 and filled in some of the answers.

What are the main benefits of moving from the stock bootloader (TF-A 2.2, U-Boot 2020.10) to your new bootloader 2021.04-rc3 ? Looks like the new UBI bootloader allows a) a change from JFFS2 to UBIFS and b) a NAND layout that boots faster and has more free space.

Can I load standard snapshot images from OpenWRT if the new bootloader 2021.04-rc3 is installed or do I need to load the dangoWRT versions? Must use UBI versions of OpenWRT if bootloader is updated. ".BIN" images of OpenWRT are for stock bootloader and ."ITB" images for UBI bootloader.

What uboot updates are being thought about after 2021.04-rc3?