Adding new spi-nand device support (Belkin RT3200 and Linksys E8450)

numero53 · January 5, 2021, 10:12pm

I am trying to port OpenWrt for Belkin RT3200 and Linksys E8450 (should be the same device) but I have a problem with the spi-nand not recognised:

[    0.667072] mtk-ecc 1100e000.ecc: probed
[    0.671680] spi-nand spi0.0: unknown raw ID 00e571e5
[    0.676656] spi-nand: probe of spi0.0 failed with error -524

While the oem firmware recognise it as FM35X1GA (actually according to fcc document it should be FM35Q1GA-IB https://fccid.io/K7S-03572/Test-Report/Test-Report-5G-B1-B4-4857307)

[    1.263993] Recognize NAND: ID [
[    1.267057] e5 71 
[    1.269068] ], [FM35X1GA], Page[2048]B, Spare [64]B Total [128]MB
[    1.275521] nand: device found, Manufacturer ID: 0xe5, Chip ID: 0x71
[    1.281877] nand: FIDELIX SNAND 128MiB 3,3V 8-bit
[    1.286582] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64

I don't have a lot experience with nand drivers but from what I can understand a spi nand should be easier to get it working than a raw nand.

Looking into already supported spi nand devices (eg: https://elixir.bootlin.com/linux/v5.4.86/source/drivers/mtd/nand/spi/winbond.c) I think that should be easy to create a new one. I will try adapt that driver to my spi nand device replacing the pagesize, blocksize, etc.

Am I in the right direction?
A wrong raw-nand driver can corrupt or even damage the nand... A wrong spi-nand driver can corrupt the nand? I don't have a nand programmer and solder skills to desolder a nand so If anything goes wrong I can't repair my unit. Is there a sort of "risk-free-test-mode"?

numero53 · January 11, 2021, 4:10pm

Some updates:

The Fidelix FM35Q1GA seems to be a re-branded version of the Dosilicon DS35Q1GA (Datasheet: http://www.benhong.cn/upload/datasheet/DS35X1GAXXX_100_rev00.pdf). I am quite sure about this because they both share the same NAND ID and Dosilicon bought Fidelix.
I didn't find a "risk-free-test-mode" so i tested directly my new driver and seems to work well reading data, but I have sometimes problems writing data, but I don't know if it's related to my code.

In the meantime I will explain how to manage the oob area, or at least what I understood:

Example 1: MX35LF2GE4AB
Linux Kernel

static int mx35lfxge4ab_ooblayout_ecc(struct mtd_info *mtd, int section,
				      struct mtd_oob_region *region)
{
	return -ERANGE;
}

static int mx35lfxge4ab_ooblayout_free(struct mtd_info *mtd, int section,
				       struct mtd_oob_region *region)
{
	if (section)
		return -ERANGE;

	region->offset = 2;
	region->length = mtd->oobsize - 2;

	return 0;
}

Note1: region->offset = 2 because the first 2 bytes can contains information about the bad block
Note2: mtd->oobsize = 64
Source

DATASHEET:

Source

So in the upstream implementation, the whole oob area is used as free area excluding the first two bytes which are used for bad block information. Even the reserved area is used as free area. However the ecc area is not defined probably because of this line in the datasheet: the ECC parity code can be calculated properly and stored in the additional hidden spare area

Example 2: MT29F2G01AAAED
Linux Kernel

static int micron_4_ooblayout_ecc(struct mtd_info *mtd, int section,
				  struct mtd_oob_region *region)
{
	struct spinand_device *spinand = mtd_to_spinand(mtd);

	if (section >= spinand->base.memorg.pagesize /
			mtd->ecc_step_size)
		return -ERANGE;

	region->offset = (section * 16) + 8;
	region->length = 8;

	return 0;
}

static int micron_4_ooblayout_free(struct mtd_info *mtd, int section,
				   struct mtd_oob_region *region)
{
	struct spinand_device *spinand = mtd_to_spinand(mtd);

	if (section >= spinand->base.memorg.pagesize /
			mtd->ecc_step_size)
		return -ERANGE;

	if (section) {
		region->offset = 16 * section;
		region->length = 8;
	} else {
		/* section 0 has two bytes reserved for the BBM */
		region->offset = 2;
		region->length = 6;
	}

	return 0;
}

Source

DATASHEET:

Source

Even here the first two bytes are used for bad block information and the reserved area is used as free area. However the ecc information is stored in the "ECC for main/spare#" sections.

robimarko · January 11, 2021, 6:14pm

You can see that the kernel sees an SPI NAND device with ID 00e571e5

But that ID is not matched any supported ID, hence the error.
So, you need to add support for the NAND under the SPI NAND framework.
It should not be hard as SPI NAND are 99% the same things and speak the same instructions.

From the looks ID that it returns has a dummy first byte, then the next byte is manufacturer ID and then the third one is the device ID.
This is really common, and there is a generic function for this just the Manufacturer code needs to be added to the table.

I would urge you to not mod an existing driver but add a new vendor driver directly.

As far as ECC goes it seems that spare area should not be used as its not ECC protected.

numero53 · January 11, 2021, 9:14pm

Thanks for the reply.

Yes. If I will get it working correctly I would even send to the Linux Kernel... It would be my first contribution!

In the previous post I didn't have time to write my actual NAND findings:

Datasheet:

According to the datasheet it seems very simillar to MX35LF2GE4AB.
So I should set the same rules for the ecc (0 bytes) and free space (62 bytes). Right?

However if I use that scheme I get a lot of "fake" bad blocks in the U-boot after the first openwrt boot:

MT7622> nand bad

Device 0 bad blocks:
Bad block detected at 0x1b80, oob_buf[0] is 0x0
Bad block detected at 0xec40, oob_buf[0] is 0x0
Bad block detected at 0x1c00, oob_buf[0] is 0x0
Bad block detected at 0xec80, oob_buf[0] is 0x0
Bad block detected at 0x1c80, oob_buf[0] is 0x0
Bad block detected at 0xecc0, oob_buf[0] is 0x0
Bad block detected at 0x1d00, oob_buf[0] is 0x0
Bad block detected at 0xed00, oob_buf[0] is 0x0
Bad block detected at 0x1d80, oob_buf[0] is 0x0
Bad block detected at 0xed40, oob_buf[0] is 0x0
Bad block detected at 0x1e00, oob_buf[0] is 0x0
Bad block detected at 0xed80, oob_buf[0] is 0x0
Bad block detected at 0x1e80, oob_buf[0] is 0x0
Bad block detected at 0xedc0, oob_buf[0] is 0x0
Bad block detected at 0x1f00, oob_buf[0] is 0x0
Bad block detected at 0xee00, oob_buf[0] is 0x0
Bad block detected at 0x1f80, oob_buf[0] is 0x0
Bad block detected at 0xee40, oob_buf[0] is 0x0
Bad block detected at 0x2000, oob_buf[0] is 0x0
Bad block detected at 0xee80, oob_buf[0] is 0x0
Bad block detected at 0x2080, oob_buf[0] is 0x0
Bad block detected at 0xeec0, oob_buf[0] is 0x0
Bad block detected at 0x2100, oob_buf[0] is 0x0
Bad block detected at 0xef00, oob_buf[0] is 0x0
Bad block detected at 0x2180, oob_buf[0] is 0x0
Bad block detected at 0xef40, oob_buf[0] is 0x0
Bad block detected at 0x2200, oob_buf[0] is 0x0
Bad block detected at 0xef80, oob_buf[0] is 0x0
Bad block detected at 0x2280, oob_buf[0] is 0x0
Bad block detected at 0xefc0, oob_buf[0] is 0x0
Bad block detected at 0x2300, oob_buf[0] is 0x0
Bad block detected at 0xf000, oob_buf[0] is 0x0
Bad block detected at 0x2380, oob_buf[0] is 0x0
Bad block detected at 0xf040, oob_buf[0] is 0x0
Bad block detected at 0x2400, oob_buf[0] is 0x0
Bad block detected at 0xf080, oob_buf[0] is 0x0
Bad block detected at 0x2480, oob_buf[0] is 0x0
Bad block detected at 0xf0c0, oob_buf[0] is 0x0
Bad block detected at 0x2500, oob_buf[0] is 0x0
Bad block detected at 0xf100, oob_buf[0] is 0x0
Bad block detected at 0x2580, oob_buf[0] is 0x0
Bad block detected at 0xf140, oob_buf[0] is 0x0
Bad block detected at 0x2600, oob_buf[0] is 0x0
Bad block detected at 0xf180, oob_buf[0] is 0x0
Bad block detected at 0x2680, oob_buf[0] is 0x0
Bad block detected at 0xf1c0, oob_buf[0] is 0x0
Bad block detected at 0x2700, oob_buf[0] is 0x0
Bad block detected at 0xf200, oob_buf[0] is 0x0
Bad block detected at 0x2780, oob_buf[0] is 0x0
Bad block detected at 0xf240, oob_buf[0] is 0x0
Bad block detected at 0x2800, oob_buf[0] is 0x0
Bad block detected at 0xf280, oob_buf[0] is 0x0
Bad block detected at 0x2880, oob_buf[0] is 0x0
Bad block detected at 0xf2c0, oob_buf[0] is 0x0
Bad block detected at 0x2900, oob_buf[0] is 0x0
Bad block detected at 0xf300, oob_buf[0] is 0x0
Bad block detected at 0x2980, oob_buf[0] is 0x0
Bad block detected at 0xf340, oob_buf[0] is 0x0
Bad block detected at 0x2a00, oob_buf[0] is 0x0
Bad block detected at 0xf380, oob_buf[0] is 0x0
Bad block detected at 0x2a80, oob_buf[0] is 0x0
Bad block detected at 0xf3c0, oob_buf[0] is 0x0
Bad block detected at 0x2b00, oob_buf[0] is 0x0
Bad block detected at 0xf400, oob_buf[0] is 0x0
Bad block detected at 0x2b80, oob_buf[0] is 0x0
Bad block detected at 0xf440, oob_buf[0] is 0x0
Bad block detected at 0x2c00, oob_buf[0] is 0x0
Bad block detected at 0xf480, oob_buf[0] is 0x0
Bad block detected at 0x2c80, oob_buf[0] is 0x0
Bad block detected at 0xf4c0, oob_buf[0] is 0x0
Bad block detected at 0x2d00, oob_buf[0] is 0x0
Bad block detected at 0xf500, oob_buf[0] is 0x0
Bad block detected at 0x2d80, oob_buf[0] is 0x0
Bad block detected at 0xf540, oob_buf[0] is 0x0
Bad block detected at 0x2e00, oob_buf[0] is 0x0
Bad block detected at 0xf580, oob_buf[0] is 0x0
Bad block detected at 0x2e80, oob_buf[0] is 0x0
Bad block detected at 0xf5c0, oob_buf[0] is 0x0
Bad block detected at 0x2f00, oob_buf[0] is 0x0
Bad block detected at 0xf600, oob_buf[0] is 0x0
Bad block detected at 0x2f80, oob_buf[0] is 0x0
Bad block detected at 0xf640, oob_buf[0] is 0x0
Bad block detected at 0x3000, oob_buf[0] is 0x0
Bad block detected at 0xf680, oob_buf[0] is 0x0
Bad block detected at 0x3080, oob_buf[0] is 0x0
Bad block detected at 0xf6c0, oob_buf[0] is 0x0
Bad block detected at 0x3100, oob_buf[0] is 0x0
Bad block detected at 0xf700, oob_buf[0] is 0x0
Bad block detected at 0x3180, oob_buf[0] is 0x0
Bad block detected at 0xf740, oob_buf[0] is 0x0
Bad block detected at 0x3200, oob_buf[0] is 0x0
Bad block detected at 0xf780, oob_buf[0] is 0x0
Bad block detected at 0x3280, oob_buf[0] is 0x0
Bad block detected at 0xf7c0, oob_buf[0] is 0x0
Bad block detected at 0x3300, oob_buf[0] is 0x0
Bad block detected at 0xf800, oob_buf[0] is 0x0
Bad block detected at 0x3380, oob_buf[0] is 0x0
Bad block detected at 0xf840, oob_buf[0] is 0x0
Bad block detected at 0x3400, oob_buf[0] is 0x0
Bad block detected at 0xf880, oob_buf[0] is 0x0
Bad block detected at 0x3480, oob_buf[0] is 0x0
Bad block detected at 0xf8c0, oob_buf[0] is 0x0
Bad block detected at 0x3500, oob_buf[0] is 0x0
Bad block detected at 0xf900, oob_buf[0] is 0x0
Bad block detected at 0x3580, oob_buf[0] is 0x0
Bad block detected at 0xf940, oob_buf[0] is 0x0
Bad block detected at 0x3600, oob_buf[0] is 0x0
Bad block detected at 0xf980, oob_buf[0] is 0x0
Bad block detected at 0x3680, oob_buf[0] is 0x0
Bad block detected at 0xf9c0, oob_buf[0] is 0x0
Bad block detected at 0x3700, oob_buf[0] is 0x0
Bad block detected at 0xfa00, oob_buf[0] is 0x0
Bad block detected at 0x3780, oob_buf[0] is 0x0
Bad block detected at 0xfa40, oob_buf[0] is 0x0
Bad block detected at 0x3800, oob_buf[0] is 0x0
Bad block detected at 0xfa80, oob_buf[0] is 0x0
Bad block detected at 0x3880, oob_buf[0] is 0x0
Bad block detected at 0xfac0, oob_buf[0] is 0x0
Bad block detected at 0x3900, oob_buf[0] is 0x0
Bad block detected at 0xfb00, oob_buf[0] is 0x0
Bad block detected at 0x3980, oob_buf[0] is 0x0
Bad block detected at 0xfb40, oob_buf[0] is 0x0
Bad block detected at 0x3a00, oob_buf[0] is 0x0
Bad block detected at 0xfb80, oob_buf[0] is 0x0
Bad block detected at 0x3a80, oob_buf[0] is 0x0
Bad block detected at 0xfbc0, oob_buf[0] is 0x0
Bad block detected at 0x3b00, oob_buf[0] is 0x0
Bad block detected at 0xfc00, oob_buf[0] is 0x0
Bad block detected at 0x3b80, oob_buf[0] is 0x0
Bad block detected at 0xfc40, oob_buf[0] is 0x0
Bad block detected at 0x3c00, oob_buf[0] is 0x0
Bad block detected at 0xfc80, oob_buf[0] is 0x0
Bad block detected at 0x3c80, oob_buf[0] is 0x0
Bad block detected at 0xfcc0, oob_buf[0] is 0x0
Bad block detected at 0x3d00, oob_buf[0] is 0x0
Bad block detected at 0xfd00, oob_buf[0] is 0x0
Bad block detected at 0x3d80, oob_buf[0] is 0x0
Bad block detected at 0xfd40, oob_buf[0] is 0x0
Bad block detected at 0x3e00, oob_buf[0] is 0x0
Bad block detected at 0xfd80, oob_buf[0] is 0x0
Bad block detected at 0x3e80, oob_buf[0] is 0x0
Bad block detected at 0xfdc0, oob_buf[0] is 0x0
Bad block detected at 0x3f00, oob_buf[0] is 0x0
Bad block detected at 0xfe00, oob_buf[0] is 0x0
Bad block detected at 0x3f80, oob_buf[0] is 0x0
Bad block detected at 0xfe40, oob_buf[0] is 0x0
Bad block detected at 0x4000, oob_buf[0] is 0x0
Bad block detected at 0xfe80, oob_buf[0] is 0x0
Bad block detected at 0x4080, oob_buf[0] is 0x0
Bad block detected at 0xfec0, oob_buf[0] is 0x0
Bad block detected at 0x4100, oob_buf[0] is 0x0
Bad block detected at 0xff00, oob_buf[0] is 0x0
Bad block detected at 0x4180, oob_buf[0] is 0x0
Bad block detected at 0xff40, oob_buf[0] is 0x0
Bad block detected at 0x4200, oob_buf[0] is 0x0
Bad block detected at 0xff80, oob_buf[0] is 0x0
Bad block detected at 0x4280, oob_buf[0] is 0x0
Bad block detected at 0x42c0, oob_buf[0] is 0x0
Bad block detected at 0x4300, oob_buf[0] is 0x0
Bad block detected at 0x4340, oob_buf[0] is 0x0
Bad block detected at 0x4380, oob_buf[0] is 0x0
Bad block detected at 0x43c0, oob_buf[0] is 0x0
Bad block detected at 0x4400, oob_buf[0] is 0x0
Bad block detected at 0x4440, oob_buf[0] is 0x0
Bad block detected at 0x4480, oob_buf[0] is 0x0
Bad block detected at 0x44c0, oob_buf[0] is 0x0
Bad block detected at 0x4500, oob_buf[0] is 0x0
Bad block detected at 0x4540, oob_buf[0] is 0x0
Bad block detected at 0x4580, oob_buf[0] is 0x0
Bad block detected at 0x45c0, oob_buf[0] is 0x0
Bad block detected at 0x4600, oob_buf[0] is 0x0
Bad block detected at 0xec40, oob_buf[0] is 0x0
Bad block detected at 0xec80, oob_buf[0] is 0x0
Bad block detected at 0xecc0, oob_buf[0] is 0x0
Bad block detected at 0xed00, oob_buf[0] is 0x0
Bad block detected at 0xed40, oob_buf[0] is 0x0
Bad block detected at 0xed80, oob_buf[0] is 0x0
Bad block detected at 0xedc0, oob_buf[0] is 0x0
Bad block detected at 0xee00, oob_buf[0] is 0x0
Bad block detected at 0xee40, oob_buf[0] is 0x0
Bad block detected at 0xee80, oob_buf[0] is 0x0
Bad block detected at 0xeec0, oob_buf[0] is 0x0
Bad block detected at 0xef00, oob_buf[0] is 0x0
Bad block detected at 0xef40, oob_buf[0] is 0x0
Bad block detected at 0xef80, oob_buf[0] is 0x0
Bad block detected at 0xefc0, oob_buf[0] is 0x0
Bad block detected at 0xf000, oob_buf[0] is 0x0
Bad block detected at 0xf040, oob_buf[0] is 0x0
Bad block detected at 0xf080, oob_buf[0] is 0x0
Bad block detected at 0xf0c0, oob_buf[0] is 0x0
Bad block detected at 0xf100, oob_buf[0] is 0x0
Bad block detected at 0xf140, oob_buf[0] is 0x0
Bad block detected at 0xf180, oob_buf[0] is 0x0
Bad block detected at 0xf1c0, oob_buf[0] is 0x0
Bad block detected at 0xf200, oob_buf[0] is 0x0
Bad block detected at 0xf240, oob_buf[0] is 0x0
Bad block detected at 0xf280, oob_buf[0] is 0x0
Bad block detected at 0xf2c0, oob_buf[0] is 0x0
Bad block detected at 0xf300, oob_buf[0] is 0x0
Bad block detected at 0xf340, oob_buf[0] is 0x0
Bad block detected at 0xf380, oob_buf[0] is 0x0
Bad block detected at 0xf3c0, oob_buf[0] is 0x0
Bad block detected at 0xf400, oob_buf[0] is 0x0
Bad block detected at 0xf440, oob_buf[0] is 0x0
Bad block detected at 0xf480, oob_buf[0] is 0x0
Bad block detected at 0xf4c0, oob_buf[0] is 0x0
Bad block detected at 0xf500, oob_buf[0] is 0x0
Bad block detected at 0xf540, oob_buf[0] is 0x0
Bad block detected at 0xf580, oob_buf[0] is 0x0
Bad block detected at 0xf5c0, oob_buf[0] is 0x0
Bad block detected at 0xf600, oob_buf[0] is 0x0
Bad block detected at 0xf640, oob_buf[0] is 0x0
Bad block detected at 0xf680, oob_buf[0] is 0x0
Bad block detected at 0xf6c0, oob_buf[0] is 0x0
Bad block detected at 0xf700, oob_buf[0] is 0x0
Bad block detected at 0xf740, oob_buf[0] is 0x0
Bad block detected at 0xf780, oob_buf[0] is 0x0
Bad block detected at 0xf7c0, oob_buf[0] is 0x0
Bad block detected at 0xf800, oob_buf[0] is 0x0
Bad block detected at 0xf840, oob_buf[0] is 0x0
Bad block detected at 0xf880, oob_buf[0] is 0x0
Bad block detected at 0xf8c0, oob_buf[0] is 0x0
Bad block detected at 0xf900, oob_buf[0] is 0x0
Bad block detected at 0xf940, oob_buf[0] is 0x0
Bad block detected at 0xf980, oob_buf[0] is 0x0
Bad block detected at 0xf9c0, oob_buf[0] is 0x0
Bad block detected at 0xfa00, oob_buf[0] is 0x0
Bad block detected at 0xfa40, oob_buf[0] is 0x0
Bad block detected at 0xfa80, oob_buf[0] is 0x0
Bad block detected at 0xfac0, oob_buf[0] is 0x0
Bad block detected at 0xfb00, oob_buf[0] is 0x0
Bad block detected at 0xfb40, oob_buf[0] is 0x0
Bad block detected at 0xfb80, oob_buf[0] is 0x0
Bad block detected at 0xfbc0, oob_buf[0] is 0x0
Bad block detected at 0xfc00, oob_buf[0] is 0x0
Bad block detected at 0xfc40, oob_buf[0] is 0x0
Bad block detected at 0xfc80, oob_buf[0] is 0x0
Bad block detected at 0xfcc0, oob_buf[0] is 0x0
Bad block detected at 0xfd00, oob_buf[0] is 0x0
Bad block detected at 0xfd40, oob_buf[0] is 0x0
Bad block detected at 0xfd80, oob_buf[0] is 0x0
Bad block detected at 0xfdc0, oob_buf[0] is 0x0
Bad block detected at 0xfe00, oob_buf[0] is 0x0
Bad block detected at 0xfe40, oob_buf[0] is 0x0
Bad block detected at 0xfe80, oob_buf[0] is 0x0
Bad block detected at 0xfec0, oob_buf[0] is 0x0
Bad block detected at 0xff00, oob_buf[0] is 0x0
Bad block detected at 0xff40, oob_buf[0] is 0x0
Bad block detected at 0xff80, oob_buf[0] is 0x0

The system boots correctly but after some time and some test writes the system can't boot anymore:

NAND read: device 0 offset 0x500000, size 0x2000
 8192 bytes read: OK
[do_read_image_blks] This is a FIT image,img_size = 0x326328
[do_read_image_blks] img_blks = 0x64d
[do_read_image_blks] img_align_size = 0x326800

NAND read: device 0 offset 0x500000, size 0x326800
[mtk_snand_check_bch_error] ECC-U, PA=2646, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=2656, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=2681, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=2690, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=3379, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
 3303424 bytes read: OK
bootm flag=0, states=70f
## Loading kernel from FIT Image at 4007ff28 ...
   Using 'config@1' configuration
   Trying 'kernel@1' kernel subimage
     Description:  ARM64 OpenWrt Linux-5.10.4
     Type:         Kernel Image
     Compression:  lzma compressed
     Data Start:   0x40080010
     Data Size:    3277140 Bytes = 3.1 MiB
     Architecture: AArch64
     OS:           Linux
     Load Address: 0x44000000
     Entry Point:  0x44000000
     Hash algo:    crc32
     Hash value:   fbc80c8a
     Hash algo:    sha1
     Hash value:   087b7d78fca98c4cbeb2b615b4501093553d2791
   Verifying Hash Integrity ... crc32 error!
Bad hash value for 'hash@1' hash node in 'kernel@1' image node
Bad Data Hash
ERROR: can't get kernel image!
MT7622>

To recover it I have to flash it from U-boot which seems to clear also the bad block status.

After some tests, if I I avoid the reserved sections in the spare sections, and then I change the free layout to this:

static int fm35x1ga_ooblayout_free(struct mtd_info *mtd, int section,
				   struct mtd_oob_region *region)
{
	if (section > 3)
		return -ERANGE;

	region->offset = (8 * section) + 2;
	region->length = 6;

	return 0;
}

U-boot doesn't find any bad block but after some test writes I have some fs errors eg:

root@OpenWrt:~# dd if=/dev/urandom of=target-file bs=1M count=10
[   80.999661] jffs2: Data CRC failed on REF_PRISTINE data node at 0x01315650: Read 0x46de0673, calculated 0xee7158e1
10+0 records in
10+0 records out
root@OpenWrt:~#

and later on I get the soft brick:

NAND read: device 0 offset 0x500000, size 0x2000
 8192 bytes read: OK
[do_read_image_blks] This is a FIT image,img_size = 0x3261c0
[do_read_image_blks] img_blks = 0x64d
[do_read_image_blks] img_align_size = 0x326800

NAND read: device 0 offset 0x500000, size 0x326800
[mtk_snand_check_bch_error] ECC-U, PA=3542, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=3688, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
 3303424 bytes read: OK
bootm flag=0, states=70f
## Loading kernel from FIT Image at 4007ff28 ...
   Using 'config@1' configuration
   Trying 'kernel@1' kernel subimage
     Description:  ARM64 OpenWrt Linux-5.10.4
     Type:         Kernel Image
     Compression:  lzma compressed
     Data Start:   0x40080010
     Data Size:    3276780 Bytes = 3.1 MiB
     Architecture: AArch64
     OS:           Linux
     Load Address: 0x44000000
     Entry Point:  0x44000000
     Hash algo:    crc32
     Hash value:   8c64d5ab
     Hash algo:    sha1
     Hash value:   f87db5ea221e43f9c9baca8635b9deefc4ba4f39
   Verifying Hash Integrity ... crc32 error!
Bad hash value for 'hash@1' hash node in 'kernel@1' image node
Bad Data Hash
ERROR: can't get kernel image!
MT7622>

However I still have no bad block from the U-boot:

MT7622> nand bad

Device 0 bad blocks:
MT7622>

To avoid problems I didn't enable advanced read/write modes like X2 or dual or quad modes.

You can read the whole (very diry) test and the rest of the commit here:

github.com

DavideFioravanti/openwrt/blob/78c930efce97d5343fe5117512acc75fedaf5469/target/linux/mediatek/patches-5.10/10001-mtd-spinand-Add-support-for-the-Fidelix-FM35X1GA.patch

From ea0df4552efcdcc2806fe6eba0540b5f719d80b6 Mon Sep 17 00:00:00 2001
From: Davide Fioravanti <pantanastyle@gmail.com>
Date: Fri, 8 Jan 2021 15:35:24 +0100
Subject: [PATCH 1/1] mtd: spinand: Add support for the Fidelix FM35X1GA

Datasheet: http://www.hobos.com.cn/upload/datasheet/DS35X1GAXXX_100_rev00.pdf

Signed-off-by: Davide Fioravanti <pantanastyle@gmail.com>
---
 drivers/mtd/nand/spi/Makefile  |  2 +-
 drivers/mtd/nand/spi/core.c    |  1 +
 drivers/mtd/nand/spi/fidelix.c | 80 ++++++++++++++++++++++++++++++++++
 include/linux/mtd/spinand.h    |  1 +
 4 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 drivers/mtd/nand/spi/fidelix.c

diff --git a/drivers/mtd/nand/spi/Makefile b/drivers/mtd/nand/spi/Makefile
index 9662b9c..3518d01 100644
--- a/drivers/mtd/nand/spi/Makefile
+++ b/drivers/mtd/nand/spi/Makefile

This file has been truncated. show original

I don't know if the problem is caused by my driver or something else... the MT7622 is a quite new target and to get it working I had to use the nbd staging tree because it has the new mediatek bad block management table which is currently missing. Moreover I had to use the 5.10 kernel because the 5.4 crashes as soon as the system

[   17.981591] jffs2_build_filesystem(): erasing all blocks after the end marker... 
[   17.994853] Unable to handle kernel write to read-only memory at virtual address ffffffc0106cf2e8
[   18.011211] Mem abort info:
[   18.014009]   ESR = 0x9600004f
[   18.017061]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.022367]   SET = 0, FnV = 0
[   18.025417]   EA = 0, S1PTW = 0
[   18.028546] Data abort info:
[   18.031422]   ISV = 0, ISS = 0x0000004f
[   18.035252]   CM = 0, WnR = 1
[   18.038216] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000044882000
[   18.044912] [ffffffc0106cf2e8] pgd=000000005ffff003, pud=000000005ffff003, pmd=000000005fffd003, pte=00600000446cf793
[   18.055518] Internal error: Oops: 9600004f [#1] SMP
[   18.060386] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack mt7915e mt7615e mt7615_common mt76 mac80211 ipt_REJECT cfg80211 xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables crc_ccitt compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 leds_gpio xhci_plat_hcd ohci_platform ohci_hcd fsl_mph_dr_of ehci_platform ehci_fsl ehci_hcd gpio_button_hotplug
[   18.121828] CPU: 0 PID: 2196 Comm: mount_root Not tainted 5.4.86 #0
[   18.128083] Hardware name: Belkin RT3200 (DT)
[   18.132430] pstate: 80000005 (Nzcv daif -PAN -UAO)
[   18.137219] pc : __memcpy+0x88/0x180
[   18.140788] lr : spinand_write_page+0x1c8/0x1e0
[   18.145307] sp : ffffffc011bd3930
[   18.148611] x29: ffffffc011bd3930 x28: ffffff801df6a200 
[   18.153913] x27: ffffff801df6a1e8 x26: ffffff801df6a198 
[   18.159215] x25: ffffff801e581580 x24: 0000000000000840 
[   18.164516] x23: 0000000000000840 x22: ffffff800307b840 
[   18.169819] x21: ffffffc011bd39f0 x20: ffffff801e52c080 
[   18.175120] x19: 0000000000000000 x18: 0000000000000000 
[   18.180422] x17: 0000000000000000 x16: 0000000000000000 
[   18.185723] x15: 0000000000000000 x14: ffffffffffffffff 
[   18.191024] x13: ffffffffffffffff x12: ffffffffffffffff 
[   18.196326] x11: ffffffffffffffff x10: 00000000000007f0 
[   18.201629] x9 : ffffffc011bd33c0 x8 : ffffff8003350850 
[   18.206931] x7 : 0000000000000000 x6 : ffffffc0106cf2e8 
[   18.212233] x5 : 00ffffffffffffff x4 : 0000000000000000 
[   18.217534] x3 : 000820031985ffff x2 : 0000000000000008 
[   18.222836] x1 : ffffff800307b808 x0 : ffffffc0106cf2e8 
[   18.228138] Call trace:
[   18.230577]  __memcpy+0x88/0x180
[   18.233797]  spinand_mtd_write+0x13c/0x248
[   18.237886]  mtk_bmt_write+0xcc/0x1a0
[   18.241540]  part_write_oob+0x20/0x28
[   18.245193]  part_write_oob+0x20/0x28
[   18.248845]  mtd_write_oob+0x4c/0xa8
[   18.252414]  jffs2_write_nand_cleanmarker+0x58/0xb8
[   18.257281]  jffs2_erase_pending_blocks+0x548/0x860
[   18.262148]  jffs2_do_mount_fs+0x20c/0x758
[   18.266234]  jffs2_do_fill_super+0x10c/0x278
[   18.270494]  jffs2_fill_super+0xc4/0xd8
[   18.274319]  mtd_get_sb+0x8c/0xd0
[   18.277625]  mtd_get_sb_by_nr+0x40/0x78
[   18.281450]  get_tree_mtd+0x12c/0x1a8
[   18.285102]  jffs2_get_tree+0x14/0x20
[   18.288756]  vfs_get_tree+0x24/0xb0
[   18.292237]  do_mount+0x50c/0x930
[   18.295542]  ksys_mount+0xdc/0xf8
[   18.298848]  __arm64_sys_mount+0x1c/0x28
[   18.302764]  el0_svc_common.constprop.1+0x7c/0x100
[   18.307545]  el0_svc_handler+0x18/0x20
[   18.311285]  el0_svc+0x8/0x1c8
[   18.314333] Code: a8c12027 a88120c7 36180062 f8408423 (f80084c3) 
[   18.320416] ---[ end trace 17acad14a80f4ab9 ]---
[   18.325023] Kernel panic - not syncing: Fatal exception
[   18.330238] SMP: stopping secondary CPUs
[   18.334153] Kernel Offset: disabled
[   18.337632] CPU features: 0x0002,04002004
[   18.341630] Memory Limit: none
[   18.344675] Rebooting in 3 seconds..

In the meantime, thanks for the attention!

robimarko · January 11, 2021, 9:16pm

Dont have time to read through the whole post now, but I hope you were using 5.10.6 because of:

commit b00195241186db6e2fb5698afe67971b05b1a959
Author: Felix Fietkau <nbd@nbd.name>
Date:   Tue Jan 5 11:18:21 2021 +0100

    Revert "mtd: spinand: Fix OOB read"
    
    This reverts stable commit baad618d078c857f99cc286ea249e9629159901f.
    
    This commit is adding lines to spinand_write_to_cache_op, wheras the upstream
    commit 868cbe2a6dcee451bd8f87cbbb2a73cf463b57e5 that this was supposed to
    backport was touching spinand_read_from_cache_op.
    It causes a crash on writing OOB data by attempting to write to read-only
    kernel memory.
    
    Cc: Miquel Raynal <miquel.raynal@bootlin.com>
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Anyway, as far as I know BBT is handled by the SPI-NAND framework, nothing MT7622 specific there.

How I understood the datasheet is that you pretty much should avoid touching the reserved/spare areas completely?

numero53 · January 12, 2021, 5:31pm

Thank you! Using that patch I don't get anymore the kernel panic on kernel 5.4. So now I have 5.4 and 5.10 booting, it didn't change the corruption problem.

I thought the same, untill I saw this commit in the nbd's staging tree:

Moreover If I don't use that patch, some partitions get shifted by 0x20000 (128kb = one block)... so I think that there is one bad block in the nand and it's stored in their custom bad block table (BBT), which is stored in the last valid block. However I don't think we can use a standard BBT because the bootloader uses the Mediatek one.

In the MX35LF2GE4AB spare area, only M1 is ECC protected but in the linux kernel they use the whole oob area (excluding the bad block bytes) even the reserved ones.
However I tried almost every combination of the free area... 1,2,4,6,32 bytes divided or not in sections but inevitably after some reboots or writes the rootfs_data became corrupted.
If I set 0 bytes available for the free area I get this error: inconsistent device description because of https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/fs/jffs2/wbuf.c#L1193
If I don't declare the free and ecc area at all [SPINAND_ECCINFO(NULL, NULL)] the kernel uses a default one not compatible with the BBT which generates a lot of fake bad block errors.

I don't know what to do anymore...
I start to think that there is a problem with something else... maybe some error in the dts? For example I cant get working the DUAL or QUAD SPI modes even if I set the SPINAND_HAS_QE_BIT...

robimarko · January 12, 2021, 6:14pm

Ahh, Mediatek is touching stuff they are not supposed to do.
Even Belkin has a patch to use the SPI-NAND without BMT, maybe you can try that.
Its 0500-mt7622-snand-without-bmt.patch

I was searching for how they handle the NAND, and they simply added the manufacturer ID and then they rely on legacy NAND ID matching.

Are you sure that the NAND even is connected in Dual or Quad mode?
It's usually not in regular single-mode on 99% of routers as SPI controllers inside don't support dual or quad modes

numero53 · January 16, 2021, 12:56pm

Good catch! Unfortunately I can't find any use of the disable-bmt property in the dts extracted from device:

gist.github.com

https://gist.github.com/DavideFioravanti/aeb9795b207776ffdd2934052cfc5954

02_dtbdump_mediatek,mt7622-ac2600rfb1.dts

/dts-v1/;

/ {
	compatible = "mediatek,mt7622-ac2600rfb1\0mediatek,mt7622";
	interrupt-parent = <0x01>;
	#address-cells = <0x02>;
	#size-cells = <0x02>;
	model = "MediaTek MT7622 AX3600 board";

	mtcpufreq {

This file has been truncated. show original

Moreover in the OEM bootlog the BMT seems to start correctly (look for BMT at [1.296239]):

gist.github.com

https://gist.github.com/DavideFioravanti/30d933e5277bc475eee80c45b803f93b

belkin_rt3200_oem_bootlog.txt

F0: 102B 0000

F6: 0000 0000

V0: 0000 0000 [0001]

00: 0000 0000

BP: 0000 0041 [0000]

This file has been truncated. show original

I will start compiling the OEM sources with some printk in the following hours to understand what's going on...

However I found another strange thing but I don't know if it's normal or not... If I dump the OOB of the same area from U-Boot and from OpenWrt I get similar things in different places, (I don't know if matters but in OpenWrt I set the free area to: offset = 2 and length = 62)

OPENWRT:
  OOB Data: ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

U-BOOT:
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a 00 f5
ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff ff 0a 00 f5 ff ff ff ff ff ff ff ff

OPENWRT:
  OOB Data: ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

U-BOOT:
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a 00 f5
ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff

OPENWRT:
  OOB Data: ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

U-BOOT:
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2
ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff

This is the whole output from where I got the previous OOB values:

gist.github.com

https://gist.github.com/DavideFioravanti/df246916562f1c454886f23ee971754d

rt3200-oob-openwrt-uboot.txt

root@OpenWrt:/# nanddump -p -o -l 0x1000 -s 0x280000  /dev/mtd12
ECC failed: 0
ECC corrected: 0
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00280000 and ending at 0x00281000...
ECC: 1 corrected bitflip(s) at offset 0x00280000
0x00280000: 68 73 71 73 77 0b 00 00 d9 21 46 5f 00 00 04 00
0x00280010: 5f 00 00 00 04 00 12 00 c0 06 01 00 04 00 00 00

This file has been truncated. show original

numero53 · January 19, 2021, 2:52am

I added some printk to the OEM firmware in the target/linux/mediatek/files/drivers/mtd/nand/mtk_snand.c file after the dev_warn(dev, "[mtk_snand] probe successfully!\n"); to understand how the oob area is divided:

[    1.598120] N53: nand_chip->ecc.size = 2048
[    1.602299] N53: nand_chip->ecc.bytes = 32
[    1.606393] N53: mtd->writesize = 2048
[    1.610138] N53: mtd->oobsize = 64
[    1.613538] N53: mtd->oobavail = 30
[    1.617022] N53: mtd->erasesize = 131072
[    1.620944] N53: snfc->use_bmt = 1
[    1.624341] ### ECC REGION ###
[    1.627394] N53: mtd_ooblayout_ecc(mtd, 0, &ooeccbregion1) = 0
[    1.633222] N53: REGION 0
[    1.635837] N53: ooeccbregion1->offset = 32
[    1.640015] N53: ooeccbregion1->length = 32
[    1.644193] N53: mtd_ooblayout_ecc(mtd, 1, &ooeccbregion2) = -34
[    1.650194] N53: NO REGION 1
[    1.653069] ### FREE REGION ###
[    1.656206] N53: mtd_ooblayout_free(mtd, 0, &oobfreeregion1) = 0
[    1.662207] N53: REGION 0
[    1.664822] N53: oobfreeregion1->offset = 2
[    1.669000] N53: oobfreeregion1->length = 30
[    1.673266] N53: mtd_ooblayout_free(mtd, 1, &oobfreeregion2) = -34
[    1.679439] N53: NO REGION 1

So there is an offset of two bytes, then there are 30 bytes for the free area and then 32 bytes for ecc. I am testing this configuration right now and seems promising... Do you know how to do a proper and extensive test? I have installed some big packages and after few reboots I don't see any CRC error...

EDIT:
After some more reboots I got the first CRC problem

[   23.047630] jffs2: notice: (762) check_node_data: wrong data CRC in data node at 0x003f83a4: read 0xb02be10d, calculated 0xa13b9885.

Just out of curiosity I added nandutils to the OEM FW I got another OOB "structure" :

nanddump -p -o -l 0x1 -s 0x28F000 /dev/mtd12 | grep "OOB"

OEM FW:
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

OPENWRT (OLD FOR COMPARISON):
  OOB Data: ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  OOB Data: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

U-BOOT (OLD FOR COMPARISON):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2
ff ff ff ff ff ff ff ff ff ff ff ff ff 2c 01 d2 ff ff ff ff ff 2c 01 d2 ff ff ff ff ff ff ff ff

robimarko · January 19, 2021, 9:39am

Nice detective work.
It would be great to use nandtest from initramfs, that should test the data integrity as well as ECC on the whole NAND.

numero53 · January 19, 2021, 8:31pm

Ok probably the main problem is the BMT. After another fake bad block problem I dumped the OOB again:

U-BOOT:
00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0a 00 f5
ff ff ff ff ff ff ff ff ff ff 85 19 03 20 08 00 ff ff ff ff ff 0a 00 f5 ff ff ff ff ff ff ff ff

OPENWRT:
  OOB Data: ff ff 85 19 03 20 08 00 00 00 ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Excluding the double 0a 00 f5 that is still a mistery, the rest of the OpenWrt's OOB data is splitted in two parts and moved around.
The 00 00 at the end is moved to the first 2 bytes of the OOB area in the U-Boot dump, causing the fake bad blocks.
The rest of the OOB data is moved to 43rd - 48th bytes.

Probably it's time to ask nbd about this...

numero53 · January 30, 2021, 5:29pm

New tests... I tried editing directly the OOB area and I got a confirmation of what I tought.
I wrote this sample data to the OOB area of a TEST partition created by me in this position: 0x000002000000-0x000002300000 and it corresponds to /dev/mtd12

root@OpenWrt:/# hexdump -C /tmp/ZERO_OOB_4.bin 
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000800  ff 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
00000810  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|
00000820  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
*
00000840

root@OpenWrt:/# nandwrite -O /dev/mtd12 /tmp/ZERO_OOB_4.bin
Writing data to block 0 at offset 0x0

And it got written correctly

root@OpenWrt:/# nanddump -c -o -l 0x800  /dev/mtd12 | grep OOB
ECC failed: 0
ECC corrected: 4
Number of bad blocks: 0
Number of bbt blocks: 0
Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x00000800...
  OOB Data: ff 01 02 03 04 00 00 06 08 09 0a 0b 0c 0d 0e 0f  |................|
  OOB Data: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f  |................|
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  |................|
  OOB Data: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  |................|

However, reading the same address from U-Boot shows this instead:

MT7622> nand dump 0x2000000

[mtk_snand_check_bch_error] ECC-U, PA=65152, S=0
[mtk_snand_check_bch_error] ECC-U, PA=65152, S=1
[mtk_snand_check_bch_error] ECC-U, PA=65152, S=2
NFI, flag byte: 0 NFI, This page is occupied!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
Address 2000000 dump (2048):
ff ff [...] (I cut this part for brevity)

OOB (64):
08 09 0a 0b 0c 0d 0e 0f 18 19 1a 1b 1c 1d 1e 1f ff ff ff ff ff ff ff ff ff ff ff ff ff 0a 00 f5
10 11 12 13 14 15 16 17 ff 01 02 03 04 00 00 06 ff ff ff ff ff 0a 00 f5 ff ff ff ff ff ff ff ff

The rest of the OOB data is in the previous nand's pages:

MT7622> nand dump 0x1fff800

Address 1fff800 dump (2048):
ff ff [...] (I cut this part for brevity)

OOB (64):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10 11 12 13 14 15 16 17 ff ff ff ff ff 0a 00 f5
ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 ff 01 02 03 04 00 00 06 ff ff ff ff ff ff ff ff

MT7622> nand dump 0x1fff000

Address 1fff000 dump (2048):
ff ff [...] (I cut this part for brevity)

OOB (64):
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 01 02 03 04 00 00 06
ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 ff ff ff ff ff ff 00 00 ff ff ff ff ff ff ff ff

@nbd can you confirm that the BMT in yout staging tree is working correctly for Elecom WRC-2533gent? (My only change to your BMT was the BB_TABLE_MAX from 0x2000U to 0x1000U)

nbd · February 4, 2021, 1:24pm

Hi,

I've added preliminary support for E8450 to my staging tree:
https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=summary
I've fixed up the flash chip support patch. Turns out the chip is unstable if support for the X4 ops is not included. I've also fixed the ECC layout - based on my reading it's pretty much the same as on Winbond flash chips, so I copied over the ops from there.
Effectively the distinction between ECC covered and non-ECC areas in OOB is irrelevant, since the driver disables the flash chip's ECC support anyway. The controller handles ECC completely by itself.
I've also changed the BMT support patch to allow configuring the table size via device tree.
Please test it and let me know if it works for you.
There was also a bug in the BMT patch regarding larger writes to OOB, which I've fixed.

numero53 · February 5, 2021, 1:31am

Hi nbd,
thanks for your answer and your time

I was wondering why there was the patch to disable ECC, now I understand it. Thanks for the explanation.

I did some tests and the "fake bad blocks" problem seems to be fixed.
However, I think that we still have some problems with the nand (maybe its cache?).
I still get some wrong data CRC in data node in the bootlog and then data corruption.
This is what I do for testing:

Write 10MB of random data:
dd if=/dev/urandom of=/root/random.file bs=1M count=10;
Sync the file system (I don't know if it's necessary):
sync;
Calculate and store the md5 of the aforementioned file:
md5sum /root/random.file > /root/random.file.md5;
Sync again the file system (I don't know if it's necessary):
sync;
Reboot the system:
reboot
After the reboot, calculate the md5 again and compare it with the previous stored md5sum and sometimes I have different values:

root@OpenWrt:/# md5sum /root/random.file 
58b802ed8d52b18b8552d3a9999337f8  /root/random.file
root@OpenWrt:/# cat /root/random.file.md5;
8506b23e9ea97fb6bb325f74c324bf8a  /root/random.file

This is the complete bootlog:

gist.github.com

https://gist.github.com/DavideFioravanti/da60059266d7a6541b5f3f5647a5104e

belkin_rt3200_openwrt_nbd_jffs2_corruption.txt

F0: 102B 0000

F6: 0000 0000

V0: 0000 0000 [0001]

00: 0000 0000

BP: 0000 0041 [0000]

This file has been truncated. show original

If my tests are right, I think that we still have some problems.
If you have access to the router, could you test it?

P.s.
I have some small fixes to your commit (lan4 and MT7915 were not working):

diff --git a/target/linux/mediatek/image/mt7622.mk b/target/linux/mediatek/image/mt7622.mk
index 1d8501f579..fb0bc45fa7 100644
--- a/target/linux/mediatek/image/mt7622.mk
+++ b/target/linux/mediatek/image/mt7622.mk
@@ -42,7 +42,7 @@ define Device/linksys_e8450
   DEVICE_DTS := mt7622-linksys-e8450
   DEVICE_DTS_DIR := $(DTS_DIR)/mediatek
   DEVICE_PACKAGES := kmod-usb-ohci kmod-usb2 kmod-usb3 kmod-ata-ahci-mtk \
-                    kmod-mt7615e kmod-mt7615-firmware kmod-mt7915
+                    kmod-mt7615e kmod-mt7615-firmware kmod-mt7915e
 endef
 TARGET_DEVICES += linksys_e8450
 
diff --git a/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network b/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network
index 4590c0bd8e..9a03141470 100755
--- a/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network
+++ b/target/linux/mediatek/mt7622/base-files/etc/board.d/02_network
@@ -10,10 +10,10 @@ mediatek_setup_interfaces()
 
        case $board in
        bananapi,bpi-r64-rootdisk|\
-       bananapi,bpi-r64|\
-       linksys,e8450)
+       bananapi,bpi-r64)
                ucidef_set_interfaces_lan_wan "lan0 lan1 lan2 lan3" wan
                ;;
+       linksys,e8450|\
        mediatek,mt7622-rfb1)
                ucidef_set_interfaces_lan_wan "lan1 lan2 lan3 lan4" wan
                ;;

P.p.s.
I would also like to thank you for your amazing work with MT7622. I can get gigabit speed using the HW NAT Acceleration without any CPU usage!
W/O HW NAT Acceleration:

W/ HW NAT Acceleration:

daniel · February 23, 2021, 12:20am

(edit: all work included in upstream OpenWrt by now)
I've hacked up replacement bl2(==Preloader) as well as ATF and U-Boot more or less completely from source and enabled support for UBI in U-boot, allowing for a more robust way to deal with the NAND flash.
I'm still waiting for MTK to update patches for U-Boot and ATF 2.4 in order to get rid of the nandx driver and have support for HWECC also with vanilla U-Boot's mtk-snfi-spi driver.
But as it is, everything already works quite well.

PoC generator for factory installer images

numero53 · February 23, 2021, 1:23am

Hi @daniel, that's a great news!
I have to confess that I was following the irc channel so I already knew that great news were coming... But I didn't expect an image that replace the U-boot automatically! That's very cool!
I think I will start to test it very soon also because I probably filled up the BMT pool due to the "fake" bad block problem I had in the previous weeks.

numero53 · February 23, 2021, 1:33am

However I have a question for you: Can I use your "UBI version" even if I have a bad block somewhere before the "factory" partition? I am sure of this, because if I don't use the BMT the rest of the flash is shifted of a block and so it can't load the calibration data, while if I use the BMT it works correctly.

(Sorry for the double post, I hit CTRL-Enter and it got posted automatically)

daniel · February 23, 2021, 9:57am

Sounds dangerous... BMT is switched off for UBI, as UBI is handling bad blocks for us. Having offsets changing because of bad blocks is a very wrong design and cannot work with UBI.
Hence you will have to make sure that factory starts at the right offset when BMT is switched off, ie. backup it with BMT switched on and write it back to the correct offset with BMT switched off (the new U-Boot acquired it's ethaddr from factory partition as well)...
So you have to make sure the following area on flash do not have any bad blocks on them (ie. BBT switched off)

0x00000000 - 0x00020000 (Preloader)
0x00080000 - 0x00160000 (BL31+U-Boot)
0x001c0000 - 0x002c0000 (Factory)

Ie. you may be lucky and the bad block resides somewhere between 0x20000 - 0x80000 or 0x160000 - 0x1c0000. If not, you will have to wait for me porting UBI SPL for TF-A...
Try booting non-BMT Linux with initramfs and carefully try erasing, writing and reading back the 128kB blocks in those unused areas -- if you win the lottery, you find the bad block there and it's all good, all needed will be making backup of factory with BBT switched on and then writing it back to the right offset with BBT switched off.

daniel · February 23, 2021, 8:44pm

Update: I've improved the installer to be able to detect if something happened to factory and in that case about and let the user decide what to do.

daniel · February 24, 2021, 4:42pm

Update 2: I've implemented relocation of the eeprom-data block and mac-adresses block in factory partition when running the installer, ie. if BBT/BMT did some mess there before, this is detected and fixed, so offsets are then correct without BBT/BMT running.
See

github.com

dangowrt/linksys-e8450-openwrt-installer/blob/main/files/installer/install.sh#L19


echo
INSTALLER_DIR="/installer"
PRELOADER="$INSTALLER_DIR/openwrt-mediatek-mt7622-linksys_e8450-ubi-preloader.bin"
FIP="$INSTALLER_DIR/openwrt-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip"
RECOVERY="$INSTALLER_DIR/openwrt-mediatek-mt7622-linksys_e8450-ubi-initramfs-recovery.fit"
FIT="$INSTALLER_DIR/openwrt-mediatek-mt7622-linksys_e8450-ubi-squashfs-sysupgrade.fit"
HAS_ENV=1
install_fix_factory() {
	local mtddev=$1
	local ebs=$(cat /sys/class/mtd/$(basename $mtddev)/erasesize)
	local off=0
	local skip=0
	local found
	while [ $((off)) -lt $((2 * ebs)) ]; do
		magic="$(hexdump -v -s $off -n 2 -e '"%02x"' $1)"
		if [ "$magic" = "7622" ]; then
			found=1