A wrong raw-nand driver can corrupt or even damage the nand... A wrong spi-nand driver can corrupt the nand? I don't have a nand programmer and solder skills to desolder a nand so If anything goes wrong I can't repair my unit. Is there a sort of "risk-free-test-mode"?
The Fidelix FM35Q1GA seems to be a re-branded version of the Dosilicon DS35Q1GA (Datasheet: http://www.benhong.cn/upload/datasheet/DS35X1GAXXX_100_rev00.pdf). I am quite sure about this because they both share the same NAND ID and Dosilicon bought Fidelix.
I didn't find a "risk-free-test-mode" so i tested directly my new driver and seems to work well reading data, but I have sometimes problems writing data, but I don't know if it's related to my code.
In the meantime I will explain how to manage the oob area, or at least what I understood:
Example 1: MX35LF2GE4AB Linux Kernel
static int mx35lfxge4ab_ooblayout_ecc(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
return -ERANGE;
}
static int mx35lfxge4ab_ooblayout_free(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
if (section)
return -ERANGE;
region->offset = 2;
region->length = mtd->oobsize - 2;
return 0;
}
Note1: region->offset = 2 because the first 2 bytes can contains information about the bad block
Note2: mtd->oobsize = 64 Source
So in the upstream implementation, the whole oob area is used as free area excluding the first two bytes which are used for bad block information. Even the reserved area is used as free area. However the ecc area is not defined probably because of this line in the datasheet: the ECC parity code can be calculated properly and stored in the additional hidden spare area
Example 2: MT29F2G01AAAED Linux Kernel
static int micron_4_ooblayout_ecc(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
struct spinand_device *spinand = mtd_to_spinand(mtd);
if (section >= spinand->base.memorg.pagesize /
mtd->ecc_step_size)
return -ERANGE;
region->offset = (section * 16) + 8;
region->length = 8;
return 0;
}
static int micron_4_ooblayout_free(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
struct spinand_device *spinand = mtd_to_spinand(mtd);
if (section >= spinand->base.memorg.pagesize /
mtd->ecc_step_size)
return -ERANGE;
if (section) {
region->offset = 16 * section;
region->length = 8;
} else {
/* section 0 has two bytes reserved for the BBM */
region->offset = 2;
region->length = 6;
}
return 0;
}
Even here the first two bytes are used for bad block information and the reserved area is used as free area. However the ecc information is stored in the "ECC for main/spare#" sections.
You can see that the kernel sees an SPI NAND device with ID 00e571e5
But that ID is not matched any supported ID, hence the error.
So, you need to add support for the NAND under the SPI NAND framework.
It should not be hard as SPI NAND are 99% the same things and speak the same instructions.
From the looks ID that it returns has a dummy first byte, then the next byte is manufacturer ID and then the third one is the device ID.
This is really common, and there is a generic function for this just the Manufacturer code needs to be added to the table.
I would urge you to not mod an existing driver but add a new vendor driver directly.
As far as ECC goes it seems that spare area should not be used as its not ECC protected.
According to the datasheet it seems very simillar to MX35LF2GE4AB.
So I should set the same rules for the ecc (0 bytes) and free space (62 bytes). Right?
However if I use that scheme I get a lot of "fake" bad blocks in the U-boot after the first openwrt boot:
MT7622> nand bad
Device 0 bad blocks:
Bad block detected at 0x1b80, oob_buf[0] is 0x0
Bad block detected at 0xec40, oob_buf[0] is 0x0
Bad block detected at 0x1c00, oob_buf[0] is 0x0
Bad block detected at 0xec80, oob_buf[0] is 0x0
Bad block detected at 0x1c80, oob_buf[0] is 0x0
Bad block detected at 0xecc0, oob_buf[0] is 0x0
Bad block detected at 0x1d00, oob_buf[0] is 0x0
Bad block detected at 0xed00, oob_buf[0] is 0x0
Bad block detected at 0x1d80, oob_buf[0] is 0x0
Bad block detected at 0xed40, oob_buf[0] is 0x0
Bad block detected at 0x1e00, oob_buf[0] is 0x0
Bad block detected at 0xed80, oob_buf[0] is 0x0
Bad block detected at 0x1e80, oob_buf[0] is 0x0
Bad block detected at 0xedc0, oob_buf[0] is 0x0
Bad block detected at 0x1f00, oob_buf[0] is 0x0
Bad block detected at 0xee00, oob_buf[0] is 0x0
Bad block detected at 0x1f80, oob_buf[0] is 0x0
Bad block detected at 0xee40, oob_buf[0] is 0x0
Bad block detected at 0x2000, oob_buf[0] is 0x0
Bad block detected at 0xee80, oob_buf[0] is 0x0
Bad block detected at 0x2080, oob_buf[0] is 0x0
Bad block detected at 0xeec0, oob_buf[0] is 0x0
Bad block detected at 0x2100, oob_buf[0] is 0x0
Bad block detected at 0xef00, oob_buf[0] is 0x0
Bad block detected at 0x2180, oob_buf[0] is 0x0
Bad block detected at 0xef40, oob_buf[0] is 0x0
Bad block detected at 0x2200, oob_buf[0] is 0x0
Bad block detected at 0xef80, oob_buf[0] is 0x0
Bad block detected at 0x2280, oob_buf[0] is 0x0
Bad block detected at 0xefc0, oob_buf[0] is 0x0
Bad block detected at 0x2300, oob_buf[0] is 0x0
Bad block detected at 0xf000, oob_buf[0] is 0x0
Bad block detected at 0x2380, oob_buf[0] is 0x0
Bad block detected at 0xf040, oob_buf[0] is 0x0
Bad block detected at 0x2400, oob_buf[0] is 0x0
Bad block detected at 0xf080, oob_buf[0] is 0x0
Bad block detected at 0x2480, oob_buf[0] is 0x0
Bad block detected at 0xf0c0, oob_buf[0] is 0x0
Bad block detected at 0x2500, oob_buf[0] is 0x0
Bad block detected at 0xf100, oob_buf[0] is 0x0
Bad block detected at 0x2580, oob_buf[0] is 0x0
Bad block detected at 0xf140, oob_buf[0] is 0x0
Bad block detected at 0x2600, oob_buf[0] is 0x0
Bad block detected at 0xf180, oob_buf[0] is 0x0
Bad block detected at 0x2680, oob_buf[0] is 0x0
Bad block detected at 0xf1c0, oob_buf[0] is 0x0
Bad block detected at 0x2700, oob_buf[0] is 0x0
Bad block detected at 0xf200, oob_buf[0] is 0x0
Bad block detected at 0x2780, oob_buf[0] is 0x0
Bad block detected at 0xf240, oob_buf[0] is 0x0
Bad block detected at 0x2800, oob_buf[0] is 0x0
Bad block detected at 0xf280, oob_buf[0] is 0x0
Bad block detected at 0x2880, oob_buf[0] is 0x0
Bad block detected at 0xf2c0, oob_buf[0] is 0x0
Bad block detected at 0x2900, oob_buf[0] is 0x0
Bad block detected at 0xf300, oob_buf[0] is 0x0
Bad block detected at 0x2980, oob_buf[0] is 0x0
Bad block detected at 0xf340, oob_buf[0] is 0x0
Bad block detected at 0x2a00, oob_buf[0] is 0x0
Bad block detected at 0xf380, oob_buf[0] is 0x0
Bad block detected at 0x2a80, oob_buf[0] is 0x0
Bad block detected at 0xf3c0, oob_buf[0] is 0x0
Bad block detected at 0x2b00, oob_buf[0] is 0x0
Bad block detected at 0xf400, oob_buf[0] is 0x0
Bad block detected at 0x2b80, oob_buf[0] is 0x0
Bad block detected at 0xf440, oob_buf[0] is 0x0
Bad block detected at 0x2c00, oob_buf[0] is 0x0
Bad block detected at 0xf480, oob_buf[0] is 0x0
Bad block detected at 0x2c80, oob_buf[0] is 0x0
Bad block detected at 0xf4c0, oob_buf[0] is 0x0
Bad block detected at 0x2d00, oob_buf[0] is 0x0
Bad block detected at 0xf500, oob_buf[0] is 0x0
Bad block detected at 0x2d80, oob_buf[0] is 0x0
Bad block detected at 0xf540, oob_buf[0] is 0x0
Bad block detected at 0x2e00, oob_buf[0] is 0x0
Bad block detected at 0xf580, oob_buf[0] is 0x0
Bad block detected at 0x2e80, oob_buf[0] is 0x0
Bad block detected at 0xf5c0, oob_buf[0] is 0x0
Bad block detected at 0x2f00, oob_buf[0] is 0x0
Bad block detected at 0xf600, oob_buf[0] is 0x0
Bad block detected at 0x2f80, oob_buf[0] is 0x0
Bad block detected at 0xf640, oob_buf[0] is 0x0
Bad block detected at 0x3000, oob_buf[0] is 0x0
Bad block detected at 0xf680, oob_buf[0] is 0x0
Bad block detected at 0x3080, oob_buf[0] is 0x0
Bad block detected at 0xf6c0, oob_buf[0] is 0x0
Bad block detected at 0x3100, oob_buf[0] is 0x0
Bad block detected at 0xf700, oob_buf[0] is 0x0
Bad block detected at 0x3180, oob_buf[0] is 0x0
Bad block detected at 0xf740, oob_buf[0] is 0x0
Bad block detected at 0x3200, oob_buf[0] is 0x0
Bad block detected at 0xf780, oob_buf[0] is 0x0
Bad block detected at 0x3280, oob_buf[0] is 0x0
Bad block detected at 0xf7c0, oob_buf[0] is 0x0
Bad block detected at 0x3300, oob_buf[0] is 0x0
Bad block detected at 0xf800, oob_buf[0] is 0x0
Bad block detected at 0x3380, oob_buf[0] is 0x0
Bad block detected at 0xf840, oob_buf[0] is 0x0
Bad block detected at 0x3400, oob_buf[0] is 0x0
Bad block detected at 0xf880, oob_buf[0] is 0x0
Bad block detected at 0x3480, oob_buf[0] is 0x0
Bad block detected at 0xf8c0, oob_buf[0] is 0x0
Bad block detected at 0x3500, oob_buf[0] is 0x0
Bad block detected at 0xf900, oob_buf[0] is 0x0
Bad block detected at 0x3580, oob_buf[0] is 0x0
Bad block detected at 0xf940, oob_buf[0] is 0x0
Bad block detected at 0x3600, oob_buf[0] is 0x0
Bad block detected at 0xf980, oob_buf[0] is 0x0
Bad block detected at 0x3680, oob_buf[0] is 0x0
Bad block detected at 0xf9c0, oob_buf[0] is 0x0
Bad block detected at 0x3700, oob_buf[0] is 0x0
Bad block detected at 0xfa00, oob_buf[0] is 0x0
Bad block detected at 0x3780, oob_buf[0] is 0x0
Bad block detected at 0xfa40, oob_buf[0] is 0x0
Bad block detected at 0x3800, oob_buf[0] is 0x0
Bad block detected at 0xfa80, oob_buf[0] is 0x0
Bad block detected at 0x3880, oob_buf[0] is 0x0
Bad block detected at 0xfac0, oob_buf[0] is 0x0
Bad block detected at 0x3900, oob_buf[0] is 0x0
Bad block detected at 0xfb00, oob_buf[0] is 0x0
Bad block detected at 0x3980, oob_buf[0] is 0x0
Bad block detected at 0xfb40, oob_buf[0] is 0x0
Bad block detected at 0x3a00, oob_buf[0] is 0x0
Bad block detected at 0xfb80, oob_buf[0] is 0x0
Bad block detected at 0x3a80, oob_buf[0] is 0x0
Bad block detected at 0xfbc0, oob_buf[0] is 0x0
Bad block detected at 0x3b00, oob_buf[0] is 0x0
Bad block detected at 0xfc00, oob_buf[0] is 0x0
Bad block detected at 0x3b80, oob_buf[0] is 0x0
Bad block detected at 0xfc40, oob_buf[0] is 0x0
Bad block detected at 0x3c00, oob_buf[0] is 0x0
Bad block detected at 0xfc80, oob_buf[0] is 0x0
Bad block detected at 0x3c80, oob_buf[0] is 0x0
Bad block detected at 0xfcc0, oob_buf[0] is 0x0
Bad block detected at 0x3d00, oob_buf[0] is 0x0
Bad block detected at 0xfd00, oob_buf[0] is 0x0
Bad block detected at 0x3d80, oob_buf[0] is 0x0
Bad block detected at 0xfd40, oob_buf[0] is 0x0
Bad block detected at 0x3e00, oob_buf[0] is 0x0
Bad block detected at 0xfd80, oob_buf[0] is 0x0
Bad block detected at 0x3e80, oob_buf[0] is 0x0
Bad block detected at 0xfdc0, oob_buf[0] is 0x0
Bad block detected at 0x3f00, oob_buf[0] is 0x0
Bad block detected at 0xfe00, oob_buf[0] is 0x0
Bad block detected at 0x3f80, oob_buf[0] is 0x0
Bad block detected at 0xfe40, oob_buf[0] is 0x0
Bad block detected at 0x4000, oob_buf[0] is 0x0
Bad block detected at 0xfe80, oob_buf[0] is 0x0
Bad block detected at 0x4080, oob_buf[0] is 0x0
Bad block detected at 0xfec0, oob_buf[0] is 0x0
Bad block detected at 0x4100, oob_buf[0] is 0x0
Bad block detected at 0xff00, oob_buf[0] is 0x0
Bad block detected at 0x4180, oob_buf[0] is 0x0
Bad block detected at 0xff40, oob_buf[0] is 0x0
Bad block detected at 0x4200, oob_buf[0] is 0x0
Bad block detected at 0xff80, oob_buf[0] is 0x0
Bad block detected at 0x4280, oob_buf[0] is 0x0
Bad block detected at 0x42c0, oob_buf[0] is 0x0
Bad block detected at 0x4300, oob_buf[0] is 0x0
Bad block detected at 0x4340, oob_buf[0] is 0x0
Bad block detected at 0x4380, oob_buf[0] is 0x0
Bad block detected at 0x43c0, oob_buf[0] is 0x0
Bad block detected at 0x4400, oob_buf[0] is 0x0
Bad block detected at 0x4440, oob_buf[0] is 0x0
Bad block detected at 0x4480, oob_buf[0] is 0x0
Bad block detected at 0x44c0, oob_buf[0] is 0x0
Bad block detected at 0x4500, oob_buf[0] is 0x0
Bad block detected at 0x4540, oob_buf[0] is 0x0
Bad block detected at 0x4580, oob_buf[0] is 0x0
Bad block detected at 0x45c0, oob_buf[0] is 0x0
Bad block detected at 0x4600, oob_buf[0] is 0x0
Bad block detected at 0xec40, oob_buf[0] is 0x0
Bad block detected at 0xec80, oob_buf[0] is 0x0
Bad block detected at 0xecc0, oob_buf[0] is 0x0
Bad block detected at 0xed00, oob_buf[0] is 0x0
Bad block detected at 0xed40, oob_buf[0] is 0x0
Bad block detected at 0xed80, oob_buf[0] is 0x0
Bad block detected at 0xedc0, oob_buf[0] is 0x0
Bad block detected at 0xee00, oob_buf[0] is 0x0
Bad block detected at 0xee40, oob_buf[0] is 0x0
Bad block detected at 0xee80, oob_buf[0] is 0x0
Bad block detected at 0xeec0, oob_buf[0] is 0x0
Bad block detected at 0xef00, oob_buf[0] is 0x0
Bad block detected at 0xef40, oob_buf[0] is 0x0
Bad block detected at 0xef80, oob_buf[0] is 0x0
Bad block detected at 0xefc0, oob_buf[0] is 0x0
Bad block detected at 0xf000, oob_buf[0] is 0x0
Bad block detected at 0xf040, oob_buf[0] is 0x0
Bad block detected at 0xf080, oob_buf[0] is 0x0
Bad block detected at 0xf0c0, oob_buf[0] is 0x0
Bad block detected at 0xf100, oob_buf[0] is 0x0
Bad block detected at 0xf140, oob_buf[0] is 0x0
Bad block detected at 0xf180, oob_buf[0] is 0x0
Bad block detected at 0xf1c0, oob_buf[0] is 0x0
Bad block detected at 0xf200, oob_buf[0] is 0x0
Bad block detected at 0xf240, oob_buf[0] is 0x0
Bad block detected at 0xf280, oob_buf[0] is 0x0
Bad block detected at 0xf2c0, oob_buf[0] is 0x0
Bad block detected at 0xf300, oob_buf[0] is 0x0
Bad block detected at 0xf340, oob_buf[0] is 0x0
Bad block detected at 0xf380, oob_buf[0] is 0x0
Bad block detected at 0xf3c0, oob_buf[0] is 0x0
Bad block detected at 0xf400, oob_buf[0] is 0x0
Bad block detected at 0xf440, oob_buf[0] is 0x0
Bad block detected at 0xf480, oob_buf[0] is 0x0
Bad block detected at 0xf4c0, oob_buf[0] is 0x0
Bad block detected at 0xf500, oob_buf[0] is 0x0
Bad block detected at 0xf540, oob_buf[0] is 0x0
Bad block detected at 0xf580, oob_buf[0] is 0x0
Bad block detected at 0xf5c0, oob_buf[0] is 0x0
Bad block detected at 0xf600, oob_buf[0] is 0x0
Bad block detected at 0xf640, oob_buf[0] is 0x0
Bad block detected at 0xf680, oob_buf[0] is 0x0
Bad block detected at 0xf6c0, oob_buf[0] is 0x0
Bad block detected at 0xf700, oob_buf[0] is 0x0
Bad block detected at 0xf740, oob_buf[0] is 0x0
Bad block detected at 0xf780, oob_buf[0] is 0x0
Bad block detected at 0xf7c0, oob_buf[0] is 0x0
Bad block detected at 0xf800, oob_buf[0] is 0x0
Bad block detected at 0xf840, oob_buf[0] is 0x0
Bad block detected at 0xf880, oob_buf[0] is 0x0
Bad block detected at 0xf8c0, oob_buf[0] is 0x0
Bad block detected at 0xf900, oob_buf[0] is 0x0
Bad block detected at 0xf940, oob_buf[0] is 0x0
Bad block detected at 0xf980, oob_buf[0] is 0x0
Bad block detected at 0xf9c0, oob_buf[0] is 0x0
Bad block detected at 0xfa00, oob_buf[0] is 0x0
Bad block detected at 0xfa40, oob_buf[0] is 0x0
Bad block detected at 0xfa80, oob_buf[0] is 0x0
Bad block detected at 0xfac0, oob_buf[0] is 0x0
Bad block detected at 0xfb00, oob_buf[0] is 0x0
Bad block detected at 0xfb40, oob_buf[0] is 0x0
Bad block detected at 0xfb80, oob_buf[0] is 0x0
Bad block detected at 0xfbc0, oob_buf[0] is 0x0
Bad block detected at 0xfc00, oob_buf[0] is 0x0
Bad block detected at 0xfc40, oob_buf[0] is 0x0
Bad block detected at 0xfc80, oob_buf[0] is 0x0
Bad block detected at 0xfcc0, oob_buf[0] is 0x0
Bad block detected at 0xfd00, oob_buf[0] is 0x0
Bad block detected at 0xfd40, oob_buf[0] is 0x0
Bad block detected at 0xfd80, oob_buf[0] is 0x0
Bad block detected at 0xfdc0, oob_buf[0] is 0x0
Bad block detected at 0xfe00, oob_buf[0] is 0x0
Bad block detected at 0xfe40, oob_buf[0] is 0x0
Bad block detected at 0xfe80, oob_buf[0] is 0x0
Bad block detected at 0xfec0, oob_buf[0] is 0x0
Bad block detected at 0xff00, oob_buf[0] is 0x0
Bad block detected at 0xff40, oob_buf[0] is 0x0
Bad block detected at 0xff80, oob_buf[0] is 0x0
The system boots correctly but after some time and some test writes the system can't boot anymore:
NAND read: device 0 offset 0x500000, size 0x2000
8192 bytes read: OK
[do_read_image_blks] This is a FIT image,img_size = 0x326328
[do_read_image_blks] img_blks = 0x64d
[do_read_image_blks] img_align_size = 0x326800
NAND read: device 0 offset 0x500000, size 0x326800
[mtk_snand_check_bch_error] ECC-U, PA=2646, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=2656, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=2681, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=2690, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=3379, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
3303424 bytes read: OK
bootm flag=0, states=70f
## Loading kernel from FIT Image at 4007ff28 ...
Using 'config@1' configuration
Trying 'kernel@1' kernel subimage
Description: ARM64 OpenWrt Linux-5.10.4
Type: Kernel Image
Compression: lzma compressed
Data Start: 0x40080010
Data Size: 3277140 Bytes = 3.1 MiB
Architecture: AArch64
OS: Linux
Load Address: 0x44000000
Entry Point: 0x44000000
Hash algo: crc32
Hash value: fbc80c8a
Hash algo: sha1
Hash value: 087b7d78fca98c4cbeb2b615b4501093553d2791
Verifying Hash Integrity ... crc32 error!
Bad hash value for 'hash@1' hash node in 'kernel@1' image node
Bad Data Hash
ERROR: can't get kernel image!
MT7622>
To recover it I have to flash it from U-boot which seems to clear also the bad block status.
After some tests, if I I avoid the reserved sections in the spare sections, and then I change the free layout to this:
static int fm35x1ga_ooblayout_free(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
if (section > 3)
return -ERANGE;
region->offset = (8 * section) + 2;
region->length = 6;
return 0;
}
U-boot doesn't find any bad block but after some test writes I have some fs errors eg:
root@OpenWrt:~# dd if=/dev/urandom of=target-file bs=1M count=10
[ 80.999661] jffs2: Data CRC failed on REF_PRISTINE data node at 0x01315650: Read 0x46de0673, calculated 0xee7158e1
10+0 records in
10+0 records out
root@OpenWrt:~#
and later on I get the soft brick:
NAND read: device 0 offset 0x500000, size 0x2000
8192 bytes read: OK
[do_read_image_blks] This is a FIT image,img_size = 0x3261c0
[do_read_image_blks] img_blks = 0x64d
[do_read_image_blks] img_align_size = 0x326800
NAND read: device 0 offset 0x500000, size 0x326800
[mtk_snand_check_bch_error] ECC-U, PA=3542, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
[mtk_snand_check_bch_error] ECC-U, PA=3688, S=0
NFI, flag byte: ff NFI, flag byte: ff NFI, This page is empty!
[mtk_nand_exec_read_page]mtk_snand_check_bch_error() FAIL!!!
3303424 bytes read: OK
bootm flag=0, states=70f
## Loading kernel from FIT Image at 4007ff28 ...
Using 'config@1' configuration
Trying 'kernel@1' kernel subimage
Description: ARM64 OpenWrt Linux-5.10.4
Type: Kernel Image
Compression: lzma compressed
Data Start: 0x40080010
Data Size: 3276780 Bytes = 3.1 MiB
Architecture: AArch64
OS: Linux
Load Address: 0x44000000
Entry Point: 0x44000000
Hash algo: crc32
Hash value: 8c64d5ab
Hash algo: sha1
Hash value: f87db5ea221e43f9c9baca8635b9deefc4ba4f39
Verifying Hash Integrity ... crc32 error!
Bad hash value for 'hash@1' hash node in 'kernel@1' image node
Bad Data Hash
ERROR: can't get kernel image!
MT7622>
However I still have no bad block from the U-boot:
MT7622> nand bad
Device 0 bad blocks:
MT7622>
To avoid problems I didn't enable advanced read/write modes like X2 or dual or quad modes.
You can read the whole (very diry) test and the rest of the commit here:
I don't know if the problem is caused by my driver or something else... the MT7622 is a quite new target and to get it working I had to use the nbd staging tree because it has the new mediatek bad block management table which is currently missing. Moreover I had to use the 5.10 kernel because the 5.4 crashes as soon as the system
Dont have time to read through the whole post now, but I hope you were using 5.10.6 because of:
commit b00195241186db6e2fb5698afe67971b05b1a959
Author: Felix Fietkau <nbd@nbd.name>
Date: Tue Jan 5 11:18:21 2021 +0100
Revert "mtd: spinand: Fix OOB read"
This reverts stable commit baad618d078c857f99cc286ea249e9629159901f.
This commit is adding lines to spinand_write_to_cache_op, wheras the upstream
commit 868cbe2a6dcee451bd8f87cbbb2a73cf463b57e5 that this was supposed to
backport was touching spinand_read_from_cache_op.
It causes a crash on writing OOB data by attempting to write to read-only
kernel memory.
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anyway, as far as I know BBT is handled by the SPI-NAND framework, nothing MT7622 specific there.
How I understood the datasheet is that you pretty much should avoid touching the reserved/spare areas completely?
Thank you! Using that patch I don't get anymore the kernel panic on kernel 5.4. So now I have 5.4 and 5.10 booting, it didn't change the corruption problem.
I thought the same, untill I saw this commit in the nbd's staging tree:
Moreover If I don't use that patch, some partitions get shifted by 0x20000 (128kb = one block)... so I think that there is one bad block in the nand and it's stored in their custom bad block table (BBT), which is stored in the last valid block. However I don't think we can use a standard BBT because the bootloader uses the Mediatek one.
In the MX35LF2GE4AB spare area, only M1 is ECC protected but in the linux kernel they use the whole oob area (excluding the bad block bytes) even the reserved ones.
However I tried almost every combination of the free area... 1,2,4,6,32 bytes divided or not in sections but inevitably after some reboots or writes the rootfs_data became corrupted.
If I set 0 bytes available for the free area I get this error: inconsistent device description because of https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/fs/jffs2/wbuf.c#L1193
If I don't declare the free and ecc area at all [SPINAND_ECCINFO(NULL, NULL)] the kernel uses a default one not compatible with the BBT which generates a lot of fake bad block errors.
I don't know what to do anymore...
I start to think that there is a problem with something else... maybe some error in the dts? For example I cant get working the DUAL or QUAD SPI modes even if I set the SPINAND_HAS_QE_BIT...
Ahh, Mediatek is touching stuff they are not supposed to do.
Even Belkin has a patch to use the SPI-NAND without BMT, maybe you can try that.
Its 0500-mt7622-snand-without-bmt.patch
I was searching for how they handle the NAND, and they simply added the manufacturer ID and then they rely on legacy NAND ID matching.
Are you sure that the NAND even is connected in Dual or Quad mode?
It's usually not in regular single-mode on 99% of routers as SPI controllers inside don't support dual or quad modes
Good catch! Unfortunately I can't find any use of the disable-bmt property in the dts extracted from device:
Moreover in the OEM bootlog the BMT seems to start correctly (look for BMT at [1.296239]):
I will start compiling the OEM sources with some printk in the following hours to understand what's going on...
However I found another strange thing but I don't know if it's normal or not... If I dump the OOB of the same area from U-Boot and from OpenWrt I get similar things in different places, (I don't know if matters but in OpenWrt I set the free area to: offset = 2 and length = 62)
I added some printk to the OEM firmware in the target/linux/mediatek/files/drivers/mtd/nand/mtk_snand.c file after the dev_warn(dev, "[mtk_snand] probe successfully!\n"); to understand how the oob area is divided:
So there is an offset of two bytes, then there are 30 bytes for the free area and then 32 bytes for ecc. I am testing this configuration right now and seems promising... Do you know how to do a proper and extensive test? I have installed some big packages and after few reboots I don't see any CRC error...
EDIT:
After some more reboots I got the first CRC problem
[ 23.047630] jffs2: notice: (762) check_node_data: wrong data CRC in data node at 0x003f83a4: read 0xb02be10d, calculated 0xa13b9885.
Just out of curiosity I added nandutils to the OEM FW I got another OOB "structure" :
Excluding the double 0a 00 f5 that is still a mistery, the rest of the OpenWrt's OOB data is splitted in two parts and moved around.
The 00 00 at the end is moved to the first 2 bytes of the OOB area in the U-Boot dump, causing the fake bad blocks.
The rest of the OOB data is moved to 43rd - 48th bytes.
New tests... I tried editing directly the OOB area and I got a confirmation of what I tought.
I wrote this sample data to the OOB area of a TEST partition created by me in this position: 0x000002000000-0x000002300000 and it corresponds to /dev/mtd12
@nbd can you confirm that the BMT in yout staging tree is working correctly for Elecom WRC-2533gent? (My only change to your BMT was the BB_TABLE_MAX from 0x2000U to 0x1000U)
I've added preliminary support for E8450 to my staging tree: https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=summary
I've fixed up the flash chip support patch. Turns out the chip is unstable if support for the X4 ops is not included. I've also fixed the ECC layout - based on my reading it's pretty much the same as on Winbond flash chips, so I copied over the ops from there.
Effectively the distinction between ECC covered and non-ECC areas in OOB is irrelevant, since the driver disables the flash chip's ECC support anyway. The controller handles ECC completely by itself.
I've also changed the BMT support patch to allow configuring the table size via device tree.
Please test it and let me know if it works for you.
There was also a bug in the BMT patch regarding larger writes to OOB, which I've fixed.
I was wondering why there was the patch to disable ECC, now I understand it. Thanks for the explanation.
I did some tests and the "fake bad blocks" problem seems to be fixed.
However, I think that we still have some problems with the nand (maybe its cache?).
I still get some wrong data CRC in data node in the bootlog and then data corruption.
This is what I do for testing:
Write 10MB of random data: dd if=/dev/urandom of=/root/random.file bs=1M count=10;
Sync the file system (I don't know if it's necessary): sync;
Calculate and store the md5 of the aforementioned file: md5sum /root/random.file > /root/random.file.md5;
Sync again the file system (I don't know if it's necessary): sync;
Reboot the system: reboot
After the reboot, calculate the md5 again and compare it with the previous stored md5sum and sometimes I have different values:
P.p.s.
I would also like to thank you for your amazing work with MT7622. I can get gigabit speed using the HW NAT Acceleration without any CPU usage!
W/O HW NAT Acceleration:
(edit: all work included in upstream OpenWrt by now)
I've hacked up replacement bl2(==Preloader) as well as ATF and U-Boot more or less completely from source and enabled support for UBI in U-boot, allowing for a more robust way to deal with the NAND flash.
I'm still waiting for MTK to update patches for U-Boot and ATF 2.4 in order to get rid of the nandx driver and have support for HWECC also with vanilla U-Boot's mtk-snfi-spi driver.
But as it is, everything already works quite well.
Hi @daniel, that's a great news!
I have to confess that I was following the irc channel so I already knew that great news were coming... But I didn't expect an image that replace the U-boot automatically! That's very cool!
I think I will start to test it very soon also because I probably filled up the BMT pool due to the "fake" bad block problem I had in the previous weeks.
However I have a question for you: Can I use your "UBI version" even if I have a bad block somewhere before the "factory" partition? I am sure of this, because if I don't use the BMT the rest of the flash is shifted of a block and so it can't load the calibration data, while if I use the BMT it works correctly.
(Sorry for the double post, I hit CTRL-Enter and it got posted automatically)
Sounds dangerous... BMT is switched off for UBI, as UBI is handling bad blocks for us. Having offsets changing because of bad blocks is a very wrong design and cannot work with UBI.
Hence you will have to make sure that factory starts at the right offset when BMT is switched off, ie. backup it with BMT switched on and write it back to the correct offset with BMT switched off (the new U-Boot acquired it's ethaddr from factory partition as well)...
So you have to make sure the following area on flash do not have any bad blocks on them (ie. BBT switched off)
Ie. you may be lucky and the bad block resides somewhere between 0x20000 - 0x80000 or 0x160000 - 0x1c0000. If not, you will have to wait for me porting UBI SPL for TF-A...
Try booting non-BMT Linux with initramfs and carefully try erasing, writing and reading back the 128kB blocks in those unused areas -- if you win the lottery, you find the bad block there and it's all good, all needed will be making backup of factory with BBT switched on and then writing it back to the right offset with BBT switched off.
Update 2: I've implemented relocation of the eeprom-data block and mac-adresses block in factory partition when running the installer, ie. if BBT/BMT did some mess there before, this is detected and fixed, so offsets are then correct without BBT/BMT running.
See