Kernel changes needed for mpc85xx NAND ECC under 5.10?

Hi all,

A while back, I got OpenWrt working on my EdgeCore ECW7210's. They're Freescale P1020-based, and not too different from a few other models.

The device support never actually got merged, mostly because I was too slow and inexperienced to get the PR together quickly. That said, I've been keeping these devices running for myself, since a set of three of them are my home network.

I went to build a snapshot image, and ran into some trouble. Specifically, the NAND flash wouldn't show up:

nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xdc
nand: Micron MT29F4G08ABADAWP
nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
fsl,elbc-fcm-nand: probe of ffa00000.nand failed with error -22

As near as I can tell, this is because the Linux kernel's internal description of how the NAND is handling ECC has outpaced the driver for the built-in NAND controller in the P1020 processors. Adding a new case to handle NAND_ECC_ENGINE_TYPE_ON_HOST seems to get it working for me.

(yes, that comment seems to be foreshadowing)

That said, I could be wrong. Maybe there's just some magic incantation that needs to be in the device tree to get it working. Not my area of expertise.

This post isn't a github issue or a PR - I'm looking for some high-level guidance first.

First, if I understand this correctly, this ought to affect other P1020-based devices with NAND, right? That includes the OCEDO Panda and the Sophos RED 15w. Am I just the first one to notice this issue? Does the OpenWRT community represent everyone that runs the P1020 under newer kernels? Are there even other people running OpenWrt on these devices? ... Or am I just doing something uniquely wrong?

Second, how should I approach getting this fixed? Do I try to get a PR in to the kernel? Or get a PR in to add a kernel patch to OpenWRT? Should I try again to merge support for my EdgeCores in too? Or just keep them running in my own fork? Or is this older hardware that no one else cares about any more?

Any impressions or suggestions are appreciated.

So, funny, the AP370 (which is another P1020E device with an open PR that hasn't gotten merged) actually has:

[    9.062526] nand: device found, Manufacturer ID: 0xec, Chip ID: 0xd3
[    9.138709] nand: Samsung NAND 1GiB 3,3V 8-bit
[    9.191872] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[    9.284045] Bad block table found at page 524224, version 0x01
[    9.354596] Bad block table found at page 524160, version 0x01
[    9.424746] nand_read_bbt: bad block at 0x0000042a0000
[    9.486296] nand_read_bbt: bad block at 0x000031700000
[    9.547787] nand_read_bbt: bad block at 0x000036420000
[    9.609277] nand_read_bbt: bad block at 0x0000368c0000

This runs fine on

root@ap370-1:~# uname -a
Linux ap370-1 5.10.80 #0 SMP Fri Nov 26 18:05:16 2021 ppc GNU/Linux

I'm guessing this is because mine is hardware NAND whereas yours is software NAND (just a guess!), specifically based on your new case for NAND_ECC_ENGINE_TYPE_ON_HOST.

Do you know how to create a .patch file to apply your changes that got your EdgeCore working again? If so, you would want to raise @chunkeey and we can see if we can get that patch submitted to the linuxppc-dev mailing list. OpenWrt would then backport it (keep your patch) until the upstream kernel has it in backports.