OpenWrt Forum Archive

Topic: why nand_correct_data: uncorrectable ECC error

The content of this topic has been archived on 27 Mar 2018. There are no obvious gaps in this topic, but there may still be some posts missing at the end.

Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.411135] blk_update_request: I/O error, dev mtdblock0, sector 0
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.417506] __nand_correct_data: uncorrectable ECC error
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.422842] blk_update_request: I/O error, dev mtdblock0, sector 8
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.429200] __nand_correct_data: uncorrectable ECC error
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.434535] blk_update_request: I/O error, dev mtdblock0, sector 16
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.441072] __nand_correct_data: uncorrectable ECC error
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.446440] blk_update_request: I/O error, dev mtdblock0, sector 24
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.453297] __nand_correct_data: uncorrectable ECC error
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.458670] blk_update_request: I/O error, dev mtdblock0, sector 0
Tue Sep 22 19:34:40 2015 kern.err kernel: [21356.464882] Buffer I/O error on dev mtdblock0, logical block 0, async page read

Did you build and/or compile your firmware?

mazilo wrote:

Did you build and/or compile your firmware?

yes and i use the config in trunk

Run make kernel_menuconfig and if the Support software BCH ECC (see below) is selected, make sure to deselect it. Beware that make kernel_menuconfig may make some necessary changes to your target/linux/<platform>/config-<version> file. This may cause sebsequent upgrades to fail. See my post on how to cope with this issue.

  .config - Linux/arm 4.1.6 Kernel Configuration
 [...] ers > Memory Technology Device (MTD) support > NAND Device Support
  ┌─────────────────────── NAND Device Support ────────────────────────┐
  │  Arrow keys navigate the menu.  <Enter> selects submenus ---> (or  │  
  │  empty submenus ----).  Highlighted letters are hotkeys.  Pressing │  
  │  <Y> includes, <N> excludes, <M> modularizes features.  Press      │  
  │  <Esc><Esc> to exit, <?> for Help, </> for Search.  Legend: [*]    │  
  │ ┌────────────────────────────────────────────────────────────────┐ │  
  │ │    --- NAND Device Support                                     │ │  
  │ │    [ ]   Support software BCH ECC                              │ │  
  │ │    < >   Support Denali NAND controller                        │ │  
  │ │    < >   GPIO assisted NAND Flash driver                       │ │  
  │ │    < >   Ricoh xD card reader                                  │ │  
  │ │    < >   DiskOnChip 2000, Millennium and Millennium Plus (NAND │ │  
  │ │    < >   Support for DiskOnChip G4                             │ │  
  │ │    < >   NAND support for OLPC CAF�~I chip                     │ │  
  │ │    < >   Support for NAND Flash Simulator                      │ │  
  │ │    -*-   Support for generic platform NAND driver              │ │  
  │ └────v(+)────────────────────────────────────────────────────────┘ │  
  ├────────────────────────────────────────────────────────────────────┤  
  │      <Select>    < Exit >    < Help >    < Save >    < Load >      │  
  └────────────────────────────────────────────────────────────────────┘  
mazilo wrote:

Run make kernel_menuconfig and if the Support software BCH ECC (see below) is selected, make sure to deselect it. Beware that make kernel_menuconfig may make some necessary changes to your target/linux/<platform>/config-<version> file. This may cause sebsequent upgrades to fail. See my post on how to cope with this issue.

  .config - Linux/arm 4.1.6 Kernel Configuration
 [...] ers > Memory Technology Device (MTD) support > NAND Device Support
  ┌─────────────────────── NAND Device Support ────────────────────────┐
  │  Arrow keys navigate the menu.  <Enter> selects submenus ---> (or  │  
  │  empty submenus ----).  Highlighted letters are hotkeys.  Pressing │  
  │  <Y> includes, <N> excludes, <M> modularizes features.  Press      │  
  │  <Esc><Esc> to exit, <?> for Help, </> for Search.  Legend: [*]    │  
  │ ┌────────────────────────────────────────────────────────────────┐ │  
  │ │    --- NAND Device Support                                     │ │  
  │ │    [ ]   Support software BCH ECC                              │ │  
  │ │    < >   Support Denali NAND controller                        │ │  
  │ │    < >   GPIO assisted NAND Flash driver                       │ │  
  │ │    < >   Ricoh xD card reader                                  │ │  
  │ │    < >   DiskOnChip 2000, Millennium and Millennium Plus (NAND │ │  
  │ │    < >   Support for DiskOnChip G4                             │ │  
  │ │    < >   NAND support for OLPC CAF�~I chip                     │ │  
  │ │    < >   Support for NAND Flash Simulator                      │ │  
  │ │    -*-   Support for generic platform NAND driver              │ │  
  │ └────v(+)────────────────────────────────────────────────────────┘ │  
  ├────────────────────────────────────────────────────────────────────┤  
  │      <Select>    < Exit >    < Help >    < Save >    < Load >      │  
  └────────────────────────────────────────────────────────────────────┘  

after make kernel_menuconfig

love4taylor@ubuntu:~/openwrt/kirkwood/trunk$ make kernel_menuconfig 
make[1]: Entering directory `/home/love4taylor/openwrt/kirkwood/trunk/target/linux'
make[2]: Entering directory `/home/love4taylor/openwrt/kirkwood/trunk/target/linux/kirkwood'
rm -rf /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood
mkdir -p /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood
xzcat /home/love4taylor/openwrt/kirkwood/trunk/dl/linux-3.18.21.tar.xz | tar -C /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood -xf -
rm -rf /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/patches; mkdir -p /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/patches
cp -fpR "/home/love4taylor/openwrt/kirkwood/trunk/target/linux/generic/files"/. /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/
find /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/ -name \*.rej -or -name \*.orig | xargs -r rm -f
touch /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/.quilt_used
touch /home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/.prepared
if [ -s "/home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/patches/series" ]; then (cd "/home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21"; if quilt --quiltrc=- next >/dev/null 2>&1; then quilt --quiltrc=- push -a; else quilt --quiltrc=- top >/dev/null 2>&1; fi ); fi
make[2]: *** [/home/love4taylor/openwrt/kirkwood/trunk/build_dir/target-arm_xscale_uClibc-0.9.33.2_eabi/linux-kirkwood/linux-3.18.21/.quilt_checked] Error 1
make[2]: Leaving directory `/home/love4taylor/openwrt/kirkwood/trunk/target/linux/kirkwood'
make[1]: *** [menuconfig] Error 2
make[1]: Leaving directory `/home/love4taylor/openwrt/kirkwood/trunk/target/linux'
make: *** [kernel_menuconfig] Error 2
mazilo wrote:

Run make kernel_menuconfig and if the Support software BCH ECC (see below) is selected, make sure to deselect it. Beware that make kernel_menuconfig may make some necessary changes to your target/linux/<platform>/config-<version> file. This may cause sebsequent upgrades to fail. See my post on how to cope with this issue.

  .config - Linux/arm 4.1.6 Kernel Configuration
 [...] ers > Memory Technology Device (MTD) support > NAND Device Support
  ┌─────────────────────── NAND Device Support ────────────────────────┐
  │  Arrow keys navigate the menu.  <Enter> selects submenus ---> (or  │  
  │  empty submenus ----).  Highlighted letters are hotkeys.  Pressing │  
  │  <Y> includes, <N> excludes, <M> modularizes features.  Press      │  
  │  <Esc><Esc> to exit, <?> for Help, </> for Search.  Legend: [*]    │  
  │ ┌────────────────────────────────────────────────────────────────┐ │  
  │ │    --- NAND Device Support                                     │ │  
  │ │    [ ]   Support software BCH ECC                              │ │  
  │ │    < >   Support Denali NAND controller                        │ │  
  │ │    < >   GPIO assisted NAND Flash driver                       │ │  
  │ │    < >   Ricoh xD card reader                                  │ │  
  │ │    < >   DiskOnChip 2000, Millennium and Millennium Plus (NAND │ │  
  │ │    < >   Support for DiskOnChip G4                             │ │  
  │ │    < >   NAND support for OLPC CAF�~I chip                     │ │  
  │ │    < >   Support for NAND Flash Simulator                      │ │  
  │ │    -*-   Support for generic platform NAND driver              │ │  
  │ └────v(+)────────────────────────────────────────────────────────┘ │  
  ├────────────────────────────────────────────────────────────────────┤  
  │      <Select>    < Exit >    < Help >    < Save >    < Load >      │  
  └────────────────────────────────────────────────────────────────────┘  

Support software BCH ECC is not selected,so how to do next

What about the NAND ECC Smart Media byte order option (above the NAND Device Support as shown below? If it is also not enabled, then probably your Linux kernel detected some errors in the NAND space. BTW, every time when you reboot your device, does it always show the same error message with the same sectors?

 .config - Linux/arm 4.1.6 Kernel Configuration
 > Device Drivers > Memory Technology Device (MTD) support ───────────
  ┌──────────── Memory Technology Device (MTD) support ────────────┐
  │  Arrow keys navigate the menu.  <Enter> selects submenus --->  │  
  │  (or empty submenus ----).  Highlighted letters are hotkeys.   │  
  │  Pressing <Y> includes, <N> excludes, <M> modularizes          │  
  │  features.  Press <Esc><Esc> to exit, <?> for Help, </> for    │  
  │ ┌────^(-)────────────────────────────────────────────────────┐ │  
  │ │    [ ]   Retain master device when partitioned             │ │  
  │ │          RAM/ROM/Flash chip drivers  --->                  │ │  
  │ │          Mapping drivers for chip access  --->             │ │  
  │ │          Self-contained MTD device drivers  --->           │ │  
  │ │    [ ]   NAND ECC Smart Media byte order                   │ │  
  │ │    <*>   NAND Device Support  --->                         │ │  
  │ │    < >   OneNAND Device Support  ----                      │ │  
  │ │          LPDDR & LPDDR2 PCM memory drivers  --->           │ │  
  │ │    < >   SPI-NOR device support  ----                      │ │  
  │ │    <*>   Enable UBI - Unsorted block images  --->          │ │  
  │ └────────────────────────────────────────────────────────────┘ │  
  ├────────────────────────────────────────────────────────────────┤  
  │    <Select>    < Exit >    < Help >    < Save >    < Load >    │  
  └────────────────────────────────────────────────────────────────┘  
mazilo wrote:

What about the NAND ECC Smart Media byte order option (above the NAND Device Support as shown below? If it is also not enabled, then probably your Linux kernel detected some errors in the NAND space. BTW, every time when you reboot your device, does it always show the same error message with the same sectors?

 .config - Linux/arm 4.1.6 Kernel Configuration
 > Device Drivers > Memory Technology Device (MTD) support ───────────
  ┌──────────── Memory Technology Device (MTD) support ────────────┐
  │  Arrow keys navigate the menu.  <Enter> selects submenus --->  │  
  │  (or empty submenus ----).  Highlighted letters are hotkeys.   │  
  │  Pressing <Y> includes, <N> excludes, <M> modularizes          │  
  │  features.  Press <Esc><Esc> to exit, <?> for Help, </> for    │  
  │ ┌────^(-)────────────────────────────────────────────────────┐ │  
  │ │    [ ]   Retain master device when partitioned             │ │  
  │ │          RAM/ROM/Flash chip drivers  --->                  │ │  
  │ │          Mapping drivers for chip access  --->             │ │  
  │ │          Self-contained MTD device drivers  --->           │ │  
  │ │    [ ]   NAND ECC Smart Media byte order                   │ │  
  │ │    <*>   NAND Device Support  --->                         │ │  
  │ │    < >   OneNAND Device Support  ----                      │ │  
  │ │          LPDDR & LPDDR2 PCM memory drivers  --->           │ │  
  │ │    < >   SPI-NOR device support  ----                      │ │  
  │ │    <*>   Enable UBI - Unsorted block images  --->          │ │  
  │ └────────────────────────────────────────────────────────────┘ │  
  ├────────────────────────────────────────────────────────────────┤  
  │    <Select>    < Exit >    < Help >    < Save >    < Load >    │  
  └────────────────────────────────────────────────────────────────┘  

NAND ECC Smart Media byte order is not enabled
it always show the same error message when i reboot

i scan NAND,but there is no BadBlock

Sorry, I have no further idea how to troubleshoot this error message. The only thing I can tell is if I enabled either ECC options (don't remember which one) in the kernel, my Seagate DockStar spewed out those error messages. I was able to fix and remove those error messages by removing either ECC options in the kernel, IIRC.

hey there,

I'm having the same errors on a linksys ea4500. have you managed to get rid of them?

I did some more digging. It's quite strange.

1.if I flash a smaller image, built from CC without any packages added, I get no nand errors, but I get a different UBIFS error:

[    1.139469] UBIFS error (pid 1): init_constants_early: too few LEBs (14), min. is 17

2.if I add some extra packages into the image ( like block-mount, openvpn, nfs-client, uhttpd, some luci apps, ddns, fs-ext4, kmod-usb3, nuttcp, kmod-tun, usbutils, usbreset, procps, rsync, ss, tcpdump ), the image file grows, obviously, and this lead to nand errors.

I'm attaching a diff between the 2 bootlogs, to see the differences, maybe someone more knowledgeable can figure it out.

https://img42.com/tfuX4
https://img42.com/JbclU

if someone need bootlogs, I can provide them.

(Last edited by nroberto13 on 17 Mar 2016, 22:51)

ok, some more info. I built a bigger image, just exceed the 17 LEBs on the rootfs volume ( it's using 18 now ), and both  errors are gone :

1. UBIFS error (pid 1): init_constants_early: too few LEBs (14), min. is 17

2. the nand errors __nand_correct_data: uncorrectable ECC error

this would make me think, that the nand erros are coming only when I exceed a specific size with the rootfs partition, what leads me to the conclusion, that it could be a nand defect. BUT, the fact that this router has 2 environments ( 2 sets of kernle/rootfs partitions ) and same errors are coming regardless of the environment used, would mean, that the nand has defects in 2 places at the same offset relative to the first and second environments. Therefore I'd exclude the nand defects, because it would be quite strange to have the same sectors in both environments defect.

an input from a dev would be highly appreciated. thanks.

I was customizing an image for my ea4500 router, and noticed, the nand errors are coming only when I includ block-mount ( and it's dependencies ) in the image... any ideas ?

block-mount scans all mtd partitions on boot. The bootloader partiton seems to be sometimes written without (correct) ECC, likely because the first block is usually guaranteed to be good (and won't go bad), and the SoC won't do ECC checks anyway. So when block-mount tries to read the partition, the NAND driver thinks the blocks have wrong ECC.

You can easily trigger the errors just by doing

dd if=/dev/mtd0 of=/dev/null

.

These errors can be seen on other targets with NAND as well, at least on ipq806x I have seen them.

The discussion might have continued from here.