Commit 22d982 ramips: add support for switching between 3-byte and 4-byte addressing on w25q256 flash

Ok, I will report this soft reset hang bug and with this patch on tracker, instead of submit a patch pull request to LEDE github.
Because I do not want to leave my real name with LEDE github 'Signed-off-by' digital finger print. I am living in border of China and the reason of GFW policy.

It was rejected already before:

Instead of sending the same patch again, please report the problem and give as much information about your hardware as you can.

I did consider the simpler patch of resetting the chip before a soft reboot, but as pepe2k already pointed out, this does not work if the watchdog resets the board. My patch is designed to leave the chip in 3-address mode as long as possible in order to minimize the chance of a permanent hang after the watchdog fires.
Yes, we can contact the manufacturer to point out the hardware design flaw, but that won't change the fact that there is hardware with this flaw out there and we have to deal with that.
I guess we can also look into extending my workaround for other flash chips, e.g. MX25L25645G

As I said, most watchdogs will trigger a Hard Rest, just like power up reset, we do not need fix that. If kernel crash, we can do nothing, and we have no way fix anything.
But at lease we need make the normal flow working well, type: reboot, system need reset and let boot loader get control.

Yes I find that, but this patch only for kernel 4.4.x, and I write a patch for kernel 4.9.x.

cat target/linux/ramips/patches-4.9/0911-spi_nor_32MB_soft_reset_hang_fix.patch
--- a/drivers/mtd/devices/m25p80.c
+++ b/drivers/mtd/devices/m25p80.c
@@ -257,6 +257,16 @@
        return mtd_device_unregister(&flash->spi_nor.mtd);
 }

+static void m25p_shutdown(struct spi_device *spi)
+{
+       struct m25p     *flash = spi_get_drvdata(spi);
+
+       flash->command[0] = 0x66;
+       spi_write(flash->spi, flash->command, 1);
+       flash->command[0] = 0x99;
+       spi_write(flash->spi, flash->command, 1);
+}
+
 /*
  * Do NOT add to this array without reading the following:
  *
@@ -327,6 +337,7 @@
        .id_table       = m25p_ids,
        .probe  = m25p_probe,
        .remove = m25p_remove,
+       .shutdown = m25p_shutdown,

        /* REVISIT: many of these chips have deep power-down modes, which
         * should clearly be entered on suspend() to minimize power use.

On the board that I developed this patch for (ZBT WG3526), a watchdog-issued hard reset does not reset the flash chip. They probably forgot to connect the flash chip reset line to the system reset line.

Just file a bug.
https://bugs.lede-project.org/index.php?do=details&task_id=1165

I understand know. And I think this patch will not conflict with your solution, it will let more devices fix this bug.

One potential issue with your patch is that it will make it harder to figure out which devices need fixing for watchdog resets as well. The way it is now, it is obvious by testing software reset, where additional workarounds are needed. In those cases it probably makes sense to reuse the code from my workaround which switches between 3-byte and 4-byte mode.

Well, first make it work, then make it better.

I find that the only way make spi flash chip reset is cut off power supply, because there is no pin to reset it.
And the soft rest just the patch: send cmd 0x66 0x99.
So watch dog reset issue for 32MB spi flash chip is no solution to mt7620/mt7621, of cause your patch only for W25Q256 can minimize the chance of hangs.
Lets forget about watch dog, fix the normal reboot soft reset first.

If we deal with affected flash chips the way I did with my workaround patch, it takes care of both normal reboot and the watchdog issue. If we apply the generic reset, we won't know which devices still need fixing for the watchdog issue, because nobody will test that anymore.

Fix soft reset bug will not effect with watch dog issue, that is different conditions. after soft reset fixed, router still hangs, It should caused by watch dog hard reset, or other kernel crash.

Of course we need to fix this.
Everybody expects that their hardware will recover (boot again) after a crash and/or watchdog reset. Unfortunately, your patch/approach doesn't cover this situation at all.

Actually, the approach proposed by Felix limits (as I said before, there is no way to fully fix broken hardware design in software) possibility of hang after soft and hardware reset (including the one caused by watchdog) because flash chip is switched to 4-byte address mode only when needed and then switched back. With your patch, the flash is switched to 3-byte mode only at a proper reboot - whatever happens before is not handled at all. This was discussed many times before and rejected, also because of the reason mentioned above.

This is not true. Both of the flash chips you mentioned (W25Q256FV in SOIC16 and MX25L25645G in SOIC8 packages) have a dedicated hardware reset input pins (look into datasheets) and the board manufacturer should have connected them to the SOC reset output.

If we don't start pointing out that mistakes, we will have to deal with such design flaws over and over again in future. I just got some MT762x and QCA95xx based boards, from different manufacturers, with the same mistake. And I'm pretty sure it's not that they forgot - they just don't understand the issue, datasheets and what happens in code. They just do copy&paste chip vendor reference design. And as the 32M SPI NOR chips are getting less expensive, cheap hardware manufacturers just put them to the same board they used with 4-16M before, without taking care of consequences. And now we have to deal with their mistakes and complaining users of their broken hardware :slight_smile:

If I'm right, the MX25L25645G should have support for 4-byte set of commands ("8-1. 256Mb Address Protocol" in datasheet), so this chip probably doesn't need any workaround as Winbond one.

Yes, it does look like it has proper 4-byte address commands. One thing we need to be careful about is the fact that it has an Extended Address Register. The W25Q256FV chip has one as well and all read/write/erase commands overwrite it. If MX25L25645G has the same behavior, then EAR needs to be cleared after every command touching the >16M region.

As you said, to deal with soft reset, my patch is the proper way. Although it not have any function to handle watchdog reset. Apply this patch have any harmful? no! It provide a proper way without side effect for all 32MB SPI NOR flashrom.
Felix make W25Q256FV can handle high percent of opportunity recover from unexpected system failure, I agree, but there is no conflict between two patches.

LEDE seems do not recognized MX25L256645G, in kernel log i paste in bug file, it shows as mx25l25635e, is there any way get the correct flash part ID?

While browsing through QSDK quite some time ago I've found that they have faced probably similar issue. Here are commits that I've managed to find at 1st glance, maybe it can be helpful somehow. Seems like one of them is the solution close to what @ffee proposes

https://source.codeaurora.org/quic/qsdk/oss/kernel/linux-msm/commit/?h=eggplant&id=ab86d30aab517334834b3bb837d696cb0c661233

https://source.codeaurora.org/quic/qsdk/oss/kernel/linux-msm/commit/?h=eggplant&id=ecb93aa88f8b378467a1c5e4c64ebed0fb938f64

@nbd To extend your patch to MX25L256645G, I add SECT_4K | SPI_NOR_4B_READ_OP flag to spi_nor_ids struct in spi-nor.c. When testing soft reset bug, it was fixed.
But I do not find a way get MX25L25635E and my MX25L25645G distinguished. Those chips have the same JEDEC id: c2, 20, 19.

--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -1070,7 +1070,7 @@
        { "mx25u6435f",  INFO(0xc22537, 0, 64 * 1024, 128, SECT_4K) },
        { "mx25l12805d", INFO(0xc22018, 0, 64 * 1024, 256, 0) },
        { "mx25l12855e", INFO(0xc22618, 0, 64 * 1024, 256, 0) },
-       { "mx25l25635e", INFO(0xc22019, 0, 64 * 1024, 512, 0) },
+       { "mx25l25635e", INFO(0xc22019, 0, 64 * 1024, 512, SECT_4K | SPI_NOR_4B_READ_OP) },
        { "mx25u25635f", INFO(0xc22539, 0, 64 * 1024, 512, SECT_4K | SPI_NOR_4B_OPCODES) },
        { "mx25l25655e", INFO(0xc22619, 0, 64 * 1024, 512, 0) },
        { "mx66l51235l", INFO(0xc2201a, 0, 64 * 1024, 1024, SPI_NOR_QUAD_READ) },
1 Like

as i thought, i have to replace the 32MB flash chip to 16MB flash chip, i tested again current trunk:
from 2018-01-19

tested: WG3526 (MT7621) 32MB => soft reset => OK, working good :slight_smile:
cu camel

1 Like