Upgrade from Barrier Breaker 14.07 to latest 21.02 (WD My Net N750)

Could unitialised clock be the reason for boot cycle? Periodic reboot.

This linked issue is related to a cron setting.

  • The poster noted that the device was working and soft booted multiple times before issues
    • All others noted something similar
  • In any case, it worked in version 19
  • In my case, I'm sure I don't have a bad cron config

I'm testing the latest OpenWRT Snapshot (r20029-3c06a344e9), running the 5.15.50 kernel via testing mode on a WD N750, but observed the same SquashFS errors upon initial boot after flash.

Update: Possible improvement! After a reboot, no errors and there have been no errors for past 7 days. No issues on second reboot after 7 day uptime either. In the past on OpenWRT 21 and 22 I would see a few SquashFS errors immediately upon reboot.

I built the current firmware image from source using the July 7th snapshot, selected testing kernel and running 5.15.50. I'll keep monitoring and will report back.

1 Like
  • I have one running 22.03.0-rc5 (no extra packages installed), no issues
  • I have an issue with another one. It continued to have issue saving configs, pre-installing more packages, etc. into the image with the firmware selector. I believe I revered to v19 and then accidentally sysupgraded remotely to 22.03.0-rc4 (with needed packages). So, I will update on this one later

After a month of testing I’m sad to report the SquashFS errors continue with my WD N750, even when running the 5.15 kernel.

I was running a custom images of 22.03.0 for about a month. Yesterday, I noticed I was getting ash I/O error typing commands and I/O errors attempting to scp to device A. In order to reboot device A, I had to execute 'busybox -reboot'

I had snmpd and Wireguard LuCI packages installed.
I've now flashed an image (without LuCI) to see if that helps

On a second device, I wanted to flash another 22 image with 'tcpdump' included. It merely reboots without flashing. Same extra packages as device A. I will have to go on-site to troubleshoot device B and install the non-LuCI image with snmpd, tcpdump and Wireguard-tools installed.

On a third upgraded from 21, it is 22 from the downloads site with only network and wifi configs (dumb AP). Still no issues with device C.

Device D was recently upgraded from 19 to 22 with relayd in the custom image (and LuCI included). No issues thus far.

FYI - I only recall seeing the errors if I installed software of edited a file. I see no errors in the log of device B.

Device B:

root@OpenWrt:~# dmesg | grep CRC
[   13.184693] jffs2: notice: (542) jffs2_get_inode_nodes: Node header CRC failed at 0x00b51c. {677a,ffff,00000044,a4ef223e}
[   13.222067] jffs2: notice: (428) jffs2_get_inode_nodes: Node header CRC failed at 0x00d05c. {6579,ffff,00000044,a4ef223e}
[   41.936318] jffs2: Node CRC 2bdb746e != calculated CRC 2191a72f for node at 0000bf34

EDIT- Device B was repaired by:

  • reset to defaults
  • flash official sysupgrade from Downloads Page (removed image with extra packages)
  • restore device's config archive
  • flashing an image that doesn't contain LuCI from the Firmware Selector (w/ snmpd, wireguard-tools and tcpdump pre-installed as needed)

I have N600s and N750s and have been struggling with these issues. I have some observations that will hopefully be of some use.

I suspect that snmpd writes /usr/lib/snmp/snmpd.conf repeatedly now (not in 19.07), so if there is an issue with writing to jffs2, it will eventually be triggered.

The spi-nor chip in the N750 is a MX25L12835e rather than a MX25L12805d (I opened two of mine and looked at the bottom of the circuit boards). The kernel believes that it sees a MX25L12805d. They are close but in particular the software write protection lock bits are interpreted differently, so if BP0-BP3 are used to lock regions, they will be inconsistent. The MX25L12835e has a minimum lock region of 128K rather than 64K. I'm not certain it matters, but if, for example, the last mtd partition is software locked, the final 64K of the firmware partition will also be inadvertently locked on a MX25L12835e (edited).

The MX25L12835e also will process RDSFDP, which the MX25L12805d does not. I didn't dig far enough in to the kernel source to see if SFDP is checked for our situation, but that may be a difference between 19.07 and current.

(To fill in, if this mystifies you, the handling of serial flash memory changed significantly between (I hope this is right) kernel versions 4 and 5, which is also the boundary between OpenWRT 19 and later, where the N600/N750 became unreliable)

I have built some kernels with modifications that attempt to turn off the software write protection entirely, setting these bits to zero early, and not processing attempts to lock anything. I've still had some issues, but these are my first tries doing openwrt builds at all, so it's too early to draw many conclusions.

I'm still working on this, but I wanted to get some of it on the record.

Hope this helps!

3 Likes

Here's an update: the software write protection bits were not the whole problem; after installing some fairly large packages, the filesystem errors started again.

I have now built a new kernel which omits the SECT_4K flag for the sp25l128[05d,35e] entry in
spi-nor/macronix.c, and this seems to improve things; sysupgrade works (at least on a device that was not yet showing filesystem errors) and I was able to install some large packages without developing errors. There was some mikrotik discussion that inspired trying that.

This is using 22.03.2, with everything standard except the tweaks in macronix.c (so kernel 5.10.146)

Have to wait a while to see if that actually fixes things, I guess.

2 Likes

@bradford Thanks for working on this.

I'm almost at the point of going down the extroot path and calling it a day.

2 Likes

Outstanding work @Bradford. Thanks for the thorough investigation and summary. I hope you keep up the testing and report back with your findings, as many of us continue to use the N600 and N750.

I've linked your post here to a Github Issues post that started in October 2021. I think the users there would benefit from knowing what you've discovered as well.

@bradford

Since it says ramips/mt7721, how is it related to the N750?

The original post in that thread is from a WD N750 owner. At some point a moderator arbitrarily changed the title. The reality is the issue seems to affect multiple platforms.

1 Like

FYI. SquashFS errors persist on WD N750 running a December 7 2022 SNAPSHOT version, standard configuration and default packages.

What happens if you remove luci?

If there is an established dev who is willing to take a look at this issue, I would gladly send you an n750 with serial port attached.

What issue (specific to the N750)?

The most recent posts seem to imply it's an issue on multiple platforms.

BTW, I'm running 22.03.3 - it sysupgraded without issues. I'm using an image made using the Firmware Selector, without LuCI.

The squashfs errors on the n750 which persist in the latest builds.

See my post:

No squashfs errors when using an image that isn't too big.

  • Try luci instead of luci-ssl?

Thanks, I dont use Luci of any sort. Command line config only.