I'm testing the latest OpenWRT Snapshot (r20029-3c06a344e9), running the 5.15.50 kernel via testing mode on a WD N750, but observed the same SquashFS errors upon initial boot after flash.
Update: Possible improvement! After a reboot, no errors and there have been no errors for past 7 days. No issues on second reboot after 7 day uptime either. In the past on OpenWRT 21 and 22 I would see a few SquashFS errors immediately upon reboot.
I built the current firmware image from source using the July 7th snapshot, selected testing kernel and running 5.15.50. I'll keep monitoring and will report back.
I have one running 22.03.0-rc5 (no extra packages installed), no issues
I have an issue with another one. It continued to have issue saving configs, pre-installing more packages, etc. into the image with the firmware selector. I believe I revered to v19 and then accidentally sysupgraded remotely to 22.03.0-rc4 (with needed packages). So, I will update on this one later
I was running a custom images of 22.03.0 for about a month. Yesterday, I noticed I was getting ash I/O error typing commands and I/O errors attempting to scp to device A. In order to reboot device A, I had to execute 'busybox -reboot'
I had snmpd and Wireguard LuCI packages installed.
I've now flashed an image (without LuCI) to see if that helps
On a second device, I wanted to flash another 22 image with 'tcpdump' included. It merely reboots without flashing. Same extra packages as device A. I will have to go on-site to troubleshoot device B and install the non-LuCI image with snmpd, tcpdump and Wireguard-tools installed.
On a third upgraded from 21, it is 22 from the downloads site with only network and wifi configs (dumb AP). Still no issues with device C.
Device D was recently upgraded from 19 to 22 with relayd in the custom image (and LuCI included). No issues thus far.
FYI - I only recall seeing the errors if I installed software of edited a file. I see no errors in the log of device B.
I have N600s and N750s and have been struggling with these issues. I have some observations that will hopefully be of some use.
I suspect that snmpd writes /usr/lib/snmp/snmpd.conf repeatedly now (not in 19.07), so if there is an issue with writing to jffs2, it will eventually be triggered.
The spi-nor chip in the N750 is a MX25L12835e rather than a MX25L12805d (I opened two of mine and looked at the bottom of the circuit boards). The kernel believes that it sees a MX25L12805d. They are close but in particular the software write protection lock bits are interpreted differently, so if BP0-BP3 are used to lock regions, they will be inconsistent. The MX25L12835e has a minimum lock region of 128K rather than 64K. I'm not certain it matters, but if, for example, the last mtd partition is software locked, the final 64K of the firmware partition will also be inadvertently locked on a MX25L12835e (edited).
The MX25L12835e also will process RDSFDP, which the MX25L12805d does not. I didn't dig far enough in to the kernel source to see if SFDP is checked for our situation, but that may be a difference between 19.07 and current.
(To fill in, if this mystifies you, the handling of serial flash memory changed significantly between (I hope this is right) kernel versions 4 and 5, which is also the boundary between OpenWRT 19 and later, where the N600/N750 became unreliable)
I have built some kernels with modifications that attempt to turn off the software write protection entirely, setting these bits to zero early, and not processing attempts to lock anything. I've still had some issues, but these are my first tries doing openwrt builds at all, so it's too early to draw many conclusions.
I'm still working on this, but I wanted to get some of it on the record.
Here's an update: the software write protection bits were not the whole problem; after installing some fairly large packages, the filesystem errors started again.
I have now built a new kernel which omits the SECT_4K flag for the sp25l128[05d,35e] entry in
spi-nor/macronix.c, and this seems to improve things; sysupgrade works (at least on a device that was not yet showing filesystem errors) and I was able to install some large packages without developing errors. There was some mikrotik discussion that inspired trying that.
This is using 22.03.2, with everything standard except the tweaks in macronix.c (so kernel 5.10.146)
Have to wait a while to see if that actually fixes things, I guess.
Outstanding work @Bradford. Thanks for the thorough investigation and summary. I hope you keep up the testing and report back with your findings, as many of us continue to use the N600 and N750.
I've linked your post here to a Github Issues post that started in October 2021. I think the users there would benefit from knowing what you've discovered as well.
The original post in that thread is from a WD N750 owner. At some point a moderator arbitrarily changed the title. The reality is the issue seems to affect multiple platforms.