Unfortunately this looks like the bootloader got corrupted and can't be loaded. Probably a bit flipped on the NAND chip...
The only ways to recovery from this is by using JTAG to re-write the bootloader. JTAG pads are exposed and can be accessed on the board, you can then use OpenOCD to load U-Boot into RAM and then use that to re-run the installer which will re-write everything.
Yeah I figured it was something like that.
I've not done something like that before but can't be too bad. It looks like I need a Jtag adapter, all the hardware bits I can probably figure ot, but I have done some searching and am not sure how to compile U Boot for this specific machine? do i need someone to pull down a working version for this machine or is it simpler than that?
I found this as an example but will need to do some more reading.
In my case, sometimes my 5GHz radio goes crazy and I have to either restart it or reboot the device. I guess that's because I use 160MHz channels, but IDK. I'll try 80 and see how it goes.
It looks like the bootloader can no longer be read from the flash. Could be a bit-flip on the NAND, after all that Fidelix FM35Q1GA is a rather low-end cheap flash chip, so that happening to some of the devices after some (longer) time of use is not too surprising.
If you have the option to do so, try connecting the JTAG pads and re-flash the bootloader by booting U-Boot into RAM using OpenOCD.
Before that you could dump what is in the flash right now (incl. OOB data) and we should find out what has happened (or at least what the damage looks like).
I was thinking of trying to repair mine this weekend finally, will have a little free time. Am going to fire up the raspberry pi and try that method for now.
As far as dumping what is in the flash right now, is that something I can do as part of re-flashing through the jtag pads? If so I can try to do that as well and see what I can get
I haven't had any hardware issues—at least not any visible signs of one that I'm aware of. But something interesting did happen to my unit once: when I rebooted it over LuCi, it seemingly powered off. No LED or any indications of any kind. I always did wonder if Linux on embedded devices like these could be powered off, and turns out it can, though not by intention. I powered it back on by flicking the device's power switch to off and on again. Nothing seems wrong with it since that little episode.
This happens because the default min clock speed is too low and the device fails to boot up. You need to set a min speed of 600MHz in the /etc/rc.local file.
Oh? How did you find out? Also, this happened only once, so does that mean it automatically tries to operate at higher clock, and only this time had to run at min clock for whatever reason?
This only set the min frequency the CPU runs at. It doesn't mean the CPU won't go faster. It sets how slow it can go. It's actually related on how the voltage of the SoC operates at such low speed that prevents the system to booting up properly. By setting it at 600MHz, the voltage of the SoC can be enough to boot up properly.
I was fairly sure the scaling_min_freq issue was mentioned on the toh wiki page, however it doesn't seem to be there anymore and I can't find it in the history either. I must be misremembering. Should it be added near the governor section? The only other way to know about it is to read through all messages in this thread.
I thought about this a little more and there is another possibility apart from flipped bit(s) on the flash: The RAM being broken. And now that this cpufreq issue came up again it makes me think: what happens if the device gets stuck in trying to calibrate DDR RAM at to low voltage for many hours or days? Can that break the RAM? Maybe we should have listened to MediaTek engineers who very clearly and repeatedly stated that they recommend running MT7622 only at full speed and only ever did QA in that way?
If RAM would be broken, that should likely manifest also in other kind of failures, I would guess. It is a bit hard to imagine a failure that would specifically affect just the bootloader.
But running at full speed with "performance" might still be preferred for several reasons:
some QoS qdiscs are vulnerable to changing CPU speeds, so a stable CPU speed might help with calculating, especially with busrty traffic that causes CPU speed to vary.
Despite my intuition telling me that additional thermal cycling associating with scaling up and down with varying loads would be bad for degradation (compared with more constant, albeit higher temperature), I had understood that maintaining lower temperatures overall might help increase longevity:
Any thoughts?
@mikewagnercmp and @NodeNovelty in respect of your failed devices were you both using 'ondemand' and how regularly were you rebooting your devices? Any special, relevant considerations like proximity to magnets and ambient temperature? Have either of you been able to fix?