Belkin RT3200/Linksys E8450 WiFi AX discussion

Reading about OKD issue has hold me back for upgrading 2 routers I have. They have run stable now for long time providing both the 2.4 and 5g network. I do not have anything special installed on these devices. I remember doing the ubifs installation to change the partition table to both of these devices at that time.

Device 1

OpenWrt 22.03-SNAPSHOT, r19385-f765f2f114
grep Built /dev/mtd0ro
Built : 19:55:52, May 6 2022

Device 2

OpenWrt 22.03-SNAPSHOT, r19609-0179ba7851
grep Built /dev/mtd0ro
Built : 19:55:52, May 6 2022

I have couple of questions:

  1. So if I have understood correctly, I should be able to update these to 2023.05 versions by downloading a single image and using the upload button?

  2. Has OKD been seen with the 2023.05 versions?

  3. Has anybody who has had OKD been able to read the failing boot partition back and do the hex-dump diff between data after and before the OKD on that partition? I mean is it the corruption always starting from the same location in that partition?

  4. I am wondering could the recovery be to have that failing partition being duplicated and also having a checksum of bytes written in the end. During the boot, the checksum could be calculated by reading the bytes and if not matching, the backup version of that partition would be copied there before booting into it.

I have also one unboxed spare belkin router that I could flash for testing purposes. Would it be more useful for me to flash the 23.05 version to it or the latest snapshots?

I am running my own snapshot build from a git pull from early Feb 2024 and I have yet to encounter OKD myself for 3 of my E8450 routers. My routers do not experience abrupt power-off scenarios all that often so that may be a key reason. Also my setup basically leaves the flash/UBI storage alone when in operations as I do not store operational data/logs to the flash chip. Mostly to the tmpfs in RAM. I'm quite paranoid about wearing out my router's flash storage ... heh heh.

All my routers are running installer v1.0.2 or earlier.

At the moment, no concrete findings of root cause of the OKD, but lots of theories from insufficient flash chip driving currents, bad flash blocks, to bugs in flash chip drivers in bootloader(possibly later versions) and/or openwrt.

If your spare router is expendable, don't mind the headache with OKD and like using bleeding edge software, maybe try the latest installer with snapshot builds and add more data points to the OKD if and when it happens. Just make sure you have all the factory data backed up in case you need to recover them.

On the plus side, you get better mt7915e driver as snapshot builds from early Feb 2024 allows the stable use of WED. WED should not be used with openwrt 23.05 or earlier releases.

For the router that currently has OpenWrt installed just update it to 23.05. Don’t update the uboot firmware.

If you wish to run snapshots then you’ll be forced to update the uboot firmware and you won’t be able to go back to stable very easily. There have been many OKD reports with the beta uboot firmware.

For the uboxed router, if it were me, I would install v1.02 of the uboot firmware and then install OpenWrt 23.05.x

I still don’t think we’ve seen an OKD reported by someone running <=v1.02 which why I wouldn’t use any newer version.

1 Like

My router encountered OKD issues in September 2023, using v1.0.2, v1.0.3 had not yet been released.

I think the most likely reason is that I upgraded from v1.0.0 to v1.0.1, and then to v1.0.2.

When upgrading bootloader and u-boot, the data written may be flipped, resulting in the final OKD.

In other words, the data written through these versions of u-boot is not very stable and is prone to unpredictable errors.

Restated, your theory is that OKD risk is proportional to the number of times the uboot firmware is updated, not the number of times a new sysupgrade image is loaded? As someone who cannot resist loading the latest sysupgrade every month or two, I like this theory!

I have set up two of these. One is running as a dumb AP on the latest snapshot and I feel like I've been rolling the dice on every sysupgrade. Like a moth drawn to the flame LOL...

The second has been running as an all-in-one gateway untouched since loading snapshot circa the initial 23.05 release with the latest uboot update then available. It was a gift to someone known to make slanderous claims that her father had mixed success constantly fiddling with the internet before they left the nest. Consequently, she subscribes to the theory "It works fine now. Please don't touch it!" She and quarky would probably get along :wink:

Both have minimum clock bumped to 600 MHz. No OKD yet. Fingers crossed.

1 Like

One of my router has been upgraded twice with the installer. I forgot the reason why it was upgraded again, but the other 2 E8450 of mine was only flashed with the installer once, and as I've said previously, all are either on v1.0.2 installer or earlier.

So far all 3 are happily churning away ... fingers crossed.

Apologies if this is unhelpful or asked before, but would I was just spitballing and thought I'd throw out ideas (likely not helpful...) that I had.

A: Boot from external?
it be at all possible to configure these things to run off of a SD card or USB drive instead of the internal firmware?

B: Hardware modify to boot from external
Heck, even if it meant soldering something to the board to force them to try to boot from external memory device first?

Or, is the issue that the error we all experience occurs so early in the bootup process that any custom configurations that tell things to "boot from external" would be farther down the chain?

C: Any chance related to "backup" behavior?
And, I was kind of wondering too ... is there ANY chance the OKD behavior might be related to the fact that these things have an "on fail, load backup firmware" behavior that the UBOOT setup eliminates? Like, could it be possible that some low level "thing" is looking at an archaic checksum and says "dude, that is messed up, boot from the backup" and then -foom- since there's no backup, something goes into a "no go" condition?

If you liked these, subscribe for more unhelpful ideas of mine in the future! :slight_smile:

1 Like

The version of U-Boot provided with OpenWRT contains the code and support to enable loading a file from a USB flash drive. Therefore, you could most likely configure the router to load and then boot from a file stored on a USB drive.

Unless you're masochistic and ready to dive deep into 'There Be Hungry Sharks Here' territory, you don't want to go there. It might be possible, but it's really not worth the effort unless your goal is to learn the hard way.

Sadly, that's exactly the problem. The easy and normal place to configure the file from which to boot is done in U-Boot, and the error occurs before U-Boot can be loaded.

I think it unlikely based on the order of operations. Despite the differences in storage and the age of the code, the boot chain stages between stock and OpenWRT are effectively using the same software. The stock firmware had separate BL2 (preloader), ATF-A (ATF), and U-Boot (Bootloader) partitions, but it still had to continue to the same point in the chain (U-Boot) before it could switch between primary and backup firmware images. The difference between the two is that OpenWRT makes use of newer versions of the same boot packages, and it also simplifies the process by combining the preloader and ATF-A into one BL2 partition.

On an OKD-affected device, we still succeed in loading up through BL2+ATF-A. However, it fails in trying to find the fip at its expected location. Since the fip contains U-Boot and it hasn't loaded when OKD presents, the Primary/Secondary boot switch would not have any effect.

To be fair, the UBI installers do already take advantage of the backup boot image idea, and in a reliable manner. If the main firmware image is corrupt and can't boot, the device will fall automatically back into the recovery environment.

Just adding to the conversation (and a few tips/tricks in case someone else is in the same boat and having trouble, here's background and a step-by-step of what seems to have worked for me to get the router booting again). I have two Belkin RT3200s that I put into service only about 4 months ago after doing a lot of research and seeing good things about stability/reliability on openWRT. I'm using the first as my router and providing wifi, and a second as a wired AP. (So this is distressing to know that I might just be a reboot away from a brick again.)

It appears that I had used the 1.0.3 installer, and I was running 25.05.0 and had not done any further flashes or upgrades, and the router had been running stable with uptime in the months.

On Sunday I was hit by the OKD. We had a power outage that day and I have my main RT3200 on a battery backup, so our internet was not interrupted during the 2-hour outage, but later I rebooted via the web while troubleshooting another issue with my VPN and the wifi/network never came back. I had no lights or sign of life upon pulling the power cord and flipping the on-off switch.

Stupidly, I had not backed up the factory or other partitions from the main device yet (yeah, no excuse).

I have downloaded the four mtd0-3 from the LuCi web interface, and scp'd the boot_backup OEM files from my 2nd working router now using the following method, and the files appear to be the correct sizes referenced above- I had seen references to using DD to image the partitions and backup but this method referenced on the dangowrt github under the backup stock vendor/bootchain section seemed easier and perhaps guide should be added to the wiki for this model?

#this is for my router which used the 1.0.3 installer and has UBI but not the newest 1.1.x changes
#from your computer, ssh into the router (terminal, putty)
ssh root@192.168.1.1

#make a directory and mount the boot_backup partition while SSH'd into the router
mkdir /tmp/boot_backup
mount -t ubifs ubi0:boot_backup /tmp/boot_backup

#use powershell to copy files from that folder to your pc in scp mode
scp -r root@192.168.1.1:/tmp/boot_backup C:\Temp\boot_backup

I tried cooling my OKD'd router in the refrigerator for an hour, and it did not boot successfully on its own (still no LEDs lighting up on the front). My 3.3v USB-serial cable arrived today, and was able to get the mtk_uartboot to boot the router up successfully (and the good news is that I was able to download all of the mtd0-mtd3 from LuCi, and I was also able to SSH in and save the boot_backup versions as well, so now both of my routers have current and original mtd backups!)

But even after trying multiple times, and it seems that when the mtk_uartboot command finishes running and kicks me over into putty.exe, the router boot has already happened, and I couldn't interrupt the process to get to the Uboot menu and rewrite the FIP.

The thing that worked for me was to control-C the mtk_uartboot shell when the following line showed:

FIP sent.

And switch to another command window where I had pre-pasted the 2nd part of the command and quickly hit return to run it:

putty.exe -serial COM3 -sercfg 115200,8,n,1,N

This dumped me into the serial shell with just 1-2 seconds left on the boot count-down and I was able to arrow down to the Uboot and run the command to re-write my FIP:

#in putty, since I did not run 1.1.1 UBI installer and was on 23.05.0, run command to re-copy the FIP
mtd read fip $loadaddr 0x0 0x140000 && mtd write fip $loadaddr 0x0 0x140000

So far, the router has cold-booted two more times successfully and I placed it back into service.

Here's hoping we get a more permanent fix soon.

I upgrade my Belkin RT3200 from 22.03.0 to 23.05.3 and luci didn't work. I saw that there were luci* packages that I had to upgrade but something I didn't expect is that when I reboot the router it doesn't come back, I have to switch the power button for it to come back. Has it happened to anyone else?

How do you identify the version of the installer that was used?

I did it from the firmware selector searching for Belkin RT3200 UBI

There was this post in the thread that suggests a possible way.

1 Like

My Belkin RT3200 upgraded from 22.03.0 to 23.05.3 when I do a reboot, the router it doesn't come back, I have to switch the power button for it to come back

this is the installer version:

# grep Built /dev/mtd0ro
Built : 12:38:48, Jan 27 2022
Built : 12:38:48, Jan 27 2022
Built : 12:38:48, Jan 27 2022
Built : 12:38:48, Jan 27 2022

I think I have explained myself wrong. It is not exactly an OKD, it simply does not restart when I restart, I have to turn it off and on again with the button and it starts normally.

That would suggest I used version 1.0.0
No problems with OKD, running 23.05.3
Power outages are extremely rare here, so I only ‘hot reboot’ the router maybe once every 2 months.

The same happened to me. I used the 1.0.2 installer when I bought my RT3200 and was on 22.03, and then when updating to 23.05.x version, I had problems when issuing a reboot (from either Luci or SSH reboot command).

The solution that works for me is to set the minimum frequency of the CPU to be 600MHz instead of the default 437MHz. Other people use the performance governor that sets the CPU speed fixed to 1.3GHz (I think). Anyway, here's the command I have on /etc/rc.local (can be edited in Luci on System > Startup > Local Startup:

echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq

It can probably work for you. I left the default governor (I think is ondemand) and the router reboots just fine.

Cheers!

Those dates suggest you installed using installer 0.6.2. OKD issue so far only (mostly?) seems to affect installer 1.0.3.

Another data point to add. Router had been up for 26 days, rebooted and got OKD today. Recovered immediately via the wiki's mtd read/write fip steps.

N.B. Running 1.0.3 and 25.05.3 firmware.

This has worked. Thank you so much
Greetings