Belkin RT3200/Linksys E8450 WiFi AX discussion

That pretty much exactly mimics the history of my OKD'd RT3200. Glad you have one in your hands.

2 Likes

Well, you know how it goes. One is a statistical anomaly. Two is a coincidence. The number of instances we've seen here and all of the common data points have finally narrowed this down to a purely software issue. I think the only reason it took so long is because we had so many other issues and potential causes to rule out, and every single one of them results in the same appearance and error (which also happens to be the first indication most people will ever see of an issue).

I'm glad that I finally have one in my hands to probe and test against, but who would have ever thought I would be happy to find a hard-earned USD100$ turn into a brick by sitting on a shelf? :laughing:

3 Likes

Massive kudos and thanks for trying all these things out and helping move this needle that hasn't moved for many months.

What's left? You've ruled out a few different things now, right?

1 Like

What's left is a good part of if not everything else that changed between TF-A 2.4 and TF-A 2.9. Sadly, that's a lot of changes.

1 Like

I know what you mean as I debated whether to say "glad" in my prior post. As always, I'm still happy to run test code on mine if you need a 2nd trial unit.

1 Like

Thank you. As long as you're willing and it's not needed for other purposes, we would be better off testing against multiple devices experiencing the issue. There are still a lot of potential causes and a lot of unknowns, so there are still chances that a patch that works on one might not work on another.

1 Like

Assuming worst case that a resolution isn’t discovered and tested prior to the 24.x freeze - Is sticking with TF-A 2.4, moving more stuff to UBI and maybe increasing reserved space for badblocks an option?

Is it imperative that this now discontinued device with it’s quirky NAND chip move to a newer TF-A?

Up until now I had mostly decided that without a resolution I would remain on 23.05.x indefinitely (it’s now just a dumb AP so not too concerned about security updates).

The great work done here over the last couple of days gives me new hope.

1 Like

We'd have a very hard time doing that, since TF-A 2.4 doesn't support FIP in UBI. Therefore, we would lose the main reason for making the change while simultaneously forcing users to go through extra steps to update.

4 Likes

If we'll find the root cause, is it possible to fix it in TF-A?

1 Like

We should be able to do so, yes. Since the router is not actually running with signed binaries, there is nothing preventing us from simply adding a local (to OpenWRT) patch if the fix hasn't been accepted upstream by the time we hit the change freeze for 24.x.

4 Likes

Not wanting to add an unnecessary distraction here -- But is the SPI interface using DMA transfers? If so, I would look into the programming of the DMA channel, speed of SPI and any DRAM controller settings. Are there diffs in these areas between versions. SPI/DMA timing errors can be totally transparent and indeed look like memory corruption.

Sorry, just my 2cents.

4 Likes

Yeah, now I’m trying to compare how both TF-A 2.4 differs from 2.9 in platform setup.

3 Likes

This scenario is exactly how I ended up with OKD - I flashed from stock via installer 1.0.3, got it configured the way I liked and then put it away for a few weeks until I was ready to use it - it exhibited OKD upon my first attempt to fire it up. This was before I was aware of OKD so it came very close to being declared dead and getting tossed.

Thanks to you and everyone else putting in so much effort to get to the bottom of this!

2 Likes

Please try loading the debug build I made using mtk_uartboot and dump the output:

D:\RT3200>mtk_uartboot.exe -a -s COM3 -p bl2.bin && putty.exe -serial COM3 -sercfg 115200,8,n,1,N
mtk_uartboot - 0.1.1
Using serial port: COM3
Handshake...
hw code: 0x7622
hw sub code: 0x8a00
hw ver: 0xcb00
sw ver: 0x100
Baud rate set to 460800
sending payload to 0x201000...
Checksum: 0xf3ee
Setting baudrate back to 115200
Jumping to 0x201000 in aarch64...

I didn't get anything as an output from putty.

EDIT: I did a quick sanity check and the trimmed bl2 which has TF-A 2.4 did still boot, so it's not a hardware setup issue (I moved my debugging setup from work to home for the weekend).

Sadly, when using Windows, this is the expected result as puTTY will not start fast enough and by the time it is ready the show is already over...

1 Like

I'll switch to linux and see if I see a difference.

EDIT: Here's the result in Linux

./mtk_uartboot -a -s /dev/ttyUSB0 -p bl2.bin && screen -L -h 100000 /dev/ttyUSB0 115200
NOTICE:  WDT: Cold boot
NOTICE:  WDT: disabled
detected page layout 2048+64
using strength 4 with 7 bytes ECC code
decoder config 903c3010
NOTICE:  SPI-NAND: FM35Q1GA (128MB)
ERROR:   BL2: Failed to load image id 5 (-2)

I'm not following all the messages that happened here, but this is to give you an update on my test result of the custom preloader that I'm using now. So, this morning I rebooted the router and sadly got the OKD. This time to recover I've just put the power off and on again.

@daniel - Strangely I don't get all of the output that @NullDev showed running the same exact command.

I only get one line of output in screen which is the failure line:

ERROR:   BL2: Failed to load image id 5 (-2)

I downloaded the file twice and even tried adding --bl2-load-baudrate 115200 to the mtk_uartboot command line.

The sha256sum of the file I got is this in case you want to double check it:

371e27910574ae8a707b15a0a4000dc6231b4c11a1e61c3ab51ea254c23b9e95 bl2.bin

Sorry to report that it wasn't a positive outcome.

Thanks you all for testing, I'll keep guessing and will come back soon with more binaries to test.

3 Likes