I don't know how Rafał came to that conclusion. CFE always boots nflash0.os2 first. As long as you flash to nflash1.trx2, you should be fine. I have a working EA6300v1. I'd be willing to test whatever you have.
What was the problem originally? Is there a bug report somewhere? I've installed this version
without any problems. It says in the console it is Designated Driver build 50072 bleeding edge. I used the trx2 partition—the one starting at 0x1f00000. I flashed it using the serial port, the CFE interface and tftp:
(I use five zeros in groups because that's exactly 1MiB, so you can tell that the deep magic is 2 MiB long.)
CFE keeps a count of the number of failed boots somewhere in the deep magic, incrementing on every failed boot. The stock firmware resets that count somewhere to tell the CFE the boot had succeeded, and openwrt does not. So, every third boot, the CFE switches to the backup firmware and things explode from there.
The real solution is to find the relevant piece of deep magic and zero it on successful boot, just like the stock images do. But as a workaround, it is also sufficient to nuke the entire fallback image partition in the CFE:
CFE> flash -erase nflash1.trx
Then, CFE will do the following:
Invalid boot block on disk
Loader:raw Filesys:tftp Dev:eth0 File:: Options:(null)
Could not load :: Timeout occured
Changed to the other image 0 (maxpartialboots exceeded)
Invalid boot block on disk
nflash1.trx CRC check failed!
Changed to the other image 1
Loader:raw Filesys:raw Dev:nflash0.os2 File: Options:(null)
Loading: ... 1816764 bytes read
Entry at 0x00008000
Starting program at 0x00008000
*** openwrt boots normally starting from here. ***
The fallback image fails its CRC check and the main image boots instead. Score one for the good guys.
The first nvram is supposed to end at 0x2 00000, and the second nvram isn't nvram at all. It's the fallback image. Until that is fixed, the workaround really should not be attempted anywhere except in the CFE. So, I'd say, not a workaround accessible to the typical user, who would assume if the image is in /releases, would also assume it can be flashed using the router's own firmware upgrade mechanism.
Oh my god, this is madness! I'm reading drivers/mtd/bcm47xxpart.c and it's total madness. The partition table isn't static at all. It's built up by a series of wild-ass guesses judging by the current contents of the flash, and could change at any time. If there's a stray header anywhere in the junk that often appears in the stock flash, spurious partitions will be made.
The only sane way to handle the cases where the parsing fails is to use static partition tables for the affected models, including the ea6300v1. In fact, I'd seriously consider using static partition tables for all models.
After all this time, it turns out Rafał was right. The ea6300v1 really doesn't have a predictable boot sequence, in a sense. It doesn't always try trx first, then trx2 second. It tries whatever it did last time first, then tries the other one if that doesn't work. So not only can the boot sequence be changed in a manner that persists on the next boot. The boot sequence is changed persistently by the boot loader itself if there's an error.
It actually gets worse than that. The stock firmware comes with a button in its web UI that changes the boot sequence. So, a user may never have even installed custom firmware and the boot sequence might not be the same as what had come from the factory.
Anyway, this and one other thing has caused me to revise my original workaround slightly. Now that I know I can use trx just as easily or as correctly as trx2, I use trx instead, giving more space for the UBIFS overlay. The other thing is it turns out you can tell CFE to act as the tftp server by specifying : as the image source.
I also found out why openwrt's partition table was wrong. After I erased trx to force CFE to boot to trx2, I had of course also erased the header, which is how openwrt finds partitions on these models.
I'm still experimenting with the stock firmware's ability to modify the boot sequence. I'll keep you apprised.
If somebody has an ea6300v1 that is in its factory state (or at least that hasn't ever had any kind of custom firmware installed on it), I'd like you to check to see what version of the stock firmware resides on trx and trx2. Ideally, you'd use CFE and a serial port, but you can also use the boot sequence switching button in the web UI, which can be found in troubleshooting > switch to backup firmware, or something close to that. Check the firmware version before and after hitting that button. If you need help, just let me know.
It's a pity you didn't help us design a proper solution & develop clean code. It took YEARS for a few experienced Linux developers to sort out partitioning problem. Even thought it's still incomplete.
It seems we could do that in a month (avoiding madness as the same time) with your little help.
I'm pretty sure it does and it's called bootpartition. See 89a0d9a9f194 ("mtd: bcm47xxpart: support layouts with multiple TRX partitions"). The remaining problem is that above code doesn't work due to bcm47xx_nvram_getenv failing at that early stage (NVRAM partition doesn't exist yet).