Sorry, for not being clear. I got the sysupgrade compatibility issue and after trying the fix through ssh and a reboot. The router never came back online.
As suggested, tried multiple times with switch on, changed the adapter as well. It shows no sign of coming back online. Any other way to check if the issue is hardware or just a bug. I've spent good amount of money to get this router to my country and I regret for not checking on the thread in detail re this issue.
What do you mean by "fix through ssh"? Did you re-run the updated installer as the compatibility warning was asking you to do? If you didn't do that, you now got a brick and will need to connect to the internal serial port to fix it, given that you have backups of the previous flash content.
SPI-NAND flash layout changes require bootloader update. Please run the UBI installer version 1.1.0+ (unsigned) first.
Obviously this is what you had to do. Just removing the warning sign so you can continue will obviously lead to problems which are very complicated to resolve. The device now being a brick and needing complicated measures to be recovered is the expected outcome in the case.
Where did you see the information that just changing the compat_version in UCI would be ok? Please tell us, so we can REMOVE IT because it obviously results in people then thinking they are fine and all they have to do is change the compat_version in UCI (which is not true and defies the whole purpose of that warning).
May be I misread the error and assumed it to be compatibility error. It was late night while I was playing with the upgrade.
This is the post (4209) I misread and cancelled the sysupgrade and just ran the command.
Please help me with the recovery. I've ordered usb to TTL adapter just in case it might be useful. One thing I don't understand is that after the error popped up, I cancelled the upgrade then how this happened?
I connected a serial cable and followed the instructions in post 4550. I'm back to running snapshots that I upgrade pretty regularly via 'auc', and haven't had an OKD since. (Knock on wood)
What you say is correct in a static system. However, drive current can become important when a signal is changing state. There is a certain amount of capacitance that must be overcome on the receiving end. The capacitance effectively slows down the logic signal transitions. Instead of a nice square wave with sharp vertical edges, the edges start to slope and get rounded. If these edges are too rounded, the receiving device can misread the value that was intended.
A stronger drive current helps to overcome this capacitance and keep the signals square. A stronger drive signal than what is necessary can potentially be a source of EMI, so it's typically kept to only what is required for the job. A scope (in analog mode) attached to the signals between the 2 devices would show if there is an issue or not.
Btw, i changed my firmware on RT3200 using 22.03.1 (v1.0.1) last year. For the upcoming 24.xyz we are expecting something like v1.1.2 or higher.
My question is: we will be able to upgrade directly from 1.0.2 to 1.1.2 or higher, correct? I will wait for the official release, i have no intentions on running snapshots and upgrading to 1.0.3 was not really encouraged nor advised in this thread for users running 1.0.2.
Nothing should be written to flash during a reboot, especially nothing should ever be written to the bl2 and fip areas, be they on UBI or on raw NAND....
Spectre is about CPU cache leaking information to neighboring processes. It's a vulnerability on servers hosting many virtual machines as it allows malicious tenants to know more than they should about the other tenants on the same machine. It's not relevant on routers which generally have only a single tenant and doesn't host any VMs. And Cortex-A53 isn't affected and never was. And even if the mitigation was compiled into the Kernel, that's only a waste of space as Linux is smart enough to not use mitigations which aren't needed on the CPU it is running on.
Let's hypothetically assume that Spectre-mitigation would have been accidentally enabled on the Cortex-A53 on Linux: the result would be a small performance hit.
Let's even more hypothetically assume it would result in some corruption of the CPU cache: that would still not result in corrupting data stored on the SPI-NAND flash chip (which also got a page cache but that has nothing in common with a CPU cache despite the name and being some sort of built-in memory).
But still, let's dig into every direction, rather share a thought and then learn why it was not the cause than not share a thought because it sounds unlikely and then it could be the cause...
@daniel -
thank you for your detailed and courteous reply.
regarding the cache issue - isn't that from RAM ? i was wondering if something about early re-reads being corrupted might have been triggered by the mitigation, that would require many re-reads from ROM to RAM precisely to avoid a cache issue?
i know almost nothing - so it might be more direct to say "no - you are an idiot"!
I'm seeing the same thing on my second bricked RT3200.
Tried the same method as my previous RT3200 (boot and force upgrade 1.1.0 installer), but got hit with "BL2: Failed to load image id 3 (-2)". I simply can't get the system to boot properly without mtk_uartboot; I can get back to 23.05.3 recovery just fine with mtk_uartboot, but can't seem to make any progress past that.
(Using this to get back to 23.05.3 recovery when needed: ./mtk_uartboot -p bl2-mt7622-1ddr-ram.bin -a -f openwrt-23.05.2-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip -s /dev/ttyUSB0 ; minicom -D /dev/ttyUSB0
)
I tried updating the BL31+U-Boot FIP and BL2 preloader with the following setup below (probably what @wrt54 did as well).
Using the following commandline to boot: ./mtk_uartboot -p bl2-for-mtk_uartboot.bin -a -f openwrt-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip -s /dev/ttyUSB0 ; minicom -D /dev/ttyUSB0
If I try to flash BL31+U-Boot FIP and then BL2 preloader via TFTP (options 7 and 8 in uboot), I get the same output as wrt54 posted (short versions here) -
"spi-nand0" partitions still in use, can't delete them
MTD device fip not found, ret -19
"spi-nand0" partitions still in use, can't delete them
Erasing 0x00000000 ... 0x0007ffff (4 eraseblock(s))
I have my other working RT3200 upgraded to 1.1.0 and SNAPSHOT - I expect I can use that as a donor if needed?
This. Have been using it like this, it works fine, havent felt the need to use sqm even tho Wave testing shows B. I use 400/200 fiber link (fiber to router)
Also, @dan3 try enabling HW flow offload and enable sqm and check. I keep forgetting if hw offload works in this case, after 4000k posts in this thread we tend to forget
Software flow offloading works fine for me when using cake-qos-simple, which leverages nftables and DSCP saving and restoration from conntracks and DSCP-based diffserv tinning in cake.
SQM works by reserving a fraction of your ISP throughput to reduce bufferbloat on what is left. In other words, it is entirely expected that throughput with SQM active will be less (~93% give or take a bit in my experience) than throughput without SQM, no matter your ISP speed. This is why ingress and egress speed in the SQM set-up menu should usually be set lower than your ISP down/up speed (slow DSL service with very high latency masking the actual ISP capability being a possible exception). It takes some experimentation to see how high they can be set before buffer bloat begins creeping up.
Your results look reasonable to me. If your ISP speed were faster (e.g., 400 Mbps), then the MT7622 CPU in the RT3200/E8450 is probably capable of a bit more SQM throughput. However, as it is, with SQM active you are getting ~94% of your throughput without SQM, which is pretty darn good.
CAKE can only run on a single core and requires more CPU in general than fq_codel/simple. However, the small differences you are seeing between CAKE and fq_codel/simple (e.g., 289 vs 296) I would call in the noise of test to test variation. As you approach the limits of a CPU core for CAKE, fq_codel/simple can give you more throughput - especially on a multi-core CPU, because fq_codel can take advantage of more than one CPU core - but I suspect you're not quite there yet. You could always install the htop package and run it in a window while you do a bufferbloat test to see how your CPU cores are being loaded if you want to investigate more to satisfy curiosity, but your results are good. I wouldn't stress over it.