Belkin RT3200/Linksys E8450 WiFi AX discussion

Sorry, for not being clear. I got the sysupgrade compatibility issue and after trying the fix through ssh and a reboot. The router never came back online.

As suggested, tried multiple times with switch on, changed the adapter as well. It shows no sign of coming back online. Any other way to check if the issue is hardware or just a bug. I've spent good amount of money to get this router to my country and I regret for not checking on the thread in detail re this issue.

What do you mean by "fix through ssh"? Did you re-run the updated installer as the compatibility warning was asking you to do? If you didn't do that, you now got a brick and will need to connect to the internal serial port to fix it, given that you have backups of the previous flash content.

No sir. I've run the below command through ssh and just rebooted the device.

uci set system.@system[0].compat_version=2.0
uci commit system

Basically, this is what I followed;

  1. Downloaded the latest ubi snapshot from firmware selector

  2. During attempting sysupgrade got the compatibility issue after which I cancelled the sysupgrade and

  3. Then entered the above commands in ssh and rebooted. After which the device got dead and is not rebooting.

Yes, I do have the back-up. Could you please share the steps and I'll get back to you.

You did see the warning saying:

SPI-NAND flash layout changes require bootloader update. Please run the UBI installer version 1.1.0+ (unsigned) first.

Obviously this is what you had to do. Just removing the warning sign so you can continue will obviously lead to problems which are very complicated to resolve. The device now being a brick and needing complicated measures to be recovered is the expected outcome in the case.

Where did you see the information that just changing the compat_version in UCI would be ok? Please tell us, so we can REMOVE IT because it obviously results in people then thinking they are fine and all they have to do is change the compat_version in UCI (which is not true and defies the whole purpose of that warning).

2 Likes

May be I misread the error and assumed it to be compatibility error. It was late night while I was playing with the upgrade.

This is the post (4209) I misread and cancelled the sysupgrade and just ran the command.

Please help me with the recovery. I've ordered usb to TTL adapter just in case it might be useful. One thing I don't understand is that after the error popped up, I cancelled the upgrade then how this happened?

I connected a serial cable and followed the instructions in post 4550. I'm back to running snapshots that I upgrade pretty regularly via 'auc', and haven't had an OKD since. (Knock on wood)

1 Like

What you say is correct in a static system. However, drive current can become important when a signal is changing state. There is a certain amount of capacitance that must be overcome on the receiving end. The capacitance effectively slows down the logic signal transitions. Instead of a nice square wave with sharp vertical edges, the edges start to slope and get rounded. If these edges are too rounded, the receiving device can misread the value that was intended.

A stronger drive current helps to overcome this capacitance and keep the signals square. A stronger drive signal than what is necessary can potentially be a source of EMI, so it's typically kept to only what is required for the job. A scope (in analog mode) attached to the signals between the 2 devices would show if there is an issue or not.

1 Like

some of these reboot issues could be cache hit issues?

when did the spectre mitigation patch get rolled out into openwrt? i see that it was retracted from mt7622 a few days ago.
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=6b007d2512ad02d08f7e9fc7ad74801ce1330924

i'm wondering if the 'fix' misapplied to mt7622 might have broken something?

Btw, i changed my firmware on RT3200 using 22.03.1 (v1.0.1) last year. For the upcoming 24.xyz we are expecting something like v1.1.2 or higher.

My question is: we will be able to upgrade directly from 1.0.2 to 1.1.2 or higher, correct? I will wait for the official release, i have no intentions on running snapshots and upgrading to 1.0.3 was not really encouraged nor advised in this thread for users running 1.0.2.

1 Like

Nothing should be written to flash during a reboot, especially nothing should ever be written to the bl2 and fip areas, be they on UBI or on raw NAND....

Spectre is about CPU cache leaking information to neighboring processes. It's a vulnerability on servers hosting many virtual machines as it allows malicious tenants to know more than they should about the other tenants on the same machine. It's not relevant on routers which generally have only a single tenant and doesn't host any VMs. And Cortex-A53 isn't affected and never was. And even if the mitigation was compiled into the Kernel, that's only a waste of space as Linux is smart enough to not use mitigations which aren't needed on the CPU it is running on.

Let's hypothetically assume that Spectre-mitigation would have been accidentally enabled on the Cortex-A53 on Linux: the result would be a small performance hit.
Let's even more hypothetically assume it would result in some corruption of the CPU cache: that would still not result in corrupting data stored on the SPI-NAND flash chip (which also got a page cache but that has nothing in common with a CPU cache despite the name and being some sort of built-in memory).

But still, let's dig into every direction, rather share a thought and then learn why it was not the cause than not share a thought because it sounds unlikely and then it could be the cause...

2 Likes

@daniel -
thank you for your detailed and courteous reply.
regarding the cache issue - isn't that from RAM ? i was wondering if something about early re-reads being corrupted might have been triggered by the mitigation, that would require many re-reads from ROM to RAM precisely to avoid a cache issue?
i know almost nothing - so it might be more direct to say "no - you are an idiot"!

1 Like

Since this appears to be the Belkin RT3200/Linksys E8450 owners thread....

My ISP recently bumped my internet to 300/300 (fios ftw), and I found SQM with Cake/piece_of_cake could no longer keep up.

  • Without SQM, I get 306 down 319 up, 85ms latency on download.
  • With Cake the best I could do was 289 down 317 up, 30ms latency during the download.
  • With fq_codel/simple I get 296 down 317 up and <3ms latency.

Anyone using piece_of_cake with >300Mbps internet and this router?

Not complaining at all.. I paid less for this router than 1 month of service from my ISP.

1 Like

What about forgetting about SQM and turning on Software & Hardware Flow Offloading?

5 Likes

I'm seeing the same thing on my second bricked RT3200.

Tried the same method as my previous RT3200 (boot and force upgrade 1.1.0 installer), but got hit with "BL2: Failed to load image id 3 (-2)". I simply can't get the system to boot properly without mtk_uartboot; I can get back to 23.05.3 recovery just fine with mtk_uartboot, but can't seem to make any progress past that.

(Using this to get back to 23.05.3 recovery when needed:
./mtk_uartboot -p bl2-mt7622-1ddr-ram.bin -a -f openwrt-23.05.2-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip -s /dev/ttyUSB0 ; minicom -D /dev/ttyUSB0
)

I tried updating the BL31+U-Boot FIP and BL2 preloader with the following setup below (probably what @wrt54 did as well).

I have these two files on my TFTP (snapshot from downloads.openwrt.org)

openwrt-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip
openwrt-mediatek-mt7622-linksys_e8450-ubi-preloader.bin

Using the following commandline to boot:
./mtk_uartboot -p bl2-for-mtk_uartboot.bin -a -f openwrt-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip -s /dev/ttyUSB0 ; minicom -D /dev/ttyUSB0

I am using the 1.1.0 bl2-for-mtk_uartboot.bin and openwrt-mediatek-mt7622-linksys_e8450-ubi-bl31-uboot.fip (snapshot from downloads.openwrt.org) for the command above.

If I try to flash BL31+U-Boot FIP and then BL2 preloader via TFTP (options 7 and 8 in uboot), I get the same output as wrt54 posted (short versions here) -

"spi-nand0" partitions still in use, can't delete them
MTD device fip not found, ret -19
"spi-nand0" partitions still in use, can't delete them
Erasing 0x00000000 ... 0x0007ffff (4 eraseblock(s))

I have my other working RT3200 upgraded to 1.1.0 and SNAPSHOT - I expect I can use that as a donor if needed?

Thanks for all your help!

Have you tried tuning SQM cake with reduced settings?

Adavanced Tab > Check ‘Advanced Configuration’ > Check ‘Dangerous Configuration’

Add one of the following lines to both ‘Qdisc options (ingress)’ and ‘Qdisc options (egress)’

In order of Slower/Better to Fastest/Worst

besteffort flows
besteffort flowblind
besteffort flows no-split-gso
besteffort flowblind no-split-gso

If that last one doesn’t work then you’ve outgrown the RT3200 for traffic shaping at those speeds.

This. Have been using it like this, it works fine, havent felt the need to use sqm even tho Wave testing shows B. I use 400/200 fiber link (fiber to router)

Also, @dan3 try enabling HW flow offload and enable sqm and check. I keep forgetting if hw offload works in this case, after 4000k posts in this thread we tend to forget :slight_smile:

HW offload will bypass SQM, so if you need SQM, turn off SW and HW offloads.

2 Likes

Software flow offloading works fine for me when using cake-qos-simple, which leverages nftables and DSCP saving and restoration from conntracks and DSCP-based diffserv tinning in cake.

root@OpenWrt-1:~# service cake-qos-simple download
qdisc cake 1: root refcnt 2 bandwidth 60Mbit diffserv4 triple-isolate nat nowash ingress no-ack-filter split-gso rtt 100ms noatm overhead 0
 Sent 103899563927 bytes 89674263 pkt (dropped 144625, overlimits 121940272 requeues 0)
 backlog 0b 0p requeues 0
 memory used: 3915192b of 4Mb
 capacity estimate: 60Mbit
 min/max network layer size:           46 /    1500
 min/max overhead-adjusted size:       46 /    1500
 average network hdr offset:           14

                   Bulk  Best Effort        Video        Voice
  thresh       3750Kbit       60Mbit       30Mbit       15Mbit
  target            5ms          5ms          5ms          5ms
  interval        100ms        100ms        100ms        100ms
  pk_delay       2.29ms        728us       1.76ms        430us
  av_delay        631us        422us        250us         56us
  sp_delay         14us          6us          6us          5us
  backlog            0b           0b           0b           0b
  pkts           839677     88402554         5261       571396
  bytes       983133313 102873824639      2113953    236649194
  way_inds          279      4740117            0         3424
  way_miss         2451       173325          203        41254
  way_cols            0            0            0            0
  drops            1120       143500            5            0
  marks               0            0            0            0
  ack_drop            0            0            0            0
  sp_flows            1            5            1            2
  bk_flows            0            1            0            0
  un_flows            0            0            0            0
  max_len         67326        68700         6870         3445
  quantum           300         1514          915          457
2 Likes

@quarky see? This is why i am perpetually confused :slight_smile:

SQM works by reserving a fraction of your ISP throughput to reduce bufferbloat on what is left. In other words, it is entirely expected that throughput with SQM active will be less (~93% give or take a bit in my experience) than throughput without SQM, no matter your ISP speed. This is why ingress and egress speed in the SQM set-up menu should usually be set lower than your ISP down/up speed (slow DSL service with very high latency masking the actual ISP capability being a possible exception). It takes some experimentation to see how high they can be set before buffer bloat begins creeping up.

Your results look reasonable to me. If your ISP speed were faster (e.g., 400 Mbps), then the MT7622 CPU in the RT3200/E8450 is probably capable of a bit more SQM throughput. However, as it is, with SQM active you are getting ~94% of your throughput without SQM, which is pretty darn good.

CAKE can only run on a single core and requires more CPU in general than fq_codel/simple. However, the small differences you are seeing between CAKE and fq_codel/simple (e.g., 289 vs 296) I would call in the noise of test to test variation. As you approach the limits of a CPU core for CAKE, fq_codel/simple can give you more throughput - especially on a multi-core CPU, because fq_codel can take advantage of more than one CPU core - but I suspect you're not quite there yet. You could always install the htop package and run it in a window while you do a bufferbloat test to see how your CPU cores are being loaded if you want to investigate more to satisfy curiosity, but your results are good. I wouldn't stress over it.

1 Like