Belkin RT3200/Linksys E8450 WiFi AX discussion

I got the very same with the test build.

NOTICE:  BL2: Built : 20:07:08, Aug  8 2024
NOTICE:  CPU: MT7622
NOTICE:  WDT: Cold boot
NOTICE:  WDT: disabled
detected page layout 2048+64
using strength 4 with 7 bytes ECC code
decoder config 903c3010
NOTICE:  SPI-NAND: FM35Q1GA (128MB)
ERROR:   BL2: Failed to load image id 5 (-2)

Sadly, I'm not surprised. We're still working on it. In short, the posts all lead up to the discovery that the issue exists within TF-A v. 2.9 and above. None of the patches so far have narrowed down the issue.

3 Likes

Excuse my ignorance, but I have to ask: @daniel , why don't leave the same BL and partition structure for the SNAPSHOTS and future OpenWRT versions?

It's just for wear leveling? We already have great part of the flash converted to UBI. What are the other advantages? The current layout for v23.05.x works fine if the 1.0.2 installer is used.

I sincerely apologize, I'm just asking as a regular Joe that loves the community and wish to learn more everyday.

Cheers!

1 Like

... and scrubbing, which means automatically relocating data once one or more bits have flipped and ECC/BCH engine returns unclean. Especially in the light of the OKD issue, this is definitely what we should do, simply because even if the BCH engine would work correctly it can only fix few bits per block, and the number of bitflips seems to constantly get worse over months. So with v1.0.2 you will be fine for a while longer (maybe up to years, depending on various unknown factors), because up to n bits do get corrected, but if the number of flipped bits exceeds that, you'd still end up with an unbootable device.

Also, downgrading TF-A in OpenWrt for the sake of rescuing those MT7622 devices would mean that we would need two separate packages, arm-trusted-firmware-filogic (with v2.10 which supports MT798x) and arm-trusted-firmware-mt7622 (v2.4, only for MT7622). While this is of course possible in principle, it would still be much nicer to find and fix the issue. I think we are on a good path to do this, and as we speak an OKD'ed device is on its way to my desk which will allow me to do lots more debugging soon. Imho it's realistic to get this done before the 24.xx branching, or worst-case till the release.

All that being said, of course, for users who are just looking for a stable build right now, the v1.0.2 installer is a very good option and there was never a reason to update TF-A and U-Boot for people who just want to use the release intended for end-users.

12 Likes

I have to respectfully disagree from the risk management perspective.

Recommending the v1.0.2 installer and a stable OpenWrt release for regular users is the correct step.

However, keeping snapshots on the modified layout (with FIP in UBI) known to be incompatible with non-buggy bootloaders is irresponsible. There is no guarantee that the issue will be fixed in time. While it is true that those who used a new installer in February would have to go through another step of undoing the layout change, I still think that reverting to the known-good setup is the right move.

We can move FIP to UBI later, once the boot loader is fixed.

This OKD issue looks severe enough that it can cause some people (e.g., me) to reconsider contributions in the form of snapshot testing. Such people can no longer be asked whether any issue that they reported is fixed in a snapshot. Keeping things as they are now (with snapshots too dangerous to test) will thus result in a lower-quality future release.

2 Likes

I've managed to create a synthetic OKD device which behaves just like the devices reported by @wavejumper00 and @NullDev. Starting from v1.0.2 installer, I've flipped a single bit in fip partition (using mtd read.raw.oob ... and mtd write.raw.oob ... which ends up corrected and not causing any problems with the now running bl2 from v1.0.2 installer. Updating bl2 to the version contained in the 23.05.4 release (v2.9) shows that bitflips are not being corrected and the device shows typical OKD symptoms.

Now I went into debugging ECC decoder and dumping status, and it looks like the decoder itself is working fine -- however, reading the data from memory returns the data with bitflips, which hints towards a problem with DMA memory setup or CPU caches.

Just sharing these results for now, I continue to dig...

Edit: I noticed that in v2.4 we got mempool.c which enforced 512-byte allignment on allocated addresses while mtk_snand_mem_alloc() in v2.9+ doesn't enforce such alignment. I've modified the function in v2.9 to also enforce alignment, but that didn't help, sadly, it looked very promising :confused:

14 Likes

@daniel, I noticed that v2.9 is using a different binary blob for the mt7622 DRAM driver compared to v2.4.

You may want to use the binary blob from v2.4 here and try if that’s the problem:

2 Likes

That's funny because I was just wondering about an hour ago if it was possible to force a bit flip and get a device to artificially become OKD. Nice work!

2 Likes

Ok, @NullDev @grauerfuchs @wavejumper00 I think I found something. We were all overthinking this. The answer is a simple logic bug. Please try this:

(bl2.img is for writing to flash, bl2.bin is for mtk_uartboot)

12 Likes

Success!

NOTICE:  BL2: Built : 20:07:08, Aug  8 2024
NOTICE:  CPU: MT7622   
NOTICE:  WDT: Cold boot
NOTICE:  WDT: disabled 
NOTICE:  SPI-NAND: FM35Q1GA (128MB)
NOTICE:  corrected up to 1 bitflips per page while reading
NOTICE:  BL2: Booting BL31
NOTICE:  BL31: v2.9(release):OpenWrt v2023-07-24-00ac6db3-2 (mt7622-snand-1ddr)
NOTICE:  BL31: Built : 22:09:42, Mar 22 2024


U-Boot 2023.07.02-OpenWrt-r23809-234f1a2efa (Mar 22 2024 - 22:09:42 +0000)

CPU:   MediaTek MT7622 
Model: mt7622-linksys-e8450-ubi
DRAM:  512 MiB
Core:  48 devices, 21 uclasses, devicetree: separate
MMC:
Loading Environment from UBI... SPI-NAND: FM35Q1GA (128MB)
Read 126976 bytes from volume ubootenv to 000000005f7bf200
Read 126976 bytes from volume ubootenv2 to 000000005f7de240
...

Where was the bug?

6 Likes

Oh wow! This seems a big moment in the history of OKD.

So fix seems probable rather than possible now?

Time to revise your votes gents:

1 Like
diff --git a/plat/mediatek/apsoc_common/bl2/bl2_dev_snfi_init.c b/plat/mediatek/apsoc_common/bl2/bl2_dev_snfi_init.c
index 301d8c249..9346dce57 100644
--- a/plat/mediatek/apsoc_common/bl2/bl2_dev_snfi_init.c
+++ b/plat/mediatek/apsoc_common/bl2/bl2_dev_snfi_init.c
@@ -29,8 +29,10 @@ static int snfi_mtd_read_page(struct nand_device *nand, unsigned int page,
        int ret;
 
        ret = mtk_snand_read_page(snf, addr, (void *)buffer, NULL, false);
-       if (ret == -EBADMSG)
+       if (ret > 0) {
+               NOTICE("corrected %d bitflips while reading page %u\n", ret, page);
                ret = 0;
+       }
 
        return ret;
 }

"Quit Thinking and Look"
Credits: https://www.amazon.com/exec/obidos/ASIN/0814474578/debuggingrule-20

14 Likes

That would do it, all right. If the rest of the code works the same way as what I've gone through of it, then it would pass the non-zero return all the way back up the calling chain and that would be instantly interpreted as a failure code rather than as an informational message. Excellent find!

2 Likes

NAND framework in TF-A is kind of dump, unlike Linux or U-Boot it doesn't deal with ECC information, all it cares about is if a read operation was succesful or not. 0 means success, non-0 mean failure.

1 Like

nice work daniel ngl tho kind of disappointed that the problem wasn't more interesting

2 Likes

Can ret be negative? If so does that benefit from logging too?

Works for me to get to U-boot. And it will boot the router if I choose it in the U-boot menu or let the menu timeout happen.

But somehow it hangs my entire Linux box most of the way through the router's boot-up process after screen has started. Tried several times and it hangs my system every time. Can't even ping it, only a power cycle will restore it. Strange. Maybe because I'm using the (I believe fake) Prolific serial interface.. None of the other bl2 files I've tried caused this behavior though.

Anyway, here is example output leading up to the U-boot menu:

NOTICE:  corrected up to 1 bitflips per page while reading
NOTICE:  corrected up to 1 bitflips per page while reading
NOTICE:  corrected up to 1 bitflips per page while reading
NOTICE:  BL2: Booting BL31
NOTICE:  BL31: v2.9(release):OpenWrt v2023-07-24-00ac6db3-2 (mt7622-snand-1ddr)
NOTICE:  BL31: Built : 13:38:11, Nov 14 2023


U-Boot 2023.07.02-OpenWrt-r23630-842932a63d (Nov 14 2023 - 13:38:11 +0000)

CPU:   MediaTek MT7622
Model: mt7622-linksys-e8450-ubi
DRAM:  512 MiB
Core:  48 devices, 21 uclasses, devicetree: separate
MMC:   
Loading Environment from UBI... SPI-NAND: FM35Q1GA (128MB)
Read 126976 bytes from volume ubootenv to 000000005f7bf200
Read 126976 bytes from volume ubootenv2 to 000000005f7de240
OK
In:    serial@11002000
Out:   serial@11002000
Err:   serial@11002000
reset button found
Loading Environment from UBI... UBI partition 'ubi' already selected
Read 126976 bytes from volume ubootenv to 000000005f7bf200
Read 126976 bytes from volume ubootenv2 to 000000005f7de240
OK
Net:   eth0: ethernet@1b100000

Here is what is in screenlog.0 from right after the U-boot menu timeout up to the point before the system hangs:

Summary
No size specified -> Using max size (9904128)
Read 9904128 bytes from volume fit to 0000000048000000

## Checking Image at 48000000 ...
   FIT image found
   FIT description: ARM64 OpenWrt FIT (Flattened Image Tree)
    Image 0 (kernel-1)
     Description:  ARM64 OpenWrt Linux-5.15.137
     Type:         Kernel Image
     Compression:  gzip compressed
     Data Start:   0x48001000
     Data Size:    5136023 Bytes = 4.9 MiB
     Architecture: AArch64
     OS:           Linux
     Load Address: 0x44000000
     Entry Point:  0x44000000
     Hash algo:    crc32
     Hash value:   2500c1a9
     Hash algo:    sha1
     Hash value:   4af1e7d57271cdafdb5f08fcaeb4b09b089617d0
    Image 1 (fdt-1)
     Description:  ARM64 OpenWrt linksys_e8450-ubi device tree blob
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0x484e7000
     Data Size:    30483 Bytes = 29.8 KiB
     Architecture: AArch64
     Hash algo:    crc32
     Hash value:   8bfe6f4a
     Hash algo:    sha1
     Hash value:   6e246b55f03211a052e3d9e86f9b022da611ea3d
    Image 2 (rootfs-1)
     Description:  ARM64 OpenWrt linksys_e8450-ubi rootfs
     Type:         Filesystem Image
     Compression:  uncompressed
     Data Start:   0x484ef000
     Data Size:    4694016 Bytes = 4.5 MiB
     Hash algo:    crc32
     Hash value:   1da48a04
     Hash algo:    sha1
     Hash value:   9bf073d473f09f857dca267e7ec487b6b8b721e9
    Default Configuration: 'config-1'
    Configuration 0 (config-1)
     Description:  OpenWrt linksys_e8450-ubi
     Kernel:       kernel-1
     FDT:          fdt-1
     Loadables:    rootfs-1
## Checking hash(es) for FIT Image at 48000000 ...
   Hash(es) for Image 0 (kernel-1): crc32+ sha1+ 
   Hash(es) for Image 1 (fdt-1): crc32+ sha1+ 
   Hash(es) for Image 2 (rootfs-1): crc32+ sha1+ 
## Loading kernel from FIT Image at 48000000 ...
   Using 'config-1' configuration
   Trying 'kernel-1' kernel subimage
     Description:  ARM64 OpenWrt Linux-5.15.137
     Type:         Kernel Image
     Compression:  gzip compressed
     Data Start:   0x48001000
     Data Size:    5136023 Bytes = 4.9 MiB
     Architecture: AArch64
     OS:           Linux
     Load Address: 0x44000000
     Entry Point:  0x44000000
     Hash algo:    crc32
     Hash value:   2500c1a9
     Hash algo:    sha1
     Hash value:   4af1e7d57271cdafdb5f08fcaeb4b09b089617d0
   Verifying Hash Integrity ... crc32+ sha1+ OK
## Loading fdt from FIT Image at 48000000 ...
   Using 'config-1' configuration
   Trying 'fdt-1' fdt subimage
     Description:  ARM64 OpenWrt linksys_e8450-ubi device tree blob
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0x484e7000
     Data Size:    30483 Bytes = 29.8 KiB
     Architecture: AArch64
     Hash algo:    crc32
     Hash value:   8bfe6f4a
     Hash algo:    sha1
     Hash value:   6e246b55f03211a052e3d9e86f9b022da611ea3d
   Verifying Hash Integrity ... crc32+ sha1+ OK
   Booting using the fdt blob at 0x484e7000
Working FDT set to 484e7000
## Loading loadables from FIT Image at 48000000 ...
   Trying 'rootfs-1' loadables subimage
     Description:  ARM64 OpenWrt linksys_e8450-ubi rootfs
     Type:         Filesystem Image
     Compression:  uncompressed
     Data Start:   0x484ef000
     Data Size:    4694016 Bytes = 4.5 MiB
     Hash algo:    crc32
     Hash value:   1da48a04
     Hash algo:    sha1
     Hash value:   9bf073d473f09f857dca267e7ec487b6b8b721e9
   Verifying Hash Integrity ... crc32+ sha1+ OK
   Uncompressing Kernel Image
   Loading Device Tree to 000000005e7e0000, end 000000005e7ea712 ... OK
Working FDT set to 5e7e0000
Add 'ramoops@42ff0000' node failed: FDT_ERR_EXISTS

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000] Linux version 5.15.137 (builder@buildhost) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 12.3.0 r23630-842932a63d) 12.3.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Tue Nov 14 13:38:11 2023
[    0.000000] Machine model: Linksys E8450 (UBI)
[    0.000000] earlycon: uart8250 at MMIO32 0x0000000011002000 (options '')
[    0.000000] printk: bootconsole [uart8250] enabled
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x0000000042ffffff]
[    0.000000]   node   0: [mem 0x0000000043000000-0x000000004302ffff]
[    0.000000]   node   0: [mem 0x0000000043030000-0x000000005fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.4
[    0.000000] percpu: Embedded 17 pages/cpu s30040 r8192 d31400 u69632
[    0.000000] pcpu-alloc: s30040 r8192 d31400 u69632 alloc=17*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration
[    0.000000] CPU features: detected: ARM erratum 843419
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 129024
[    0.000000] Kernel command line: earlycon=uart8250,mmio32,0x11002000 console=ttyS0,115200n1 swiotlb=512
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 502064K/524288K available (8384K kernel code, 902K rwdata, 1416K rodata, 448K init, 306K bss, 22224K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] Root IRQ handler: 0xffffffc008421b34
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] arch_timer: cp15 timer(s) running at 12.50MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x2e2049cda, max_idle_ns: 440795202628 ns
[    0.000000] sched_clock: 56 bits at 12MHz, resolution 80ns, wraps every 4398046511080ns
[    0.008247] Calibrating delay loop (skipped), value calculated using timer frequency.. 25.00 BogoMIPS (lpj=125000)
[    0.018647] pid_max: default: 32768 minimum: 301
[    0.023563] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
[    0.030910] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
[    0.040048] rcu: Hierarchical SRCU implementation.
[    0.045329] smp: Bringing up secondary CPUs ...
[    0.050235] Detected VIPT I-cache on CPU1
[    0.050245] CPU features: SANITY CHECK: Unexpected variation in SYS_CNTFRQ_EL0. Boot CPU: 0x00000000bebc20, CPU1: 0x00000000000000
[    0.050265] CPU features: Unsupported CPU feature variation detected.
[    0.050294] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[    0.050366] smp: Brought up 1 node, 2 CPUs
[    0.083292] SMP: Total of 2 processors activated.
[    0.088008] CPU features: detected: 32-bit EL0 Support
[    0.093164] CPU features: detected: CRC32 instructions
[    0.098349] CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
[    0.106794] CPU: All CPU(s) started at EL2
[    0.110906] alternatives: patching kernel code
[    0.119831] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.129735] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[    0.136697] pinctrl core: initialized pinctrl subsystem
[    0.142813] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.149088] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations
[    0.156221] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.164010] DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.172245] thermal_sys: Registered thermal governor 'fair_share'
[    0.172249] thermal_sys: Registered thermal governor 'bang_bang'
[    0.178362] thermal_sys: Registered thermal governor 'step_wise'
[    0.184392] thermal_sys: Registered thermal governor 'user_space'
[    0.190617] ASID allocator initialised with 65536 entries
[    0.202521] pstore: Registered ramoops as persistent store backend
[    0.208729] ramoops: using 0x10000@0x42ff0000, ecc: 0
[    0.235504] cryptd: max_cpu_qlen set to 1000
[    0.242466] SCSI subsystem initialized
[    0.246420] libata version 3.00 loaded.
[    0.251518] clocksource: Switched to clocksource arch_sys_counter
[    0.258264] NET: Registered PF_INET protocol family
[    0.263306] IP idents hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.271053] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.279490] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.287274] TCP established hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.295075] TCP bind hash table entries: 4096 (order: 4, 65536 bytes, linear)
[    0.302294] TCP: Hash tables configured (established 4096 bind 4096)
[    0.308757] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.315326] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.322472] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.328168] PCI: CLS 0 bytes, default 64
[    0.334735] workingset: timestamp_bits=46 max_order=17 bucket_order=0
[    0.344863] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.350728] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.393044] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251)
[    0.401667] mt7622-pinctrl 10211000.pinctrl: invalid group "pwm_ch7_2" for function "pwm"
[    0.414536] mt-pmic-pwrap 10001000.pwrap: unexpected interrupt int=0x1
[    0.429602] Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
[    0.438312] printk: console [ttyS0] disabled
[    0.462809] 11002000.serial: ttyS0 at MMIO 0x11002000 (irq = 125, base_baud = 1562500) is a ST16650V2
[    0.472132] printk: console [ttyS0] enabled
[    0.472132] printk: console [ttyS0] enabled
[    0.480496] printk: bootconsole [uart8250] disabled
[    0.480496] printk: bootconsole [uart8250] disabled
[    0.511057] 11004000.serial: ttyS1 at MMIO 0x11004000 (irq = 126, base_baud = 1562500) is a ST16650V2
[    0.521187] 1100c000.serial: ttyS2 at MMIO 0x1100c000 (irq = 130, base_baud = 17499995) is a MediaTek BTIF
[    0.531014] serial serial0: tty port ttyS2 registered
[    0.536903] mtk_rng 1020f000.rng: registered RNG driver
[    0.537080] random: crng init done
[    0.549336] loop: module loaded
[    0.552524] Loading iSCSI transport class v2.0-870.
[    0.558661] mtk-ecc 1100e000.ecc: probed
[    0.565503] spi-nand spi2.0: Fidelix SPI NAND was found.
[    0.570828] spi-nand spi2.0: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
[    0.578988] mtk-snand 1100d000.spi: ECC strength: 4 bits per 512 bytes
[    0.585797] 4 fixed-partitions partitions found on MTD device spi2.0
[    0.592212] OF: Bad cell count for /spi@1100d000/flash@0/partitions
[    0.598495] OF: Bad cell count for /spi@1100d000/flash@0/partitions
[    0.605036] Creating 4 MTD partitions on "spi2.0":
[    0.609832] 0x000000000000-0x000000080000 : "bl2"
[    0.615500] 0x000000080000-0x0000001c0000 : "fip"
[    0.621939] 0x0000001c0000-0x0000002c0000 : "factory"
[    0.628452] 0x000000300000-0x000008000000 : "ubi"
[    0.895126] mtk_soc_eth 1b100000.ethernet eth0: mediatek frame engine at 0xffffffc009440000, irq 141
[    0.905054] i2c_dev: i2c /dev entries driver
[    0.910639] mtk-wdt 10212000.watchdog: IRQ index 0 not found
[    0.916496] mtk-wdt 10212000.watchdog: Watchdog enabled (timeout=31 sec, nowayout=0)
[    0.926657] NET: Registered PF_INET6 protocol family
[    0.932762] Segment Routing with IPv6
[    0.936439] In-situ OAM (IOAM) with IPv6
[    0.940392] NET: Registered PF_PACKET protocol family
[    0.945513] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[    0.958653] 8021q: 802.1Q VLAN Support v1.8
[    0.964056] pstore: Using crash dump compression: deflate
[    0.979562] mtk-pcie 1a143000.pcie: host bridge /pcie@1a143000 ranges:
[    0.986154] mtk-pcie 1a143000.pcie: Parsing ranges property...
[    0.992005] mtk-pcie 1a143000.pcie:      MEM 0x0020000000..0x0027ffffff -> 0x0020000000
[    1.137943] mtk-pcie 1a143000.pcie: PCI host bridge to bus 0000:00
[    1.144157] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.149645] pci_bus 0000:00: root bus resource [mem 0x20000000-0x27ffffff]
[    1.156530] pci_bus 0000:00: scanning bus
[    1.160735] pci 0000:00:00.0: [14c3:3258] type 01 class 0x060400
[    1.166931] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x1ffffffff 64bit pref]
[    1.176930] pci_bus 0000:00: fixups for bus
[    1.181127] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 0
[    1.187840] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.195920] pci 0000:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[    1.203055] pci_bus 0000:01: scanning bus
[    1.207279] pci 0000:01:00.0: [14c3:7915] type 00 class 0x000280
[    1.213476] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit pref]
[    1.220803] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00003fff 64bit pref]
[    1.228137] pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x00000fff 64bit pref]
[    1.236117] pci 0000:01:00.0: supports D1 D2
[    1.240382] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    1.247023] pci 0000:01:00.0: PME# disabled
[    1.251532] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:00.0 (capable of 4.000 Gb/s with 5.0 GT/s PCIe x1 link)
[    1.294948] pci_bus 0000:01: fixups for bus
[    1.299157] pci_bus 0000:01: bus scan returning with max=01
[    1.304769] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[    1.311410] pci_bus 0000:00: bus scan returning with max=01
[    1.317006] pci 0000:00:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[    1.324662] pci 0000:00:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[    1.332674] pci 0000:00:00.0: BAR 8: assigned [mem 0x20000000-0x201fffff]
[    1.339460] pci 0000:01:00.0: BAR 0: assigned [mem 0x20000000-0x200fffff 64bit pref]
[    1.347290] pci 0000:01:00.0: BAR 2: assigned [mem 0x20100000-0x20103fff 64bit pref]
[    1.355118] pci 0000:01:00.0: BAR 4: assigned [mem 0x20104000-0x20104fff 64bit pref]
[    1.362945] pci 0000:00:00.0: PCI bridge to [bus 01]
[    1.367907] pci 0000:00:00.0:   bridge window [mem 0x20000000-0x201fffff]
[    1.374841] pcieport 0000:00:00.0: assign IRQ: got 146
[    1.379986] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[    1.386099] pcieport 0000:00:00.0: enabling bus mastering
[    1.391540] mtk-pcie 1a143000.pcie: msi#0 address_hi 0x0 address_lo 0x44d6d0c0
[    1.398943] pcieport 0000:00:00.0: PME: Signaling with IRQ 146
[    1.404885] pcieport 0000:00:00.0: saving config space at offset 0x0 (reading 0x325814c3)
[    1.413077] pcieport 0000:00:00.0: saving config space at offset 0x4 (reading 0x100006)
[    1.421078] pcieport 0000:00:00.0: saving config space at offset 0x8 (reading 0x6040000)
[    1.429167] pcieport 0000:00:00.0: saving config space at offset 0xc (reading 0x10000)
[    1.437082] pcieport 0000:00:00.0: saving config space at offset 0x10 (reading 0xc)
[    1.444739] pcieport 0000:00:00.0: saving config space at offset 0x14 (reading 0x0)
[    1.452393] pcieport 0000:00:00.0: saving config space at offset 0x18 (reading 0x40010100)
[    1.460650] pcieport 0000:00:00.0: saving config space at offset 0x1c (reading 0x4200000)
[    1.468825] pcieport 0000:00:00.0: saving config space at offset 0x20 (reading 0x20102000)
[    1.477089] pcieport 0000:00:00.0: saving config space at offset 0x24 (reading 0x0)
[    1.484743] pcieport 0000:00:00.0: saving config space at offset 0x28 (reading 0x0)
[    1.492397] pcieport 0000:00:00.0: saving config space at offset 0x2c (reading 0x0)
[    1.500047] pcieport 0000:00:00.0: saving config space at offset 0x30 (reading 0x0)
[    1.507701] pcieport 0000:00:00.0: saving config space at offset 0x34 (reading 0x50)
[    1.515441] pcieport 0000:00:00.0: saving config space at offset 0x38 (reading 0x0)
[    1.523094] pcieport 0000:00:00.0: saving config space at offset 0x3c (reading 0x20192)
[    1.531649] mtk-pcie 1a145000.pcie: host bridge /pcie@1a145000 ranges:
[    1.538201] mtk-pcie 1a145000.pcie: Parsing ranges property...
[    1.544044] mtk-pcie 1a145000.pcie:      MEM 0x0028000000..0x002fffffff -> 0x0028000000
[    1.771590] mtk-pcie 1a145000.pcie: Port1 link down
[    1.776676] mtk-pcie 1a145000.pcie: PCI host bridge to bus 0001:00
[    1.782887] pci_bus 0001:00: root bus resource [bus 00-ff]
[    1.788376] pci_bus 0001:00: root bus resource [mem 0x28000000-0x2fffffff]
[    1.795258] pci_bus 0001:00: scanning bus
[    1.801031] pci_bus 0001:00: fixups for bus
[    1.805223] pci_bus 0001:00: bus scan returning with max=00
[    1.811406] mtk_hsdma 1b007000.dma-controller: MediaTek HSDMA driver registered
[    1.861835] mt7530-mdio mdio-bus:00: configuring for fixed/2500base-x link mode
[    1.872098] mt7530-mdio mdio-bus:00: Link is Up - 2.5Gbps/Full - flow control rx/tx
[    1.880296] mt7530-mdio mdio-bus:00 lan1 (uninitialized): PHY [mt7530-0:00] driver [MediaTek MT7531 PHY] (irq=147)
[    1.901287] mt7530-mdio mdio-bus:00 lan2 (uninitialized): PHY [mt7530-0:01] driver [MediaTek MT7531 PHY] (irq=148)
[    1.921675] mt7530-mdio mdio-bus:00 lan3 (uninitialized): PHY [mt7530-0:02] driver [MediaTek MT7531 PHY] (irq=149)
[    1.942067] mt7530-mdio mdio-bus:00 lan4 (uninitialized): PHY [mt7530-0:03] driver [MediaTek MT7531 PHY] (irq=150)
[    1.962683] mt7530-mdio mdio-bus:00 wan (uninitialized): PHY [mt7530-0:04] driver [MediaTek MT7531 PHY] (irq=151)
[    1.973969] DSA: tree 0 setup
[    1.977736] UBI: auto-attach mtd3
[    1.981066] ubi0: default fastmap pool size: 50
[    1.985606] ubi0: default fastmap WL pool size: 25
[    1.990389] ubi0: attaching mtd3
[    2.295953] ubi0: scanning is finished
[    2.305472] ubi0: attached mtd3 (name "ubi", size 125 MiB)
[    2.310975] ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 126976 bytes
[    2.317866] ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 2048
[    2.324666] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[    2.331630] ubi0: good PEBs: 1000, bad PEBs: 0, corrupted PEBs: 0
[    2.337718] ubi0: user volume: 6, internal volumes: 1, max. volumes count: 128
[    2.344943] ubi0: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 1538370547
[    2.354075] ubi0: available PEBs: 0, total reserved PEBs: 1000, PEBs reserved for bad PEB handling: 20
[    2.363390] ubi0: background thread "ubi_bgt0d" started, PID 514
[    2.365084] FIT: Selected configuration: "config-1" (OpenWrt linksys_e8450-ubi)
[    2.376731] FIT:           kernel sub-image 0x00001000..0x004e6e96 "kernel-1" (ARM64 OpenWrt Linux-5.15.137) 
[    2.386665] FIT:          flat_dt sub-image 0x004e7000..0x004ee712 "fdt-1" (ARM64 OpenWrt linksys_e8450-ubi device tree blob) 
[    2.398064] FIT:       filesystem sub-image 0x004ef000..0x00968fff "rootfs-1" (ARM64 OpenWrt linksys_e8450-ubi rootfs) 
[    2.408848] FIT: selecting configured loadable "rootfs-1" to be root filesystem
[    2.416151]  ubiblock0_4: p1(rootfs-1)
[    2.416349] block ubiblock0_4: created from ubi0:4(fit)
[    2.428763] VFS: Mounted root (squashfs filesystem) readonly on device 259:0.
[    2.436137] Freeing unused kernel memory: 448K
[    2.471683] Run /sbin/init as init process
[    2.475790]   with arguments:
[    2.478754]     /sbin/init
[    2.481459]   with environment:
[    2.484641]     HOME=/
[    2.487000]     TERM=linux
[    2.675614] init: Console is alive
[    2.679147] init: - watchdog -
[    3.057689] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    3.083679] usbcore: registered new interface driver usbfs
[    3.089209] usbcore: registered new interface driver hub
[    3.094594] usbcore: registered new device driver usb
[    3.104990] xhci-mtk 1a0c0000.usb: xHCI Host Controller
[    3.110247] xhci-mtk 1a0c0000.usb: new USB bus registered, assigned bus number 1
[    3.119342] xhci-mtk 1a0c0000.usb: hcc params 0x01403198 hci version 0x96 quirks 0x0000000000210010
[    3.128451] xhci-mtk 1a0c0000.usb: irq 135, io mem 0x1a0c0000
[    3.134316] xhci-mtk 1a0c0000.usb: xHCI Host Controller
[    3.139544] xhci-mtk 1a0c0000.usb: new USB bus registered, assigned bus number 2
[    3.146955] xhci-mtk 1a0c0000.usb: Host supports USB 3.0 SuperSpeed
[    3.153727] hub 1-0:1.0: USB hub found
[    3.157524] hub 1-0:1.0: 2 ports detected
[    3.161989] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    3.170524] hub 2-0:1.0: USB hub found
[    3.174375] hub 2-0:1.0: 1 port detected
[    3.182738] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.200117] init: - preinit -
[    3.522221] mtk_soc_eth 1b100000.ethernet eth0: configuring for fixed/2500base-x link mode
[    3.530686] mtk_soc_eth 1b100000.ethernet eth0: Link is Up - 2.5Gbps/Full - flow control rx/tx
[    3.545491] mt7530-mdio mdio-bus:00 lan1: configuring for phy/gmii link mode
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level

Even after rebooting the Linux box, I cannot reconnect to the router through serial. But the router is up and I can ssh into it. Strange to me is also that the logging shows 2.5Gbps network speed in the screen output. I don't know that I've noticed it before.

Thank you for the work on this! It certainly seems promising. Although I've never had this Linux box hang like this before.

Yes, ret can be negative, and that would indicate an actual non-recoverable read error, which will (righteously) trigger the error message which we had been seeing all along (ERROR: BL2: Failed to load image...).

2 Likes

The fake Prolific chips, especially under Windows, are notorious for flaking out or being intentionally bricked by Prolific's drivers. I'm not having any boot issues, but I'm also using a CH341-based USB to UART adapter.

I should note I've also had EMI issues with these cheap adapters, however. The only way to solve those issues has been to disconnect, wait a while, and reconnect. You might also have a USB port whose hardware doesn't like the chipset much. I've also experienced that. The USB3 ports tend to hang when using these cheap adapters, but a proper USB2 port (even if through a USB2 hub) usually works fine. This issue isn't unique to UART adapters; There are reports all over the globe of old USB2 or USB1 devices refusing to work with USB3 hardware.

1 Like

I can also confirm that this allowed mine to boot as well. Awesome work!

Well then, it looks like that's that. OKD is resolved! :clap: :clap: :clap:

Now we just need to wait for the change to be made upstream. If that looks like it might take a while to do, then it's definitely still possible to apply a patch directly through OpenWRT and release the repaired versions. Meanwhile, I'm compiling unofficial versions that I'll put up on my github repo for those already affected and anxious to update.

2 Likes