GCC11 images are not working on CPE510v3

Hi all,

After a couple of hours bisecting, it is the commit da5bb885e17cf77caea70adcf473f1fb95448553 that brakes my TPLink CPE510v3 image from decompressing the kernel, hence from booting.

error description:
-after bootloader (TP-LINK safeloader) starts the kernel, the system hangs/freezes at the "Decompressing kernel..." stage without an error...
-boot output:

TP-LINK SafeLoader (Build time: Jun 14 2017 - 15:42:06)
CPU: 560MHz AHB: 225MHz DDR: 64MB
Performing LED check..  PASS
Press CTRL+B to enter SafeLoader: 1
Manufacturer_id: 0xc8
device_id: 0x4017
open user-config failed.
open user-config failed.
Allocated memory for elf segment ok: addr: 0x80060000, size 0x16cc
Loading .text @ 0x80060000 (5836 bytes)

Starting kernel



OpenWrt kernel loader for AR7XXX/AR9XXX
Copyright (C) 2011 Gabor Juhos <juhosg@openwrt.org>
Looking for OpenWrt image... found at 0xbf043000
Decompressing kernel...

effected targets
-validated on TPLINK CPE510 v3
(probably more target versions like v2, v1 effected)

effected OpenWrt builds:
-at least all trunk builds for CPE510v3 (also from snapshot download folder) after this commit da5bb885e17cf77caea70adcf473f1fb95448553.

workaround
-git revert da5bb885e17cf77caea70adcf473f1fb95448553

bug fix
TBD

I am willing to help fixing this gcc11 issue, let me know if you need extra information or some testing, as I have my CPE510v3 with serial next to me.

Greetings Thomas

wipe your build_dir and staging_dir and try again.

HI,

Tested with a fresh git clone build and also with current snapshot, which does not boot as well. All git bisecting steps were done after make dirclean.
Reverting the gcc11 commit as current workaround brings proper boot back current trunk.

greetings thomas

GCC10 is still available under advanced toolchain configuration. No need to revert.

is there a target specific gcc option, so that we could set gcc10 for those targets that do not boot currently ... I guess there are more targets effected as the cpe510v3 that I just tested

That's done in toolchain/gcc/Config.in

just tested a TPLink CPEv2 ... it is also affected and does not boot with a GCC11 image currently

troubleshooting status update with the help of nbd:

-good guessed starting point -> lzma-loader
(1) compare lzma-loader assembler between gcc11 vs. gcc10:
-> a lot difference ... no captain obvious
(2) disabling GCC11 LTO (link-time-optimization) and re-test:
-> cpe510v3 without LTO image does NOT boot .. freeze at the same
"Decompressing kernel.." stage

actual state: GCC10 for the ath79 cpe510er targets must be used as workaround f.t.t.b.

Greetings Thomas

Makes sense. I don't think my Archer C7 v2 uses lzma-loader.

update on tested devices:

effected devices that do not boot with GCC11 compiled images & current snapshot, that I have tested on my desk:
-TPLink CPE510 v1, v2, v3
-TPLink CPE210v1

probably there are some more models effected that I do not have over here at hand

TP-Link TL-WR1043ND V1 (8/32 :see_no_evil: ) also affected. Checked with the latest snapshot: r17675-ed7769aa40

1 Like

I've run into this isssue while building custom images for my 1043ND v1 (64 MB RAM mod) and my 941ND v3 (4/32 -> 8/64MB mod). The 1043ND is running the stock bootloader, the 941ND is running Breed bootloader (plus the required tweaks in the .dts file, changed the partition table to the 1043ND's out of the box 8MB as the only real difference between these two devices is the switch and the USB port).

Apart from choosing my required packages and features, these are built with "-O2 -mno-mips16 -march=24kc -mtune=24kc" and no MIPS16 for packages, in hopes of getting a bit more performance out of their aging AR9132s.

These devices use "tplink-4m" (941NDv3) and "tplink-8m" (1043NDv1) images as default. These don't boot. Stuck at decompressing kernel on serial output on both Breed and stock u-boot. "tplink-8m" for the 941ND v3 doesn't work either.

I'd then tried building the alternative "tplink-8mlzma" for both, I mean why not? Turns out the 1043NDv1 bootloops, but the 941ND v3 booted successfully :thinking:

Then I installed the Breed bootloader in the 1043ND.... and it booted the very same LZMA compressed image it had just bootlooped in

Kernel log:

[    0.000000] Linux version 5.4.145 (******@******) (gcc version 11.2.0 (OpenWrt GCC 11.2.0 r17581-2c9a07ed28)) #0 Sat Sep 25 17:28:54 2021
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 00019374 (MIPS 24Kc)
[    0.000000] MIPS: machine is TP-Link TL-WR1043ND v1
[    0.000000] SoC: Atheros AR9132 rev 2
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000] On node 0 totalpages: 16384
[    0.000000]   Normal zone: 144 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 16384 pages, LIFO batch:3
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 16240
[    0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
[    0.000000] Dentry cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
[    0.000000] Inode-cache hash table entries: 4096 (order: 2, 16384 bytes, linear)
[    0.000000] Writing ErrCtl register=000729d0
[    0.000000] Readback ErrCtl register=000729d0
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 57096K/65536K available (5021K kernel code, 189K rwdata, 1108K rodata, 1224K init, 196K bss, 8440K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 51
[    0.000000] random: get_random_bytes called from start_kernel+0x354/0x548 with crng_init=0
[    0.000000] CPU clock: 400.000 MHz
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 9556302233 ns
[    0.000012] sched_clock: 32 bits at 200MHz, resolution 5ns, wraps every 10737418237ns
[    0.007966] Calibrating delay loop... 265.42 BogoMIPS (lpj=1327104)
[    0.094267] pid_max: default: 32768 minimum: 301
[    0.099146] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.106511] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.119842] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.129798] futex hash table entries: 256 (order: -1, 3072 bytes, linear)
[    0.136775] pinctrl core: initialized pinctrl subsystem
[    0.143322] NET: Registered protocol family 16
[    0.184036] workqueue: max_active 576 requested for napi_workq is out of range, clamping between 1 and 512
[    0.197975] clocksource: Switched to clocksource MIPS
[    0.204644] NET: Registered protocol family 2
[    0.209335] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.217458] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.225970] TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.233716] TCP bind hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.240840] TCP: Hash tables configured (established 1024 bind 1024)
[    0.247429] UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.254079] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.261469] NET: Registered protocol family 1
[    0.265895] PCI: CLS 0 bytes, default 32
[    0.274630] workingset: timestamp_bits=14 max_order=14 bucket_order=0
[    0.291306] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.297174] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.326268] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[    0.333651] printk: console [ttyS0] disabled
[    0.338090] 18020000.uart: ttyS0 at MMIO 0x18020000 (irq = 9, base_baud = 12500000) is a 8250
[    0.346681] printk: console [ttyS0] enabled
[    0.355105] printk: bootconsole [early0] disabled
[    0.373232] spi-nor spi0.0: s25sl064p (8192 Kbytes)
[    0.378269] 3 fixed-partitions partitions found on MTD device spi0.0
[    0.384652] Creating 3 MTD partitions on "spi0.0":
[    0.389505] 0x000000000000-0x000000020000 : "u-boot"
[    0.395821] 0x000000020000-0x0000007f0000 : "firmware"
[    0.402485] 2 tplink-fw partitions found on MTD device firmware
[    0.408512] Creating 2 MTD partitions on "firmware":
[    0.413510] 0x000000000000-0x0000001fa20e : "kernel"
[    0.419738] 0x0000001fa210-0x0000007d0000 : "rootfs"
[    0.425817] mtd: device 3 (rootfs) set to be root filesystem
[    0.431627] 1 squashfs-split partitions found on MTD device rootfs
[    0.437849] 0x000000600000-0x0000007d0000 : "rootfs_data"
[    0.444487] 0x0000007f0000-0x000000800000 : "art"
[    0.451267] Realtek RTL8366RB ethernet switch driver version 0.2.4
[    0.457587] rtl8366rb rtl8366rb: cannot find mdio node phandle
[    0.577995] rtl8366rb rtl8366rb: using GPIO pins 18 (SDA) and 19 (SCK)
[    0.584707] rtl8366rb rtl8366rb: RTL5937 ver. 3 chip found
[    0.812363] libphy: rtl8366rb: probed
[    0.818141] libphy: Fixed MDIO Bus: probed
[    1.160031] ag71xx 19000000.eth: connected to PHY at fixed-0:00 [uid=00000000, driver=Generic PHY]
[    1.169951] eth0: Atheros AG71xx at 0xb9000000, irq 4, mode: rgmii
[    1.176427] i2c /dev entries driver
[    1.182878] NET: Registered protocol family 10
[    1.193702] Segment Routing with IPv6
[    1.197549] NET: Registered protocol family 17
[    1.202176] 8021q: 802.1Q VLAN Support v1.8
[    1.213627] VFS: Mounted root (squashfs filesystem) readonly on device 31:3.
[    1.228359] Freeing unused kernel memory: 1224K
[    1.232915] This architecture does not have kernel memory protection.
[    1.239422] Run /sbin/init as init process
[    2.036865] init: Console is alive
[    2.041034] init: - watchdog -
[    2.044591] init: Watchdog has previously reset the system
[    2.467985] random: fast init done
[    3.714023] kmodloader: loading kernel modules from /etc/modules-boot.d/*
[    3.845316] usbcore: registered new interface driver usbfs
[    3.850999] usbcore: registered new interface driver hub
[    3.856472] usbcore: registered new device driver usb
[    3.871139] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    3.880629] ehci-fsl: Freescale EHCI Host controller driver
[    3.888963] ehci-platform: EHCI generic platform driver
[    3.894499] ehci-platform 1b000100.usb: EHCI Host Controller
[    3.900318] ehci-platform 1b000100.usb: new USB bus registered, assigned bus number 1
[    3.908361] ehci-platform 1b000100.usb: irq 3, io mem 0x1b000100
[    3.937992] ehci-platform 1b000100.usb: USB 2.0 started, EHCI 1.00
[    3.945401] hub 1-0:1.0: USB hub found
[    3.949742] hub 1-0:1.0: 1 port detected
[    3.958863] kmodloader: done loading kernel modules from /etc/modules-boot.d/*
[    3.976791] init: - preinit -
[    5.867776] random: jshn: uninitialized urandom read (4 bytes read)
[    6.303948] random: jshn: uninitialized urandom read (4 bytes read)
[    6.540624] random: jshn: uninitialized urandom read (4 bytes read)
[    6.910970] urandom_read: 3 callbacks suppressed
[    6.910982] random: jshn: uninitialized urandom read (4 bytes read)
[    7.690769] eth0: link up (1000Mbps/Full duplex)
[    7.695439] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    7.718206] IPv6: ADDRCONF(NETDEV_CHANGE): eth0.1: link becomes ready
[    7.809972] random: procd: uninitialized urandom read (4 bytes read)
[   12.152409] jffs2: notice: (537) jffs2_build_xattr_subsystem: complete building xattr subsystem, 31 of xdatum (9 unchecked, 22 orphan) and 33 of xref (22 dead, 0 orphan) found.
[   12.170317] mount_root: switching to jffs2 overlay
[   12.185230] overlayfs: upper fs does not support tmpfile.
[   12.201404] urandom-seed: Seeding with /etc/urandom.seed
[   12.360627] eth0: link down
[   12.392640] procd: - early -
[   12.396015] procd: - watchdog -
[   12.399763] procd: Watchdog has previously reset the system
[   13.032744] procd: - watchdog -
[   13.036390] procd: Watchdog has previously reset the system
[   13.045093] procd: - ubus -
[   13.172285] random: ubusd: uninitialized urandom read (4 bytes read)
[   13.182071] random: ubusd: uninitialized urandom read (4 bytes read)
[   13.193137] procd: - init -
[   14.627500] kmodloader: loading kernel modules from /etc/modules.d/*
[   15.140339] GACT probability on
[   15.149720] Mirror/redirect action on
[   15.192726] u32 classifier
[   15.195459]     input device check on
[   15.199205]     Actions configured
[   15.269874] Loading modules backported from Linux version v5.10.68-0-g4d8524048a35
[   15.277496] Backport generated by backports.git v5.10.68-1-0-ga4f9ba32
[   15.343724] urngd: v1.0.2 started.
[   15.421093] xt_time: kernel timezone is -0000
[   15.807926] PPP generic driver version 2.4.2
[   15.829718] NET: Registered protocol family 24
[   15.848804] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[   15.856680] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[   16.007377] random: crng init done
[   16.114129] ath: EEPROM regdomain sanitized
[   16.114145] ath: EEPROM regdomain: 0x64
[   16.114150] ath: EEPROM indicates we should expect a direct regpair map
[   16.114173] ath: Country alpha2 being used: 00
[   16.114178] ath: Regpair used: 0x64
[   16.156995] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   16.159737] ieee80211 phy0: Atheros AR9100 MAC/BB Rev:7 AR2133 RF Rev:a2 mem=0xb80c0000, irq=2
[   16.420160] kmodloader: done loading kernel modules from /etc/modules.d/*
[   38.063110] eth0: link up (1000Mbps/Full duplex)
[   38.067813] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   38.084152] br-GUEST: port 1(eth0.25) entered blocking state
[   38.089939] br-GUEST: port 1(eth0.25) entered disabled state
[   38.096057] device eth0.25 entered promiscuous mode
[   38.101158] device eth0 entered promiscuous mode
[   38.130961] br-GUEST: port 1(eth0.25) entered blocking state
[   38.136670] br-GUEST: port 1(eth0.25) entered forwarding state
[   38.632810] br-lan: port 1(eth0.1) entered blocking state
[   38.638338] br-lan: port 1(eth0.1) entered disabled state
[   38.644191] device eth0.1 entered promiscuous mode
[   38.695467] br-lan: port 1(eth0.1) entered blocking state
[   38.700979] br-lan: port 1(eth0.1) entered forwarding state
[   39.058594] IPv6: ADDRCONF(NETDEV_CHANGE): br-GUEST: link becomes ready

It seems Breed (r1163, date 26/12/17) can read these images, but won't read the non-LZMA images. The stock old u-boot these devices (~2012-13 IIRC, I was using the latest TPLink factory image on them before installing OpenWRT) use won't read/work with the newer images, in any format.

Shot in the dark, I am probably completely wrong, but is there a chance non compressed kernels as of 5.4+ in this day and age have become too big to fit in their specified partition size in the image (these are 2009-2010 devices...) and got cut off when building the image, while compression brings it back to a size it fits in and it can be read successfully?

Last year, when I got the 8MB mod done for my 941ND v3, it was running OpenWRT r13927, kernel 5.4.52, 24 July 2020, GCC 8.4... It was already running Breed, but I didn't have to resort to using the lzma images for it to boot.

Another shot in the dark, maybe there was a breaking change somewhere lately that affected the non-LZMA images, while the LZMA images are unaffected and that's why they can be booted?


As for OP, @thuehn
CPE210v1 uses AR9344
CPE510v1, v2 and v3 all seem to use the AR9350

These are old Atheros SoCs, somewhat newer than the AR9132 in the 1043NDv1 and the 941ND v3 in this post. It is not unreasonable to think that they are all hit by the same issue.

1 Like

Another thing to try is the latest LZMA SDK. Wild guess is that maybe there is some alignment issue reading from flash in the ancient LZMA decoder code (circa 2006). I don't have any OKLI devices and lzma-loader works fine on my Ubiquiti EdgeRouter X. If you want to try latest master with new LZMA SDK see https://github.com/lipnitsk/openwrt/tree/okli.

2 Likes

Will do, when I get my device back! Thanks for the refresh!

A fix (reverting the bogus upstream commit for a while) hit the master.

Please test your devices! It's already in the snapshot builds. :+1:

r18107-db34b93331

Thanks @lipnitsk for the reproducer and bugreport!

1 Like

I just build a new firmware after pulling the latest changes (r18297-7e89421a7). I tried booting my bricked TL-WR1043ND-v1, by using the tpl / tftp method, but it still hangs at decompressing. I noticed in the bin/targets/ath79/generic folder that the initramfs-kernel.bin for v1 is 9478396 vs 6904911 for v2. I did a make clean before the build.

I just tried r18107, but it yields the same 9.1MB initramfs, and it hangs at decompressing.

Latest SNAPSHOT build (now: OpenWrt SNAPSHOT, r18395-5e67cd63c4) works nicely for me:

eth0 up                                                                                                                 
eth0                                                                                                                    
is_auto_upload_firmware=0                                                                                               
Autobooting in 1 seconds## Booting image at bf020000 ...                                                                
   Uncompressing Kernel Image ... OK                                                                                    
                                                                                                                        
Starting kernel ...                                                                                                     
                                                                                                                        
                                                                                                                        
                                                                                                                        
OpenWrt kernel loader for AR7XXX/AR9XXX                                                                                 
Copyright (C) 2011 Gabor Juhos <juhosg@openwrt.org>                                                                     
Looking for OpenWrt image... found at 0xbf022000                                                                        
Decompressing kernel... done!                                                                                           
Starting kernel at 80060000...                                                                                          
                                                                                                                        
[    0.000000] Linux version 5.10.87 (builder@buildhost) (mips-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0 r18395-5e67cd63c4) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 Thu Dec 23 18:18:56 2021                                                    
[    0.000000] printk: bootconsole [early0] enabled                                                                     
[    0.000000] CPU0 revision is: 00019374 (MIPS 24Kc)                                                                   
[    0.000000] MIPS: machine is TP-Link TL-WR1043ND v1                                                                  
[    0.000000] SoC: Atheros AR9132 rev 2                                                                                
[    0.000000] Initrd not found or empty - disabling initrd                                                             
[    0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.                                          
[    0.000000] Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes   
...

It does have a few eraseblock free too:

root@OpenWrt:~# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/root                 3.0M      3.0M         0 100% /rom
tmpfs                    11.9M     64.0K     11.9M   1% /tmp
tmpfs                    11.9M     40.0K     11.9M   0% /tmp/root
tmpfs                   512.0K         0    512.0K   0% /dev
/dev/mtdblock4            2.7M    220.0K      2.5M   8% /overlay
overlayfs:/overlay        2.7M    220.0K      2.5M   8% /
[    0.422949] spi-nor spi0.0: s25sl064p (8192 Kbytes)                                                                  
[    0.427934] 3 fixed-partitions partitions found on MTD device spi0.0                                                 
[    0.434365] Creating 3 MTD partitions on "spi0.0":                                                                   
[    0.439194] 0x000000000000-0x000000020000 : "u-boot"                                                                 
[    0.448953] 0x000000020000-0x0000007f0000 : "firmware"                                                               
[    0.455714] 2 tplink-fw partitions found on MTD device firmware                                                      
[    0.461690] Creating 2 MTD partitions on "firmware":                                                                 
[    0.466778] 0x000000000000-0x0000002453ea : "kernel"                                                                 
[    0.471768] mtd: partition "kernel" doesn't end on an erase/write block -- force read-only                           
[    0.483532] 0x0000002453ec-0x0000007d0000 : "rootfs"                                                                 
[    0.488548] mtd: partition "rootfs" doesn't start on an erase/write block boundary -- force read-only                
[    0.499050] mtd: device 3 (rootfs) set to be root filesystem                                                         
[    0.504867] 1 squashfs-split partitions found on MTD device rootfs                                                   
[    0.511099] 0x000000520000-0x0000007d0000 : "rootfs_data"                                                            
[    0.519777] 0x0000007f0000-0x000000800000 : "art"                                                                    

LuCI does work too (included with ImageBuilder):

As you can see the serial log, it's GCC11! :grinning:

Thanks to @lipnitsk for backporting the GCC11 upstream fix! :+1: