Master regression - boot loop due to kernel panic on latest snapshot (mt7621/Archer C6 v3)

I just did a build yesterday for the Archer C6 v3 from master, and now it is not booting anymore due to a boot loop issue.

Attaching a UART I can see the error below (kernel panic).

Reverting to a build I did a couple of weeks ago (also from master) solves the problem.

I would appreciate if any dev can take a look on this, and if needed I may file a bug report.

Thanks.

[    0.797651] CPU 1 Unable to handle kernel paging request at virtual address 5050404, epc == 80588ef8, ra == 801fe360
[    0.808162] Oops[#1]:
[    0.810387] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.159 #0
[    0.816345] $ 0   : 00000000 00000001 8fc304a4 00000108
[    0.821525] $ 4   : 05050404 80621000 8064d18f 00000061
[    0.826712] $ 8   : fffffffc 80594b3c 00000045 006d6873
[    0.831893] $12   : 015ede76 08fca8f3 9715a5ed 5c2e1039
[    0.837079] $16   : 8ff9cc00 8fc2523c 05050404 8064d184
[    0.842261] $20   : 0000000b 8fc06e00 8ff9cc8c 806ebd24
[    0.847448] $24   : 00000010 76ec2f43
[    0.852629] $28   : 8fc40000 8fc41d50 38e38e39 801fe360
[    0.857816] Hi    : 00000000
[    0.860665] Lo    : 006c0400
[    0.863546] epc   : 80588ef8 strlen+0x0/0x2c
[    0.867771] ra    : 801fe360 insert_header+0x140/0x4f8
[    0.872847] Status: 11007c03 KERNEL EXL IE
[    0.876997] Cause : 40800008 (ExcCode 02)
[    0.880969] BadVA : 05050404
[    0.883821] PrId  : 0001992f (MIPS 1004Kc)
[    0.887882] Modules linked in:
[    0.890910] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), ts=00000000)
[    0.898940] Stack : 08fca8f3 00000000 2ab4a599 00000dc0 00000000 8fc06e30 8f06e00 801fc880
[    0.907235]         80862190 00000000 00000000 8fe57007 00000000 8ff9cc00 8f06e00 80860000
[    0.915528]         8fc06e00 00000001 00000000 801feaec 806f0000 80830000 8024aec 00000000
[    0.923822]         8064d008 8fd64a00 806eb18c 8fc06e00 8063ab2c 8063ab50 0000001 8fc06e00
[    0.932118]         8fe57007 806ebc4c 8fe57000 806ebc04 00000001 806eb18c 8030000 80830000
[    0.940412]         ...
[    0.942832] Call Trace:
[    0.945261] [<80588ef8>] strlen+0x0/0x2c
[    0.949146] [<801fe360>] insert_header+0x140/0x4f8
[    0.953897] [<801feaec>] __register_sysctl_table+0x30c/0x630
[    0.959516] [<801ff154>] __register_sysctl_paths+0xf4/0x1e8
[    0.965067] [<8070de10>] ipc_sysctl_init+0x14/0x24
[    0.969793] [<800015c8>] do_one_initcall+0x50/0x1a8
[    0.974641] [<806fbeec>] kernel_init_freeable+0x1ec/0x2d0
[    0.979997] [<80594e78>] kernel_init+0x10/0xf8
[    0.984398] [<80006478>] ret_from_kernel_thread+0x14/0x1c
[    0.989755] Code: a066ffff  1000fff7  00000000 <80820000> 10400007  00000000 00801025  80430001  1460fffe
[    0.999424]
[    1.000995] ---[ end trace d1818afedd9795ac ]---

Full log:

Full log
Boot 1.1.3 (May 13 2020 - 19:39:06)

Board: Ralink APSoC DRAM:  128 MB
relocate_code Pointer at: 87f58000

Config XHCI 40M PLL
flash manufacture id: ef, device id 40 18
find flash: W25Q128BV
*** Warning - bad CRC, using default environment

============================================
Ralink UBoot Version: 5.0.0.0
--------------------------------------------
ASIC MT7621A DualCore (MAC to MT7530 Mode)
DRAM_CONF_FROM: Auto-Detection
DRAM_TYPE: DDR3
DRAM bus: 16 bit
Xtal Mode=3 OCP Ratio=1/3
Flash component: SPI Flash
Date:May 13 2020  Time:19:39:06
============================================
THIS IS uboot
icache: sets:256, ways:4, linesz:32 ,total:32768
dcache: sets:256, ways:4, linesz:32 ,total:32768

 ##### The CPU freq = 880 MHZ ####
 estimate memory size =128 Mbytes

Press '4' or 't' to break the booting process

Press 'x' to enter recovery web server
 0
nm_init:791
nm_initFwupPtnStruct:276
nm_lib_readPtnTable:738
[NM_Debug](nm_lib_readPtnTable) 00743: NM_PTN_TABLE_BASE = 0xfe0000
[NM_Debug](nm_lib_readPtnFromNvram) 00569: partition_used_len = 1054, requried en = 8192
[NM_Debug](nm_lib_readPtnTable) 00751: Reading Partition Table from NVRAM ... O

[NM_Debug](nm_lib_readPtnTable) 00759: Parsing Partition Table ... OK

[NM_Debug](nm_lib_readPtnFromNvram) 00569: partition_used_len = 2, requried len= 2
factory boot check integer ok.


3: System Boot system code via Flash.
## Booting image at bc040000 ...
   Image Name:   MIPS OpenWrt Linux-5.4.159
   Image Type:   MIPS Linux Kernel Image (lzma compressed)
   Data Size:    2383645 Bytes =  2.3 MB
   Load Address: 80001000
   Entry Point:  80001000
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... OK
No initrd
## Transferring control to Linux (at address 80001000) ...
## Giving linux memsize in MB, 128

Starting kernel ...

[    0.000000] Linux version 5.4.159 (dsouza@dsouza00) (gcc version 11.2.0 (OenWrt GCC 11.2.0 r18195-d1c7df9c4b)) #0 SMP Fri Nov 26 20:33:09 2021
[    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[    0.000000] MIPS: machine is TP-Link Archer C6 v3
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] VPE topology {2,2} total 4
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 byes
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x0000000023ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x0000000023ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000023ffffff]
[    0.000000] percpu: Embedded 14 pages/cpu s26448 r8192 d22704 u57344
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 130496
[    0.000000] Kernel command line: console=ttyS0,115200n8 rootfstype=squashfs,ffs2
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, inear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, liear)
[    0.000000] Writing ErrCtl register=0006a040
[    0.000000] Readback ErrCtl register=0006a040
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 510004K/524288K available (5739K kernel code, 200K rwdat, 1196K rodata, 1236K init, 226K bss, 14284K reserved, 0K cma-reserved, 262144Khighmem)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jifies.
[    0.000000] NR_IRQS: 256
[    0.000000] random: get_random_bytes called from start_kernel+0x368/0x580 wih crng_init=0
[    0.000000] CPU Clock: 880MHz
[    0.000000] clocksource: GIC: mask: 0xffffffffffffffff max_cycles: 0xcaf478ab4, max_idle_ns: 440795247997 ns
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_dle_ns: 4343773742 ns
[    0.000009] sched_clock: 32 bits at 440MHz, resolution 2ns, wraps every 488045118ns
[    0.007805] Calibrating delay loop... 586.13 BogoMIPS (lpj=2930688)
[    0.073964] pid_max: default: 32768 minimum: 301
[    0.078751] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, liner)
[    0.085951] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes,linear)
[    0.096261] rcu: Hierarchical SRCU implementation.
[    0.101567] smp: Bringing up secondary CPUs ...
[    0.107706] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.107715] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 byes
[    0.107726] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.107821] CPU1 revision is: 0001992f (MIPS 1004Kc)
[    0.166329] Synchronize counters for CPU 1: done.
[    0.207168] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.207177] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 byes
[    0.207184] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.207233] CPU2 revision is: 0001992f (MIPS 1004Kc)
[    0.257046] Synchronize counters for CPU 2: done.
[    0.288371] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.288379] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 byes
[    0.288386] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.288440] CPU3 revision is: 0001992f (MIPS 1004Kc)
[    0.342231] Synchronize counters for CPU 3: done.
[    0.372092] smp: Brought up 1 node, 4 CPUs
[    0.380382] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, mx_idle_ns: 19112604462750000 ns
[    0.390168] futex hash table entries: 1024 (order: 3, 32768 bytes, linear)
[    0.397173] pinctrl core: initialized pinctrl subsystem
[    0.403734] NET: Registered protocol family 16
[    0.433411] workqueue: max_active 576 requested for napi_workq is out of rane, clamping between 1 and 512
[    0.444529] clocksource: Switched to clocksource GIC
[    0.451073] NET: Registered protocol family 2
[    0.455573] IP idents hash table entries: 4096 (order: 3, 32768 bytes, linea)
[    0.463300] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144bytes, linear)
[    0.471595] TCP established hash table entries: 2048 (order: 1, 8192 bytes, inear)
[    0.479184] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear
[    0.486277] TCP: Hash tables configured (established 2048 bind 2048)
[    0.492672] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.499144] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.506290] NET: Registered protocol family 1
[    0.510577] PCI: CLS 0 bytes, default 32
[    0.754479] 4 CPUs re-calibrate udelay(lpj = 2924544)
[    0.761037] workingset: timestamp_bits=14 max_order=17 bucket_order=3
[    0.772581] random: fast init done
[    0.780855] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.786625] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORIY) (c) 2001-2006 Red Hat, Inc.
[    0.797651] CPU 1 Unable to handle kernel paging request at virtual address 5050404, epc == 80588ef8, ra == 801fe360
[    0.808162] Oops[#1]:
[    0.810387] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.4.159 #0
[    0.816345] $ 0   : 00000000 00000001 8fc304a4 00000108
[    0.821525] $ 4   : 05050404 80621000 8064d18f 00000061
[    0.826712] $ 8   : fffffffc 80594b3c 00000045 006d6873
[    0.831893] $12   : 015ede76 08fca8f3 9715a5ed 5c2e1039
[    0.837079] $16   : 8ff9cc00 8fc2523c 05050404 8064d184
[    0.842261] $20   : 0000000b 8fc06e00 8ff9cc8c 806ebd24
[    0.847448] $24   : 00000010 76ec2f43
[    0.852629] $28   : 8fc40000 8fc41d50 38e38e39 801fe360
[    0.857816] Hi    : 00000000
[    0.860665] Lo    : 006c0400
[    0.863546] epc   : 80588ef8 strlen+0x0/0x2c
[    0.867771] ra    : 801fe360 insert_header+0x140/0x4f8
[    0.872847] Status: 11007c03 KERNEL EXL IE
[    0.876997] Cause : 40800008 (ExcCode 02)
[    0.880969] BadVA : 05050404
[    0.883821] PrId  : 0001992f (MIPS 1004Kc)
[    0.887882] Modules linked in:
[    0.890910] Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), ts=00000000)
[    0.898940] Stack : 08fca8f3 00000000 2ab4a599 00000dc0 00000000 8fc06e30 8f06e00 801fc880
[    0.907235]         80862190 00000000 00000000 8fe57007 00000000 8ff9cc00 8f06e00 80860000
[    0.915528]         8fc06e00 00000001 00000000 801feaec 806f0000 80830000 8024aec 00000000
[    0.923822]         8064d008 8fd64a00 806eb18c 8fc06e00 8063ab2c 8063ab50 0000001 8fc06e00
[    0.932118]         8fe57007 806ebc4c 8fe57000 806ebc04 00000001 806eb18c 8030000 80830000
[    0.940412]         ...
[    0.942832] Call Trace:
[    0.945261] [<80588ef8>] strlen+0x0/0x2c
[    0.949146] [<801fe360>] insert_header+0x140/0x4f8
[    0.953897] [<801feaec>] __register_sysctl_table+0x30c/0x630
[    0.959516] [<801ff154>] __register_sysctl_paths+0xf4/0x1e8
[    0.965067] [<8070de10>] ipc_sysctl_init+0x14/0x24
[    0.969793] [<800015c8>] do_one_initcall+0x50/0x1a8
[    0.974641] [<806fbeec>] kernel_init_freeable+0x1ec/0x2d0
[    0.979997] [<80594e78>] kernel_init+0x10/0xf8
[    0.984398] [<80006478>] ret_from_kernel_thread+0x14/0x1c
[    0.989755] Code: a066ffff  1000fff7  00000000 <80820000> 10400007  00000000 00801025  80430001  1460fffe
[    0.999424]
[    1.000995] ---[ end trace d1818afedd9795ac ]---
[    1.005529] Kernel panic - not syncing: Fatal exception
[    1.010723] Rebooting in 1 seconds..

BTW, sometimes it boots OK, but most of time the boot process does not continue.

So it seems somewhat random. The issue appeared in three different C6 v3 units, so definitively it is a software/OpenWRT (not hardware) issue.

OK, looking at the logs I believe I've found the root cause.

It seems that the amount of RAM memory sometimes is not correctly identified. When the boot fails, the boot loader seems to be identifying a "HighMem" memory that does not exist in this device:

Wrong HighMem Memory Detected causes Kernel Panic during boot:

(...)
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x0000000023ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x0000000023ffffff]
(...)
[    0.000000] Memory: 510004K/524288K available (5739K kernel code, 200K rwdata, 1196K rodata, 1236K init, 226K bss, 14284K reserved, 0K cma-reserved, 262144K highmem)

Per log above the identified amount of RAM memory is 512MB, when in fact this device has only 128MB of RAM. When the above situation happens the boot fails with kernel panic.

After a couple of power cycles, the memory is correctly identified as 128MB (no HighMem) per below and the device boots OK:

Correct Memory Size Detected boots OK:

(...)
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000]   HighMem  empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000007ffffff]
(...)
[    0.000000] Memory: 120916K/131072K available (5739K kernel code, 200K rwdata, 1196K rodata, 1236K init, 226K bss, 10156K reserved, 0K cma-reserved, 0K highmem)
(...)

Any advice about how this can be fixed would be appreciated.

Thanks!

I just created a bug report for this issue per below:

https://bugs.openwrt.org/index.php?do=details&task_id=4158

2 Likes

Just in case anyone faces this problem, there is a workaround but requires doing your own build.

The workaround it to disable the HighMem support, which for some reason started failing.

To do this, just edit the file "./target/linux/ramips/mt7621/config-5.4" and disable HighMem support by changing the line below to "n":

CONFIG_HIGHMEM=n

After this redo the build. This will solve the bootloop and will have no side effect on this device since it has only 128MiB of RAM, and HighMem support is required only for devices with 256MiB of RAM or more. I will update the bug report with this info.

Just in case some dev is looking at this topic, I've tracked this issue to the code below. But i believe it has not changed and I have no understanding about its logic to proceed with the investigation of the root cause of what caused it started failing (but I suspect that the "detec_magic" is flawed):

/arch/mips/ralink/mt7621.c

static void __init mt7621_memory_detect(void)
{
	void *dm = &detect_magic;
	phys_addr_t size;

	for (size = 32 * SZ_1M; size < 256 * SZ_1M; size <<= 1) {
		if (!__builtin_memcmp(dm, dm + size, sizeof(detect_magic)))
			break;
	}

	if ((size == 256 * SZ_1M) &&
	    (CPHYSADDR(dm + size) < MT7621_LOWMEM_MAX_SIZE) &&
	    __builtin_memcmp(dm, dm + size, sizeof(detect_magic))) {
		memblock_add(MT7621_LOWMEM_BASE, MT7621_LOWMEM_MAX_SIZE);
		memblock_add(MT7621_HIGHMEM_BASE, MT7621_HIGHMEM_SIZE);
	} else {
		memblock_add(MT7621_LOWMEM_BASE, size);
	}
}

mt7621_memory_detect() determines the amount of memory installed by testing whether the physical memory content is mirrored to the next-higher possible memory size. For example if you have 128MB of memory, the value (detect_magic) at (say) 1MB and 33MB would be different, the value at 1MB and 65MB would be different, but the value at 1MB and 129MB would be identical because the 0-128MB range is mirrored to 128MB-256MB.

The if-statement in the second half of the function is just due to the way MT7621 lays out its memory; if you have 512MB memory then the I/O region will end up in the middle of your physical memory. So that conditional sets up the first 448MB as lowmem and the remaining 64MB on the other side of the I/O region as highmem.

Unfortunately this means disabling highmem isn't an effective workaround; when the memory size is misdetected, you'll just end up with 448MB of lowmem, which is obviously still quite wrong. I suspect that your no-highmem build has merely perturbed the kernel layout in a way that avoids the problem occurring.

If you're able to reproduce the problem with different builds, it might be worth instrumenting it to see why mt7621_memory_detect() is failing, but if you just want a reliable workaround I suggest removing target/linux/ramips/patches-5.4/105-mt7621-memory-detect.patch which should return to the upstream memory detection algorithm.

3 Likes

Thank you! Right now I've updated all my four Archer C6 v3 with "CONFIG_HIGHMEM=n" and so far the issue has not appeared anymore, but as you said it might be a (positive) side effect.

This week I will try your suggestion for removing 105-mt7621-memory-detect.patch and check the results.

Good news! I followed your suggestion, re-enabled "CONFIG_HIGHMEM=y" and removed the patch **target/linux/ramips/patches-5.4/105-mt7621-memory-detect.patch**.

Issue solved! So it seems that in fact this patch is the culprit.

I will mark your post as solution for future reference and I will update this info there in the bug report.

1 Like

cc @981213

2 Likes

Well, I think I've spoken too soon. In a rush last night to test it, I just removed the patch and rebooted the router. And it rebooted OK, without boot loop.

But today after a closer inspection in the log file I've noticed that the RAM size is being incorrectly detected, 256MB instead of 128MB:

[    0.000000] Linux version 5.4.162 (dsouza@dsouza00) (gcc version 11.2.0 (OpenWrt GCC 11.2.0 r18233-0a4f5d06c2)) #0 SMP Sun Nov 28 20:15:10 2021
[    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[    0.000000] printk: bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
[    0.000000] MIPS: machine is TP-Link Archer C6 v3
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] VPE topology {2,2} total 4
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000001bffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x0000000023ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000023ffffff]
[    0.000000] On node 0 totalpages: 65536
[    0.000000]   Normal zone: 576 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 65536 pages, LIFO batch:15
[    0.000000] percpu: Embedded 14 pages/cpu s26480 r8192 d22672 u57344
[    0.000000] pcpu-alloc: s26480 r8192 d22672 u57344 alloc=14*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64960
[    0.000000] Kernel command line: console=ttyS0,115200n8 rootfstype=squashfs,jffs2
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] Writing ErrCtl register=0005a010
[    0.000000] Readback ErrCtl register=0005a010
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 250320K/262144K available (6107K kernel code, 201K rwdata, 1240K rodata, 1272K init, 206K bss, 11824K reserved, 0K cma-reserved)

I will do a new build disabling HIGHMEM to see if it has any effect. So while removing the patch prevented the bootloop, the memory is still being incorrectly identified.

This should skip the memory detection and force the memory to be 128M:

--- a/target/linux/ramips/dts/mt7621_tplink_archer-x6-v3.dtsi
+++ b/target/linux/ramips/dts/mt7621_tplink_archer-x6-v3.dtsi
@@ -18,6 +18,11 @@
 		bootargs = "console=ttyS0,115200n8";
 	};
 
+	memory@0 {
+		device_type = "memory";
+		reg = <0x0 0x8000000>;
+	};
+
 	keys {
 		compatible = "gpio-keys";
 
3 Likes

Thank you! This change solved the problem. I'm not sure why the memory size detection started failing on this device. Should this .dtsi change be a permanent solution for this issue?

Well, some good and unexpected news.

Today (Dec 5th, 2021) I did a new build (r18287-f9a28d216d) without the above patch to skip the memory detection in the .dtsi file, and now everything is working fine again (tested with 4 Archer C6 v3).

So I'm convinced this was in fact a regression that has now being fixed (by some unidentified commit) and the above patch is not required anymore.

The mystery now is to identify which change/commit fixed this issue... :upside_down_face:

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Hi @981213!

I noticed that you landed a patch ramips: mt7621: do memory detection on KSEG1.

I tried to contact @dsouza about revisit this unresolved bootloop issue.
We messaged and I gave a try to backport your fix to Kernel 5.4 exactly on top of the failing commit: r18195-d1c7df9c4b

@dsouza compiled it and reported back that it fixes the bootlop.

While he reported that a later build (r18287-f9a28d216d) doesn't reproduce the issue, it's clear to me that your patch is the real solution for this problem.

@981213, as Kernel 5.4 is still in release 21.02, may I backport / ask for backporting your commit 2f024b793311 to 21.02?

1 Like

Hi! As you've already done the backport, feel free to submit your work to mail list or Github pull request :smiley:

1 Like

@dsouza just pushed the 21.02 backport:

Please test this too!
And if you don't mind, provide a Tested-by: too, thanks!

@xablocs, I'm willing to test, but how do you want me to test?

I mean, in the previous test, I've applied your backported patch (to kernel 5.4) on the failing build (r18195-d1c7df9c4b) and confirmed it worked.

I understand that the patch is now committed to master, but the current master dropped support to kernel 5.4 and only builds with kernel 5.10.

So, how do you believe I could test it? Perhaps releasing this patch to the 21.01 head (which is still on kernel 5.4) and test with a 21.01 head build?