Failed to startup an ipq806x router on kernel 5.4

Hi, I have an ipq8064 router with 4GB emmc, and I want to try the 5.4 kernel, but unfortunately failed.
Here is the failed startup log:

[    1.543797] libphy: ipq8064_mdio_bus: probed
[    1.595937] switch0: Atheros AR8337 rev. 2 switch registered on 37000000.mdio-mii
[    2.702404] ar8327: qca,phy-rgmii-en is not specified
[    2.702828] libphy: Fixed MDIO Bus: probed
[    2.707144] ipq806x-gmac-dwmac 37200000.ethernet: IRQ eth_wake_irq not found
[    2.710526] ipq806x-gmac-dwmac 37200000.ethernet: IRQ eth_lpi not found
[    2.717866] ipq806x-gmac-dwmac 37200000.ethernet: PTP uses main clock
[    2.724726] ipq806x-gmac-dwmac 37200000.ethernet: User ID: 0x10, Synopsys ID: 0x37
[    2.730689] ipq806x-gmac-dwmac 37200000.ethernet:    DWMAC1000
[    2.738039] ipq806x-gmac-dwmac 37200000.ethernet: DMA HW capability register supported
[    2.743967] ipq806x-gmac-dwmac 37200000.ethernet: RX Checksum Offload Engine supported
[    2.751662] ipq806x-gmac-dwmac 37200000.ethernet: COE Type 2
[    2.759476] ipq806x-gmac-dwmac 37200000.ethernet: TX Checksum insertion supported
[    2.765375] ipq806x-gmac-dwmac 37200000.ethernet: Wake-Up On Lan supported
[    2.772750] ipq806x-gmac-dwmac 37200000.ethernet: Enhanced/Alternate descriptors
[    2.779444] ipq806x-gmac-dwmac 37200000.ethernet: Enabled extended descriptors
[    2.787087] ipq806x-gmac-dwmac 37200000.ethernet: Ring mode enabled
[    2.794117] ipq806x-gmac-dwmac 37200000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    2.800204] ipq806x-gmac-dwmac 37200000.ethernet: device MAC address 96:b5:3a:b7:12:bc
[    2.809857] ipq806x-gmac-dwmac 37400000.ethernet: IRQ eth_wake_irq not found
[    2.816704] ipq806x-gmac-dwmac 37400000.ethernet: IRQ eth_lpi not found
[    2.824127] ipq806x-gmac-dwmac 37400000.ethernet: PTP uses main clock
[    2.830506] ipq806x-gmac-dwmac 37400000.ethernet: User ID: 0x10, Synopsys ID: 0x37
[    2.836740] ipq806x-gmac-dwmac 37400000.ethernet:    DWMAC1000
[    2.844294] ipq806x-gmac-dwmac 37400000.ethernet: DMA HW capability register supported
[    2.850017] ipq806x-gmac-dwmac 37400000.ethernet: RX Checksum Offload Engine supported
[    2.857814] ipq806x-gmac-dwmac 37400000.ethernet: COE Type 2
[    2.865717] ipq806x-gmac-dwmac 37400000.ethernet: TX Checksum insertion supported
[    2.871528] ipq806x-gmac-dwmac 37400000.ethernet: Wake-Up On Lan supported
[    2.878835] ipq806x-gmac-dwmac 37400000.ethernet: Enhanced/Alternate descriptors
[    2.885680] ipq806x-gmac-dwmac 37400000.ethernet: Enabled extended descriptors
[    2.893240] ipq806x-gmac-dwmac 37400000.ethernet: Ring mode enabled
[    2.900190] ipq806x-gmac-dwmac 37400000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[    2.906428] ipq806x-gmac-dwmac 37400000.ethernet: device MAC address 56:3d:9e:a5:5c:2e
[    2.916140] i2c /dev entries driver
[    2.923160] i2c_qup 124a0000.i2c: using default clock-frequency 100000
[    2.928857] cpuidle: enable-method property 'qcom,kpss-acc-v1' found operations
[    2.933035] cpuidle: enable-method property 'qcom,kpss-acc-v1' found operations
[    2.941541] mmci-pl18x 12400000.sdcc: mmc0: PL180 manf 51 rev0 at 0x12400000 irq 35,0 (pio)
[    2.947360] mmci-pl18x 12400000.sdcc: DMA channels RX dma0chan1, TX dma0chan2
[    2.983089] sdhci: Secure Digital Host Controller Interface driver
[    2.983118] sdhci: Copyright(c) Pierre Ossman
[    2.988158] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.997267] NET: Registered protocol family 10
[    3.000858] Segment Routing with IPv6
[    3.002661] NET: Registered protocol family 17
[    3.007766] 8021q: 802.1Q VLAN Support v1.8
[    3.010893] Registering SWP/SWPB emulation handler
[    3.051995] qcom_rpm 108000.rpm: RPM firmware 3.0.16777364
[    3.073080] s1a: supplied by regulator-dummy
[    3.073234] s1a: Bringing 0uV into 1050000-1050000uV
[    3.076866] s1b: supplied by regulator-dummy
[    3.081565] s1b: Bringing 0uV into 1050000-1050000uV
[    3.086077] s2a: supplied by regulator-dummy
[    3.087906] mmc0: new high speed MMC card at address 0001
[    3.090750] s2a: Bringing 0uV into 800000-800000uV
[    3.095552] mmcblk0: mmc0:0001 P1XXXX 3.60 GiB 
[    3.100694] s2b: supplied by regulator-dummy
[    3.105145] mmcblk0boot0: mmc0:0001 P1XXXX partition 1 2.00 MiB
[    3.109346] s2b: Bringing 0uV into 800000-800000uV
[    3.114070] mmcblk0boot1: mmc0:0001 P1XXXX partition 2 2.00 MiB
[    3.121981] Speed bin: 0
[    3.124446] mmcblk0rpmb: mmc0:0001 P1XXXX partition 3 128 KiB, chardev (248:0)
[    3.130062] PVS bin: 1
[    3.141576]  mmcblk0: p1 p2 p3 p4 <p5>
[    3.148880] VFS: Mounted root (squashfs filesystem) readonly on device 179:3.
[    3.149508] Freeing unused kernel memory: 1024K

Then it is stuck here and cannot start, I also noticed that the switch seems a bit abnormal.
Here is the dts on 4.14 and it can works fine: https://paste.ubuntu.com/p/cdCStSSCqt/
Here is the dts I changed to 5.4: https://paste.ubuntu.com/p/dkGBYwb9nk/
So where did I write it wrong or it maybe a bug?

What is this router you're talking about?

The only supported ipq806x device with eMMC I know would be the ZyXEL Armor Z2/ nbg6817 - but that would be ipq8065 (and mine is working fine with kernel 5.4.32).

That is a router from China, not officially supported.
Specification:

  • CPU: ipq8064
  • Flash size: 32M spi, 4GB emmc
  • RAM size: 2GB
  • Wireless: 2*qca9880
  • Switch: QCA8337

Are you sure it has a 8075 PHY? IPQ8064 does not have PSGMII

I wrote it wrong, it should be qca8337, sorry about my misstatement.

Here is the emmc model: https://www.micron.com/products/managed-nand/emmc/part-catalog/mtfc4gacaaam-1m-wt , I don't know if it will help to analyze this problem.

@Ansuel Have you experienced something like this?
I have hit a similar issue now with trying to bring up a new board.
Actually, for whatever reason init scripts are not running.
I can even mount rootfs but it simply gets stuck after that

Mh no on my r7800 I can correctly boot initramfs image

1 Like

Hm, this is really weird behaviour then.
I can even flash it to NAND it will boot until it mounts rootfs and simply get stuck there, no errors or panics.
I disabled all peripherals except SPI, NAND and UART.
Can you share your bootargs/cmdline?

It completely does not run init

@Ansuel Have you seen this warning before:
OF: fdt: Ignoring memory range 0x40000000 - 0x42000000

My VR2600v does not have that

Anyway to make sure it's not related to cpufreq just remove the nvmem qcom flag

Could be that the range is already defined?

Tried disabling the nvmem driver.
No difference.

Well, I have no idea why would kernel reject that range.
I have left the memory node with <0x0 0x0> so that bootloader will populate it.
The kernel sees 229376K of memory, which makes sense as its ignoring 32MB chunk.
The device has 256MB of RAM so that adds up, but why would it ignore a whole 32MB of RAM?

I have never seen a similar behaviour that kernel simply stops booting after mounting rootfs.

This all smells like a really bad bootloader as even when booting from flash PCI errors about not correct mode are present.
I mean, even bootloader sees 235MiB of RAM only.

U-Boot 2012.07 [Chaos Calmer V1.0.0,unknown] (Mar 10 2020 - 16:31:25)

smem ram ptable found: ver: 0 len: 5
DRAM:  235 MiB
NAND:  SF: Detected W25Q128FW with page size 64 KiB, total 16 MiB
ipq_spi: page_size: 0x100, sector_size: 0x10000, size: 0x1000000
144 MiB
MMC:   
PCI1 Link Intialized
PCI2 Link Intialized
In:    serial
Out:   serial
Err:   serial
MMC Device 0 not found
cdp: get part failed for 0:HLOS
MMC Device 0 not found
cdp: get part failed for rootfs
Net:   MAC2 addr:3c:2c:99:f4:50:62
Full duplex link
Link 2 up, Phy_status = b0a0a
Port:2 speed 1000Mbps
MAC3 addr:3c:2c:99:f4:50:63
Port:3 speed 10Mbps
eth0, eth1
Erasing Nand...
Erasing at 0x170000 -- 100% complete.
Writing to Nand... done
Hit any key to stop autoboot:  0

Ok, so it looks like the vendor used a really rare IPQ806x model, the IPQ8068.
I only now figured it out, I mean even the DTS inside the device presents itself as AP160-v2/IPQ8064.

With IPQ8065 DTSI, it boots but it will throw some segmentation fault error and panics.
So, it was most likely missing the 2 regulator definitions that IPQ8065 DTSI has, and IPQ8064 does not.

This is gonna be painfull, as there is little to no information on this SoC model.

how do you discover the new soc ?

anyway about pcie error i was thinking... as uboot just reset it can't we do the same with the pcie driver (knowing the right interrupt or gpio to reset pci?)

I opened the device up and removed the heatsink and thermal paste.
FCC images fooled me before even having the device as in them in the SoC close-up photo the last digit is covered up by thermal paste.
So I thought that its gotta be 8064/8065.

I dont know what U-boot does, it does something with the PCI-E in the bootqca command.

@Ansuel Have you even seen OpenWrt crash on kmodloader?

[    3.224950] 8<--- cut here ---
[    3.227079] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    3.230282] pgd = (ptrval)
[    3.238449] [00000000] *pgd=4e3c1835, *pte=00000000, *ppte=00000000
[    3.241084] Internal error: Oops: 17 [#1] SMP ARM
[    3.247132] Modules linked in: phy_qcom_dwc3 ahci fsl_mph_dr_of ehci_platform ehci_fsl sd_mod ahci_platform libahci_platform libahci libata scsi_mod ehci_hcd gpio_button_hotplug
[    3.252012] CPU: 1 PID: 91 Comm: kmodloader Not tainted 5.4.36 #0
[    3.267793] Hardware name: Generic DT based system
[    3.273875] PC is at find_exported_symbol_in_section+0x64/0xd8
[    3.278557] LR is at find_exported_symbol_in_section+0x54/0xd8
[    3.284372] pc : [<c039e6b4>]    lr : [<c039e6a4>]    psr: a0000013
[    3.290190] sp : cc3a3dd0  ip : 00000000  fp : c160e2d8
[    3.296349] r10: 00000000  r9 : 00000000  r8 : cc3a3e30
[    3.301559] r7 : 00000000  r6 : bf0c1080  r5 : cc3a3e9c  r4 : 00000000
[    3.306770] r3 : cc3a3df4  r2 : 00000000  r1 : 00000000  r0 : 00000001
[    3.313369] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    3.319878] Control: 10c5787d  Table: 4e3bc06a  DAC: 00000051
[    3.327082] Process kmodloader (pid: 91, stack limit = 0x(ptrval))
[    3.332813] Stack: (0xcc3a3dd0 to 0xcc3a4000)
[    3.338892] 3dc0:                                     c039e53c c030cd7c c10a61c0 cc3a3e08
[    3.343328] 3de0: bf0c1080 c039e650 cc3a3e9c c039f3b4 c095d25c 00000000 00000000 00000000
[    3.351487] 3e00: 00000000 c10a6100 00000000 00000000 00000000 00000001 00000000 00000000
[    3.359648] 3e20: 00000000 00000000 00000002 c0403800 00000001 cc3a3eb8 bf0cd048 bf0cd054
[    3.367807] 3e40: c039e650 00000001 bf0ce04c bf0ce040 c1604e48 c03a2160 bf0ce040 c165a4f8
[    3.375968] 3e60: cc763ca0 c165a4f4 cc7631c8 00000001 cc3a3f34 bf0ce1bc c096661c c0966768
[    3.384126] 3e80: bf0ce04c c1604e48 00000000 c0966574 c09665cc c1604e48 c1099f80 bf0cdb96
[    3.392286] 3ea0: 00000001 c1095980 0000000a 0000000a 00000000 00000000 bf0cd018 00000005
[    3.400445] 3ec0: 00000000 00000000 6e72656b 00006c65 00000000 00000000 00000000 00000000
[    3.408605] 3ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    3.416767] 3f00: 00000000 e7b77a5a 00000000 0000918c cedc318c 00000000 01baf7fc ffffe000
[    3.424926] 3f20: 000129d5 00000051 00000000 c03a2d28 00000000 cedbf308 cedbfb40 cedba000
[    3.433084] 3f40: 0000918c cedc2b9c cedc2a2c cedc0e4c 00008000 00008220 00002950 000083b1
[    3.441245] 3f60: 00000000 00000000 00000000 00002940 00000023 00000024 0000001d 00000000
[    3.449404] 3f80: 00000017 00000000 00000000 00000000 00000005 00000080 c0301204 cc3a2000
[    3.457565] 3fa0: 00000080 c0301000 00000000 00000000 01ba6670 0000918c 000129d5 00000002
[    3.465724] 3fc0: 00000000 00000000 00000005 00000080 0000918c 00000000 00023650 00000000
[    3.473884] 3fe0: befbccf4 befbccd8 00011e14 b6ef1e00 60000010 01ba6670 00000000 00000000
[    3.482043] [<c039e6b4>] (find_exported_symbol_in_section) from [<c039f3b4>] (each_symbol_section+0x118/0x160)
[    3.490200] [<c039f3b4>] (each_symbol_section) from [<c03a2160>] (load_module+0x188c/0x22ec)
[    3.500093] [<c03a2160>] (load_module) from [<c03a2d28>] (sys_init_module+0x168/0x19c)
[    3.508689] [<c03a2d28>] (sys_init_module) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[    3.516407] Exception stack(0xcc3a3fa8 to 0xcc3a3ff0)
[    3.524657] 3fa0:                   00000000 00000000 01ba6670 0000918c 000129d5 00000002
[    3.529613] 3fc0: 00000000 00000000 00000005 00000080 0000918c 00000000 00023650 00000000
[    3.537767] 3fe0: befbccf4 befbccd8 00011e14 b6ef1e00
[    3.545926] Code: e2503000 01a00003 0a000010 e5d50004 (e5947000) 
[    3.551030] ---[ end trace 600d35f1330fe722 ]---
[    3.557030] Kernel panic - not syncing: Fatal exception
[    3.561728] CPU0: stopping
[    3.566672] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D           5.4.36 #0
[    3.569448] Hardware name: Generic DT based system
[    3.576671] [<c030f954>] (unwind_backtrace) from [<c030b96c>] (show_stack+0x14/0x20)
[    3.581534] [<c030b96c>] (show_stack) from [<c08a3380>] (dump_stack+0x94/0xa8)
[    3.589429] [<c08a3380>] (dump_stack) from [<c030eb80>] (handle_IPI+0x184/0x1b8)
[    3.596463] [<c030eb80>] (handle_IPI) from [<c05b547c>] (gic_handle_irq+0xb4/0xb8)
[    3.604012] [<c05b547c>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[    3.611378] Exception stack(0xc1601ee0 to 0xc1601f28)
[    3.618950] 1ee0: 00000000 00000000 0c852000 cddd2a00 cc19a000 00000000 cddd1df0 00000000
[    3.623989] 1f00: 00000000 00000000 d44b81e0 d42010a0 00000015 c1601f30 c0705458 c070545c
[    3.632130] 1f20: 80000013 ffffffff
[    3.640290] [<c0301a8c>] (__irq_svc) from [<c070545c>] (cpuidle_enter_state+0x94/0x498)
[    3.643593] [<c070545c>] (cpuidle_enter_state) from [<c07058a4>] (cpuidle_enter+0x30/0x4c)
[    3.651579] [<c07058a4>] (cpuidle_enter) from [<c034a5dc>] (do_idle+0x1d8/0x240)
[    3.659910] [<c034a5dc>] (do_idle) from [<c034a8ec>] (cpu_startup_entry+0x1c/0x20)
[    3.667470] [<c034a8ec>] (cpu_startup_entry) from [<c0b00e5c>] (start_kernel+0x4dc/0x4e8)
[    3.674838] Rebooting in 1 seconds..

The strange part is that it doesn't crash on a specific function of a module
Are you sure it doesn't crash randomly?
Could be that the modem write in some reserved space and crash

Yeah, it actuall will crash on different functions.

And now couple of times in a row it wont crash but get stuck during after loading some USB drivers.
Its gotta be some reserved memory area.

But which one as stock DTS does not define any.
OpenWrt defines some in the DTSI, I tried adding the R7800 ones but no change.

It looks like bootloader patches some in, but I cat the reg property returns nothing.
Update: You can pipe it to hexdump -C and see the bytes

root@OpenWrt:/proc/device-tree/reserved-memory# ls
#address-cells       name                 ranges               smem@41000000        wigig_dump@44400000
#size-cells          nss@40000000         rsvd@41200000        wifi_dump@44000000

one way to test this would be to limit the memory to something small (32-64 mb?) and position it at the center of the ram space

I tried placing 64mb at the middle, but kernel still detected the whole RAM.
So, it looks like bootloader will override the nodes with what it wants.

I added all of the reserved memory nodes, but no luck with that also.

Bootloader actually adds a node to map to 0x40000000 while all other devices do 0x42000000.

That is where the message about ignoring comes from.
So, bootloader will always append its nodes to the DTS, this is really bad.