SolidRun ClearFog CN9130 Pro

Hello,
I have recently received a ClearFog CN9130 Pro. I added the board to the cortexa72.mk file

+++ b/target/linux/mvebu/image/cortexa72.mk
@@ -63,6 +63,17 @@ define Device/marvell_clearfog-gt-8k
 endef
 TARGET_DEVICES += marvell_clearfog-gt-8k
 
+define Device/marvell_clearfog-cn9130-pro
+  $(call Device/Default-arm64)
+  DEVICE_VENDOR := SolidRun
+  DEVICE_MODEL := ClearFog
+  DEVICE_VARIANT := CN9130-Pro
+  DEVICE_DTS := cn9130-db
+  SUPPORTED_DEVICES := marvell,clearfog-cn9130-pro
+endef
+TARGET_DEVICES += marvell_clearfog-cn9130-pro

I am seeing stalls rcu stalls on the boot up (below). The last time I have seen this it was because I didn't have the right device tree file.
The right file looks to be upstream in kernel.org, but I don't know how to tell the openwrt build system to use the dtsi file, if that is even the right thing to do.
There was mention of the device tree files being broken but I assume there are a lot of configurations for this SoC?
Is there a device tree guru reading?

Marvell>> booti $kernel_addr_r - $fdt_addr_r;
## Flattened Device Tree blob at 06f00000
   Booting using the fdt blob at 0x6f00000
   Loading Device Tree to 000000007f5d1000, end 000000007f5d9c50 ... OK

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] Linux version 5.10.131 (mrbojangles@builder) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r20153-22ffbbf04a) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Thu Jul 28 23:40:24 2022
[    0.000000] Machine model: Marvell Armada CN9130-DB
[    0.000000] earlycon: uart8250 at MMIO32 0x00000000f0512000 (options '')
[    0.000000] printk: bootconsole [uart8250] enabled
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000]   node   0: [mem 0x0000000004000000-0x00000000041fffff]
[    0.000000]   node   0: [mem 0x0000000004200000-0x00000000bfffffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000013fffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.2
[    0.000000] percpu: Embedded 16 pages/cpu s27736 r8192 d29608 u65536
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: detected: Spectre-BHB
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
[    0.000000] Kernel command line: root=PARTUUID=8161ad66-02 rw rootwait console=ttyS0,115200 earlycon=uart8250,mmio32,0xf0512000
[    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0x00000000bc000000-0x00000000c0000000] (64MB)
[    0.000000] Memory: 4042508K/4194304K available (7934K kernel code, 894K rwdata, 2044K rodata, 448K init, 270K bss, 151796K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32)
[    0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32)
[    0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32)
[    0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32)
[    0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287]
[    0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns
[    0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns
[    0.008213] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.00 BogoMIPS (lpj=250000)
[    0.018711] pid_max: default: 32768 minimum: 301
[    0.023451] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.030995] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.039461] rcu: Hierarchical SRCU implementation.
[    0.044339] dyndbg: Ignore empty _ddebug table in a CONFIG_DYNAMIC_DEBUG_CORE build
[    0.052272] smp: Bringing up secondary CPUs ...
[    0.057263] Detected PIPT I-cache on CPU1
[    0.057291] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
[    0.057686] Detected PIPT I-cache on CPU2
[    0.057707] CPU2: Booted secondary processor 0x0000000100 [0x410fd083]
[    0.058107] Detected PIPT I-cache on CPU3
[    0.058122] CPU3: Booted secondary processor 0x0000000101 [0x410fd083]
[    0.058153] smp: Brought up 1 node, 4 CPUs
[    0.094386] SMP: Total of 4 processors activated.
[    0.099177] CPU features: detected: 32-bit EL0 Support
[    0.104379] CPU features: detected: CRC32 instructions
[    0.109637] CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
[    0.118120] CPU: All CPU(s) started at EL2
[    0.122325] alternatives: patching kernel code
[    0.128399] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.138407] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.145444] pinctrl core: initialized pinctrl subsystem
[    0.151067] NET: Registered protocol family 16
[    0.155809] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[    0.163017] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.170955] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.179094] thermal_sys: Registered thermal governor 'step_wise'
[    0.179429] cpuidle: using governor ladder
[    0.189745] ASID allocator initialised with 65536 entries
[    0.202391] cryptd: max_cpu_qlen set to 1000
[    0.207629] SCSI subsystem initialized
[    0.211581] usbcore: registered new interface driver usbfs
[    0.217143] usbcore: registered new interface driver hub
[    0.222554] usbcore: registered new device driver usb
[    0.228133] clocksource: Switched to clocksource arch_sys_counter
[    0.234554] NET: Registered protocol family 2
[    0.239186] IP idents hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.247522] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    0.256233] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.264383] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    0.272012] TCP: Hash tables configured (established 32768 bind 32768)
[    0.278724] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.285542] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.292910] NET: Registered protocol family 1
[    0.297352] PCI: CLS 0 bytes, default 64
[    0.302696] workingset: timestamp_bits=46 max_order=20 bucket_order=0
[    0.310455] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.316388] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.327622] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
[    0.336670] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver
[    0.347082] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver
[    0.353409] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver
[    0.359781] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver
[    0.366134] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver
[    0.372530] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver
[    0.378901] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver
[    0.385081] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
[    0.391793] printk: console [ttyS0] disabled
[    0.416254] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 16, base_baud = 12500000) is a 16550A
[    0.425369] printk: console [ttyS0] enabled
[    0.425369] printk: console [ttyS0] enabled
[    0.433799] printk: bootconsole [uart8250] disabled
[    0.433799] printk: bootconsole [uart8250] disabled
[    0.444007] omap_rng f2760000.trng: Random Number Generator ver. 203b34c
[    0.444256] random: crng init done
[    0.455288] loop: module loaded
[    0.458864] ahci f2540000.sata: supply ahci not found, using dummy regulator
[    0.465984] ahci f2540000.sata: supply phy not found, using dummy regulator
[    0.473086] platform f2540000.sata:sata-port@0: supply target not found, using dummy regulator
[    0.481858] platform f2540000.sata:sata-port@1: supply target not found, using dummy regulator
[    0.491397] spi-nor spi2.0: s25fl064k (8192 Kbytes)
[    0.496366] 2 fixed-partitions partitions found on MTD device spi2.0
[    0.502755] Creating 2 MTD partitions on "spi2.0":
[    0.507566] 0x000000000000-0x000000200000 : "U-Boot-0"
[    0.512858] 0x000000200000-0x000001000000 : "Filesystem-0"
[    0.518370] mtd: partition "Filesystem-0" extends beyond the end of device "spi2.0" -- size truncated to 0x600000
[    0.529973] hwmon hwmon0: temp1_input not attached to any thermal zone
[    0.536727] mdio_bus f212a200.mdio-mii: MDIO device at address 1 is missing.
[    0.544077] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    0.559432] mvpp2 f2000000.ethernet eth0: Using random mac address 2e:cd:f0:43:d2:2f
[    0.568726] mvpp2 f2000000.ethernet eth1: Using random mac address c2:9d:86:14:0b:8f
[    0.583923] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.590490] ehci-platform: EHCI generic platform driver
[    0.595801] ehci-orion: EHCI orion driver
[    0.600148] usbcore: registered new interface driver usb-storage
[    0.606331] armada38x-rtc f2284000.rtc: registered as rtc0
[    0.611857] armada38x-rtc f2284000.rtc: setting system clock to 2030-01-22T20:21:24 UTC (1895343684)
[    0.621074] i2c /dev entries driver
[    0.624804] pca953x 0-0021: supply vcc not found, using dummy regulator
[    0.631493] pca953x 0-0021: using no AI
[    0.635468] pca953x 0-0021: failed writing register
[    0.640410] pca953x: probe of 0-0021 failed with error -5
[    0.659465] sdhci: Secure Digital Host Controller Interface driver
[    0.665673] sdhci: Copyright(c) Pierre Ossman
[    0.670091] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.676055] xenon-sdhci f2780000.sdhci: Got CD GPIO
[    0.676217] NET: Registered protocol family 10
[    0.685766] Segment Routing with IPv6
[    0.689474] NET: Registered protocol family 17
[    0.694002] 8021q: 802.1Q VLAN Support v1.8
[    0.700397] armada8k-pcie f2600000.pcie: host bridge /cp0/pcie@f2600000 ranges:
[    0.707748] armada8k-pcie f2600000.pcie:      MEM 0x00c0000000..0x00dfefffff -> 0x00c0000000
[   21.738125] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   21.744072] rcu: 	2-...0: (1 GPs behind) idle=aa2/1/0x4000000000000000 softirq=63/65 fqs=844 
[   21.752632] 	(detected by 3, t=2103 jiffies, g=-1127, q=147)
[   21.758313] Task dump for CPU 2:
[   21.761552] task:kworker/2:8     state:R  running task     stack:    0 pid:  548 ppid:     2 flags:0x0000000a
[   21.771521] Workqueue: events deferred_probe_work_func
[   21.776679] Call trace:
[   21.779136]  __switch_to+0x9c/0xfc
[   21.782551]  deferred_probe_work_func+0x54/0xb4
[   84.788124] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   84.794069] rcu: 	2-...0: (1 GPs behind) idle=aa2/1/0x4000000000000000 softirq=63/65 fqs=3378 
[   84.802715] 	(detected by 3, t=8408 jiffies, g=-1127, q=147)
[   84.808395] Task dump for CPU 2:
[   84.811635] task:kworker/2:8     state:R  running task     stack:    0 pid:  548 ppid:     2 flags:0x0000000a
[   84.821595] Workqueue: events deferred_probe_work_func
[   84.826753] Call trace:
[   84.829209]  __switch_to+0x9c/0xfc
[   84.832622]  deferred_probe_work_func+0x54/0xb4

Its simple, copy the correct DTS into target/linux/mvebu/files/arch/arm64/boot/dts/marvell and update the DEVICE_DTS

@robimarko, took your advice. I ended up having to copy a few device tree files from upstream. I am still getting my RCU hang (below). So I think either I have the "B" version of the board / SoC and I am trying the "A" version device tree files or more work is needed. I have a VERY rough idea of device tree and I have a post on the manufacturers forum asking if the upstream device tree is all good. so hopefully they can shed some light.

Marvell>> ext4ls mmc 1;ext4load mmc 1 $kernel_addr_r Image;
<DIR>       4096 .
<DIR>       4096 ..
<DIR>       4096 lost+found
        11663368 Image
             568 boot.scr
           23686 cn9130-crb-A.dtb
11663368 bytes read in 1031 ms (10.8 MiB/s)
Marvell>> ext4load mmc 1 $fdt_addr_r cn9130-crb-A.dtb;
23686 bytes read in 15 ms (1.5 MiB/s)
Marvell>> booti $kernel_addr_r - $fdt_addr_r;
## Flattened Device Tree blob at 06f00000
   Booting using the fdt blob at 0x6f00000
   Loading Device Tree to 000000007f5d1000, end 000000007f5d9c85 ... OK

Starting kernel ...

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] Linux version 5.10.131 (mrbojangles@build) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r20153-22ffbbf04a) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Jul 29 01:35:48 2022
[    0.000000] Machine model: Marvell Armada CN9130-CRB-A
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]
[    0.000000]   node   0: [mem 0x0000000004000000-0x00000000041fffff]
[    0.000000]   node   0: [mem 0x0000000004200000-0x00000000bfffffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000013fffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.2
[    0.000000] percpu: Embedded 16 pages/cpu s27736 r8192 d29608 u65536
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: Spectre-v2
[    0.000000] CPU features: detected: Spectre-BHB
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
[    0.000000] Kernel command line: 
[    0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] software IO TLB: mapped [mem 0x00000000bc000000-0x00000000c0000000] (64MB)
[    0.000000] Memory: 4042508K/4194304K available (7934K kernel code, 894K rwdata, 2044K rodata, 448K init, 270K bss, 151796K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000
[    0.000000] GIC: Using split EOI/Deactivate mode
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32)
[    0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32)
[    0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32)
[    0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255]
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32)
[    0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287]
[    0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns
[    0.000001] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns
[    0.000066] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.00 BogoMIPS (lpj=250000)
[    0.000071] pid_max: default: 32768 minimum: 301
[    0.000127] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.000144] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.000660] rcu: Hierarchical SRCU implementation.
[    0.000712] dyndbg: Ignore empty _ddebug table in a CONFIG_DYNAMIC_DEBUG_CORE build
[    0.000847] smp: Bringing up secondary CPUs ...
[    0.001224] Detected PIPT I-cache on CPU1
[    0.001253] CPU1: Booted secondary processor 0x0000000001 [0x410fd083]
[    0.001650] Detected PIPT I-cache on CPU2
[    0.001672] CPU2: Booted secondary processor 0x0000000100 [0x410fd083]
[    0.002071] Detected PIPT I-cache on CPU3
[    0.002086] CPU3: Booted secondary processor 0x0000000101 [0x410fd083]
[    0.002114] smp: Brought up 1 node, 4 CPUs
[    0.002124] SMP: Total of 4 processors activated.
[    0.002127] CPU features: detected: 32-bit EL0 Support
[    0.002130] CPU features: detected: CRC32 instructions
[    0.002155] CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching
[    0.002159] CPU: All CPU(s) started at EL2
[    0.002169] alternatives: patching kernel code
[    0.003694] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.003702] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.003746] pinctrl core: initialized pinctrl subsystem
[    0.004035] NET: Registered protocol family 16
[    0.004397] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[    0.004476] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.004547] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.004681] thermal_sys: Registered thermal governor 'step_wise'
[    0.004978] cpuidle: using governor ladder
[    0.004997] ASID allocator initialised with 65536 entries
[    0.012121] cryptd: max_cpu_qlen set to 1000
[    0.013164] SCSI subsystem initialized
[    0.013297] usbcore: registered new interface driver usbfs
[    0.013309] usbcore: registered new interface driver hub
[    0.013323] usbcore: registered new device driver usb
[    0.013378] usb_phy_generic cp0_usb3_phy0: supply vcc not found, using dummy regulator
[    0.013426] usb_phy_generic cp0_usb3_phy0: dummy supplies not allowed for exclusive requests
[    0.014055] clocksource: Switched to clocksource arch_sys_counter
[    0.014291] NET: Registered protocol family 2
[    0.014458] IP idents hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.015304] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    0.015331] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.015435] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    0.015692] TCP: Hash tables configured (established 32768 bind 32768)
[    0.015752] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.015794] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.015888] NET: Registered protocol family 1
[    0.015902] PCI: CLS 0 bytes, default 64
[    0.017363] workingset: timestamp_bits=46 max_order=20 bucket_order=0
[    0.018608] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.018612] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME) (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
[    0.019946] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver
[    0.020268] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver
[    0.021965] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver
[    0.022235] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver
[    0.022503] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver
[    0.022768] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver
[    0.023068] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver
[    0.023336] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver
[    0.023436] Serial: 8250/16550 driver, 2 ports, IRQ sharing disabled
[    0.023680] printk: console [ttyS0] disabled
[    0.043815] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 16, base_baud = 12500000) is a 16550A
[    0.729345] printk: console [ttyS0] enabled
[    0.733948] omap_rng f2760000.trng: Random Number Generator ver. 203b34c
[    0.734271] random: crng init done
[    0.745241] loop: module loaded
[    0.749469] spi-nor spi2.0: s25fl064k (8192 Kbytes)
[    0.754444] 2 fixed-partitions partitions found on MTD device spi2.0
[    0.760824] Creating 2 MTD partitions on "spi2.0":
[    0.765638] 0x000000000000-0x000000200000 : "U-Boot"
[    0.770757] 0x000000200000-0x000001000000 : "Filesystem"
[    0.776095] mtd: partition "Filesystem" extends beyond the end of device "spi2.0" -- size truncated to 0x600000
[    0.787574] hwmon hwmon0: temp1_input not attached to any thermal zone
[    0.826113] mv88e6085: probe of f212a200.mdio-mii:06 failed with error -110
[    0.843505] mvpp2 f2000000.ethernet: using 8 per-cpu buffers
[    0.854595] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.861148] ehci-platform: EHCI generic platform driver
[    0.866452] ehci-orion: EHCI orion driver
[    0.870647] xhci-hcd f2500000.usb3: xHCI Host Controller
[    0.875991] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1
[    0.883548] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x0000000000010010
[    0.892831] xhci-hcd f2500000.usb3: irq 34, io mem 0xf2500000
[    0.898811] hub 1-0:1.0: USB hub found
[    0.902582] hub 1-0:1.0: 1 port detected
[    0.906598] xhci-hcd f2500000.usb3: xHCI Host Controller
[    0.911933] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2
[    0.919451] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed
[    0.925850] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    0.934094] hub 2-0:1.0: USB hub found
[    0.937866] hub 2-0:1.0: 1 port detected
[    0.942052] usbcore: registered new interface driver usb-storage
[    0.948236] armada38x-rtc f2284000.rtc: registered as rtc0
[    0.953753] armada38x-rtc f2284000.rtc: setting system clock to 2030-01-23T20:22:06 UTC (1895430126)
[    0.962968] i2c /dev entries driver
[    0.980417] sdhci: Secure Digital Host Controller Interface driver
[    0.986630] sdhci: Copyright(c) Pierre Ossman
[    0.991042] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.996927] xenon-sdhci f2780000.sdhci: Got CD GPIO
[    0.997194] NET: Registered protocol family 10
[    1.006560] Segment Routing with IPv6
[    1.010257] NET: Registered protocol family 17
[    1.014779] 8021q: 802.1Q VLAN Support v1.8
[    1.021114] armada8k-pcie f2600000.pcie: host bridge /cp0/pcie@f2600000 ranges:
[    1.028477] armada8k-pcie f2600000.pcie:      MEM 0x00c0000000..0x00dfefffff -> 0x00c0000000
[   22.074044] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   22.079990] rcu: 	2-...0: (1 GPs behind) idle=2be/1/0x4000000000000000 softirq=39/40 fqs=980 
[   22.088550] 	(detected by 3, t=2103 jiffies, g=-1135, q=223)
[   22.094231] Task dump for CPU 2:
[   22.097471] task:kworker/2:4     state:R  running task     stack:    0 pid:  710 ppid:     2 flags:0x0000000a
[   22.107439] Workqueue: events deferred_probe_work_func
[   22.112598] Call trace:
[   22.115055]  __switch_to+0x9c/0xfc
[   22.118470]  deferred_probe_work_func+0x54/0xb4
[   85.124043] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[   85.129988] rcu: 	2-...0: (1 GPs behind) idle=2be/1/0x4000000000000000 softirq=39/40 fqs=3916 
[   85.138634] 	(detected by 3, t=8408 jiffies, g=-1135, q=223)
[   85.144315] Task dump for CPU 2:
[   85.147554] task:kworker/2:4     state:R  running task     stack:    0 pid:  710 ppid:     2 flags:0x0000000a
[   85.157514] Workqueue: events deferred_probe_work_func
[   85.162672] Call trace:
[   85.165127]  __switch_to+0x9c/0xfc
[   85.168541]  deferred_probe_work_func+0x54/0xb4
[  148.174043] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  148.179988] rcu: 	2-...0: (1 GPs behind) idle=2be/1/0x4000000000000000 softirq=39/40 fqs=6831 
[  148.188634] 	(detected by 3, t=14713 jiffies, g=-1135, q=223)
[  148.194403] Task dump for CPU 2:
[  148.197642] task:kworker/2:4     state:R  running task     stack:    0 pid:  710 ppid:     2 flags:0x0000000a
[  148.207601] Workqueue: events deferred_probe_work_func
[  148.212759] Call trace:
[  148.215215]  __switch_to+0x9c/0xfc
[  148.218629]  deferred_probe_work_func+0x54/0xb4

5.10 is getting ancient currently, so its quite possible that SoC and CP DTS-es are out of sync as well as any other number of drivers.

So, you should try it with a much newer kernel

possible you have to check with more recent kernels, if it has been backported.
cn9130 and armada8040 share quite a few drivers right now. And gone through some rounds of breaking here and there. It is a very new board, so 5.10 may be a bit old. After trying a bit, or still, worth reporting to driver developers. I have a switch exposed as a port (eth2) for lan@ due to cn91x changes to drivers, but works. May be a custom adopted from freebsd, best fix one thing and break something else. While working on best way to exploit device trees and comphy driver. May be that all you need is a newer DTS. Looked at U-boot? A question, just so that it's asked. Don't have an oddly behaving card attached to a port that isn't pcie, but the kernel think is pcie. It's made so the ports can be changed to SATA or USB3 or PCIE with DTB files, so no keying like M.2 helping you.

I tried to pull the DTS files from upstream, still no luck. I am still getting the hang of the openwrt build system so grabbing a newer kernel isn't familiar yet.
The vendor posts ubuntu images, which I have yet to try. but I did look at their patch files which actually are just updated! so I might give it another shot here soon.

I just saw this for the first time on clearfog gt 8k and a crash, tried to strip things to get it more stable.
Partial from console.

type or paste co[86758.943289] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[86758.949268] rcu:     1-...0: (7 ticks this GP) idle=1e2/1/0x4000000000000000 softirq=668658/66 
[86758.958887]  (detected by 0, t=2102 jiffies, g=1212889, q=50)
[86758.964664] Task dump for CPU 1:
[86758.967912] task:kworker/1:2     state:R  running task     stack:    0 pid:23459 ppid:     2 fa
[86758.977901] Workqueue: events 0xffffffc0105a74d0
[86758.982549] Call trace:
[86758.985011]  0xffffffc010007da0
[86758.988172]  0xffffff813ffa2f00de here

Haven't really cought this, and have been leaving console attached to catch it.
So turns out that it may be the same issue, hip hooray for pin cables being able to go past corners on USB3 port. Relevant for both armada8040 and CN9130 pro now.

compex wave 5 card in what's marked SRD2, not checking what it's called in u-boot or DTS, but likely CON1, with CON0 as SFP. Back side is CON2. Never been very stable pcie drivers or perhaps its electromechanics?

@mrbojangles3 have you reported yet?

solid-run have, or is advertising openwrt function and support.
and if it is timing problems at chip level that can be fixed with drivers.
it should be eventually end up where it can be fixed.

I have sata disk attached, that works currently, but with older kernel doesn't.
and a pcie wifi card that is barely used or receive load, which shouldn't matter if its timing.

For my intermittently crashing board, trying to see how it goes with stability after several crashes today and not all reporting on console. To see if min and max freq ebtween 666Mhz and 1Ghz does something. It does smell like a Marvell issue, where mileage of vendors may vary. Have seen node sync errors on x86 before, catching it red handed among lots of identical servers ended with a tighetning fo dram timings, but i may be offering a red herring.

@mrbojangles3 got any other DIMM's or is it one of thise where they can be changed? As that may send it of to those people quicker.

Tight, as in more within digital spec and not lose at edges -- in which case low or high freq scaling can add jitter. And you can run at 2,2Ghz? If same general, underlying problem got carried over. Larger manufacturers can have a 1000 units just spin for a few months to ensure its tight enough.

A crash once per day could be, and perhaps more likely hardware as i see it, so worth testing for me.
echo "666666" > /sys/devices/system/cpu/cpufreq/policy[0,2]/scaling_min_freq

For the same reasons, "industrial" variants may just have better tolerance on digital clock - annoying.
Where you may as well stick an "industrial" board at highest clock it offer, lower than the regular max clock, expected to be under load and responsive at all times. Which is a problem also for drivers, as better compilers creating smaller code can lead to more efficient execution in the pipeline and therefor lead to crash. This now edited post might be intirely irrelevant, but tolerance not being tight, all of a sudden a bug is efficient execution that draws more power and causes it to trip. And then 'mv_ddr_binaries' need tolerances tighetened for pcie and dram.

Decided to go all the way with stability testing:
in rc.local.

echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy2/scaling_min_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy2/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq

Hoping the crashbox redeems itself eventually.

Edit: Nope, hard crash and lockup. More stable without radio/pcie card running/enabled.
That "enterprise" or "industrial" are tighter otherwise is still annoying questions to have.
Probably long overdue they hold a electro mechanical, firmware and pcie driver workshop, as quirkyness is seen too often. And this certainly hasn't been helpful, as can only really test if it is DIMM's not as good as before that is the reason.

Please fix these platforms with workshops outsidde of major industries and specific use cases of major actors?

@bldur,
When I had rcu_sched stalls that was related to the device tree file for me, and it meant my system didn't boot at all. I haven't changed the device tree file, but some people have had success on the solidrun forum.

I don't really have anything to report for my use case. I am pretty sure that the GT8k is End of Life so I don't expect a lot of conversation about it. I love mine and I am looking for an enclosure for it (know anyone selling?) but I also have the CN9130 Pro, I am looking for the magic DTS files for that as it doesn't boot OpenWRT, I haven't tried their ubuntu image, yet.

I have a corsair vengeance 4GiB SO-DIMM, haven't had problems but he hasn't entered heavy service yet (need that enclosure).
I also have a Lite-ON MSTATA drive in whats labeled CON3 on the board, it didn't work / show up in CON4 (just to the right of CON3 on the top of the board).

Hope this helps

1 Like

pci was quirky for me on A388 as well (stable if pcie show up in boot), not had as much success.
thanks for the info about DIMM, mine was included and may not be top shelf.
the driver/firmware ecosystem along with lucky happy go getter DIY hacks.
seem kernel developers often struggle with the rapid changes and conventions with half tied to u-boot.
left with a feeling that marvell don't follow code mainteance and support disapears as the rug is pulled for old conventions, and doesn't help with one driver to rule them all.

my board and DIMM is largely stable without pcie, at least more than a day between hard lockups.

in either case, DTS issues or not, it says something about how things then fail.
not reporting an error is close to a bug for crashing instead, and general neglect of the subsystems.

hope marvell come to their senses and hire people where open source can be invited, but being no group support to pick up the phone and ask platform people for answers for their test setups.

I'm seeing this as well.

I have pulled out the dts file from the image provided by Solid Run. I decompiled it via dtc, and happily thing seem to be working. I suspect that a decompiled dts file is probably not suitable for upstreaming. I have asked SolidRun for the actual source file. There is a small chance it isn't theirs to give so we shall see.

I went through solid runs github page to find the dts file they are using. I am using the testing kernel inside of openwrt 5.15.79. The board boots and seems to work. Because of how I want to use the board I am looking to use the SFP+ port. I can assign an IP to the port and pass traffic, but I get some intermittent messages about the interface going up and down. I don't see these messages on the other side of the link so I think it is only on the CN9130-pro side of the interface.
I am looking for help debugging where the message could be coming from and or how to turn up the verbosity of the messages. I have already tried pressing 4 when the system boots to increase the logging. but maybe I missed the window?

[411973.067579] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[411978.662765] mvpp2 f2000000.ethernet eth0: Link is Down
[411978.668041] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[413262.817490] mvpp2 f2000000.ethernet eth0: Link is Down
[413262.822777] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[415240.748679] mvpp2 f2000000.ethernet eth0: Link is Down
[415240.753967] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[415463.869028] mvpp2 f2000000.ethernet eth0: Link is Down
[415463.874316] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[415695.312440] mvpp2 f2000000.ethernet eth0: Link is Down
[415695.317722] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx
[416622.397914] mvpp2 f2000000.ethernet eth0: Link is Down
[416622.403201] mvpp2 f2000000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx

root@OpenWrt:/# uname -a
Linux OpenWrt 5.15.79 #0 SMP Tue Nov 22 21:33:53 2022 aarch64 GNU/Linux

If I leave a ping running it will drop packets I assume associated with the up/down of the interface.
I am using the same optics module on both ends an 1/10Gbit SFP+ from fs.com

Hi @mrbojangles3 ,
I too, like you, have been arguing with the Clearfog pro CN9130 for some time.
Unfortunately I'm not a programmer and compiling a Kernel is a mountain that I haven't been able to overcome yet.
I wanted to ask if you could help me?
Thank you

Hi,
I have the code I use to build openwrt for the CN9130 Pro on my github page here so if you are comfortable doing a git clone and the building openwrt you should be at least where I'm at :slight_smile: I did submit a PR to the kernel mailing lists to have the dts file merged upstream. Hopefully that goes okay.

1 Like

I got rid of rcu stalls on my Clearfog GT 8K by locking the CPU frequency to max. Seems like the correct frequency scaling driver is not built with the current kernel config. Haven't tried to build the correct driver since I'm fine with running the CPU at max frequency. Don't know if the same applies to the CN9130.

I get the stalls even if the scaling driver seems to be correct. How did you lock the frequency?

I have this in my /etc/rc.local:

echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy2/scaling_governor

Not had any stalls after doing this, uptime of 229 days now.

1 Like