Netgear R7800 exploration (IPQ8065, QCA9984)

@Ansuel

Haven't yet debugged, but I get this patch error

Applying /Openwrt/r7800/target/linux/ipq806x/patches-5.15/106-02-ARM-dts-qcom-ipq8064-add-ipq8062-variant.patch using plaintext: 
patching file arch/arm/boot/dts/qcom-ipq8062-smb208.dtsi
The next patch would create the file arch/arm/boot/dts/qcom-ipq8062.dtsi,
which already exists!  Applying it anyway.
patching file arch/arm/boot/dts/qcom-ipq8062.dtsi
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file arch/arm/boot/dts/qcom-ipq8062.dtsi.rej
1 Like

fixed! forgot to remove a file!

1 Like

A small fix that I can apply by hand? (instead of cleaning and reapplying the whole series?)

By a quick look compared to the previous one, I just remove target/linux/ipq806x/files-5.15/arch/arm/boot/dts/qcom-ipq8062.dtsi ????

@hnyman wait a second I notice i still have to update the dts... sorry

@hnyman ok sorry i refreshed all and now they all compile/build correctly. (think you have to revert the changes as there are many dts changes)

1 Like

I still got this:

Applying /Openwrt/r7800/target/linux/ipq806x/patches-5.15/117-v6.0-02-clk-qcom-clk-krait-unlock-spin-after-mux-completion.patch using plaintext: 
patching file drivers/clk/qcom/clk-krait.c
Hunk #1 FAILED at 32.
1 out of 1 hunk FAILED -- saving rejects to file drivers/clk/qcom/clk-krait.c.rej
Patch failed!  Please fix /Openwrt/r7800/target/linux/ipq806x/patches-5.15/117-v6.0-02-clk-qcom-clk-krait-unlock-spin-after-mux-completion.patch!
make[3]: *** [Makefile:37: /Openwrt/r7800/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/linux-5.15.67/.prepared_3ebe8393afaba762f9f4ce98daf61d9d] Error 1
$ cat build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/linux-ipq806x_generic/linux-5.15.67/drivers/clk/qcom/clk-krait.c.rej
--- drivers/clk/qcom/clk-krait.c
+++ drivers/clk/qcom/clk-krait.c
@@ -32,11 +32,16 @@ static void __krait_mux_set_sel(struct k
 		regval |= (sel & mux->mask) << (mux->shift + LPL_SHIFT);
 	}
 	krait_set_l2_indirect_reg(mux->offset, regval);
-	spin_unlock_irqrestore(&krait_clock_reg_lock, flags);
 
 	/* Wait for switch to complete. */
 	mb();
 	udelay(1);
+
+	/*
+	 * Unlock now to make sure the mux register is not
+	 * modified while switching to the new parent.
+	 */
+	spin_unlock_irqrestore(&krait_clock_reg_lock, flags);
 }
 
 static int krait_mux_set_parent(struct clk_hw *hw, u8 index)

guess that patch got backported... i have an old kernel on my repo

You patch is apparently upstreamed in the newest linux.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/clk/qcom/clk-krait.c?h=v5.15.67&id=9ed2640eb88374a234949198651775c3f2d26917

So it looks like :frowning:

and also the errata patch

Thanks, now the patches applied.
I will make build

2 Likes

I should really clean my buildroot... too much junk and changes at once...

1 Like

My test build at least boots ok :wink:

 OpenWrt SNAPSHOT, r20629-01e2184c49
 -----------------------------------------------------
root@router1:~# uname -a
Linux router1 5.15.67 #0 SMP Tue Sep 13 18:33:34 2022 armv7l GNU/Linux

But apparently cpufreq is missing?
nothing in /sys/devices/system/cpu/cpufreq/ ?
(and nothing in /sys/class/devfreq/ either)

Looking at the kernel boot log, some new error messages possibly related to this:

[    0.019975] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.021434] thermal_sys: Registered thermal governor 'step_wise'
[    0.023421] cpuidle: using governor ladder
[    0.023501] cpuidle: using governor menu
[    0.041572] qcom_rpm 108000.rpm: RPM firmware 3.0.16777364
[    0.072796] qcom_rpm_reg 108000.rpm:regulators: invalid frequency 500000
[    0.072825] qcom_rpm_reg 108000.rpm:regulators: driver callback failed to parse DT for regulator s1a
[    0.073019] s1a: supplied by regulator-dummy
[    0.073186] qcom_rpm_reg 108000.rpm:regulators: invalid frequency 500000
[    0.073207] qcom_rpm_reg 108000.rpm:regulators: driver callback failed to parse DT for regulator s1b
[    0.073380] s1b: supplied by regulator-dummy
[    0.073521] qcom_rpm_reg 108000.rpm:regulators: invalid frequency 500000
[    0.073540] qcom_rpm_reg 108000.rpm:regulators: driver callback failed to parse DT for regulator s2a
[    0.073711] s2a: supplied by regulator-dummy
[    0.073853] qcom_rpm_reg 108000.rpm:regulators: invalid frequency 500000
[    0.073873] qcom_rpm_reg 108000.rpm:regulators: driver callback failed to parse DT for regulator s2b
[    0.074050] s2b: supplied by regulator-dummy
[    0.074601] usbcore: registered new interface driver usbfs
[    0.074675] usbcore: registered new interface driver hub
[    0.074739] usbcore: registered new device driver usb

Also something about bad cell count:

[    1.657034] nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
[    1.661138] 8 fixed-partitions partitions found on MTD device qcom_nand.0
[    1.668535] OF: Bad cell count for /soc/nand-controller@1ac00000/nand@0/partitions
[    1.675306] OF: Bad cell count for /soc/nand-controller@1ac00000/nand@0/partitions
[    1.683503] Creating 8 MTD partitions on "qcom_nand.0":
[    1.690382] 0x000000000000-0x000000c80000 : "qcadata"

Full bootlogs at https://gist.github.com/hnyman/27a1e4f106baa9b52c0381f4983c0b6f

Also the CPU speed / network throughput was limited to some extent:

My normal SQM limits are 190/55 Mbit and 5.10 does that speek ok.
(I have a 200 Mbit connection from ISP.)

But 5.15 seems to limit that to 110/55.
I take that as an indication that the CPU might be locked to a lowish speed?

Mhhh I have 2 patch but I have some fear they will cause instability but lets test... (my testing r7800 is not good anymore as a lightning broke 3 lan port so I can't really trust him... the other I have is for develop so can't use that for stability testing)

if you want to test this new version, it should not have cpufreq working.

I will take the new commit and will test.
Did you actually mean "should now have cpufreq" ?

yes sorry an extra not slipped in

That commit causes a reboot loop.
Full log in https://gist.github.com/hnyman/af00a342293ee583142bee4174d60b91

[    2.959358] Registering SWP/SWPB emulation handler
[    3.008274] thermal thermal_zone0: failed to read out thermal zone (-110)
[    3.019620] 8<--- cut here ---
[    3.019655] Unable to handle kernel paging request at virtual address e16d4204
[    3.020383] 8<--- cut here ---
[    3.021576] pgd = (ptrval)
[    3.028773] Unable to handle kernel paging request at virtual address e16d4204
[    3.031810] [e16d4204] *pgd=00000000
[    3.034508] pgd = (ptrval)
[    3.034511]
[    3.041707] [e16d4204] *pgd=00000000
[    3.045444] Internal error: Oops: 5 [#1] SMP ARM
[    3.047954]
[    3.057768] Modules linked in:
[    3.059247] CPU: 1 PID: 88 Comm: kworker/1:6 Not tainted 5.15.67 #0
[    3.062114] Hardware name: Generic DT based system
[    3.068279] Workqueue: events dbs_work_handler
[    3.073137] PC is at __mutex_lock.constprop.0+0x8c/0x5b0
[    3.077564] LR is at 0xc1da0600
[    3.083029] pc : [<c0a0940c>]    lr : [<c1da0600>]    psr: a0000013
[    3.085898] sp : c1d89e28  ip : c08ae408  fp : 00000000
[    3.092147] r10: c0d903f0  r9 : c1715408  r8 : c1d88000
[    3.097354] r7 : 00000002  r6 : 0005dc00  r5 : ffffe000  r4 : c1715408
[    3.102565] r3 : e16d41f8  r2 : e58d8010  r1 : 00000002  r0 : c1715408
[    3.109164] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    3.115676] Control: 10c5787d  Table: 4220406a  DAC: 00000051
[    3.122876] Register r0 information: slab kmalloc-1k start c1715400 pointer offset 8 size 1024
[    3.128611] Register r1 information: non-paged memory
[    3.137111] Register r2 information: non-paged memory
[    3.142233] Register r3 information: non-paged memory
[    3.147269] Register r4 information: slab kmalloc-1k start c1715400 pointer offset 8 size 1024
[    3.152308] Register r5 information: non-paged memory
[    3.160809] Register r6 information: non-paged memory
[    3.165931] Register r7 information: non-paged memory
[    3.170966] Register r8 information: non-slab/vmalloc memory
[    3.176001] Register r9 information: slab kmalloc-1k start c1715400 pointer offset 8 size 1024
[    3.181735] Register r10 information: non-slab/vmalloc memory
[    3.190152] Register r11 information: NULL pointer
[    3.195965] Register r12 information: non-slab/vmalloc memory
[    3.200654] Process kworker/1:6 (pid: 88, stack limit = 0x(ptrval))
[    3.206473] Stack: (0xc1d89e28 to 0xc1d8a000)
[    3.212552] 9e20:                   00000000 c06bf0dc 00000000 c161d600 00000000 c1d3f580
[    3.217071] 9e40: c0d928a4 0005dc00 c1d89ec0 c1715400 c1715408 c0d903f0 00000000 c0826620
[    3.225230] 9e60: 00000000 00000001 fffffffe 00000001 c1d89ec0 00000000 c0d903f0 c03438c8
[    3.233391] 9e80: c1c12e00 c1d89ec0 00000002 00000001 0005dc00 00000000 000927c0 c07e1124
[    3.241549] 9ea0: c1c12e00 c1d89ec0 00000000 c07e1178 c1c12e00 c0dcffc4 00000000 c07e200c
[    3.249710] 9ec0: c1c12e00 0005dc00 000927c0 00000024 dd99d480 c1c12e00 c1d3d700 c1d3d680
[    3.257871] 9ee0: c1d3d700 c1d34dc0 c1d3d680 dd9a0305 c1cc65c0 c07e5254 c1d3d738 00000000
[    3.266030] 9f00: c1d3d704 c0d906d4 00000000 00000040 dd9a0305 c07e5f30 c1d3d738 c1cc6580
[    3.274188] 9f20: dd99d040 dd9a0300 00000000 c0339fb8 c1d88000 dd99d040 00000008 c1cc6580
[    3.282348] 9f40: c1cc6598 dd99d040 00000008 dd99d058 c0d03d00 dd99d200 c1d88000 c033a2b8
[    3.290510] 9f60: c0d0c36c c0d9aadc c1d85ecc c150a740 c150a800 c033a244 c1cc6580 c1d88000
[    3.298669] 9f80: c1d85ecc c150a820 00000000 c0341f84 c150a740 c0341e2c 00000000 00000000
[    3.306827] 9fa0: 00000000 00000000 00000000 c0300130 00000000 00000000 00000000 00000000
[    3.314988] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    3.323146] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    3.331310] [<c0a0940c>] (__mutex_lock.constprop.0) from [<c0826620>] (cpufreq_passive_notifier_call+0xc8/0x100)
[    3.339468] [<c0826620>] (cpufreq_passive_notifier_call) from [<c03438c8>] (srcu_notifier_call_chain+0x7c/0xf4)
[    3.349712] [<c03438c8>] (srcu_notifier_call_chain) from [<c07e1124>] (cpufreq_notify_transition+0xc4/0xec)
[    3.359520] [<c07e1124>] (cpufreq_notify_transition) from [<c07e1178>] (cpufreq_freq_transition_end+0x2c/0xa4)
[    3.369240] [<c07e1178>] (cpufreq_freq_transition_end) from [<c07e200c>] (__cpufreq_driver_target+0x1b4/0x238)
[    3.379310] [<c07e200c>] (__cpufreq_driver_target) from [<c07e5254>] (od_dbs_update+0xcc/0x1a0)
[    3.389291] [<c07e5254>] (od_dbs_update) from [<c07e5f30>] (dbs_work_handler+0x38/0x74)
[    3.397882] [<c07e5f30>] (dbs_work_handler) from [<c0339fb8>] (process_one_work+0x230/0x4bc)
[    3.405871] [<c0339fb8>] (process_one_work) from [<c033a2b8>] (worker_thread+0x74/0x5d4)
[    3.414550] [<c033a2b8>] (worker_thread) from [<c0341f84>] (kthread+0x158/0x174)
[    3.422622] [<c0341f84>] (kthread) from [<c0300130>] (ret_from_fork+0x14/0x24)
[    3.429996] Exception stack(0xc1d89fb0 to 0xc1d89ff8)
[    3.437027] 9fa0:                                     00000000 00000000 00000000 00000000
[    3.442156] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    3.450316] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    3.458479] Code: e5932014 e3520000 0a00010f e5933004 (e593300c)
[    3.464897] Internal error: Oops: 5 [#2] SMP ARM
[    3.464897] ---[ end trace f3d54433cff978cc ]---

No more crash @hnyman

Thanks @Ansuel ,
your current PR 10703 ending with commit 99d156a seems to work ok as
my new kernel-5.15-test-master-r20646 build.

Download speeds are back at 190/55 with A+ quality (like with 5.10), and CPU frequency scaling seems to be also there.

Bootlogs at https://gist.github.com/hnyman/fac515ad3cb785c1f67274457524b10a

2 Likes

@hnyman btw i discovered something bad for real this time...

I will be short on explanation if anyone is interested.
The cpu freq on this system (krait cpu) works with 2 mux (mux are like switch that decide from where take the clock)

We have on hfpll (source of high freq) one hfpll/2 and each mux have a safe selection to switch while the hfpll is getting reprogrammed.
We have a cascade system the primary mux safe selection is connected to the secondary mux. So when we change frequency from for example 600khz to 800khz, we have to first source the CPU out of the secondary mux and then switch to the primary after the hfpll are reprogrammed for the new frequency.

(on top of this the acpu_aux clock are another mux that can source out of pxo_board and out of pll8_vote)

Now the problem... It seems the secondary mux is never setup correctly and actually enabled this cause 2 problem.
Any change to the secondary mux is ignored as the clock is never enabled (first bug)
image
Second problem, for some reason the kernel never set the frequency in the first place so the secondary mux is in an undefined state.

(you can check all of this by /sys/kernel/debug/clock/clock_summary)

Currently the secondary mux is set as follow.

  • cpu 0 is at 382Mhz sourcing of pll8_vote
  • cpu 1 is at 25mhz sourcing out of pxo_board
  • cache is at 25Mhz sourcing out of pxo_board

Now no idea if this is the main cause of all the instability but for sure this is different than the original firmware and the different clk are not good for the system. And consider that cpu and cache use these clock everytime as they are the safe selection when switching from one freq to another. Obviously setting gov to performance or not having a cpufreq at all never change the mux configuration so it doesn't change anything to have the secondary mux in this state... but when we change them... WHO knows...

And anyway the confusing thing is that if the secondary mux clk were never enabled, then the request was ignored so on idle freq we had the system working on the wrong freq probably.

5 Likes