R7500v2 kernel 4.19 test

anon98444528 · July 17, 2019, 4:10am

cpuidle update: progress of sorts.

I want to see if "arm-idle" does something and there are sysfs entries set up to show some usage stats as described here... only they didn't show up. This is due the fact I took out the dev setup in cpuidle-arm.c (up to now leaving it in causes a boot loop).

I put the dev reg stuff back into cpuidle-arm.c and worked out that the boot loop is caused by trying to call the "qcom,idle-state-spc" via the spm driver (which for reasons I still don't understand) isn't used/probed (perhaps its not supposed to be for ipq806x).

I neutered the qcom,idle-state-spc functionality (comment out the body of qcom_cpu_spc in spm.c and have it return 0, i also changed

use_scm_power_down = false

spm.c qcom_cpuidle_init but I don't know if this is necessary) and the boot loop is gone.

Now

/sys/devices/system/cpu/cpu0/cpuidle/
/sys/devices/system/cpu/cpu1/cpuidle/

shows up. For me cpu[0|1]/cpuidle/state1 corresponds to "spc" as opposed to WFI (I think WFI is "arm-idle") so I

echo 1 > cpu[0|1]/cpuidle/state1/disable

to disable spc for good measure. Now I have something to look at... not sure if it means anything yet. Also I'm not sure if I can compare "none" vs "arm-idle" this way.

FWIW it is also possible to change the govenor via the cpuidle_sysfs_switch boot option as described in the cpuidle sysfs docs linked in this post without all the hacking above.

I'll update my patches in a bit if anyone want to look or try it.

anon98444528 · July 17, 2019, 1:59pm

cpuidle update:

I've got the spm driver working by adding:

        { .compatible = "qcom,ipq8064-saw2-v1.1-cpu",
          .data = &spm_reg_8064_cpu },

in spm.c and editing qcom-ipq8064.dtsi:

		saw0: regulator@2089000 {
                        compatible = "qcom,saw2",
                                     "qcom,ipq8064-saw2-v1.1-cpu",
			             "syscon";
                        reg = <0x02089000 0x1000>, <0x02009000 0x1000>;
                        regulator;
		};

                saw1: regulator@2099000 {
                        compatible = "qcom,saw2",
                                     "qcom,ipq8064-saw2-v1.1-cpu",
		                     "syscon";
			reg = <0x02099000 0x1000>, <0x02009000 0x1000>;
		        regulator;
                };

I tried a similar change under saw_l2. I observed (via pr_err statements) spm_dev_prob starts for this but (gracefully) does not finish so I've left it out.

WARNING: I don't know if this approach will get the spm driver working on ipq8065 (r7800) systems. Since spm.c adjusts "power" to a physical device (the cpu) any prospective tester should consider if they can afford a new router before trying this. The data hard coded in &spm_reg_8064_cpu in spm.c appears to be necessary (router won't boot for me without it). I don't know if this data will work for ipq8065 systems.

I can also get both idle states "arm-idle" and "qcom,idle-state-spc" working as shown by

/sys/devices/system/cpu/cpu[0|1]/cpuidle/state[0|1]/usage

Note there is a cleaner way to disable "qcom,idle-state-spc" than by the spm.c hackery in my prior post. Turns out adding status = "disabled":

                idle-states {                                                   
                        CPU_SPC: spc {                                          
                                compatible = "qcom,idle-state-spc",             
                                                "arm,idle-state";
                                status = "disabled"
                                entry-latency-us = <400>;                       
                                exit-latency-us = <900>;                        
                                min-residency-us = <3000>;                      
                        };

in qcom-ipq8064.dtsi will disable only "qcom,idle-state-spc" and leave "arm,idle-state" functional. The "dev" reg code in cpuidle-arm.c is still necessary to get the additional sysfs info. Remove status = "disabled" to get the spc idle state. I observe after ~40 min up with no load, the spc driver gets used the most.

I have not yet posted patches to my github k419 branch but I plan to do so and will post a link once its ready.

anon50098793 · July 17, 2019, 3:04pm

didn't fry my 8065... usb init froze tho' ( EDIT: see edit 1 ) ( likely my hacky DTS ) ..... see if same happens on other "known semi-well-configured-for-usb" 806x systems...

did get two found operations without error tho'

[    5.816223] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x0090367f]                            
[    5.821936] qcom-tsens 900000.thermal-sensor: tsens init failed                                           
[    5.831637] qcom-tsens: probe of 900000.thermal-sensor rejects match -19   
[    5.887099] Speed bin: 0                                                                                  
[    5.895006] PVS bin: 5                                                                                    
[    5.899124] Registering platform device 'cpufreq-dt.0'. Parent at platform                                
[    5.899596] device: 'cpufreq-dt.0': device_add                                                            
[    5.906550] bus: 'platform': add device cpufreq-dt.0                                                      
[    5.911138] cpuidle: enable-method property 'qcom,kpss-acc-v1' found operations                           
[    5.916280] cpuidle: enable-method property 'qcom,kpss-acc-v1' found operations                           
[

EDIT1: ( maybe usb was a guesstimate it's pre-internal mmc-init.... )

[    1.772518 ] cpu cpu1: Linked as a consumer to regulat
HANGSHERE
[    1.801974 ] mmc0: new high speed MMC card at address 0001

anon98444528 · July 17, 2019, 3:11pm

i neglected to mention that I changed qcom,kpss-acc-v1 to qcom,kpss-acc-v2 in qcom-ipq8064.dtsi for my testing above (just to see...).

i'm changing it back to v1 now and will test as i suspect it does not matter

EDIT 0: i don't see an effect of v1 vs v2 for cpuidle or usb... (i'll need to do a full flash to fully test usb...)

EDIT 1: my github k419 commits contains edits for usb in qcom-ipq8064.dtsi specific to my edits in qcom-ipq8064-r7500v2.dts . This can cause usb issues for r7800 users using these patches - I'll conform to whatever the community decides as a standard.

anon98444528 · July 17, 2019, 8:24pm

EDIT: It looks like a few changes to the qcom-ipq8064.dtsi file is all that is needed to get cpuidle working on the r7500v2 and the R7800.

To enable cpuidle on the R7800, see this post below to edit qcom-ipq8064.dtsi. See also the WARNING for R7800 testers in this post above - @anon50098793 has tested the concept on the R7800 and no "smoke" reported yet.

WARNING My patch "ipq806x: k419 cpuidle fix" contains edits for usb in qcom-ipq8064.dtsi specific to my edits in qcom-ipq8064-r7500v2.dts (also on my github site as "ipq806x: r7500v2 k419 extend overlay and usb fix") R7800 users using my patch may have usb issues.

It would be good to have a common "ipq806x k419 usb fix" that will work on all devices that use qcom-ipq8064.dtsi. I'll follow the community... See @Ansuel's post to get started adjusting the dtsi and dts files for usb on the R7800.

Lastly @Ansuel and/or @anon50098793 , assuming everything works out after sufficient testing, would you mind submitting the cpuidle pull request(s)? For personnel reasons related to privacy, I choose not to identify myself in public online forums; consequently, I can't submit the pull request per openwrt identity requirements (a policy I fully support BTW). I'm not bothered about attribution to me - a link to this thread is more than sufficient.

Ansuel · July 17, 2019, 8:36pm

i'm currently away from the router

Anyway i will test and publish a commit ASAP...

anon98444528 · July 17, 2019, 9:10pm

a quick thankyou to invisiblek (@invisiblek in the forum here?) and karlp on irc for helping me think through the "workflow" when building initramfs images for testing.

I ran into some errors after make distclean followed by make target/linux/{clean,prepare} V=s and make target/linux/{compile,install} V=s. The short of it is that one can not expect a functional initramfs image to be built from these commands without first building a full flash image... you need things beyond just the kernel of course.

anon98444528 · July 18, 2019, 1:31am

thank you

by "test" I hope you mean demonstrate a benefit and stability resulting from these changes (reduced power at low cpu usage, better performance under high cpu usage, no crashing). See my edit above starting with "Case 3 is problematic" i.e. there is no point in submitting a PR if there is no benefit.

I'll take a shot at demonstrating a benefit and showing stability as well...

Ansuel · July 18, 2019, 12:55pm

Personally I can test stability as it's my main router

anon98444528 · July 18, 2019, 7:16pm

After some thought, I've updated my github k419 branch in favor of a simple solution that does not involve patching the kernel drivers. Its likely a few changes to the qcom-ipq8064.dtsi file is all that is needed and is a flexible solution (I'll explain below).

To get cpuidle working (the wfi, spc, and the additional sysfs information), add status = "okay" and "qcom,apq8064-saw2-v1.1-cpu" to the qcom-ipq8064.dtsi along the lines of:

...
                idle-states {                                                   
                        CPU_SPC: spc {                                          
                                compatible = "qcom,idle-state-spc",             
                                                "arm,idle-state";
                                status = "okay"
                                entry-latency-us = <400>;                       
                                exit-latency-us = <900>;                        
                                min-residency-us = <3000>;                      
                        };    
...
		saw0: regulator@2089000 {
                        compatible = "qcom,saw2",
                                    "qcom,apq8064-saw2-v1.1-cpu",
			             "syscon";
                        reg = <0x02089000 0x1000>, <0x02009000 0x1000>;
                        regulator;
		};

                saw1: regulator@2099000 {
                        compatible = "qcom,saw2",
                                     "qcom,apq8064-saw2-v1.1-cpu",
		                     "syscon";
			reg = <0x02099000 0x1000>, <0x02009000 0x1000>;
		        regulator;
                };
...

note the a (not i) in "qcom,apq8064-saw2-v1.1-cpu"

I'd like to keep the status = "okay" line as one could use status = "disabled" to turn off spc at compile time (this does not disable the wfi idle state).

The spc idle state can be disabled/enabled at run time via:

echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/state1/disable
echo 1 > /sys/devices/system/cpu/cpu1/cpuidle/state1/disable

to disable spc, or

echo 0 > /sys/devices/system/cpu/cpu0/cpuidle/state1/disable
echo 0 > /sys/devices/system/cpu/cpu1/cpuidle/state1/disable

to enable spc. For me state1 is spc, state0 is wfi. I can not disable wfi with this method (even with spc enabled).

This solution resolves the issue and is flexible; consequently I think it has a better chance of acceptance (in openwrt master) than would patching kernel drivers.

Testing for stability only should be fine in this case.

Your approach to "do some magic in the dts" is correct...

anon98444528 · July 19, 2019, 11:34am

Regarding the hang before MMC...

I don't see "MMC" in my dmesg output (r7500v2).

Have you tried just using the edits to dtsi mentioned above? Try just the dtsi edits above with status = "disabled" and status = "okay" and see if either affects the hang.

idk if its related but I'm pretty sure the spm "system" is checked for possible probing very early (it's anoying that I can't see pr_err statements from spm_dev_probe function in spm.c unless there is a possible working dts configuration) and then probed at a later time (but well before the enable-method property messages).

Lastly, on boot just after a full flash to nvram with the dtsi changes (and also with my now defunct kernel driver patches), I did get a strange hang near the end of dmesg. On reboot (and subsequent reboot/power cycles) no hang. I also never saw it booting initramfs images.

anon50098793 · July 19, 2019, 11:42am

tried;

> CONFIG_CONSOLE_LOGLEVEL_DEFAULT=11 ( or 9 )
> CONFIG_CONSOLE_LOGLEVEL_QUIET=5
> CONFIG_DEBUG_DRIVER=y
> CONFIG_DEBUG_INFO=y
> CONFIG_DEVFREQ_THERMAL=y
> CONFIG_DEVTMPFS=y
> CONFIG_KALLSYMS=y
> CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7
> CONFIG_SCHED_DEBUG=y
> CONFIG_THERMAL_GOV_USER_SPACE=y
> CONFIG_THERMAL_STATISTICS=y
> CONFIG_THERMAL_WRITABLE_TRIPS=y

i get the feeling the deferred probe on cpuidle also holds back spm....

yeah, sticking to the basic edit.... the mmc thing i think is false positive and it's the cpuORregulator handover glitching.... like i said.... my device is a little out of the norm so my tests are to be taken with 3 grains of salt

wait for someone with an r7800 or nbg6817 to test it out before factoring in any of my bugs!

an old log... not sure what i was running config wise and the -19 should be disregarded .... but shows some init logic...

[    6.037817] bus: 'platform': add driver cpufreq-dt                                                        
[    6.045948] Speed bin: 0                                                                                  
[    6.050326] PVS bin: 5                                                                                    
[    6.054531] Registering platform device 'cpufreq-dt.0'. Parent at platform                                
[    6.055200] device: 'cpufreq-dt.0': device_add                                                            
[    6.062061] bus: 'platform': add device cpufreq-dt.0                                                      
[    6.066646] bus: 'platform': driver_probe_device: matched device cpufreq-dt.0 with driver cpufreq-dt      
[    6.071650] bus: 'platform': really_probe: probing driver cpufreq-dt with device cpufreq-dt.0             
[    6.080745] cpufreq-dt cpufreq-dt.0: no default pinctrl state                                             
[    6.089265] platform cpufreq-dt.0: Driver cpufreq-dt requests probe deferral                              
[    6.094875] platform cpufreq-dt.0: Added to deferred list                                                 
[    6.102032] DT idle-states: Parsing idle state node /cpus/idle-states/spc failed with err -19

anon98444528 · July 19, 2019, 12:16pm

~~I'll try the elevated log messages.~~ Actually, next time I have an issue like this I'll review this. However, the pr_err statements (added to the spm_dev_probe function in spm.c) do show up for me... only if i include "qcom,apq8064-saw2-v1.1-cpu" in the dts saw config. It seems like the fdt (dtb from dts?) file is parsed first and checked for what the kernel should do when booting. I'm guessing that if it finds something in fdt it should check regarding spm, then it will probe spm otherwise it skips the driver completely.

I suspect the "deferred probe" (added by 0070-qcom-spm-fix-probe-order.patch) does not happen for R7800 and r7500v2 - I recall this is for a different device but I can't find the link atm. EDIT and should issues around the deferred probe come up, I'll review this and the link above as well.

Ansuel · July 19, 2019, 12:31pm

Some times ago I also tried to use apq saw driver in DTS but my router bootloped... Do you what could be the problem? Missing reg?

anon98444528 · July 19, 2019, 12:33pm

idk we'd need to reproduce it now.

I never tried without the status = "okay" (or "disable") line under idle states. Also "some time ago" could be kernel 4.4, 4.9, 4.14... there have been a few changes to the drivers over that time frame. Perhaps you did it after seeing this (I think it refers using an apq setting under scm firmware dts node rather than saw nodes BTW)?

anon98444528 · July 20, 2019, 12:06am

@ansuel, regarding thermal sensors:

see EDITs: below

in k414, qcom-ipq8064.dtsi, under the label: node "tsens: thermal-sensor@900000" there's

interrupts = <0 178 0>;

in k419 its:

interrupts = <GIC_SPI 178 IRQ_TYPE_LEVEL_HIGH>;

I found:

linux-4.19.57/include/dt-bindings/interrupt-controller/arm-gic.h:#define GIC_SPI 0

but

linux-4.19.57/include/dt-bindings/interrupt-controller/irq.h:#define IRQ_TYPE_LEVEL_HIGH 4

have you tried:
interrupts = <0 178 0>;
on k419 yet?

~~Also, it looks like irq.h is not in the dtsi include files at the top... so may try that next...~~ nvmd, its included in arm-gic.h...

EDIT 0: I got a chance to try it:
dmesg before:

root@OpenWrt:/# dmesg | grep tsens
[    2.384480] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x0090367f]
[    2.386855] qcom-tsens 900000.thermal-sensor: tsens init failed

dmesg after:

root@OpenWrt:/# dmesg | grep tsens
[    3.104406] [<c05bcee8>] (platform_get_irq) from [<c0667210>] (tsens_probe+0x120/0x2b0)
[    3.112210] [<c0667210>] (tsens_probe) from [<c05bcae4>] (platform_drv_probe+0x3c/0x84)
[    3.229016] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x0090367f]
[    3.233830] qcom-tsens 900000.thermal-sensor: tsens init failed

so a change..., but need to look into the resource error

EDIT 1: there is a possible solution to the resource error here (by making tsens a sub-node of the gcc node just above) which makes some sense given this. I'll try it when I have a chance.

I see that IRQ_TYPE_LEVEL_HIGH (4) is used in other places in the dtsi file where in k414 it was 0.

EDIT 2: making tsens a sub node of gcc in the style of the link above will remove the error but no thermal sensors (I tried a few variants here). If I return the tsens back the way it was but change the reg property under tsens to

reg = <0x900000 0x4000>;

I get:

root@OpenWrt:/# dmesg | grep tsens
[    2.490618] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x00903fff]

so it looks like the driver is exhausitng its memory allocation.

Looking at the k414 patches (also in k419 but I have not checked for diffs yet), there have been a few changes since the linked thread above about getting the thermal sensors working on ipq806x. This post links back to the caf site that the current patches seem to have been pulled from.

After reading about dts interrupts, I suspect the interrupt change from 0 to 4 is ok (and either likely will work).

At this point I have some clues to work on and will go back to thinking about it.

anon98444528 · July 22, 2019, 12:57pm

Thermal sensors update:

I found a way to get the thermal sensors working without changing patches. ~~, but the specific method below is not a solution as it kills my wifi (I'm using ath10k-ct with the htt firmware).~~ After repeat test, wifi is working. Test with caution

edit the thermal-sensor node in qcom-ipq8064.dtsi to be:

                tsens: thermal-sensor@904000 {                                  
                        compatible = "qcom,ipq8064-tsens";                      
                        reg = <0x904000 0x3680>;                                
                        nvmem-cells = <&tsens_calib>, <&tsens_backup>;          
                        nvmem-cell-names = "calib", "calib_backup";             
                        interrupts = <GIC_SPI 178 IRQ_TYPE_LEVEL_HIGH>;         
                        #thermal-sensor-cells = <1>;                            
                };

i.e. I'm hoping that just because the thermal sensors can share the gcc memory space (with an appropriate dts/kernel driver), does not mean they have to when using the exiting kernel driver patches.

Now should I find an unused memory region for tsens to occupy? Or should I work harder at figuring out how to define a shared memory region for gcc and tsens?

It be good if actually understood what the problem is or how memory is mapped...

anon50098793 · July 22, 2019, 1:57pm

spent a few hours on this yesterday;

some highlights ( just areas of interest - does not work but you can see where 14 and 19 seem to be muddling with the initialisation )

+++ b/drivers/thermal/qcom/tsens-ipq8064.c
@@ -263,7 +263,7 @@ static void tsens_scheduler_fn(struct wo
 			schedule_work(&tmdev->sensor[0].notify_work);
 			regmap_read(tmdev->map, sensor_addr, &adc_code);
 			pr_debug("Trigger (%d degrees) for sensor %d\n",
-				code_to_degC(adc_code, &tmdev->sensor[0]), 0);
+				code_to_degC(code, &tmdev->sensor[0]), 0);
 		}
 	}
 	regmap_write(tmdev->map, STATUS_CNTL_8064, reg & mask);
--- a/drivers/thermal/qcom/tsens.c
+++ b/drivers/thermal/qcom/tsens.c
@@ -146,6 +146,7 @@ static int tsens_probe(struct platform_d
 	int ret, i;
 	struct device *dev;
 	struct device_node *np;
+	struct tsens_sensor *s;
 	struct tsens_device *tmdev;
 	const struct tsens_data *data;
 	const struct of_device_id *id;
@@ -174,9 +175,12 @@ static int tsens_probe(struct platform_d
 		return -EINVAL;
 	}
 
-	tmdev = devm_kzalloc(dev,
-			     struct_size(tmdev, sensor, num_sensors),
-			     GFP_KERNEL);
+	/* tmdev = devm_kzalloc(dev, */
+		     /* struct_size(tmdev, sensor, num_sensors), */
+	/*		     GFP_KERNEL); */
+	tmdev = devm_kzalloc(dev, sizeof(*tmdev) +	
+		data->num_sensors * sizeof(*s), GFP_KERNEL);	
+
 	if (!tmdev)
 		return -ENOMEM;
 
--- a/drivers/thermal/qcom/tsens.h
+++ b/drivers/thermal/qcom/tsens.h
@@ -87,8 +87,10 @@ struct tsens_device {
 	u32				num_sensors;
 	u32				tsens_irq;
 	struct regmap			*map;
+	struct regmap_field		*status_field;
 	u32				tm_offset;
 	struct tsens_context		ctx;
+ 	bool				trdy;
 	const struct tsens_ops		*ops;
 	struct work_struct		tsens_work;
 	struct tsens_sensor		sensor[0];
@@ -100,8 +102,8 @@ int init_common(struct tsens_device *);
 int get_temp_common(struct tsens_device *, int, int *);
 
 /* TSENS v1 targets */
-extern const struct tsens_data data_8916, data_8974, data_8960, data_ipq8064;
+extern const struct tsens_data data_8916, data_8974, data_8960, data_8996, data_ipq8064;
 /* TSENS v2 targets */
-extern const struct tsens_data data_8996, data_tsens_v2;
+extern const struct tsens_data data_tsens_v2;
 
 #endif /* __QCOM_TSENS_H__ */

So between the struct change, and when it is probed....something is initialised incorrectly?....possibly add some prints around the functions mentioned....

Might be worth taking out the v2 references for fun and see what happens....

Of note or not... if you change the tsens compatible to qcom,tsens-v2 ... you get a "invalid number of sensors" probably not relevant but something different

Anyways, your findings kinda debunk my theories from yesterday.... unless they contribute to allocating more memory somehow?

anon98444528 · July 22, 2019, 2:58pm

the error says it can't reserve the region, but it does not say why. This is what I don't understand.

Is the root cause in: the gcc driver (needs more memory than before)?, the thermal sensor driver? (seems unlikely now as I can make it work outside of shared memory with gcc), or something different - like the rules changed and now one must explicitly define a shared memory region in the dts for gcc and tsens

reserved-memory and no-map look interesting. Just adding no-map under gcc or tsens or both has no effect. I made several attempts to split the 0x900000 to 0x904000 memory region between gcc and tsens but router won't even boot.

Lastly,

the way it killed my wifi is odd... I

built and tested an initramfs with dts mod above - thermal sensors worked - yea!
flashed the sysupgrade image to nvram that was built when building the initramfs image, kept settings, wifi working ok here, added a bunch o packages via opkg (still no issues), power cycled the router (as is my norm after nvram flash and updates), a kernel oops/reboot on boot after power on, and then successful watchdog reboot but no wifi but thermal sensors still working. (Reboot/power cycling did not fix it).
thinking latest opkg additions might be responsible, I rebuilt (kernel only) again without tsens dts mod above, sysupgrade flashed to nvram, opkg updated, power cycle and wifi working...

I'm still sifting through this mess (via screen logs) so I wouldn't read too much into it yet.

EDIT: I could not reproduce the sequence above so I suspect I had a bad sysupgrade flash - perhaps due to flashing a sysupgrade image to nvram that was built from make target/linux/{compile,install} V=s

Ansuel · July 22, 2019, 4:20pm

If the struct changed it could be that we need to enlarge the reserved space. Coukd be that the driver load the entire struct in memory and as they changed size they clash or they goes out if memory (or the driver try to read/write in non valid memory)

I would randomly increase the reserved space and see if something change.