R7500v2 kernel 4.19 test

@ansuel, regarding thermal sensors:

see EDITs: below

in k414, qcom-ipq8064.dtsi, under the label: node "tsens: thermal-sensor@900000" there's

interrupts = <0 178 0>;

in k419 its:

interrupts = <GIC_SPI 178 IRQ_TYPE_LEVEL_HIGH>;

I found:

linux-4.19.57/include/dt-bindings/interrupt-controller/arm-gic.h:#define GIC_SPI 0

but

linux-4.19.57/include/dt-bindings/interrupt-controller/irq.h:#define IRQ_TYPE_LEVEL_HIGH 4

have you tried:
interrupts = <0 178 0>;
on k419 yet?

Also, it looks like irq.h is not in the dtsi include files at the top... so may try that next... nvmd, its included in arm-gic.h...

EDIT 0: I got a chance to try it:
dmesg before:

root@OpenWrt:/# dmesg | grep tsens
[    2.384480] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x0090367f]
[    2.386855] qcom-tsens 900000.thermal-sensor: tsens init failed

dmesg after:

root@OpenWrt:/# dmesg | grep tsens
[    3.104406] [<c05bcee8>] (platform_get_irq) from [<c0667210>] (tsens_probe+0x120/0x2b0)
[    3.112210] [<c0667210>] (tsens_probe) from [<c05bcae4>] (platform_drv_probe+0x3c/0x84)
[    3.229016] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x0090367f]
[    3.233830] qcom-tsens 900000.thermal-sensor: tsens init failed

so a change..., but need to look into the resource error

EDIT 1: there is a possible solution to the resource error here (by making tsens a sub-node of the gcc node just above) which makes some sense given this. I'll try it when I have a chance.

I see that IRQ_TYPE_LEVEL_HIGH (4) is used in other places in the dtsi file where in k414 it was 0.

EDIT 2: making tsens a sub node of gcc in the style of the link above will remove the error but no thermal sensors (I tried a few variants here). If I return the tsens back the way it was but change the reg property under tsens to

reg = <0x900000 0x4000>;

I get:

root@OpenWrt:/# dmesg | grep tsens
[    2.490618] qcom-tsens 900000.thermal-sensor: can't request region for resource [mem 0x00900000-0x00903fff]

so it looks like the driver is exhausitng its memory allocation.

Looking at the k414 patches (also in k419 but I have not checked for diffs yet), there have been a few changes since the linked thread above about getting the thermal sensors working on ipq806x. This post links back to the caf site that the current patches seem to have been pulled from.

After reading about dts interrupts, I suspect the interrupt change from 0 to 4 is ok (and either likely will work).

At this point I have some clues to work on and will go back to thinking about it.

1 Like

Thermal sensors update:

I found a way to get the thermal sensors working without changing patches. , but the specific method below is not a solution as it kills my wifi (I'm using ath10k-ct with the htt firmware). After repeat test, wifi is working. Test with caution

edit the thermal-sensor node in qcom-ipq8064.dtsi to be:

                tsens: thermal-sensor@904000 {                                  
                        compatible = "qcom,ipq8064-tsens";                      
                        reg = <0x904000 0x3680>;                                
                        nvmem-cells = <&tsens_calib>, <&tsens_backup>;          
                        nvmem-cell-names = "calib", "calib_backup";             
                        interrupts = <GIC_SPI 178 IRQ_TYPE_LEVEL_HIGH>;         
                        #thermal-sensor-cells = <1>;                            
                };                                                              

i.e. I'm hoping that just because the thermal sensors can share the gcc memory space (with an appropriate dts/kernel driver), does not mean they have to when using the exiting kernel driver patches.

Now should I find an unused memory region for tsens to occupy? Or should I work harder at figuring out how to define a shared memory region for gcc and tsens?

It be good if actually understood what the problem is or how memory is mapped...

spent a few hours on this yesterday;

some highlights ( just areas of interest - does not work but you can see where 14 and 19 seem to be muddling with the initialisation )

+++ b/drivers/thermal/qcom/tsens-ipq8064.c
@@ -263,7 +263,7 @@ static void tsens_scheduler_fn(struct wo
 			schedule_work(&tmdev->sensor[0].notify_work);
 			regmap_read(tmdev->map, sensor_addr, &adc_code);
 			pr_debug("Trigger (%d degrees) for sensor %d\n",
-				code_to_degC(adc_code, &tmdev->sensor[0]), 0);
+				code_to_degC(code, &tmdev->sensor[0]), 0);
 		}
 	}
 	regmap_write(tmdev->map, STATUS_CNTL_8064, reg & mask);
--- a/drivers/thermal/qcom/tsens.c
+++ b/drivers/thermal/qcom/tsens.c
@@ -146,6 +146,7 @@ static int tsens_probe(struct platform_d
 	int ret, i;
 	struct device *dev;
 	struct device_node *np;
+	struct tsens_sensor *s;
 	struct tsens_device *tmdev;
 	const struct tsens_data *data;
 	const struct of_device_id *id;
@@ -174,9 +175,12 @@ static int tsens_probe(struct platform_d
 		return -EINVAL;
 	}
 
-	tmdev = devm_kzalloc(dev,
-			     struct_size(tmdev, sensor, num_sensors),
-			     GFP_KERNEL);
+	/* tmdev = devm_kzalloc(dev, */
+		     /* struct_size(tmdev, sensor, num_sensors), */
+	/*		     GFP_KERNEL); */
+	tmdev = devm_kzalloc(dev, sizeof(*tmdev) +	
+		data->num_sensors * sizeof(*s), GFP_KERNEL);	
+
 	if (!tmdev)
 		return -ENOMEM;
 
--- a/drivers/thermal/qcom/tsens.h
+++ b/drivers/thermal/qcom/tsens.h
@@ -87,8 +87,10 @@ struct tsens_device {
 	u32				num_sensors;
 	u32				tsens_irq;
 	struct regmap			*map;
+	struct regmap_field		*status_field;
 	u32				tm_offset;
 	struct tsens_context		ctx;
+ 	bool				trdy;
 	const struct tsens_ops		*ops;
 	struct work_struct		tsens_work;
 	struct tsens_sensor		sensor[0];
@@ -100,8 +102,8 @@ int init_common(struct tsens_device *);
 int get_temp_common(struct tsens_device *, int, int *);
 
 /* TSENS v1 targets */
-extern const struct tsens_data data_8916, data_8974, data_8960, data_ipq8064;
+extern const struct tsens_data data_8916, data_8974, data_8960, data_8996, data_ipq8064;
 /* TSENS v2 targets */
-extern const struct tsens_data data_8996, data_tsens_v2;
+extern const struct tsens_data data_tsens_v2;
 
 #endif /* __QCOM_TSENS_H__ */

So between the struct change, and when it is probed....something is initialised incorrectly?....possibly add some prints around the functions mentioned....

Might be worth taking out the v2 references for fun and see what happens....

Of note or not... if you change the tsens compatible to qcom,tsens-v2 ... you get a "invalid number of sensors" probably not relevant but something different :wink:

Anyways, your findings kinda debunk my theories from yesterday.... unless they contribute to allocating more memory somehow?

the error says it can't reserve the region, but it does not say why. This is what I don't understand.

Is the root cause in: the gcc driver (needs more memory than before)?, the thermal sensor driver? (seems unlikely now as I can make it work outside of shared memory with gcc), or something different - like the rules changed and now one must explicitly define a shared memory region in the dts for gcc and tsens

reserved-memory and no-map look interesting. Just adding no-map under gcc or tsens or both has no effect. I made several attempts to split the 0x900000 to 0x904000 memory region between gcc and tsens but router won't even boot.

Lastly,

the way it killed my wifi is odd... I

  1. built and tested an initramfs with dts mod above - thermal sensors worked - yea!
  2. flashed the sysupgrade image to nvram that was built when building the initramfs image, kept settings, wifi working ok here, added a bunch o packages via opkg (still no issues), power cycled the router (as is my norm after nvram flash and updates), a kernel oops/reboot on boot after power on, and then successful watchdog reboot but no wifi but thermal sensors still working. (Reboot/power cycling did not fix it).
  3. thinking latest opkg additions might be responsible, I rebuilt (kernel only) again without tsens dts mod above, sysupgrade flashed to nvram, opkg updated, power cycle and wifi working...

I'm still sifting through this mess (via screen logs) so I wouldn't read too much into it yet.

EDIT: I could not reproduce the sequence above so I suspect I had a bad sysupgrade flash - perhaps due to flashing a sysupgrade image to nvram that was built from make target/linux/{compile,install} V=s

If the struct changed it could be that we need to enlarge the reserved space. Coukd be that the driver load the entire struct in memory and as they changed size they clash or they goes out if memory (or the driver try to read/write in non valid memory)

I would randomly increase the reserved space and see if something change.

@Ansuel @anon50098793,

I repeated the changes to the dts in this post above. This time no kernel oops on boot/reboot and no loss of wifi. This makes me doubt that I tried rebooting/powercycles after the wifi outage... but I'm pretty sure I did (bad nvram flash?).

In any event, it may not have been related and this might be worth testing (I'm using it now with functional wifi and thermal sensors). Upon nvram flash only part of my settings were retained (I lost some files in ~/ (easily restored). This type of update strangeness I have experienced before so it may not be related. Or the 2KB after 0x904000 is used for something and causing the nvram flashing inconsistencies.

If your brave, would you mind testing these dts changes as well?

For the record, I also tried several other variants none of which worked. If I expand the gcc mem range to 0x8000 and try several variants of placing the thermal sensor range in that range, I get the same error. This makes me think this is not a lack of memory issue but an inability to use the gcc range in 19 as was done in 14. The router will not boot if I make the gcc range size smaller - it wants to be 0x4000.

I made some attempts at using reserved-memory but the best result I got from this indicates I'll need to edit thermal sensor driver code to get further... and I don't know if it will work even if I do it correctly.

24 hours up on 4.19.57 with cpuidle, usb, and thermal sensors functioning.

No obvious errors or issues.

1 Like

5+ days up and no obvious errors or issues; however, at least on R7800 user can't boot after building an image based on my k419 git hub branch. I don't know the reason for this boot failure yet but I suspect the thermal sensor patch.

My best guess about why thermal sensors fails on 4.19 but works on 4.14 has to do with how the current thermal sensor driver requests memory. The thermal sensor driver (init_common in drivers/thermal/qcom/tsense_common.c) makes calls to devm_ioremap_resource (in lib/devres.c). Subsequent function calls from the devm_ioremap_resource function check if another driver has requested the memory region which of course the gcc driver has. This might have worked in 4.14 but was likely a bug that has been fixed in 4.19 (I've seen a reference to a similar type of bug).

If all this speculation is relavent, one solution might be keep the 4.14 device tree definition and change the devm_ioremap_resource function to devm_ioremap (also in /lib/devres.c - I'll need to work out how to do the function args for this) ref.

It will be 2-3 weeks before I'll have chance to try this... unless someone else wants to have a go in the interim.

1 Like

I wonder how the driver never used the memory of the other. Can't we just split it?

tried several variants splitting 0x900000 to 0x904000 between gcc and tsens - all would not boot. It seems gcc has a minimum size but I don't know what that is. I do know (0x4000-0x3680) = 0x980 is too small.

The "range" device tree property has functionality that seems relevant to allocating a memory range but I suspect that it won't work here; however, I have not tried it.

Several questions I don't have the answer for:

the kernel can address memory by logical, virtual, of physical addresses. Are the addresses in the device tree physical address?

Is some firmware code secretly using (physical) addresses 0x904000+?

Is there another memory location that can be safely utilized? (I tried having tsens use 0x42000000+0x3680 but that would not boot.)

If I do request more memory like I did from 0x904000 + 0x3680 do I have to adjust some other parameter in the device tree when I do that?

@anon50098793 has pointed out that the dev structures have changed (increased and need more memory?). Is there now more than one issue related to sharing the 0x900000+0x4000 memory region between gcc and tsens?

Sorry for all the questions, I don't expect you or others to answer them, I'm just admitting what I know that I don't know.

I've had 17+ days up (including the temperature sensor patch mentioned above); however, I've also had a few unexplained crashes (one where the 5 ghz radio quit working). I've not been paying close attention to why but I continue to suspect the thermal sensor patch.

I tried updating my source today and ran into multiple (likely unrelated to my patches) build failures. One failure was PF_RING kernal module - easy enough to de-select as I don't use this. However, ath10k-ct-htt complains about wanting to install the board.bin file but can't do due to conflicts from ath10k-ct (non htt).

In sort, it seems there is little or no ipq806x progress (especially related to kernal 4.19) so I've so set this aside for now and will use 4.14.

Thermal sensor update: in short, I found another way to get them working without any hint of strange boot or wireless issues.

Base on the speculation above, I made the following changes to init_common in drivers/thermal/qcom/tsense_common.c:

int __init init_common(struct tsens_device *tmdev)                              
{                                                                               
        resource_size_t size;                                                   
        void __iomem *base;                                                     
        struct resource *res;                                                   
        struct platform_device *op = of_find_device_by_node(tmdev->dev->of_node\
);                                                                              
                                                                                
        if (!op)                                                                
                return -EINVAL;                                                 
                                                                                
        /* The driver only uses the TM register address space for now */        
        if (op->num_resources > 1) {                                            
                tmdev->tm_offset = 0;                                           
        } else {                                                                
                /* old DTs where SROT and TM were in a contiguous 2K block */   
                tmdev->tm_offset = 0x1000;                                      
        }                                                                       
                                                                                
        res = platform_get_resource(op, IORESOURCE_MEM, 0);                     
        size = resource_size(res);                                              
        base = devm_ioremap(&op->dev, res->start, size);                        
        //base = devm_ioremap_resource(&op->dev, res);                          
        if (IS_ERR(base))                                                       
                return PTR_ERR(base);                                           
                                                                                
        tmdev->map = devm_regmap_init_mmio(tmdev->dev, base, &tsens_config);    
        if (IS_ERR(tmdev->map))                                                 
                return PTR_ERR(tmdev->map);                                     
                                                                                
        return 0;                                                               
}

and this works (no changes made to dts(i) files). No non-reproducable boot issues or wireless failures - it just works (so far). I'll test for a week or two and report back then.

I'll try to post a patch to my git tree, but I'd prefer if non r7500v2 users not use my branch to test this change for now. If you'd like to try it, obviously go right ahead, just make the change "by hand" yourself and please report back how it turned out.

1 Like

This week I will return from vacation so I can test cpuidle and tsense on r7800

Also I can't understand why ipq806x is still on 4.14 after all this time.

nice work!... will test when i get a chance but definitely looks like you nailed it in the guts... should be much easier to backpedal, to any struct issues etc. later.... now that the root is found.

EDIT: loaded... nothing in dmesg...

cat /sys/bus/platform/drivers/qcom-tsens/900000.thermal-sensor/subsystem/drivers_autoprobe
1
1 Like

as in the apq8064 definition the register that control the cpuidle are called "generic 806x" i wonder if it's better to also patch the driver and adds the compatible in the cpuidle driver

I mean it's strange to see apq8064 in ipq8064 dtsi.
I mean as we need to add a patch for this... patch also the driver while we are there.

I thought about that...

I could be wrong but the conclusion I came to after reading about dts etiquette and conventions (see here under " Understanding the compatible Property") is that it should be left as apq.

up 2+ days no issues

root@OpenWrt:~# uname -a
Linux OpenWrt 4.19.69 #0 SMP Tue Sep 3 00:16:19 2019 armv7l GNU/Linux
root@OpenWrt:~# uptime
 07:25:49 up 2 days, 10:19,  load average: 0.00, 0.04, 0.00
root@OpenWrt:~# cat /sys/devices/system/cpu/cpuidle/current_driver 
arm_idle
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name 
WFI
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state0/usage
18524385
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu1/cpuidle/state0/name
WFI
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu1/cpuidle/state0/usage
14295090
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/name 
spc
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu0/cpuidle/state1/usage
19018080
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu1/cpuidle/state1/name
spc
root@OpenWrt:~# cat /sys/devices/system/cpu/cpu1/cpuidle/state1/usage
7054843
root@OpenWrt:~# cat /sys/devices/virtual/thermal/thermal_zone0/temp
44112
root@OpenWrt:~# cat /sys/devices/virtual/thermal/thermal_zone9/temp
46786
root@OpenWrt:~# lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

some update on r7800....

cpuidle brick the router (no panic but the bootlog just froze)

what cpu have r7500 ? (are you sure that the changes to dtsi are the only change?)

tsense no log in bootlog with your patch... any way to test if it does work ?

glad you tested it... but I'm under the impression @anon50098793 tested it above. Perhaps not, and the warnings above are valid.

r7500v2 is IPQ8064
r7800 is IPQ8065

I'm running an image built off the patch set here. So as far as cpuidle goes only change should be in dtsi (but my patch set makes other changes for usb).

regarding tsense, try:

cat /sys/bus/platform/drivers/qcom-tsens/900000.thermal-sensor/subsystem/drivers_autoprobe
cat /sys/devices/virtual/thermal/thermal_zone9/temp

EDIT: if it will help, I can send you a diffconfig, build log, and logs from my build scripts...

root@No-Lag-Router:~# cat /sys/bus/platform/drivers/qcom-tsens/900000.thermal-se
nsor/subsystem/drivers_autoprobe
1
root@No-Lag-Router:~# cat /sys/devices/virtual/thermal/thermal_zone9/temp
65970

will check if my dtsi have more options or not...