Supporting thermal sensors on ipq806x

There's strange message in dmesg that tsens is not calibrated, seems it needs some tuning

I get the same message, I guess:

[    1.990990] qcom-tsens 900000.clock-controller:thermal-sensor@900000: tsens calibration failed

The core functionality seems to work, though.

I tested with launching two simultaneous "openssl benchmark" processes to make sure that both cores are fully utilised. CPU util jumped to 100% and temps ramped up rather nicely and Luci stats picked that up...

So I would say that it works to a large extent.

If I got it right, this is the first/original version of the upstream driver from 2015, which got several adjustments before getting accepted to upstream Linux in 2016.

May I ask your help here... R7800 is new for me and I have not bricked it so far. Lucky me.

But what is the best way to recover from a non-booting R7800, from a "bad flash"?
Is there a TFTP recovery mode to be triggered via button during boot (like in some older Netgear routers), or what you do?

Yep, it's the 1st version of driver that went upstream.
It's not a big deal, you should:

  1. Turn off the power, push and hold the reset button with a pin
  2. Turn on the power and wait till power led starts flashing white.
  3. Release the pin and tftp the img in binary mode

I suspect that we should go up with each iteration and check at which one it stops working. If all work with the same problems, then I guess it's better to stick to full Celsius version with your patch, as it seems to be the only bug-free

I am not quite sure about it being completely bug-free. With my patch the "trip_point" temp values were crazy, e.g. -75000 etc.. I am not sure if that anomaly was due to my patch, or if they were botched already in the original conversion of millicelsius logic to celsius in the original patch. (But I think that I did not check the values at sensors 7-10 that your new version uses. So it is quite possible that some of the other registers are uninitialised, but those four might have sensible values.)

Your commit 65fb10e0 from yesterday works for me and the "trip_point" temp values are sensible 75000 and 95000 like my example above shows.

I did not quite understand what went wrong for you later.

Just for reference, this is the data after my millicelcius patch to the PR's full-celsius driver.

As you can see (below), the temps themselves look ok, but the trip_point temp values are crazy. critical high at 133'C and "configurable hi" at 113'C etc. I think that something goes wrong there. I am not sure if those trip_point temps have any value for us, but in any case that looks strange.

root@lede:/sys/class/thermal# cat thermal_zone*/temp
55000
54000
55000
53000
55000
53000
56000
54000
55000
52000
56000

Full data for thermal_zone9:

root@lede:/sys/class/thermal# ls thermal_zone9/*
thermal_zone9/available_policies  thermal_zone9/mode                thermal_zone9/temp                thermal_zone9/trip_point_2_type
thermal_zone9/integral_cutoff     thermal_zone9/offset              thermal_zone9/trip_point_0_temp   thermal_zone9/trip_point_3_temp
thermal_zone9/k_d                 thermal_zone9/passive             thermal_zone9/trip_point_0_type   thermal_zone9/trip_point_3_type
thermal_zone9/k_i                 thermal_zone9/policy              thermal_zone9/trip_point_1_temp   thermal_zone9/type
thermal_zone9/k_po                thermal_zone9/slope               thermal_zone9/trip_point_1_type   thermal_zone9/uevent
thermal_zone9/k_pu                thermal_zone9/sustainable_power   thermal_zone9/trip_point_2_temp

thermal_zone9/subsystem:
thermal_zone0   thermal_zone10  thermal_zone3   thermal_zone5   thermal_zone7   thermal_zone9
thermal_zone1   thermal_zone2   thermal_zone4   thermal_zone6   thermal_zone8

root@lede:/sys/class/thermal# cat thermal_zone9/*
step_wise
cat: read error: I/O error
cat: read error: I/O error
cat: read error: I/O error
cat: read error: I/O error
cat: read error: I/O error
enabled
cat: read error: I/O error
0
step_wise
cat: read error: I/O error
cat: read error: Is a directory
cat: read error: I/O error
56000
133000
critical
113000
configurable_hi
8000
configurable_low
-88000
critical_low
tsens_tz_sensor9

The calibration failure can't go unnoticed, as it may bring inconsistent values after all.
As for trip_points - it seems to be expected behavior in full-Celsius patch, because it doesn't set it at all, it's pure temp and nothing more.

What troubles me in last patch is that it selects N 7, 8, 9 and 10 sensors that represent cores 1 - 4 in apq8064 soc, but for ipq806x it should be only N 0, 1 and 2: 0 and 1 for each core and 2 seems to be general. But you can't select 0, 1 and 2, because it stops booting.

I'll try to port next iterations of the driver and see how it goes.

If you want you may test a new one, it's v3 of upstream and seems to be in line with kernel 4.4

Does not work.

root@lede:/sys/class/thermal# cat thermal_zone*/temp
cat: read error: Invalid argument
cat: read error: Invalid argument
cat: read error: Invalid argument
cat: read error: Invalid argument


I might have found the missing part - updated patch for GCC controller (patch 309) in compliance with according patch for apq8064

I'll test it in a couple of hours)

I compiled your "test V3" 88e0baa9 and it seems to work:

root@lede:/sys/class/thermal# cat thermal_zone*/temp
57577
52640
57577
54904

But I get the same warning in dmesg:

 gcc-ipq806x 900000.clock-controller: tsens calibration failed

In any case, all four thermal zones report temps and temps change quickly when burning CPU with openssl benchmark.

I am not sure if that warning is too serious. I think that the changes in temps under load vs. idle is the important thing for users. It seems quite reasonable to me that cores run ~55-60 C in normal conditions and then ramp up to ~70 under heavy CPU load. Similarly network traffic causes a mild temp rise.

So I wouldn't worry too much at this point about the calibration failure.

To get this properly fixed, we probably need more data about the sensors in ipq8065. Hopefully qcom will upstream the correct dts at some point.

Wohoo! It means that upstream patch will work as well! That was my main goal :slight_smile:

So I'll try the upstream patch next for easier management and we'll focus on solving the calibration issue :slight_smile:
Thank you very much mate, it would have been a pain without your assistance :slight_smile:

Seems like qcom won't as they haven't sent the gcc patch upstream as they did for apq8064, so we had to figure it out ourself

I compiled and tested your newest, 4a9cd7a, but that bricked the device. R7800 got into a reboot loop, so I used TFTP to recover. Thanks for your advice regarding that. It worked like a charm, just like it used to do for me old WNDR3700.

I think that the backport from 4.9 contains too much additional non-qcom stuff, which may cause breakage.

Yeah, I've encountered the same :slight_smile: don't test now, I think it's better for you to stick to that V3 version now, I keep tuning some stuff to figure out what's ok and what's not. I'll let you know when it's safe to pull

Just mentioning here that a stable solution was found and now the backported upstream driver works nicely in IPQ8065 based R7800.

@dissent1 has updated https://github.com/lede-project/source/pull/533 in case somebody wants to try it before the PR gets accepted.

Have you tried checking wireless temp?

cat /sys/class/ieee80211/phy*/device/hwmon/hwmon2/temp1_input

I see no hwmon in /sys/class/ieee80211/phy1/device/

Okay, it's missing in mac80211 backports makefile, need to define additional package
ath10k_core-$(CPTCFG_ATH10K_THERMAL) += thermal.o

I've made a bug report https://bugs.lede-project.org/index.php?do=details&task_id=311