CPU frequency scaling driver for mvebu (WRT3200ACM etc.)


#21

@hnyman: i've just upgraded to the latest Lede build and added the new mlwifi driver.
only thing that doesn't seem to work anymore is collectd. No stats are collected anymore, and graphs stay empty.
could you by any chance check your device if it is me or a general bug in Lede?

edit: nevermind, my error.
System is running really nice i must say. still no issues at all with the frequency scaling driver.


#22

Sooo this will be inplemented?


#23

Hi, tried patch on WRT1200. Confirmed working, but temperature stays the same, even though the CPU runs half speed.

I also tried to revert the CPU Idle disabling patch, but temperature RISES by 10 degrees, which is very strange.


#24

I have been waiting to see if the wrt1900ac v1 crash problems with kernel 4.9 can be solved first. I think that we should not add new CPU related complexity as long as there are visible problems.

And I am not quite is if this brings actual benefits. The router temps stay at almost same level.

So far nobody has reported about real negative effects with this, so the patch looks good in that sense.

I have the gut feeling that the mvebu power management is somewhat half-baked at the moment. I have not looked into the cpuidle things that @onja mentioned.

Ps.
My own main interest is more on the IPQ806x side with R7800. This mvebu thing is more like a side show, thanks to the Linksys giveaway :wink:


#25

Good to know there is a patch for CPU scaling, but the temperatures reported to the CPU, seems to be wrong (armada_thermal-virtual-0).

The other temps is correct (tmp421-i2c-0-4c temp1 and temp2).

How can be possible to have a CPU almost 60oC at coldboot? (the router remain powered off by about 9 hours, with room temperature about 23oC)
The other temps at start at boot are correct (about 34oC)

After 3 hours running the router, measured the router temperature with a infrared thermometer and the max temps measured not surpassed 48oC, which is about the same of the tmp421-i2c-0-4c temp1 and temp2 sensors, not the CPU (at almost 80oC).

The DD-WRT firmware also shows the CPU temps and the temps is about 50oC, which is right.

I have been running the router for 2 months, if the CPU temp at always 80oC, by now the CPU has fried...


#26

the SoC is specified to run at up to 105 degrees C...


#27

iirc, on the mamba, the temperatures(C) just to engage fan to low are:

CPU  80
DDR  60
WIFI 100

and add 5C to each to turn fan to high.


#28

I use cat '/sys/class/thermal/thermal_zone*/temp' and it shows 72-82C (idle/loaded) in both lede and dd-wrt gui for WRT1200

This seems legit for non-idling CPU, I'd like to lower it, though, despite of 105 degrees limit.


#29

Any news about the commit to LEDE ?
Will it be on 4.4, 4.9...?


#30

nobody can answer ?


#31

Answers to your questions are mostly in the thread. Just read it wholly.

I published the patch in a PR for 4.9, but I have also built it for 4.4 as 17.01 branch.
Both builds are actually available​ from my download directory, link above.

If the PR gets accepted, it will be merged to master with 4.9, not to stable 17.01 with 4.4.

But I don't see any hurry to merge it in, as 4.9 still causes crashes for wrt1900ac v1. No need to add additional uncertainty.

Additionally, the benefits from frequency scaling seem rather small in mvebu compared to some other platforms. No real impact on device temps.


#32

Thanks for the abstract !


#33

thanks for spelling it out


#34

@hnyman
any chance you can update the patch for kernel version 4.14?


#35

I will look into it.
Mvebu was bumped to 4.14 yesterday, so I need the patch for my own master build.


#36

master-r6365-45fdb12258-20180303-cpufreq in the download dir contains now also the 4.14 patches, but the clock does not keep the right pace :frowning:

As support for 1866 MHz has been added in Linux upstream, I initially dropped the 805- patch.

I flashed my own router with r6365 and it seems to work otherwise ok, but the clocks lags badly.

I major challenge with this frequency scaling patch is that it is based on work done by upstream developers in 2015, originally for 4.4 or earlier, I think, but then apparently abandoned for not very visible reasons. Hard to say if the processor family does not fully support the scaling after all, or something similar, so I have not created a proper pull request about this.

EDIT:
I added back the 805- patch to see if that would help with the real-time clock, but no. The clock still lags badly.
https://github.com/hnyman/openwrt/commit/fba5acfdc8ad2225627ae88bc8adbc5c1c035093.patch


#37

Great! Much appreciated

@hnyman not sure if you see the same, but after a day of running i see that it messes up the clock. it will trail behind for hours in a matter of hours. reverting to a build without the frequency scaling now.


#38

You are right. I can see the same.
Might be due to dropping the 805- patch as it set also some supporting values along the basic 1866 MHz support. I will test with a modified version of that patch.

EDIT:
I tested compiling with the 805- patch back, but that does not help.
Apparently something has changed since kernel 4.9 :frowning:


#39

No worries, thanks for trying.


#40

I compiled also a 4.14 version without CPU frequency scaling, and that naturally works ok.

With the frequency scaling patch, the router's clock seems to lag by some 40-50%, which is quite much and clearly indicates that something is calculated wrongly (and/or does not get adjusted along the CPU frequency and the real-time clock ticks at half-pace 933MHz while it is read like it would be running at full 1866 MHz speed).

Comparing the kernel bootlogs of 4.9 and 4.14 reveals that there is something different in clock handling. Apparently some things have moved from 32bit to 64bit and clock resolution dropped from 40ns to 1ns?

kernel 4.9:

[    0.000000] Switching to timer-based delay loop, resolution 40ns
[    0.000003] sched_clock: 32 bits at 25MHz, resolution 40ns, wraps every 85899345900ns
[    0.000008] clocksource: armada_370_xp_clocksource: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417870 ns
[    0.000113] Calibrating local timer... 933.19MHz.
[    0.060030] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.00 BogoMIPS (lpj=250000)

kernel 4.14:

[    0.000005] sched_clock: 64 bits at 933MHz, resolution 1ns, wraps every 4398046511103ns
[    0.000015] clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: 0x1ae5b571769, max_idle_ns: 881590513431 ns
[    0.000026] Switching to timer-based delay loop, resolution 1ns
[    0.000125] Ignoring duplicate/late registration of read_current_timer delay
[    0.000131] clocksource: armada_370_xp_clocksource: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417870 ns
[    0.000245] Calibrating delay loop (skipped), value calculated using timer frequency.. 1866.00 BogoMIPS (lpj=9330000)

Not sure how that affects the CPU frequency scaling, but is likely part of the challenge. And just highlights why it would be great to have the frequency scaling functionality upstreamed in Linux itself.

It is also possible that something relevant has changed in the kernel config and/or devicetree files.

I have no large motivation into looking to fix the things right now, as I am not really a clock specialist (and mvebu is not my main router). I will leave the patches there in case somebody wants to experiment further.