IEI PUZZLE M902 Fan speed is too low

IEI PUZZLE M902 Fan speed is too low
PWM doesn't work
(because the temperature is over 80 degree Celsius)

Should this be fixed

I added it to rc.local for emergency handling
21.02
echo "255" > /sys/class/hwmon/hwmon0/pwm1
echo "255" > /sys/class/hwmon/hwmon0/pwm2

22.03
echo "255" > /sys/class/hwmon/hwmon6/pwm1
echo "255" > /sys/class/hwmon/hwmon6/pwm2

My PUZZLE-M902 fan is running at full speed, what should I do? The firmware is 22.03,thanks.

tried the solution above, with other values ?

I did not try it and the device was returned.

then why are you asking ... ? :thinking:

I like it very much, if I can solve it, I will buy it again.

I see, best of luck then, fingers X:ed.

I just configured mine and noticed my temps were crazy high.. Thanks for this.

before
fan1: 1200 RPM

after
fan1: 5160 RPM

hopefully i can find a happy medium where the fan has some longevity and the temps aren't too high.

I've spent two solid days with it, and plan to write up some takeaways on the device soon.

Cheers,
th0th

OK, I could not find any fan curve settings or software in the repository, so this "solution", and I use the term very loosely, is super kludgy, but it's simple and easy. After several minutes of grepping /sys/class/hwmon, changing the values for /sys/class/hwmon/hwmon6/pwm1 and comparing it with the output from sensors, I made a few observations.

  1. one of the sensors seems to consistently run hotter than the others (in my case it's the reading at /sys/class/hwmon/hwmon3/temp1_input), so I'll use this sensor for my fan setting. It reads around 56C when the fan is at full bore. At this same time, the two ISA sensors are at 39C and 41C.
  2. there is only 1 fan in my puzzle m902; changing /sys/class/hwmon/hwmon6/pwm2 has no effect, only pwm1 matters.
  3. I'm sure this is coincidence, but there seems to be a ~20x multiple between the pwm setting and fan speed. pwm=1 -> fanspeed=0, 50 -> 900, 100 -> 1980, 150 -> 3000, 200 -> 4020, 255 -> 5160.
  4. setting pwm1 to 0 stops the fan, I had thought maybe it would set it to auto. Does anyone know what the original settings were? I'm curious why my fan was originally running at 1200RPM (and ~80C) and what the settings were.

So I wrote a bash script that would look at the temp for hwmon3; if it was higher than 63C it would increase the PWM setting by 20, below 58C it would lower it by 20. I am running it via cron every 10 minutes.

#!/bin/bash
pwmsetting=$(cat /sys/class/hwmon/hwmon6/pwm1)
temp=$(cat /sys/class/hwmon/hwmon3/temp1_input)
tempC=${temp:0:2}.${temp:2:3}
fanspeed=$(cat /sys/class/hwmon/hwmon6/fan1_input)
now=$(date)
if [ $temp -gt 63000 ]
then
        if [ $pwmsetting -eq 255 ]
        then
                echo $now: ALERT temp at $tempC C. and fanspeed at maxmimum.  >> /var/log/fanscript.log
        elif [ $pwmsetting -gt 235 ]
        then
                echo $now: ALERT temp at $tempC C., changing fanspeed to maxmimum -- pwm1=255.  >> /var/log/fanscript.log
                echo 255 > /sys/class/hwmon/hwmon6/pwm1
        else
        newsetting=$((pwmsetting+20))
        echo $newsetting > /sys/class/hwmon/hwmon6/pwm1
        tempC = $(($temp/1000))
        echo $now: temp at $tempC C. and fanspeed at $fanspeed. Raised pwm setting from $pwmsetting to $newsetting >> /var/log/fanscript.log
        fi
elif [ $temp -lt 58000 ]
then
        newsetting=$((pwmsetting-20))
        echo $newsetting > /sys/class/hwmon/hwmon6/pwm1
        tempC = $(($temp/1000))
        echo $now: temp at $tempC C. and fanspeed at $fanspeed. Lowered pwm setting from $pwmsetting to $newsetting >> /var/log/fanscript.log
else
        echo $now: temp at $tempC C. and fanspeed at $fanspeed. Leaving pwm setting unchanged at $pwmsetting >> /var/log/fanscript.log
fi

I understand that this is not very elegant, so hopefully someone will come along with a real solution. In the meantime I guess I'll use this and check the logs to make some fine tuning. maybe I can add email alerts if fan at max and still running too hot etc.

Theoretically, these tasks should be done by the devices tree
but I don't know why it didn't work

You can optimize this shell

#!/bin/sh

# OpenWRT fan control

# SLEEP_DURATION and CPU_TEMP_CHECK need to be multiples of each other
EMERGENCY_COOLDOWN_DURATION=30
SLEEP_DURATION=5
CPU_TEMP_CHECK=20
DEFAULT_SPEED=100
EMERGENCY_COOLDOWN_TEMP_CHANGE=5                         

# DON'T MESS WITH THESE
VERBOSE=0
LAST_FAN_SPEED=$DEFAULT_SPEED
EMERGENCY_COOLDOWN=0                
EMERGENCY_COOLDOWN_TIMER=0                         
ELAPSED_TIME=0 
CPU_TEMP=0

# determine verbose mode
if [ ! -z "$1" ]; then
    VERBOSE=1
fi

echo "determin fan controller"
# determine fan controller
if [ -d /sys/class/hwmon/hwmon6 ]; then
    FAN_CTRL=/sys/class/hwmon/hwmon6/pwm1
elif [ -d /sys/devices/platform/pwm_fan ]; then
    FAN_CTRL=/sys/devices/platform/pwm_fan/pwm1
else
    exit 0
fi
echo "start fan control"
# retrieve new cpu temps
get_temps() {
    CPU_TEMP=`cut -c1-2 /sys/class/hwmon/hwmon6/temp1_input`  
}

# use this to make setting the fan a bit easier
#     set_fan WHAT VALUE
set_fan() {
    LAST_FAN_SPEED=`cat ${FAN_CTRL}`

    if [ $LAST_FAN_SPEED -ne $2 ]; then
        if [ $VERBOSE == 1 ]; then
            echo "setting fan to ${2} (${1}) ${FAN_CTRL}"
        fi

        # write the new speed to the fan controller
        echo $2 > ${FAN_CTRL}
    else
        if [ $VERBOSE == 1 ]; then
            echo "keeping fan speed at ${LAST_FAN_SPEED}"
        fi
    fi
}

# floating-point greater-than-or-equals-to using awk 'cause ash doesn't
# like floats. instead of this:
#     if [ $VALUE_1 >= $VALUE_2 ];
# use this:
#     if [ $(fge $VALUE_1 $VALUE_2) == 1 ];
float_ge() {
    awk -v n1=$1 -v n2=$2 "BEGIN { if ( n1 >= n2 ) exit 1; exit 0; }"
    echo $?
}

# start the emergency cooldown mode
start_emergency_cooldown() {
    if [ $VERBOSE == 1 ]; then
        echo
        echo "Starting Emergency Cooldown!"
    fi

    # toggle the cooldown bit to on and reset the timer
    EMERGENCY_COOLDOWN=1
    EMERGENCY_COOLDOWN_TIMER=$EMERGENCY_COOLDOWN_DURATION

    set_fan EMERGENCY 255
}              

# check for load averages above 1.0
check_load() {
    # loop over each load value (1 min, 5 min, 15 min)
    for LOAD in `cat /proc/loadavg | cut -d " " -f1,2,3`; do
        if [ $VERBOSE == 1 ]; then
            echo "Checking Load ${LOAD}"
        fi

        # trigger the emergency cooldown if we're using more than 1 core
        if [ $(float_ge $LOAD 2.5) == 1 ]; then
            start_emergency_cooldown

            break
        fi
    done
}

# makes sure that the temperatures haven't fluctuated by more than 1.5 degrees
check_temp_change() {
    TEMP_CHANGE=$(($3 - $2));

    if [ $VERBOSE == 1 ]; then
        echo "${1} original temp: ${2} | new temp: ${3} | change: ${TEMP_CHANGE}"
    fi

    if [ $(float_ge $TEMP_CHANGE $EMERGENCY_COOLDOWN_TEMP_CHANGE) == 1 ]; then
       start_emergency_cooldown;

       continue;
    fi
}

# set fan speeds based on CPU temperatures
check_cpu_temp() {
    if [ $VERBOSE == 1 ] ; then
        echo "Checking CPU Temp ${CPU_TEMP}"
    fi

    if [ $CPU_TEMP -ge 70 ]; then
        set_fan CPU 255
    elif [ $(float_ge $CPU_TEMP 67.5) == 1 ]; then
        set_fan CPU 223
    elif [ $CPU_TEMP -ge 65 ]; then
        set_fan CPU 191
    elif [ $(float_ge $CPU_TEMP 62.5) == 1 ]; then
        set_fan CPU 159
    elif [ $CPU_TEMP -ge 60 ]; then
        set_fan CPU 127
    elif [ $CPU_TEMP -ge 55 ]; then
        set_fan CPU 95
    elif [ $CPU_TEMP -ge 50 ]; then
        set_fan CPU 80
    elif [ $CPU_TEMP -ge 45 ]; then
        set_fan CPU 63
    fi
}

# start the fan initially to $DEFAULT_SPEED
set_fan START $DEFAULT_SPEED

# and get the initial system temps
get_temps

# the main program loop:
# - look at load averages every $SLEEP_DURATION seconds
# - look at temperature deltas every $SLEEP_DURATION seconds
# - look at raw cpu temp every $CPU_TEMP_CHECK seconds
while true ; do

    # handle emergency cooldown stuff
    if [ $EMERGENCY_COOLDOWN == 1 ]; then

        # reduce the number of seconds left in emergency cooldown mode
        EMERGENCY_COOLDOWN_TIMER=$((${EMERGENCY_COOLDOWN_TIMER} - 5))

        # do we still need to be in cooldown?
        if [ $EMERGENCY_COOLDOWN_TIMER -le 0 ]; then

            set_fan LAST $LAST_FAN_SPEED                              

            EMERGENCY_COOLDOWN=0                                      

            if [ $VERBOSE == 1 ]; then
                echo "Exiting Emergency Cooldown Mode!"
                echo
            fi

        else
            if [ $VERBOSE == 1 ]; then
                echo "Still in Emergency Cooldown. ${EMERGENCY_COOLDOWN_TIMER} seconds left."
            fi

            sleep $SLEEP_DURATION

            continue
        fi
    fi

    # save the previous temperatures                                    
    LAST_CPU_TEMP=$CPU_TEMP                                                                                                  

    # and re-read the current temperatures
    get_temps 

    # check the load averages
    check_load

    # check to see if the cpu, ram, or wifi temps have spiked
    check_temp_change CPU $CPU_TEMP $LAST_CPU_TEMP

    # check the raw CPU temps every $CPU_TEMP_CHECK seconds...
    if [ $(( $ELAPSED_TIME % $CPU_TEMP_CHECK )) == 0 ]; then
        check_cpu_temp
    fi

    # wait $SLEEP_DURATION seconds and do this again
    if [ $VERBOSE == 1 ]; then
        echo "waiting ${SLEEP_DURATION} seconds..."
        echo
    fi

    sleep $SLEEP_DURATION;

    ELAPSED_TIME=$(($ELAPSED_TIME + $SLEEP_DURATION))
done

fix done

Mhm I had just updated to the "23.05-SNAPSHOT" version with kernel 5.15.150.

Doesn't seem to work, it sets the FAN speed once at boot and it stays the same.

No idea why?
Any ideas?

I'm not sure if I understand the question correctly.
I had OpenWrt 23.05.2 installed before and now again the kernel version is 5.15.137

But in this version the RTC and the fan control do not work.

Both should be fixed by now.

The fan control (see commit) in 5.15.140

So I installed the current snapshot version with kernel 5.15.150.
The RTC works but apparently the fan control does not work as before.

I made an incorrect assumtion, I apologize.

Fan control works in the sense that the PWM is adjusted according the temperature. However, I'm also experiencing that after some time of use, the PWM needs to be set to higher values because the friction of the fan apparently increases and then is not sufficient any more. If you have time to play with it, try adjusting the PWM values for the fan in DTS, and make sure that the cooling device (pwm-fan) is associated with all thermal zones. As we also got a tacho it was be nice to just the target speed in RPM instead of trying to figure out PWM values which then may not work similarly well on all devices (and all ages of devices).

Well...

The problem is that it only adjusts the PWM setting once at system startup.

When the router is cold, the PWM value is set to 102 at boot and then never changed again
I have run several benchmarks to heat up the router, the temperatures have increased but the fan speed has not changed.

Then I restarted the router and then the fan speed was set to 170 at system startup.
I could observe how the temperatures dropped from 65°C to 55°C (f212a600mdiomii08-mdio-8)

However, the fan speed did not change even after the router had cooled down again.

The CPU cores also often reach over 75°C without the fan speed changing, so I would argue that the automatic control is not working at all.

See:

root@OCTEON-TX2:~# reboot
root@OCTEON-TX2:~# Connection to 192.168.1.1 closed by remote host.
Connection to 192.168.1.1 closed.


BusyBox v1.36.1 (2023-11-14 13:38:11 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.2, r23630-842932a63d
 -----------------------------------------------------
root@OCTEON-TX2:~# sensors
iei_wt61p803_puzzle-isa-f2702000
Adapter: ISA adapter
fan1:        1980 RPM
fan2:           0 RPM
fan3:           0 RPM
fan4:           0 RPM
fan5:           0 RPM
temp1:        +42.0°C  
temp2:        +39.0°C  

f612a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +47.1°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +54.8°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +55.7°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f612a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +48.4°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +49.2°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +57.9°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

root@OCTEON-TX2:~# cat /sys/class/hwmon/hwmon6/pwm1
102
root@OCTEON-TX2:~# stress-ng --cpu 4 --cpu-method matrixprod  --metrics-brief --perf -t 60
stress-ng: info:  [7468] setting to a 1 min, 0 secs run per stressor
stress-ng: info:  [7468] dispatching hogs: 4 cpu
root@OCTEON-TX2:~# cat /sys/devices/virtual/thermal/thermal_zone4/temp
75129
root@OCTEON-TX2:~# sensors
iei_wt61p803_puzzle-isa-f2702000
Adapter: ISA adapter
fan1:        1920 RPM
fan2:           0 RPM
fan3:           0 RPM
fan4:           0 RPM
fan5:           0 RPM
temp1:        +50.0°C  
temp2:        +43.0°C  

f612a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +55.9°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +62.0°C  (low  = +10.0°C, high = +60.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +63.0°C  (low  = +10.0°C, high = +60.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +70.0°C)

f612a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +56.7°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +56.1°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +65.1°C  (low  = +10.0°C, high = +60.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +70.0°C)

root@OCTEON-TX2:~# cat /sys/class/hwmon/hwmon6/pwm1
102
root@OCTEON-TX2:~# reboot
root@OCTEON-TX2:~# Connection to 192.168.1.1 closed by remote host.

BusyBox v1.36.1 (2023-11-14 13:38:11 UTC) built-in shell (ash)

  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 -----------------------------------------------------
 OpenWrt 23.05.2, r23630-842932a63d
 -----------------------------------------------------
root@OCTEON-TX2:~# sensors
iei_wt61p803_puzzle-isa-f2702000
Adapter: ISA adapter
fan1:        3360 RPM
fan2:           0 RPM
fan3:           0 RPM
fan4:           0 RPM
fan5:           0 RPM
temp1:        +48.0°C  
temp2:        +43.0°C  

f612a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +55.1°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +62.1°C  (low  = +10.0°C, high = +60.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +62.9°C  (low  = +10.0°C, high = +60.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +70.0°C)

f612a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +55.5°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +55.9°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +65.1°C  (low  = +10.0°C, high = +60.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +70.0°C)

root@OCTEON-TX2:~# cat /sys/class/hwmon/hwmon6/pwm1
170

20 minutes later...........
The router has cooled down and the PWM value for the fan is still 170.

I am not a developer but I can tell you that it does not work and the fan is regulated once at system startup - and never again.

root@OCTEON-TX2:~# uptime
 09:28:33 up 20 min,  load average: 0.00, 0.00, 0.00
root@OCTEON-TX2:~# sensors
iei_wt61p803_puzzle-isa-f2702000
Adapter: ISA adapter
fan1:        3300 RPM
fan2:           0 RPM
fan3:           0 RPM
fan4:           0 RPM
fan5:           0 RPM
temp1:        +42.0°C  
temp2:        +39.0°C  

f612a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +46.8°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +55.4°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii00-mdio-0
Adapter: MDIO adapter
temp1:        +56.2°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f612a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +48.1°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f412a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +49.2°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

f212a600mdiomii08-mdio-8
Adapter: MDIO adapter
temp1:        +58.4°C  (low  = +10.0°C, high = +60.0°C)
                       (crit low =  +0.0°C, crit = +70.0°C)

root@OCTEON-TX2:~# cat /sys/class/hwmon/hwmon6/pwm1
170

It would be nice if this would work without having to set it manually according to load.
If you need information and someone to test it, I would be happy to help you.

Please try this patch

diff --git a/target/linux/mvebu/files/arch/arm64/boot/dts/marvell/puzzle-thermal.dtsi b/target/linux/mvebu/files/arch/arm64/boot/dts/marvell/puzzle-thermal.dtsi
index 94677532f2..d9857261d1 100644
--- a/target/linux/mvebu/files/arch/arm64/boot/dts/marvell/puzzle-thermal.dtsi
+++ b/target/linux/mvebu/files/arch/arm64/boot/dts/marvell/puzzle-thermal.dtsi
@@ -1,4 +1,7 @@
 #define PUZZLE_FAN_THERMAL(_cname, _fan)					\
+	polling-delay-passive = <500>;
+	polling-delay = <1000>;
+
 	trips {									\
 		_cname##_active_high: cpu-active-high {				\
 			temperature = <80000>;					\

I build for a friend of mine with this patch and got a compile error:
Error: /media/egc/linuxdata/openwrt/build_dir/target-aarch64_cortex-a72_musl/linux-mvebu_cortexa72/linux-5.15.150/arch/arm64/boot/dts/marvell/puzzle-thermal.dtsi:3.2-15 syntax error

This is the patched puzzle-thermal.dtsi:

#define PUZZLE_FAN_THERMAL(_cname, _fan)					\
	polling-delay-passive = <500>;
	polling-delay = <1000>;

	trips {									                        \
		_cname##_active_high: cpu-active-high {				\
			temperature = <80000>;					                \
			hysteresis = <2000>;					                \
			type = "active";					                       \
		};								                               \
		_cname##_active_med: cpu-active-med {				\
			temperature = <72000>;					                \
			hysteresis = <2000>;					                \
			type = "active";					                        \
		};								                                \
		_cname##_active_low: cpu-active-low {				        \
			temperature = <65000>;					                \
			hysteresis = <2000>;					                \
			type = "active";					                       \
		};								                               \
		_cname##_active_idle: cpu-active-idle {				\
			temperature = <60000>;					               \
			hysteresis = <2000>;					               \
			type = "active";					                       \
		};								                               \
	};									                               \
	cooling-maps {								               \
		cpu-active-high {						                       \
			trip = <&_cname##_active_high>;				         \
			cooling-device = <_fan 3 THERMAL_NO_LIMIT>;		\
		};								                                \
		cpu-active-med {						                        \
			trip = <&_cname##_active_med>;				          \
			cooling-device = <_fan 2 THERMAL_NO_LIMIT>;		\
		};								                                   \
		cpu-active-low {						                           \
			trip = <&_cname##_active_low>;				            \
			cooling-device = <_fan 1 THERMAL_NO_LIMIT>;		\
		};								                                      \
		cpu-active-idle {						                               \
			trip = <&_cname##_active_idle>;				               \
			cooling-device = <_fan 0 THERMAL_NO_LIMIT>;		\
		};								                                        \
	}

Maybe missing some line continuation characters?