OpenWrt Support for Armor G5 (NBG7815)

The nbg7815 branch is base on openwrt main branch. I will also push commits that i would think is going to be good for this router.

The nbg7815-23.05 is base on openwrt-23.05 branch. Like before branch i'll update with commits tested before.

The test branch is for testing commits that i think its can fix problem or lack of performance.

The led_fan_support branch is there to get a fixed reference to the led and fan support. It wil only be update when something change in main openwrt that will broke that support.

If you want to be on the more stable branch go for the nbg7815-23.05 one

1 Like

I'm tring to turn back in order to test if wifi performance are the same from original firmware and openwrt but ....

Return to original firmware from Openwrt
sh change_boot_partition.sh

return this error

OpenWrt release
1+0 records in
1+0 records out
1+0 records in
1+0 records out
Could not open mtd device: /dev/mtd2
Can't open device for writing!
Could not open mtd device: /dev/mtd3
Can't open device for writing!

Some one can help me?

If you have updated openwrt once you have installed it you will need serial access.

i don't think i updated it, how can i check?

I'm sorry. I've no knowlege about your matter. I can't help you.

My goal is to create a led and fan driver supported firmware that includes openwrt master changes. Is there any led and fan driver support in nbg7815-23.05 branch? I saw some fan related changes in nbg7815-23.05 branch, but I wanted to ask to confirm.

Yes, it has led and fan support

1 Like

Hey @robimarco.
I take the patch of Enrico for fan support on nbg7815.

Initially I have simplified the patch to determine what temperature may be appropriate.

I have changed the temperature from 70ºC to 75ºC because at 70ºC it does the fan always works under normal conditions in a room at 24ºC.

Without a fan, the temperature reached by the CPU is around 77ºC in the aforementioned room.

You can see the commit here.

The problem is that the hysteresis doesn't seem to work since once the cpu reaches 75º the fan starts turning on and off in very short periods of time (1 or 2 seconds) without taking into account the 5º that I initially assigned.

I have tried different values for hysteresis and it is always the same.
Can you review the code? Is there a bug that prevents normal operation?
Thanks.

I dont really have time to debug tsens, last time I tested it, it properly waited for the hysteresis to trip again.

There are some debug prints that you can enable in tsens to see whats going on.

Ok. Thanks for the hint.

I've tried to adjust settings as well and trying to fine tune. So far I'm struggling to monitor temperature via collectd to verify my settings easy for a longer period. Without thermal in dts collectd is producing a graph. As soon its in dts included it does not anymore.

But why did you strip aqr_thermal? This is actually the control when the fan starts and how long. Because I think the CPU is not relevant and the entries just there.

We have 3 sensors:

ath11k_hwmon-isa-c000000 = very hot at all >=80°C (with fan in action it is around 85°C); this sensor is present 2 times and cannot be read due to this behaviour. There is sth. wrong. On top the 2nd one has a range from -255 to +255 (just garbage).

tmp103-i2c-0-70 = is always about 10°C lower then 90000mdio108-mdio-8

90000mdio108-mdio-8 = is the relevant sensor for us; I've installed OEM FW for this to test. I've meassured the time span from a cold device running idle until the fan started the first time. It's if 90000mdio108-mdio-8 reaches 70°C. To be sure I meassured the time span for OpenWrt reaching 70°C on 90000mdio108-mdio-8. It was about the same.

The on off thing is controlled by (CPU is IMO irrelevant here because we have a sensor):

	thermal-zones {
		aqr_thermal: aqr-thermal {
			polling-delay-passive = <30000>;
			polling-delay = <90000>;
			thermal-sensors = <&aqr113c>;
		};

&aqr_thermal {
	trips {
		aqr_thermal_active: aqr-thermal-active {
			temperature = <75000>;
			hysteresis = <5000>;
			type = "active";
		};
	};

	cooling-maps {
		map1 {
			trip = <&aqr_thermal_active>;
			cooling-device = <&fan THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
		};
	};

This is my test setting atm. While polling delay is 90 seconds. This indicates the fan has to run a minmum of 90 seconds after 75°C is hit for more the 5s (hysterisis). If the fan is not running arq113c is checked every 30 seconds (polling-delay-passive) if set active.
Hysteresis is giving you a timespan of 5s the sensor can be over its limit before the fan is set to active. If its set on active then the fan starts on next check. That is how I understand it.
With this setting the fan is starting ~ every 10-15 minutes if the device is idle with USB3 device connected.

That's how I understand it.

Despite that I have still lets call it hick-ups while the fan starts for a few seconds and stops after a few seconds. But I think this is due to the CPU settings which I didn't touch so far. I will strip it if I have a script to monitor the sensor without collectd.

EDIT: The device itself is very hot around the 10G connector. That's why I think aqr113c was choosen. And www was telling me it has an mdio management interface. Where I think it provides temperatures as well? So it makes sense to me to use this?

1 Like

Hello @pwned
My approach have been initially to simplify it as much as possible to see if it works correctly.
With a single element it is more comfortable for me to check the operation.
I have chosen cpu0 for simplicity.

Yes I know.
ath11k_hwmon-isa-c000000 is problematic. it has 3 sensor (hwmon3, hwmon4 and hwmon5 if you have define gpio_fan and hwmon2, 3 and 4 without gpio_fan) with the same name. if you put it on luci-statistic it generate a awfull graph (if you can get one), at least for me.

I think that the hysteresis value is expressed in 1/1000ºC and not in 1/1000 seconds. Are you sure they are in s.?
For me, the important thing is that the fan works and I see that you have managed to make it work in a reasonable way without going into the end of the problem that exists with the hysteresis parameter because I have modified this parameter from 50 to 30000 and it always works in the same way.

PD: I allways get the hotter temp on wifi radio2 sensor (hwmon5) if i have it enable.

I have 2 nbg7815 devices. I use one as the main router and one as an AP in mesh mode. I installed the firmware on the device with the AP. Major doesn't seem to have any problems. The led works but I'm not sure if the fan is working. It seems that the script /sbin/fan_ctrl.sh failed in cron jobs. When I run the script manually, no error is returned. I don't know if it has an effect on the fan working or not.

Thanks so much for your hard work.

Fri Jun  2 10:35:00 2023 cron.err crond[2007]: USER root pid 3695 cmd /sbin/fan_ctrl.sh
Fri Jun  2 10:40:00 2023 cron.err crond[2007]: USER root pid 3714 cmd /sbin/fan_ctrl.sh
Fri Jun  2 10:45:00 2023 cron.err crond[2007]: USER root pid 3722 cmd /sbin/fan_ctrl.sh
Fri Jun  2 10:50:00 2023 cron.err crond[2007]: USER root pid 3738 cmd /sbin/fan_ctrl.sh
Fri Jun  2 10:55:00 2023 cron.err crond[2007]: USER root pid 3747 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:00:00 2023 cron.err crond[2007]: USER root pid 3764 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:05:00 2023 cron.err crond[2007]: USER root pid 3773 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:10:00 2023 cron.err crond[2007]: USER root pid 3790 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:15:00 2023 cron.err crond[2007]: USER root pid 3799 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:20:00 2023 cron.err crond[2007]: USER root pid 3807 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:25:00 2023 cron.err crond[2007]: USER root pid 3824 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:30:00 2023 cron.err crond[2007]: USER root pid 3832 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:35:00 2023 cron.err crond[2007]: USER root pid 3850 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:40:00 2023 cron.err crond[2007]: USER root pid 3857 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:45:00 2023 cron.err crond[2007]: USER root pid 3874 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:50:00 2023 cron.err crond[2007]: USER root pid 3883 cmd /sbin/fan_ctrl.sh
Fri Jun  2 11:55:00 2023 cron.err crond[2007]: USER root pid 3900 cmd /sbin/fan_ctrl.sh
Fri Jun  2 12:00:00 2023 cron.err crond[2007]: USER root pid 3909 cmd /sbin/fan_ctrl.sh
Fri Jun  2 12:05:00 2023 cron.err crond[2007]: USER root pid 3926 cmd /sbin/fan_ctrl.sh
Fri Jun  2 12:10:00 2023 cron.err crond[2007]: USER root pid 3935 cmd /sbin/fan_ctrl.sh
Fri Jun  2 12:15:00 2023 cron.err crond[2007]: USER root pid 3945 cmd /sbin/fan_ctrl.sh

Check this post just in case you got the bad fan_ctrl.sh script.

1 Like

Yes I'm sure.

Let's say the Temperature is fluctuating between 74°C and 75°C constantly. The fan is constant doing ON/OFF (like if you use the original dts patch with its low polling delays).

If you interprete it as a °C value it would not make much sense to have a value like 1/1000°C. Because the silicon would run an extra 5°C on top of 75°C. The temperature in connection with a time period makes sense. Because it does not damage the silicon if it runs 5s out of spec. And it explains why the fan is put on/off constantly.

That were my thoughts before I've found this:

https://www.kernel.org/doc/Documentation/devicetree/bindings/thermal/thermal.txt

And it tells its seconds.

Y, wifi is running hotter. But I don't know if the value is for real and I think OEM is using 90000mdio108-mdio-8 as reference also. So for me it is more interesting how often/long the fan is running on OEM firmware to compare with it on OpenWrt. To be on the safe side. Therefore I have to write a script collecting Temperature values on OEM and OpenWrt.

I just did a build with stripping cpu thermal out of DTS. I will let it run a view hours and see what happen about the hick-ups.

EDIT: Stripping out CPU and Cluster (just aqr left) is making sensors looking better:

root@NBG7815:~# sensors
ath11k_hwmon-isa-c000000
Adapter: ISA adapter
temp1:        +65.0°C  

tmp103-i2c-0-70
Adapter: QUP I2C adapter
temp1:        +65.0°C  (low  = -10.0°C, high = +60.0°C)

90000mdio108-mdio-8
Adapter: MDIO adapter
temp1:        +74.0°C  (low  = +70.0°C, high = +75.0°C)  ALARM (CRIT)
                       (crit low =  +0.0°C, crit = +70.0°C)

ath11k_hwmon-isa-c000000
Adapter: ISA adapter
temp1:        +83.0°C  

ath11k_hwmon-isa-c000000
Adapter: ISA adapter
temp1:        +63.0°C  

gpio_fan-isa-0000
Adapter: ISA adapter
fan1:           0 RPM  (min =    0 RPM, max = 4500 RPM)

ath11k has suddenly values making sense here. And I didn't notice hick-up so far. Fan is running ~15 minutes for 90 seconds with settings I've mentioned previously.

What I still not understand here are where the values for low (low I can imagine), high (high I understand) crit low and crit are comming. Or where I would set them within DTS if possible.

Thanks for the link.
But you can read hysteresis parameter is in milicelsius.

* Trip points

The trip node is a node to describe a point in the temperature domain
in which the system takes an action. This node describes just the point,
not the action.

Required properties:
- temperature:		An integer indicating the trip temperature level,
  Type: signed		in millicelsius.
  Size: one cell

- hysteresis:		A low hysteresis value on temperature property (above).
  Type: unsigned	This is a relative value, in millicelsius.
  Size: one cell

I saw it somewhere. I remember it because it caught my attention. 70ºC as a critical temperature seemed low to me.

Hey @sahindirek
If you can run the scrpit from ssh it is working as it should.
dlog message is advertising the script had been launched. This is the way the fan support has been implemented.

2 Likes

Ups, then you are right and I need to put on my glasses next time. :slight_smile:

Anyway. It does not change the outcome at all.
75+5 so we end at 80?
E. g. if I let a fan start at 75°C, it would start there, cool down to 74°C, switch off again right away and then start again right away at 75°C. Then I put on hysteresis and say the fan has to start at 80°C (75+5) cool down until 75°C is reached again. Which is then just hold for a short period of time and fan starts again (which does not work in reality as you've wrote). Therefor we have the delay times. I hope I got it right now.

For now it looks like fluctuating between 73-78 (I never saw 80 so far in my previous tests):

I will test with 3000ms and compare.

Don't waste your time. This is what it is not working.
It will work the same way you put 30000 or 50. This is for what I wrote for help.

EDIT: I rewrite my hole post because i misundestood how thermal work.

I initially set these values to test its operation. It would be more logical to set a hysteresis value of 2000-2500 to have a 4º or 5º oscillation. 10º seems to high.
But first it's necessary understand why 'hysteresis' it is not working.

With temperature=75000 and hysteresis = 5000 fan will should swtich on when sensor reach 75ºC and should switch off when sensor go down to 70ºC

I have a different point of view.

The way your configuration should work should be the following.
The system checks the temperature and if it is higher than 75º+hys the fan turns on and remains at least 90 seconds or until the temperature is lower than 75º-hys, at this point the fan remains at least 30 seconds at rest and it does not turn on again until it reaches 75º+hys and that is how we return to the beginning.

With your config I assume that the fan initially starts working when the designated element reaches 75º, the fan stays on for 90 seconds and turns off if the temperature has dropped below 75º, it stays at rest for at least 30 seconds and after this time checks the temperature and if it is higher than 75º the fan turns on again.

The way your configuration should work should be the following.

The system checks the temperature and if it is higher than 75º the fan turns on and remains until a new reading get temp under 75º-hys, reading are done every 90 sec. the fan will switch on again at 75º and it began again.

With your config I assume the fan initially starts working when the temperature of the designated element reach 75º, the fan stays on during 90 sec. (polling-delay) if temp is now under 75º system will stop fan and wait 90 sec to get a new reading

polling-delay-passive is for passive cooling (frenquency down, volt down....) and it is not interesting for what we are looking for.

1 Like

Same problem here. Anyone can help?