MT7621 kernel 4.14.36 Vert high Load Average up to 2.75 without visible CPU consuming processes

LEDE 17.01.4 kernel 4.14.36
As you see on top bellow there is no one process that uses CPU

Are you running a LuCI page with AUTO REFRESH ON enabled at the same time?

1 Like

yes but as you see it doesn't take processor time at all

  • Are you saying that because of your picture?
  • You do understand how top works?
  • You believe that your hostapd and SSH Server take more resources than LuCI?

Those are just snapshots per sample interval (once per second). When I use SNMP to get my usage, LuCI can take up to 30% (or more) of the average CPU time during the SNMP sample interval (5 minutes), depending on how many tabs are open.

Hope this helps.

Hello this load stable in 5 minutes at least this is very strange. After it goes down some times it again goes up.
The most strange thing that I can't undedstand I don't see any process that takes more than 1% of CPU time.
If I understand right if Load average is > than 2 CPU load must be more then 100 %. But I don't see it and watching our routers a lot of time. I have 27 routers with same FW they are connected in daisy chain. Most of them have Load Average less than 0.2 but some of theme some times get Load Average more than 3. If somebody has any clue I will glad to know it.

Close any and all Luci sessions connected to the router, and run top or htop via ssh. What is your idle CPU % there?

Luci is not an accurate way of measuring load, as it can generate a lot of CPU load itself when it is running.

1 Like

How did you measure a 5 minute sample interval (as top doesn't do that)? Did you install SNMP?

Load and CPU usage do not necessarily interrelate. Load depends on how many processes are using all available CPUs. On a single core, a load of 1 is 100% CPU usage. On a dual core, a load of 2 equals 100% usage. Some MIPS and ARM CPUs can run 2 process in a clock cycle, offload TCP and encryption, etc. though.

In addition, LuCI runs various scripts (i.e. top, uptime, ifconfig, iwconfig, swconfig, ARP, etc.) on the various pages with AUTOREFRESH ON. This is why I suggest not running LuCI while attempting to get a accurate system load, CPU and memory reading.

Are you running a LuCI page with AUTO REFRESH ON enabled at the same time?

I was worried that my slightly complicated VPN setup was choking my router. I did see a ridiculously high load average when I checked it via LuCI. I tried disabling services and all, but no dice. It was still pegged at 6++ load average (1-core router).

And then, I saw your post.

Started monitoring load average using top via SSH. When I closed all LuCI tabs, load average plummeted to around 0.2, reaching as low as 0.03.

Just an anecdote for anyone facing the same issue.

1 Like

Hi,
know exactly what you are talking about!
And I have a full solution to drop the Load between 0.02 ~ 0.10 while continuously using Luci.

I deep investigated this problem, found where it occurs, and found a solution.

The BIG problem starts with the Luci interface Lua Scripts that runs on the embedded OpenWRT mini http server.(I will explain.)

Introduction:
Luci interface makes heavy use of CGI-BIN LUA Scripts. They are necessary to do almost everything on the interface. Some of them exists to update things in realtime, like indicators and dynamic text changes (just to mention a few). And when the page is opened on your browser, scripts are being called from time to time [via AUTOREFRESH] to retrieve updated informations and display them on the browser
[No news, everybody here knows how it works].

.
But why this causes High Load? (sometimes almost freezing the router out of resources?)

OpenWRT uses an embedded HTTP Server, called uhttp, that is responsible to process LuaScripts that reside on Luci interface and when AUTOREFRESH calls them.

The root of the problem:
uhttp comes pre-configured to run a maximum of 3 scripts at the same time in parallel. And Luci pages have calls to various different scripts. They keep running all the time to update indicators and graphs on Luci interface [when it's opened in borwser] and those scripts are getting stucked while uhttp is trying to run 3 of them at the same time!

This is the origin of the HUGE LOAD that occurs while autorefresh is ON. It´s uhttp cgi-bin subprocess that is originating the LOAD while running various scripts at the same time.

The proposal fix:
To fix it, we have to change the way uhttp deals with scripts processing, by changing uhttp configuration to run scripts sequentially [only 1 script at a time]. This will drop the load below 0.10 [and you can have as many opened Luci windows as you want (and with autorefresh ON)] :grinning:.
By changing this behavior all of the opened Luci windows will continue updating things in realtime, and the load will be no more a problem for your router. CPU will also be idle by 95%+ even with Autorefresh ON and having many browsers windows opened and updating things.

The logic:
By running 1 script at a time makes them run sequentially, and sequentially they run in milliseconds, this eliminates the HighLoad, also eliminates all the LAGs on the interface, and there is no side-effects (I extensively tested it and debuged this behavioral change. Did not find any problem until now) .

Changing it results in benefits and optimization.

.
.
Procedure:
Edit /etc/config/uhttpd

Change option max_requests value to 1.

option max_requests '1'

Save the file, reboot the router.
No more high load average and you get free CPU cicles to run other process and programs. Your router will fly smooth and fast!

.
.
Technical details:

That parameter line refers to to SCRIPT REQUESTS, not HTTP connections. It determines the quantity of CGI-BIN Lua scripts that can be executed simultaneously. (HTTP max connection is another parameter, and don't need to be changed)

Analysis from the Application point of view [and usage]

  1. There´s no technical reason [and no necessity] to run more than 1 lua script at the same time on the web interface.
  2. By running them sequentially, they run in milliseconds.
  3. Others Luci interface script-calls that exists on the webpage gets automatically queued to be processed by uhttp in sequence [so no lost information, no lost time, no lost attempts].
  4. Being queued makes they run instantly after the before-one has finished [and we are talking about milliseconds here]. This means the returned new information [returned by the script to the interface] is populated on the Luci interface instantly, in milliseconds.
  5. For us humans, milliseconds means realtime, so AUTOREFRESH stays realtime at our practical human experience and view.
  6. Luci interface is informative, I love it, but it must not [it should be prohibited to] cause a High Load on the router. Router has many more important things and jobs to do with it's resources and CPU cycles.
  7. We love to see realtime things in Luci Interface, we need to see it, and sometimes we need to keep the interface opened for hours. It is common and it should not prejudice the router itself.
  8. We cannot have the luxury to waste the router precious [and limited] resources with a stuck script via the interface.
  9. And again, there is no necessity/need [as I investigated] for those scripts to run in parallel. (please correct me if I missed something ou if I'm wrong here)

.
Analysis from the Router point of view [and health]

  1. This modification completely eliminates the High Load and High CPU usage / High SYS usage (that Luci scripts causes)
  2. This modification allow more resources to be used for other processes and jobs
  3. It stops Luci interface from slowing down the router

.
Results:

  1. a faster router, faster applications running, faster throughput. Because Luci interface no more causes High Load.
  2. a faster Luci interface rendering, with pleasure interface navigation and comfortable usage, hundreds times faster, yes, hundreds (as I measured).

.
Final personal observations:
Luci interface seems to be the origin for HighLoad on various OpenWRT routers. I looked around this forum and found similar topics from people suffering with the same problem, some topics from a long time ago, with no solution. My intention and focus on this analysis was to understand the origin of the problem and fix it. Beside this intention, when I found the cause, I questioned myself: "Why? Why do we need to run the interface scripts in parallel?"

For sure there may exist [technically] something wrong happening on a layer below the CGI-BIN script layer [of uhttp] that causes the Load. But again, do we need to run 3 scripts in parallel on the interface? Why 3? Why this number 3? Probably because three is the default of uhttp basic configuration. But for out goal, for Luci interface, there's no need for parallel execution. Or is there a reason for that and I don't know?

I work for decades with data networks, and this is similar to a simple data congestion problem, so let's solve it the same way QoS do for data networks, the same way disks do to write data fiscally and concurrently: Let's queue it to be processed super fast and keep things running smoothly!

Luci is an amazing interface, really good, beautiful and well done interface. With this modification, it becomes fast and very light for the router execute it.

.
Please make my proposed change on your router and share your feedbacks here.

Regards,
Rafael Prado

1 Like

think something is wrong with the average cpu load and MT7621. Even with the new OpenWrt 21.02.0.
Sometimes it marks me almost 0 and most at 1.00 and the cpu % don't change ~0%.

Sin nombre

Edit: Random example after a period of time