High average load but low CPU usage

My system load never (NEVER!) goes below 1.0 (single core CPU). Previously I was using LEDE and the load average used to be around 0.1 generally. My usage pattern has not changed.

  1. How to troubleshoot this? I understand that the load average has no relation to CPU usage. How to know what is causing this load?
  2. How to sort top results by CPU usage and Memory usage. This is busybox top and -O switch or SHIFT+m are not working.

I am using GL-AR150 router and their firmware version 3.0.5

image

No idea what is "different" with your current device and its firmware (which device and firmware would help).

I find htop (available as a package) a good tool, in general. Doubly so compared to the busybox top applet!

2 Likes

Load is not only CPU dependent. It could also be a slow storage medium or other random IO.
You'll have to give us some more info about what target you're using and which packages are installed.

If you have a process that writes data to a storage medium it will also cause the LOAD indicators to rise, or when either you're looking at LuCI and have autorefresh on which causes network load.

Your best bet is probably to disable a bunch of processes(services) until the 1m LOAD indicator drops. More like try and error.

2 Likes

I've seen something similar on Linux system, which was caused by processes rapidly creating and dying. That would cause the loadavg to go high while the CPU usage is low because top didn't include the statistics of those dead process. htop could capture that and show you the exact process causing the problem.

Also, from the screenshot, it seems there are 2 new born sleep processes which were created by mwan3track and gl_monitor. See if you could stop one of them to see if the loadavg goes down.

1 Like

Looks like you are running GL.Inet firmware. If you are having problem with the firmware, why not run the clean version and report back?

1 Like

I've had problems with mwan3 doing weird stuff. Try stopping it and see if it helps. If that fixes it then you can try to manually copy the newest version from git. There has been a lot of fixes and changes compared to the version in 18.06.1.

1 Like

I actually solved the issue. Please read on.

Thank you @jeff for your comment. OpenWRT forum didn't send any notification email about your reply. I thought that nobody replied. I got notified today about the new replies.

@Timeless Yes, that's what I did.

@rtau How did you know which parent processes created the sleep processes? I don't have detailed knowledge. So, I want to learn.

@sammo Yes, I am using GL-inet AR-150. The current firmware is in the testing stage. There are quite a few optimization problems IMHO. And @Per Thank you for your comment but it was not a mwan3 issue in my case.

Solution:
GL-inet adds some of their own software on top of the OpenWRT for their admin panel. It loads a module after boot (/etc/rc.d/S99gl_tertf), named ter-traffic which had D (Uninterruptible sleep) process state in the top output. (You can see the process tertf in my top output screenshot on my question). However, the D flag raised my suspicion and I unloaded it: rmmod ter-traffic. The load eventually decreased to about 0.2. I was informed later, it is a traffic statistics module for each client. I don't have their source code, so, I don't know why the module was in D state and whether it would have affected my router performance; but I don't need to see client statistics. So, I have permanently removed the init.d script.

In the future, while seeking help, please inform the community that you are not using the OpenWrt firmware.

1 Like

Thank you @lleachii. I will remember that. I edited my current post to include the info.

I wanted to know about 'how to troubleshoot load issues'. i.e., if there is any general way to find out which process is causing the load even when they are not using much CPU. And how to sort the busybox top output. Thus, I omitted the custom firmware part.

1 Like

From your screenshot, the PPID column shows the parent process ID. For the meaning of other columns, please refer to the manpage.

2 Likes

Thank you Raymond :slight_smile:

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.