Possible cause of R7800 latency issues

draigun · March 30, 2018, 10:33pm

Just wanted to briefly say: to all attempting to solve this issue, it is appreciated. I'm unfortunately not in a position to play around with or test my R7800 (in fact, I'm on stock due to the ping issues, and other issues with netlink bandwidth monitor).

For what it's worth, my unit has 'Antenna #' on the antennae; and I also experience ping spikes. Switching over to stock, as expected, the latency issues go away.

Not as detailed and thorough as some of your posts here, but have a look at my own evidence located in this post here.

As far as I could remember, kernel log is the same as hnyman's when booting; so same flash (Micron), RAM (Nanya), CPU stepping/revision, etc..

Truly is very strange that some users don't experience this issue... So then, maybe it has something to do with the connection between a certain modem and it's WAN connection to the R7800. Maybe there is something odd with the switching between the QCA8337 and the respective user's modem (does that even control the WAN port?)? I'm using an SB8200 as the modem, so it's a Broadcom BCM3390Z.

fantom-x · March 31, 2018, 10:59am

Here is a proper answer: http://devarea.com/understanding-linux-kernel-preemption/. Preemption is about a low priority thread not blocking a higher priority one. So it makes sense that voluntary preemption has improved latency, but a pre-emotive kernel is the best. Not real time, though.

On embedded systems with soft real time requirements it is a best practice to use this option but in a server system that we are usually work asynchronously the first option is better – less context switches – more cpu time

The default kernel configuration is definitely wrong. It is unfortunate that best option is not booting.

Ansuel · March 31, 2018, 11:55am

We need to find why it doesn't boot... This should be stock for a stock image

fantom-x · March 31, 2018, 12:27pm

I have just realized that I used @hnyman's build with his patches. I will try building plain vanilla firmware now: there is a chance of a conflict...

fantom-x · March 31, 2018, 4:54pm

No such luck: the firmware built from master does not boot either. Anyone here with serial access willing to lend a hand? I can share pre-built images...

This is what I am getting when pinging 8.8.8.8 with a voluntary preemption kernel and performance governor.

kyva1929 · March 31, 2018, 6:55pm

How does it look with default cpu governor?

kyva1929 · March 31, 2018, 7:23pm

Hey guys, some data points here:

With what huaracheguarache suspected as the root cause for the latency spike, that is, a kworker thread hogging 40%+ cpu every 2-3 seconds which correlate in time with the latency spike observed, I found the following:

On the same ipq806x platform based r7500v2, there is one kernel thread exhibiting exactly the same issue with 10h18:14 total cpu time consumed by a kworker thread for a router that has been running for 10days+15 hour.
The kernel version is

root@X4:~# uname -a
Linux X4 4.9.87 #0 SMP Tue Mar 20 20:45:27 2018 armv7l GNU/Linux

On a f9k1115v2 with a recent kernel (ar71xx) that has ran for 7days+13hours, a similar kworker thread consumed 1h23:00 cpu time

root@f9k1115v2:~# uname -a
Linux f9k1115v2 4.9.87 #0 Sat Mar 17 23:59:18 2018 mips GNU/Linux

However, in 4.4.X kernel version for a n150r that has been up for 130days+, a kworker thread consumed only
58:49.64 cputime.

Similarly for a wrt1900v1 with 4.4.X kernel version that is also up for 130days+, there are three kworker threads with:
0:20.62,
0:10.57,
0:47.13

cputime consumed by three separate kworker thread.

TLDR:
This seems to indicate that in general the 4.4.X -> 4.9.X switch introduced some processes that hogs the cpu more so in 4.9.X than in 4.4.X. I also observe cpu spikes in one of the kworker thread in 4.4.X kernel but they are less frequent and max cpu% I've seen is only ~8%

Evidence: On 4.4.X kernel in ar71xx platform and mvebu platform, sum total of kworker thread cpu time is less than 2 hours for two devices that has been up for 130days+.

On 4.9.X kernel in ar71xx platform and ipq806x platform, sum total of kworker thread cpu time is 1h23min/7days13hour, and 10h18min/10days15hour, which seems to be a substantially larger portion. In addition, on both of these devices I observe the kworker thread hogging 10-30% (ar71xx), and 40-80% cpu% (ipq806x) every 2~3 seconds.

kyva1929 · March 31, 2018, 7:35pm

Not sure if anyone in this thread already tried:

kernel 4.4 for ipq806x got removed by this commit which is not long ago:
https://github.com/openwrt/openwrt/commit/3a3564ead5e4cf2f6ff73302c1e680b5575079ec

Prehaps you guys can try building with 4.4.X kernel and see there's any difference for the r7800.

shm0 · March 31, 2018, 9:05pm

Full preemptive kernel boots fine here on wrt1200.
First thing i noticed, bootup has noticeable decreased.
Btw when you do compile your own firmware. Don't use O3 optimization.
I noticed on my device O3 noticeable increases bootup time.
And i guess it also slows down other things.
I always stick to O2.

fantom-x · March 31, 2018, 9:09pm

Where do I change that?

shm0 · March 31, 2018, 9:21pm

In menuconfig
Under Advanced Configurations Options -> Target Options -> Target Optimizations
On my device default is:

-Os -pipe -mcpu=cortex-a9 -mfpu=vfpv3-d16"

So i changed that to:

-O2 -pipe -mcpu=cortex-a9 -mfpu=vfp3

Os optimizes for size.
Change that to O2 to optimize for performance.

There was also a discussion on irc. If it is not better to copy the target options over to the additional compiler options. So those optimization get more used in the build process.
You can find the "Additional compiler options" directly in the advanced configurations options.
Default here on my device is: -fno-caller-saves -fno-plt
So you would combine both:
-O2 -pipe -mcpu=cortex-a9 -mfpu=vfp3 -fno-caller-saves -fno-plt
So it looks like this and enter them here -> Advanced configuration options -> Additional compiler options
And clear the target optimizations in Advanced Configuration Options -> Target Options -> Target Optimizations

fantom-x · April 1, 2018, 12:11am

Now the image is too big at 2231183 Bytes = 2178.89 kB = 2.13 MB. What could I safely remove from it? I have not added any USB modules, etc.

shm0 · April 1, 2018, 12:45am

2.13 Mb, actually looks quite too small?

and R7800 has 128 mb flash ? And like 24 Mb or so that can be used actually.
Like on the wrt* devices.
My image is roughly 8 Mb.

fantom-x · April 1, 2018, 12:47am

I think 2MB is the size of the kernel partition. I have found steps to extend it to 4MB, but not sure if I could revert back to stock if I do so.

shm0 · April 1, 2018, 12:55am

So r7800 has only a 2mb kernel partition size?
wrt* has 40mb kernel partition size.
//edit
it is 6mb

Hmm thats odd.
Before you mess around with removing things.
I would recommend to undo the changes. And leave it at Os.

kyva1929 · April 1, 2018, 7:24pm

Hi guys,

I compiled a new firmware with strace and ftrace, plus 1000hz timer clock + voluntary preemption for a r7500v2. I disabled firewall and unloaded all nf_conntrack modules.

Here's what I found correlating the high cpu usage (40-80%) that appears in one kworker thread every 2~3 seconds. Previously OP was able to correlate the high cpu usage to the latency spike he observed. Here's the log

 kworker/0:1-25    [000] d.s.   595.265983: workqueue_queue_work: work struct=cd131484 function=netdev_trig_work workqueue=cd880100 req_cpu=4 cpu=0
 kworker/0:1-25    [000] d...   597.067342: workqueue_queue_work: work struct=cde0aa78 function=vmstat_update workqueue=cd9f4700 req_cpu=0 cpu=0
 kworker/0:1-25    [000] d...   597.067353: workqueue_queue_work: work struct=cde18a78 function=vmstat_update workqueue=cd9f4700 req_cpu=1 cpu=1
 kworker/0:1-25    [000] dns.   597.428010: workqueue_queue_work: work struct=cd131484 function=netdev_trig_work workqueue=cd880100 req_cpu=4 cpu=0
 kworker/0:1-25    [000] d.s.   599.604000: workqueue_queue_work: work struct=cd131484 function=netdev_trig_work workqueue=cd880100 req_cpu=4 cpu=0
 kworker/0:1-25    [000] d...   600.043072: workqueue_queue_work: work struct=cde18a78 function=vmstat_update workqueue=cd9f4700 req_cpu=1 cpu=1
 kworker/0:1-25    [000] d.s.   601.708078: workqueue_queue_work: work struct=cd131484 function=netdev_trig_work workqueue=cd880100 req_cpu=4 cpu=0
 kworker/0:1-25    [000] d...   602.077568: workqueue_queue_work: work struct=cde0aa78 function=vmstat_update workqueue=cd9f4700 req_cpu=0 cpu=0

basically I have two ssh session opened, one doing htop -d1, the other doing

cat /sys/kernel/debug/tracing/trace_pipe | grep kworker

The cpu usage spike for the kworker thread happens at the same time as the netdev_trig_work / vmstat_update function call.

huaracheguarache · April 1, 2018, 7:26pm

I just tested the TP-Link Archer C7 v2:

archerc7

archerc72

I ran two 30-minute sessions and got only isolated spikes of under 15 ms latency. The router was connected to two laptops with one acting as a ping target connected to the WAN port, and the other was the one sending the pings connected to a LAN ethernet port. A ping session with the laptops connected directly to each other produced no spikes:

laptop

I took a look at what htop -d1 produced and kworker activity was much lower compared to the R7800. Every ~2 seconds a kworker briefly used 5-20% of the CPU. On the R7800 this activity uses about 40-80% of one CPU core during those brief spikes.

In a previous post I tracked these kworker spikes to gc_worker [nf_conntrack]. Someone needs to take a look at why gc_worker is producing these spikes.

huaracheguarache · April 1, 2018, 7:27pm

Hmm, so it's not gc_worker [nf_conntrack] then?

kyva1929 · April 1, 2018, 7:39pm

I think its likely that the cpu spike has nothing to do with nf_conntrack. If I load the nf_conntrack module, the trace pipe is simply filled with nf_conntrack logs like this

btw, I also did some ping with a kernel that uses timer clock 1000hz + voluntary preemption. Even in the presence of those high cpu usage spike, the latency plot is much cleaner compared to the previous plots I produced on the same router.

Figure_1

The latency plot with nf_conntrack module loaded looks about the same.

Also, I checked that netdev_trig_work has something to do with led trigger. From my memory there were some work done on ipq806x platform leds and they are also only partially working. Could this be a cause for the high cpu usage?

kyva1929 · April 1, 2018, 7:59pm

btw, I have no hard numbers to backup my claim. But it seems like setting timer clock = 1000hz and voluntary preemption (maybe only of them is sufficient) makes webpage loading feels much snappier.

From now on I think all my custom firmware will be built with these settings.