Netgear R7800 exploration (IPQ8065, QCA9984)

fantom-x · March 18, 2018, 12:57pm

And surprisingly, not all of them are affected. BTW, I just built the latest master and the latency spikes seem to have become lower and further apart...

huaracheguarache · March 18, 2018, 1:12pm

What about the Zyxel NBG6817? It would be interesting if someone could run some tests on that router since it also has the IPQ8065 and QCA9984.

That's what I find strange. Although there has been talk about different hardware versions I find it more likely that the changes are just cosmetic. If they changed the internal workings of the router wouldn't they need to apply for a new FCC certification?

slh · March 18, 2018, 3:08pm

No latency spikes on the nbg6817, the variation between ping times is very small.

huaracheguarache · March 18, 2018, 3:09pm

Here's a plot of the latency I experience: https://imgur.com/a/1gcx1

I'm currently running snapshot r6469-5862f01.

fantom-x · March 18, 2018, 3:21pm

Care to try this: Netgear R7800 exploration (IPQ8065, QCA9984) - #954 by bouwew

huaracheguarache · March 18, 2018, 3:28pm

I'll take a look when I have time later today.

bouwew · March 19, 2018, 9:40am

About the latency issue, I finally managed to go back to stock (Voxel's .49SF), by using the windows tftp-command, and then ran a long ping session, during 3000 seconds.
The results are: min ping time: 7ms, max ping time 35ms (until about 2800 secs it was 21ms, between 2800 and 3000 secs it went to 35ms), average ping time 11ms, 1 ping attempt failed.
My hardware is the version with the printing on the antenna's (2x antenna 1, antenna 2, antenna 3).

huaracheguarache · March 19, 2018, 10:02am

I ran a ping session last night on stock firmware (1.0.0.2.46 downloaded from NETGEAR website). Here are the results: https://imgur.com/a/2Y0VK

The ping session had a duration of just under 7.5 hours. Even on stock firmware I get the occasional spike, but they are much less frequent than when running OpenWrt. I had the stock firmware QoS switched on during this test.

@fantom-x I ended up not having time yesterday to try out your suggestion. I'll see if I can do it today.

fantom-x · March 19, 2018, 11:52am

I have looked at both of your latency charts, and I am not sure they are less frequent with stock. Both charts look pretty much the same to me...

huaracheguarache · March 19, 2018, 11:56am

I ran the ping test much longer on stock (7.5 hours) compared to the previous test. It looks the same because those 7.5 hours have been compressed to the same plot size as the plot from when I ran OpenWrt.

fantom-x · March 19, 2018, 12:05pm

Right, I missed the scale...

huaracheguarache · March 20, 2018, 7:40am

Here are the results from my latest ping test: https://imgur.com/a/nKp07

As you can see, the spikes have improved by quite a bit. I set CONFIG_CMDLINE="isolcpus=1" and moved wifi0, wifi1, eth0 and eth1 to CPU1. In addition I set the scaling governor to performance.

This is just a workaround to the problem, and the root cause needs to be found.

fantom-x · March 20, 2018, 11:25am

That is really not necessary. I did mot make any difference in my testing.

draigun · March 20, 2018, 7:18pm

Indeed. Moreover, I wonder what consequences exist when using isolcpus=1?

fantom-x · March 20, 2018, 10:03pm

You are effectively making your CPU single core. No other consequences.

UPDATE: And it allows you to use the other core for some other tasks, like service network interrupts while not competing with the kernel and other processes for CPU cycles.

bouwew · March 21, 2018, 6:55pm

Don't you mean to say that it makes the 2 processor-cores behave in a static way ? And no more switching between cores to balance the load.

fantom-x · March 21, 2018, 7:02pm

By making this change, one core is removed for the kernel’s scheduler and the only way to use that core is to explicitly assign tasks to it.

huaracheguarache · March 21, 2018, 10:04pm

I've been looking at the threads that are running with htop -d1 since it has a much shorter update interval than top. What I noticed was that one kworker thread was consuming between 40 and 80 percent of one CPU core every 2 seconds. This CPU usage spike lasts for a fraction of a second. I used ftrace to check what the kworker thread was doing and got this result: function=gc_worker [nf_conntrack].

Is this normal behaviour?

draigun · March 22, 2018, 5:57pm

For those who may not know: 6 days ago, board-2.bin was updated for QCA9984.

Ansuel · March 22, 2018, 7:32pm

anothe random version with no changelog LOVE IT