And surprisingly, not all of them are affected. BTW, I just built the latest master and the latency spikes seem to have become lower and further apart...
What about the Zyxel NBG6817? It would be interesting if someone could run some tests on that router since it also has the IPQ8065 and QCA9984.
That's what I find strange. Although there has been talk about different hardware versions I find it more likely that the changes are just cosmetic. If they changed the internal workings of the router wouldn't they need to apply for a new FCC certification?
No latency spikes on the nbg6817, the variation between ping times is very small.
Here's a plot of the latency I experience: https://imgur.com/a/1gcx1
I'm currently running snapshot r6469-5862f01.
Care to try this: Netgear R7800 exploration (IPQ8065, QCA9984) - #954 by bouwew
I'll take a look when I have time later today.
About the latency issue, I finally managed to go back to stock (Voxel's .49SF), by using the windows tftp-command, and then ran a long ping session, during 3000 seconds.
The results are: min ping time: 7ms, max ping time 35ms (until about 2800 secs it was 21ms, between 2800 and 3000 secs it went to 35ms), average ping time 11ms, 1 ping attempt failed.
My hardware is the version with the printing on the antenna's (2x antenna 1, antenna 2, antenna 3).
I ran a ping session last night on stock firmware (1.0.0.2.46 downloaded from NETGEAR website). Here are the results: https://imgur.com/a/2Y0VK
The ping session had a duration of just under 7.5 hours. Even on stock firmware I get the occasional spike, but they are much less frequent than when running OpenWrt. I had the stock firmware QoS switched on during this test.
@fantom-x I ended up not having time yesterday to try out your suggestion. I'll see if I can do it today.
I have looked at both of your latency charts, and I am not sure they are less frequent with stock. Both charts look pretty much the same to me...
I ran the ping test much longer on stock (7.5 hours) compared to the previous test. It looks the same because those 7.5 hours have been compressed to the same plot size as the plot from when I ran OpenWrt.
Right, I missed the scale...
Here are the results from my latest ping test: https://imgur.com/a/nKp07
As you can see, the spikes have improved by quite a bit. I set CONFIG_CMDLINE="isolcpus=1" and moved wifi0, wifi1, eth0 and eth1 to CPU1. In addition I set the scaling governor to performance.
This is just a workaround to the problem, and the root cause needs to be found.
That is really not necessary. I did mot make any difference in my testing.
Indeed. Moreover, I wonder what consequences exist when using isolcpus=1?
You are effectively making your CPU single core. No other consequences.
UPDATE: And it allows you to use the other core for some other tasks, like service network interrupts while not competing with the kernel and other processes for CPU cycles.
Don't you mean to say that it makes the 2 processor-cores behave in a static way ? And no more switching between cores to balance the load.
By making this change, one core is removed for the kernelās scheduler and the only way to use that core is to explicitly assign tasks to it.
I've been looking at the threads that are running with htop -d1 since it has a much shorter update interval than top. What I noticed was that one kworker thread was consuming between 40 and 80 percent of one CPU core every 2 seconds. This CPU usage spike lasts for a fraction of a second. I used ftrace to check what the kworker thread was doing and got this result: function=gc_worker [nf_conntrack].
Is this normal behaviour?
For those who may not know: 6 days ago, board-2.bin was updated for QCA9984.
anothe random version with no changelog LOVE IT