[SOLVED] Router (Netgear R7800) introduced latency spikes >100ms

Yes, I have. But I am no longer looking at the last ms: I am looking at the 50..100ms latency spikes twice a minute or so.

I have read all the posts I could find including that entire thread and unless I missed anything, the people are talking about just a an increased latency by 2..3ms, and I have already made my peace with that. I am just puzzled as to why I am the only one experiencing these order of magnitude spikes. No-one said they also experience them in this thread either.

Maybe no one else has a VDSL modem bridged to a R7800...

It may well be that it is something in your package selection, configuration and resource utilisation that every now and then causes a brief CPU or I/O utilisation spike.

For finding the possible culprit package combination, you may need to disable all your packages and if there are then no spikes, then start packages one by one until you find what adds enough load t cause the spikes.

Just for reference, I tested ping to an external site while watching a streamed HD video, and the latency is steadily 5.6-8.2 ms. No spikes for my R7800. (And I have rather similar packages as yours. adblock, luci statistics, sqm, ddns (no actual usage), nlbwmon, but I am not logging anything to USB.)

--- www.tut.fi ping statistics ---
199 packets transmitted, 199 packets received, 0% packet loss
round-trip min/avg/max = 5.588/6.356/8.169 ms

My router is configured with five SSID's: three for 5GHz (two guest networks) and two for 2.4GHz (one guest network) plus one VLAN on one of the switch ports: all three guest WiFI's and the VLAN are in the same firewall guest zone. I can easily have at least 15..20 devices connected at any given moment of time and most of them are wireless and mostly idle. Besides that, I have added BCP38 and a few firewall rules for the guest network as well as to force using local DNS only.

I there a reasonable limit to haw many SSID's and/or connected devices this router (R7800) is expected to handle? Right after a factory reset, there is only one device connected and the spikes are smaller and further apart. I am using your latest master built.

I agree with @jwoods...chasing .001 seconds with pinpoint accuracy will be difficult.

I've watched this thread for quite some time, and I'm surprised it's flowed this far.

  • Your "baseline" was connecting a PC directly to your DSL modem, versus through your R7800. This will obviously add latency.
  • Another note, any software you run on the router to test - will add latency.

YES! As @slh mentioned, it's quite normal to see ~1 ms added when attaching a NAT-based router (i.e. a "hop").

Multiple things could be occurring:

  • If you're a gamer...someone could be cheating and crafting/spoofing/generating packets to your WAN port, causing things such as ICMP-Echo-Replies, ICMP-Host/Destination-Unreachable, etc. Many types of these packets cause CPU usage to increase on the router. For an example, see: https://www.cisco.com/c/en/us/about/security-center/ttl-expiry-attack.html (note: this example specifically mitigates Traceroute attacks)
  • Using the Zone/Global Option REJECT instead of DROP will increase CPU usage during attack, mis-configured remote host, recently closed TCP sessions (i.e. during speed tests), etc. This is because the router's CPU must create the proper ICMP or RST messages during REJECT.
  • A high TCP port timeout and fast, port intensive connections, P2P, etc. (i.e. back-to-back speedtests, P2P, etc. may benefit from setting smaller TCP timeouts)
  • TCP connections take more CPU resources than UDP...if you use P2P such as for file sharing, CPU usage on the router will increase, as these TCP connections are built and destroyed in the NAT table.
  • Utilizing the near maximum of your connection (e.g. during a speed test) will actually cause a CPU increase, as thats when traffic shaping actually "activates."
  • Being logged into LuCI with Autorefresh ON increases CPU usage.
  • An improperly configured QoS table can cause such latency as well. It's possible you dedicated Priority bandwidth to something that should have less priority.

Yeah, that is how it began and I agree that it is ok. But then I started chasing the 50..100ms latency spikes (at least) a couple of times a minute. They do not happen with the stock firmware nor with a PC directly connected to the modem.

  • As I noted about the directly-connected PC

Regarding the stock firmware:

Many have observed that phenomenon as well and noted it in the forums. I speculate many reasons for it, it is mostly attributable to software improvements in OSes (i.e. the Linux Kernel) tend to add some some CPU lag. I observe this increase even as I upgrade versions of the same OS on PCs. One major known cause of latency in OpenWRT as compared to stock firmware is noted in-depth here: Hardware NAT For LEDE

Yes, I started looking at the wrong number during a 10 second ping test, but once I ran it for minutes I noticed the real issue, which was latency spikes.

No games played when I run the test

Everything is set to DROP

No-one else was using the network

Just 10 ping 8.8.8.8 sessions; no-one else connected.

Learned that quickly and even disabled uhttpd

Using SQM

I read that, but got an impression that the latency increase would be relatively constant and not an order of magnitude. Did I misunderstood it?

Interestingly the max spikes dropped from over 100ms to under 50ms once I switched the CPU scaling governors from ondemand to performance. Everything looks much better now even though not perfect.

1 Like

Good catch!

I wonder how that wil affect the life of the CPU, since it will now be running at maximum frequency.

This brings back vague memories of working with an underpowered phone running Android on its Linux kernel. There was a lot of tuning of the parameters around the governor required to get "snappy" response without draining the battery or putting the CPU into thermal protection.

It will probably shorten it somewhat, but I have not noticed any difference in the temperature: it is the same 45 Degrees Celsius when the network is not in use. Only time will tell; I am still experimenting.

cpuinfo_transition_latency for this CPU is 100,000 nanoseconds or 100 micro seconds: I am no expert, but five switches back and forth and 1ms latency is added...

I don't seem to have cpufreq on my Archer C7, but this link may have some helpful information. Post #4 is a "CPU Governor tuning guide"

https://forum.xda-developers.com/general/general/ref-to-date-guide-cpu-governors-o-t3048957

Edit: Not that the dust is clearing, but sampling_rate_min, up_threshold and down_differential are where I'd start.

Yeah, that is too hard core stuff for me just yet. It is so much simpler to get myself an Archer C7 v2, flush @r00t's build on it, and immediately start regretting the purchase of the R7800 unit. There are no more latency spikes and the router seems to introduce almost no additional latency. It is pretty much like connecting directly to the modem. Just need to set it up as a family router to give it a good long term test.

The C7 behaved beautifully in my tests, but I decided to give my R7800 another chance and I seem to have mostly eliminated the spikes as described here: Netgear R7800 exploration (IPQ8065, QCA9984) , so I will mark this as solved.

1 Like

@fantom-x this post specifically refers to people experiencing latency issues with the R7800. Maybe this build can solve your issues completely?: Build for Netgear R7800

Thx, I tried it already and it did not make any difference.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.