[SOLVED] Router (Netgear R7800) introduced latency spikes >100ms

UPDATE: When troubleshooting the latency issue, I started looking at the wrong value. It turned out that it was not the additional latency, it was random periodic latency spikes from 11ms to 50..100ms a few times a minute. So this thread evolved and I updated the subject accordingly.

Hi, I have just tried the "ping 8.8.8.8" test by connecting my computer directly to a VDSL modem (bridged) and then with a router (Netgear R7800 running LEDE 17.01.4). The ping times without the router are ~10.3ms and are all within 10.1 to 10.5ms range for a few minutes of monitoring.
Once I connect via the router (wired of course), the ping times jump by 1ms and hover between 11.2 to 11.5ms most of the time with a spike to 50 to 100 ms two or three times a minute.
Tried with and without SQM and did not see a big difference.

Is that normal for a router to add 1ms latency?

What can be causing those odd ping spikes to 50 or 100ms or more a few times a minute? They do not happen when connected to the modem directly.

1 Like

If you have the VDSL modem bridged to the R7800, and the firewall is turned on for both, that would introduce latency.

FIrewall is only turned on on the router; the modem is in a bridged mode. This is not a double NAT setup. I just did not expect 1ms to be added by the router.

I would look at the firewall and network configs, as well as how many tasks are running on the R7800.

1ms is pretty small.

What exactly should I be looking for? Just the number of rules? The only rule is to make everyone use router's DNS.

SSH in to the router and run the following...

cat /etc/config/network

cat /etc/config/firewall

Post the results.

Where are talking about ONE ms more, I don't see the problem.

There isn't one...the OP had a question.

Right, there is no problem. Just was not expecting the router to consistently add 1ms of latency. Below is the contents of /etc/config/firewall, that I have made very little modifications to.

config defaults
	option syn_flood '1'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'DROP'
	option drop_invalid '1'

config zone
	option name 'lan'
	option input 'ACCEPT'
	option output 'ACCEPT'
	option forward 'ACCEPT'
	option network 'lan'

config zone
	option name 'wan'
	option output 'ACCEPT'
	option masq '1'
	option mtu_fix '1'
	option input 'DROP'
	option forward 'DROP'
	option network 'wan wan6'

config forwarding
	option src 'lan'
	option dest 'wan'

config rule
	option target 'ACCEPT'
	option proto 'tcp udp'
	option name 'Guest DNS'
	option dest_port '53'
	option src 'guest'

config rule
	option target 'ACCEPT'
	option proto 'udp'
	option name 'Guest DHCP'
	option src 'guest'
	option dest_port '67-68'

config include
	option path '/etc/firewall.user'

config zone
	option name 'guest'
	option output 'ACCEPT'
	option input 'DROP'
	option forward 'DROP'
	option network ‘pluto’

config forwarding
	option dest 'wan'
	option src 'guest'

config redirect 'dns_override_lan'
	option name 'DNS Override (lan)'
	option src 'lan'
	option proto 'tcp udp'
	option src_dport '53'
	option dest_port '53'
	option target 'DNAT'

config redirect 'dns_override_guest'
	option name 'DNS Override (guest)'
	option src 'guest'
	option proto 'tcp udp'
	option src_dport '53'
	option dest_port '53'
	option target 'DNAT'

config rule
	option target 'ACCEPT'
	option src 'guest'
	option name 'Printer'
	option dest_ip ‘192.168.1.100’
	option dest 'lan'
	option proto 'tcp udp'

config include 'bcp38'
	option type 'script'
	option path '/usr/lib/bcp38/run.sh'
	option family 'IPv4'
	option reload '1'

Every (active) hop will add a tiny delay, most of them outside of your control.

1 Like

Just guessing here but perhaps the softirq timer is about 1ms?

Question, does the spike also happen with SQM?

Yes, they do. With SQM on or off, latency spikes look like below: frequent, significant, but isolated. Those spikes are not happening when I connect directly to the modem. My router is Netgear R7800 and one would expect it to have enough muscle to run smoothly. I am not using WiFi for these tests, it is all on wired.

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=60 time=11.3 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=60 time=11.3 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=60 time=68.8 ms
64 bytes from 8.8.8.8: icmp_seq=8 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=9 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=10 ttl=60 time=11.0 ms
64 bytes from 8.8.8.8: icmp_seq=11 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=12 ttl=60 time=11.0 ms
64 bytes from 8.8.8.8: icmp_seq=13 ttl=60 time=11.0 ms
64 bytes from 8.8.8.8: icmp_seq=14 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=15 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=16 ttl=60 time=11.3 ms
64 bytes from 8.8.8.8: icmp_seq=17 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=18 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=19 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=20 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=60 time=11.0 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=60 time=50.2 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=60 time=20.0 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=25 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=26 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=27 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=28 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=29 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=30 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=31 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=32 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=33 ttl=60 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=34 ttl=60 time=11.2 ms
64 bytes from 8.8.8.8: icmp_seq=35 ttl=60 time=11.2 ms

And I get this when connected directly to the modem, which is in same bridged mode. Using the same NIC on my PC, but a different cable. Not a single spike over 11ms in over ~5 minutes of observations.

PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=7 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=8 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=9 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=10 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=11 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=12 ttl=61 time=10.6 ms
64 bytes from 8.8.8.8: icmp_seq=13 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=14 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=15 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=16 ttl=61 time=10.6 ms
64 bytes from 8.8.8.8: icmp_seq=17 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=18 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=19 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=20 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=25 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=26 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=27 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=28 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=29 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=30 ttl=61 time=10.4 ms
64 bytes from 8.8.8.8: icmp_seq=31 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=32 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=33 ttl=61 time=10.6 ms
64 bytes from 8.8.8.8: icmp_seq=34 ttl=61 time=10.3 ms
64 bytes from 8.8.8.8: icmp_seq=35 ttl=61 time=10.4 ms

And one last test from the router: all the additional latency and spikes seem to come from the router.

With SQM

PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=61 time=10.983 ms
64 bytes from 8.8.8.8: seq=1 ttl=61 time=10.877 ms
64 bytes from 8.8.8.8: seq=2 ttl=61 time=11.055 ms
64 bytes from 8.8.8.8: seq=3 ttl=61 time=11.302 ms
64 bytes from 8.8.8.8: seq=4 ttl=61 time=11.187 ms
64 bytes from 8.8.8.8: seq=5 ttl=61 time=12.466 ms
64 bytes from 8.8.8.8: seq=6 ttl=61 time=10.777 ms
64 bytes from 8.8.8.8: seq=7 ttl=61 time=79.327 ms
64 bytes from 8.8.8.8: seq=8 ttl=61 time=11.278 ms
64 bytes from 8.8.8.8: seq=9 ttl=61 time=10.936 ms
64 bytes from 8.8.8.8: seq=10 ttl=61 time=11.018 ms
64 bytes from 8.8.8.8: seq=11 ttl=61 time=11.060 ms
64 bytes from 8.8.8.8: seq=12 ttl=61 time=10.963 ms
64 bytes from 8.8.8.8: seq=13 ttl=61 time=11.082 ms
64 bytes from 8.8.8.8: seq=14 ttl=61 time=11.068 ms
64 bytes from 8.8.8.8: seq=15 ttl=61 time=10.974 ms
64 bytes from 8.8.8.8: seq=16 ttl=61 time=10.868 ms
64 bytes from 8.8.8.8: seq=17 ttl=61 time=11.123 ms
64 bytes from 8.8.8.8: seq=18 ttl=61 time=11.128 ms
64 bytes from 8.8.8.8: seq=19 ttl=61 time=11.003 ms
64 bytes from 8.8.8.8: seq=20 ttl=61 time=10.973 ms
64 bytes from 8.8.8.8: seq=21 ttl=61 time=40.033 ms
64 bytes from 8.8.8.8: seq=22 ttl=61 time=11.011 ms
64 bytes from 8.8.8.8: seq=23 ttl=61 time=10.994 ms
64 bytes from 8.8.8.8: seq=24 ttl=61 time=11.063 ms
64 bytes from 8.8.8.8: seq=25 ttl=61 time=11.109 ms
64 bytes from 8.8.8.8: seq=26 ttl=61 time=10.793 ms

Without SQM

PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=61 time=11.105 ms
64 bytes from 8.8.8.8: seq=1 ttl=61 time=10.959 ms
64 bytes from 8.8.8.8: seq=2 ttl=61 time=11.119 ms
64 bytes from 8.8.8.8: seq=3 ttl=61 time=10.479 ms
64 bytes from 8.8.8.8: seq=4 ttl=61 time=11.116 ms
64 bytes from 8.8.8.8: seq=5 ttl=61 time=10.998 ms
64 bytes from 8.8.8.8: seq=6 ttl=61 time=11.269 ms
64 bytes from 8.8.8.8: seq=7 ttl=61 time=11.059 ms
64 bytes from 8.8.8.8: seq=8 ttl=61 time=10.900 ms
64 bytes from 8.8.8.8: seq=9 ttl=61 time=34.492 ms
64 bytes from 8.8.8.8: seq=10 ttl=61 time=11.149 ms
64 bytes from 8.8.8.8: seq=11 ttl=61 time=10.997 ms
64 bytes from 8.8.8.8: seq=12 ttl=61 time=11.026 ms
64 bytes from 8.8.8.8: seq=13 ttl=61 time=11.061 ms
64 bytes from 8.8.8.8: seq=14 ttl=61 time=10.528 ms
64 bytes from 8.8.8.8: seq=15 ttl=61 time=10.724 ms
64 bytes from 8.8.8.8: seq=16 ttl=61 time=37.894 ms
64 bytes from 8.8.8.8: seq=17 ttl=61 time=10.975 ms
64 bytes from 8.8.8.8: seq=18 ttl=61 time=10.945 ms
64 bytes from 8.8.8.8: seq=19 ttl=61 time=43.298 ms
64 bytes from 8.8.8.8: seq=20 ttl=61 time=11.026 ms
64 bytes from 8.8.8.8: seq=21 ttl=61 time=11.046 ms
64 bytes from 8.8.8.8: seq=22 ttl=61 time=11.129 ms
64 bytes from 8.8.8.8: seq=23 ttl=61 time=11.125 ms

Can you run the ping test but also SSH in to the R7800 and run the command 'top' and let us know what the idle percentage is when these spikes occur and what it is on average when they are not occurring. Also, does your VDSL modem/router combo by chance have an Intel Puma 6 chip in it?

Good thought, but the puma series are DOCSIS cable modem SoCs so will not be used on a VDSL-modem (also the puma issue should also show up when connecting the PC directly with the modem).

Thanks for the tests, yes I agree it is related to the router. Could you fully disable the wifi radios for a test (or wrap the router in aluminum foil to basically isolate the radios from outside influences). I had a case once where a wifi router with no station attached had cyclic repeated latency/bandwidth issues that went away once I powered down the 2.4 GHz radio in that router (in case you wonder speedport w723v type A).
Spikes up to 80ms even if just occasional will make using a number of applications awkward (VoIP, video chat, on-line ganes...), so getting rid of those looks like a good idea...

The modem is SR505n https://wikidevi.com/wiki/SmartRG_SR505N, which seems to be Broadcom based.

I first assumed that the use is aluminum foil was a joke... I will run the other tests with top and wifi disabled later today.

Ok, so usually idle is at 95..96%, but during a latency spike it drops to 90..91% and the top process seems to always be [kworker/0:2], which consumes 4..5% CPU alone at that moment.

UPDATE: It is now [kworker/0:0] after a reboot.