[SOLVED] Router (Netgear R7800) introduced latency spikes >100ms

These show clearly why wifi is problematic, but this is orthogonal to your reported wired latency spikes, so I would leave these alone for now.

In this case your two hosts are basically connected by a switch and lo and behold the overall RTT and the lack of (much) variability supports that as well.

This is not terrible, but you already see more variability than with the switch test just above, if you let this run longer will you also see those RTTs > 20ms.

Finally could you try to look at your VDSL-modem's error counters as well, maybe increases in those correlate with the observed spikes?

Nope, no errors on the modem.

 	 				Bytes		Pkts		Errs	Drops	Bytes		Pkts		Errs	Drops
atm0.1	br_4_0_35	0			0			0		0		0			0			0		0
ptm0.1	br_4_1_1.35	957136274	2866450647	0		0		605737079	1735017611	0		0

Ah, I am more interested in the VDLS errors and parameters like sync tx and rx power, CRC errors FEC errors...

Well, there are no latency spikes if I connect directly to the modem.

Going out on a limb here, but the router might introduce RF noise if connected to the modem that causes transient repeated VDLS hick-ups; admittedly not very likely...

They’re at least 10m apart and connected via a Cat6 cable.

That would not rule out noise issues, EM noise can and does travel along cables... (anything below 17MHz can disturb VDSL2 (pre AnnexQ, which shifts that threshold to 35MHz)). But this is not very likely anyway...

Do you have wireless and DHCP enabled on both the VDSL modem and the R7800?

If so, turn them off on the modem and let the R7800 do all the work.

The modem is in the bridge mode and the R7800 is doing all the work.

Have you done any testing with any of the tools posted above to see what is happening on the route?

That is the plan for later today: need to wait for the network to stop being so heavily used.

Found it. I have observed it for several minutes and the number of errors does not change during the latency spikes.

When was the last time the modem was rebooted?

A long while ago according to this. I never reboot it and the pings are very stable (and 1ms lower) when I connect my computer (PPPoE client) to a bridged modem,

Uptime: 95D 9H 16M 24S

This has the flavor of some kind of kernel interrupt process occasionally hogging the CPU, perhaps writing to flash or a USB disk or a bug in a wifi driver or .... So that the network interrupts aren't being serviced.

1 Like

I do have an USB flash drive attached to store system logs. I installed the packages below on top of the default LEDE image and have some of them (Adblock, nlbwmon) use the flash drive as their storage. I will reset everything to default later today and try with that.

opkg update && \
opkg install luci-ssl-openssl && \
opkg install wget && \
opkg install ca-certificates && \
opkg install luci-app-ddns && \
opkg install luci-app-sqm && \
opkg install luci-app-adblock && \
opkg install luci-app-bcp38 && \
opkg install htop && \
opkg install diffutils && \
opkg install block-mount && \
opkg install kmod-usb-storage && \
opkg install kmod-fs-ext4 && \
opkg install rsync && \
opkg install sysstat && \
opkg install blkid && \
opkg install e2fsprogs && \
opkg install lsof && \
opkg install ethtool && \
opkg install zip && \
opkg install unzip && \
opkg install luci-app-nlbwmon

Ok, with a factory reset and only PPPoE configured, the issue is still there. I will test with the tools suggested in this thread tomorrow.

UPDATE: but is was not happening as often as it is now after I rebuilt the router back to the original state...

It seems @dlakelan has good intuition (as so often), but this is going to be tricky to figure out. Are you using a stable 17.01.4 build or are you using a recent snapshot? If you are using stable it might be time to test with a recent snapshot to avoid chasing an old an potentially fixed issue....

Already did and had the same issue. Are there tools I can use to figure out what is hogging the CPU?

Running either the standard top command, or installing and running htop would give you that information.