Possible cause of R7800 latency issues

Devide and concur. Let's move all MIB Counters discussions to MIB Counters for QCA8337 (Netgear R7800)

1 Like

Report this to mailing list

as a follow up i've been running with the low-latency kernel for a couple of days now and it has spontaneously rebooted on me twice. the first time it was just master with the MIB counter fix and the low-latency kernel, this time r6628 with PR#669 and PR#632, the MIB counterfix and the low-latency kernel. i suspect the low-latency kernel as with a sample size of 2 it is one common factor. i'm going to try the same but with the default pre-emption model.

By low-latency kernel do you mean the "Preemptible Kernel (Low-Latency Desktop)"? If yes, then I would recommend switching to Voluntary Kernel Preemption (Desktop) since it appears to be stable.

I am running the low latency one with no issues...

Now that the latency over a wired link is more less understood, @nbd can you suggest where/how to start debugging the 300+ms spikes over a wireless connection?

wireless

How are you measuring that latency. Is it still there with the preemptible kernel?

I run this command while the router is either not in use or used lightly.

ping -c 9001 -i 0.2 8.8.8.8 | tee 8.8.8.8.wireless

The chart above is with a pre-emtable kernel and MIB counters disabled. With the default one it used to be much worse, but I will test again today with MIB counter disabled and with the commit you recommended.

You're running that ping on your wireless client? Or what's the path between where you ping from and the internet?

Yes, from my laptop. I usually run two concurrent sessions: one over a wired connection and one over wireless. The wired ping chart is below and taken over the same 30 minute interval. Notice, they both were run at the same time. I get similar results if I ping the first several hops. I tried both tests with the stock and they were both much better.

@nbd, here is another latency chart over wifi with MIB Counters disabled: the spikes are reaching 300ms. The difference between this one and the one a few posts above that that this time I have not disabled Export mac80211 internals in DebugFS and Kernel Debug FS. A concurrent wired ping did not have these issues and topped up at <20ms a few times only.
In both cases a preemptive kernel is used, because it is much worse with the default one.

What is using the information from Export mac80211 internals in DebugFS ? Can it be safely and permanenlty disabled?

UPDATE: the wifi client is within 5..10 ft from the router and is using 5GHz band.

debugfs

Let's take this step by step. First of all, did my changes resolve the switch related latency on wired connections? Or does that issue remain and wired latency still needs the preempt change?

Fair enough. Do you want me to test just your change without actually disabling the mib counters?
To be safe, I compile the firmware from scratch after every change and it takes a while. Is running make reliable enough to pick up any config and code changes if I keep reusing the same workspace?

make clean only has to be used if you are running into issues. Just using make should be fine and will massively speed up future compilations :slight_smile:

1 Like

Yes, please test my change without disabling MIB counters. Just make should be enough.
If you really want to make sure all changes are applied properly, you can run make target/linux/clean before running make again (the same will happen anyway when kernel changes are being picked up).

2 Likes

what are these changes of which you speak? :slight_smile:

The change mentioned in the discussion in the PR by fantom-x: https://github.com/openwrt/openwrt/pull/849#issuecomment-379462723
Details:
https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=commitdiff;h=02ce5e4e3d6a17b846c8254700ace7bb3fd985fa

1 Like

I think that your change for ar8216 has at least decreased the spikes.

I tested with my normal R7800 build plus your change, and to me it looks like the deviation of the ping times has decreased.
( build: master-r6643-08ccfdea78-20180407-spike-test )

In tests of wired traffic from PC via router to ISP nexthop, 2000 pings at 200ms intervals, there was just one minor 12-17 ms spike, and no 60ms spikes. So, minor spike at 0.05% of times. Even with 100ms intervals, there was no increase in spikes.

0.2 sec interval:
Packets: sent=2000, rcvd=2000, error=0, lost=0 (0.0% loss) in 399.801976 sec
RTTs in ms: min/avg/max/dev: 1.559 / 1.980 / 12.447 / 0.310

Packets: sent=2000, rcvd=2000, error=0, lost=0 (0.0% loss) in 399.802504 sec
RTTs in ms: min/avg/max/dev: 1.583 / 1.978 / 14.179 / 0.303

Packets: sent=2000, rcvd=2000, error=0, lost=0 (0.0% loss) in 399.803274 sec
RTTs in ms: min/avg/max/dev: 1.571 / 1.982 / 6.182 / 0.147

0.1 sec interval:
Packets: sent=2000, rcvd=2000, error=0, lost=0 (0.0% loss) in 199.902437 sec
RTTs in ms: min/avg/max/dev: 1.569 / 1.966 / 17.444 / 0.371

Two weeks ago, the standard deviation was clearly higher:

on the third try there was one 22 ms spike. Otherwise steady 2ms roudtrip.
Packets: sent=500, rcvd=500, error=0, lost=0 (0.0% loss) in 99.813805 sec
RTTs in ms: min/avg/max/dev: 1.575 / 2.038 / 21.846 / 1.107

But let's see what @fantom-x notices.

I pinged 8.8.8.8 for 30 minutes. Compared to without @nbd's patch it's a lot better, but not as good as when MIB counters are disabled. Here's a plot of the ping:

fietkau_test

And some additional statistics:

--- 8.8.8.8 ping statistics ---
9001 packets transmitted, 9001 received, 0% packet loss, time 1807675ms
rtt min/avg/max/mdev = 9.419/9.782/41.556/0.471 ms

Percentage of pings above 30 ms: 0.01%

very little in it for me. patched with PR#632, PR#669, MIB counters disabled and the voluntary pre-emption kernel:

--- 192.168.0.1 ping statistics ---
2000 packets transmitted, 2000 received, 0% packet loss, time 401723ms
rtt min/avg/max/mdev = 1.156/1.437/7.063/0.199 ms

with just nbd's patch, default kernel:

--- 192.168.0.1 ping statistics ---
2000 packets transmitted, 2000 received, 0% packet loss, time 401646ms
rtt min/avg/max/mdev = 1.116/1.463/7.093/0.268 ms