Possible cause of R7800 latency issues

The 1 CPU port thing is a bit misleading. Usually DSA drivers are faster as they sometimes offer some hardware acceleration. They also sit in kernel space whereas swconfig (I believe) is in user space.

So the slowdown isn't 2x in most cases.

Then there are devices with only 1 CPU port (my Archer C7v4 for example) where this is not an issue. Not that there's DSA for it yet... Although it would be a good idea to use qca8k for it when it gets ported to ath79.

1 Like

That is all good, but could be getting off topic. Either way, DSA is a long term solution, but the problem exists now. As far as I am concerned, the latency problem is mostly solved: I am running a custom firmware that is very fast and also eliminates most latency spikes over wifi. The latency over wired is even better. It does take a number of patches/hacks, but it can be done now without having to wait. And it feels almost as fast as the stock.

yes i've also built one with the low-latency kernel and the MIB counters disabled, seems fine. brief ping stats:

--- 192.168.0.1 ping statistics ---
1037 packets transmitted, 1037 received, 0% packet loss, time 208020ms
rtt min/avg/max/mdev = 1.252/1.477/6.930/0.315 ms

i used to get max of 80+ms.

Now the real problem is push all to openwrt master...
And also increased kernel partition and dissent patch about cache scaling

The same here. I am also seeing great improvements to the ping latency over wifi.

Agree, and the first step could to have everyone interested to add comments to the PRs explaining the benefits of merging them. The more people are interested and benefited, the more likely they will be merged.

Here is a list of changes that I think makes this router great again.

  1. https://github.com/openwrt/openwrt/pull/632
  2. https://github.com/openwrt/openwrt/pull/669
  3. https://github.com/openwrt/openwrt/pull/848
  4. https://github.com/openwrt/openwrt/pull/849
  5. O2 or O3 optimization: https://gist.github.com/fantom-x/58fe0e1bb6534d73e6f3820e41423239
  6. A 4MB kernel partition as per https://github.com/dissent1/r7800/commit/18b149dbbcd744fad7dd35f7e45bc9b559eda8ef
  7. Disable mib counters as per this thread findings: https://gist.github.com/fantom-x/85f7841ecdbf9f111ff38ca1822c0a79
  8. Disable Export mac80211 internals in DebugFS followed by disabling DebugFS in kernel. This seems to have a positive impact on the wifi pings, but I do not know what it could have broken :wink: An independent verification is required. https://gist.github.com/fantom-x/78564320f785b977b6058f9c08ece49e
  9. option cache size 50000 for DHCP as the default is 150 only: this thing has a lot of RAM to handle more entries.
  10. If using SQM, then only move eth0 to CPU1. SQM is very CPU intensive (much more than the wifi interface) so let's let SQM use it all. I leave eth1, wifi0, and wifi1 on CPU0 for this reason.

UPDATE

  1. Make collectd (https://github.com/openwrt/packages/pull/5875), nlbwmon (https://github.com/openwrt/packages/pull/5876), and uhttpd run with lowest priority
  2. Use performance scaling governor for both cores: helps with wifi latency too. In the absence of any hardware acceleration and considering that the cpu is taking 100us to switch frequencies, this is a good stop-gap solution.
5 Likes

Devide and concur. Let's move all MIB Counters discussions to MIB Counters for QCA8337 (Netgear R7800)

1 Like

Report this to mailing list

as a follow up i've been running with the low-latency kernel for a couple of days now and it has spontaneously rebooted on me twice. the first time it was just master with the MIB counter fix and the low-latency kernel, this time r6628 with PR#669 and PR#632, the MIB counterfix and the low-latency kernel. i suspect the low-latency kernel as with a sample size of 2 it is one common factor. i'm going to try the same but with the default pre-emption model.

By low-latency kernel do you mean the "Preemptible Kernel (Low-Latency Desktop)"? If yes, then I would recommend switching to Voluntary Kernel Preemption (Desktop) since it appears to be stable.

I am running the low latency one with no issues...

Now that the latency over a wired link is more less understood, @nbd can you suggest where/how to start debugging the 300+ms spikes over a wireless connection?

wireless

How are you measuring that latency. Is it still there with the preemptible kernel?

I run this command while the router is either not in use or used lightly.

ping -c 9001 -i 0.2 8.8.8.8 | tee 8.8.8.8.wireless

The chart above is with a pre-emtable kernel and MIB counters disabled. With the default one it used to be much worse, but I will test again today with MIB counter disabled and with the commit you recommended.

You're running that ping on your wireless client? Or what's the path between where you ping from and the internet?

Yes, from my laptop. I usually run two concurrent sessions: one over a wired connection and one over wireless. The wired ping chart is below and taken over the same 30 minute interval. Notice, they both were run at the same time. I get similar results if I ping the first several hops. I tried both tests with the stock and they were both much better.

@nbd, here is another latency chart over wifi with MIB Counters disabled: the spikes are reaching 300ms. The difference between this one and the one a few posts above that that this time I have not disabled Export mac80211 internals in DebugFS and Kernel Debug FS. A concurrent wired ping did not have these issues and topped up at <20ms a few times only.
In both cases a preemptive kernel is used, because it is much worse with the default one.

What is using the information from Export mac80211 internals in DebugFS ? Can it be safely and permanenlty disabled?

UPDATE: the wifi client is within 5..10 ft from the router and is using 5GHz band.

debugfs

Let's take this step by step. First of all, did my changes resolve the switch related latency on wired connections? Or does that issue remain and wired latency still needs the preempt change?

Fair enough. Do you want me to test just your change without actually disabling the mib counters?
To be safe, I compile the firmware from scratch after every change and it takes a while. Is running make reliable enough to pick up any config and code changes if I keep reusing the same workspace?

make clean only has to be used if you are running into issues. Just using make should be fine and will massively speed up future compilations :slight_smile:

1 Like

Yes, please test my change without disabling MIB counters. Just make should be enough.
If you really want to make sure all changes are applied properly, you can run make target/linux/clean before running make again (the same will happen anyway when kernel changes are being picked up).

2 Likes