Possible cause of R7800 latency issues

What's worth noting is how steady the ping is when pinging the router itself. Here's a graph of a ping session I ran just now (note the y-axis scale):

ping_self

--- 192.168.1.1 ping statistics ---
1801 packets transmitted, 1801 packets received, 0% packet loss
round-trip min/avg/max = 0.107/0.322/0.880 ms

Whatever's causing the spikes is obviously not affecting the pings in this case.

Linux has an HRTICK feature, disabled by default - has anyone looked at this? is it of any help to anyone with this issue?

Hmm, That made me go and check if the kernel is compiled as a low latency one. It is not. Recompiling with low-latency config to test this theory.

Do you know how to enable it in make kernel_menuconfig? I cannot seems find it.

When I tested a new r7800 previously where I did not find latency spike across wan-lan interface, I had wan-lan bridged with all netfilter kmod unloaded. (It was a transparent sqm bridge setup, testing r7800 cake throughput)

I reread your first post - can you try to disable nf_conntrack by unloading all netfilter related kmod / or just -j NOTRACK in the raw table?

I don't know how to do that. Can you make a more detailed description of the procedure?

Try this:

iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

if it complains about unable to initialize raw table, install this kmod:

kmod-ipt-raw

after you are done, you can do /etc/init.d/firewall reload to reinitialize iptable rules.

  • Note, this will break internet connection if your r7800 is the internet facing gateway router (or for any reason it's not gateway router but doing nat / filtering out packets for non-established connection). In the case your router is not supposed to be doing nat, you can / should disable firewall /etc/init.d/firewall stop.

I'm getting an error:

~# iptables -t raw -I PREROUTING -j NOTRACK
iptables v1.6.2: Couldn't load target `NOTRACK':No such file or directory

I've installed kmod-ipt-raw.

To save time I did 100ms interval ping.

My r7800 is supposedly a "new" one with numbers printed on the antenna instead of yellow stickers.

Ping host is ethernet wired to r7800, the r7800 has the physical wan port bridged to lan, and running wireless client mode for internet at the same time. Server that responds to ping is ethernet wired to physical wan port. Both server and host are on same subnet.

I also experience no lags in csgo (consistent <30ms latency) using my r7800 as a wireless client.

Note the pings were done when I am using the computer. wan and lan interface is bridged. internet is through wlan0 interface.

Default: ping with conntrack enabled. Note even though nat is enabled in this case, I don't technically need it for internet because I offloaded nat in the main gateway router to handle nat for 192.168.4.0/24.


variant1: ping with empty iptables rules for nat and filter, but conntrack is still enabled. i.e.

cat /proc/net/nf_conntrack

is not empty and shows connections


variant2: ping with all netfilter modules unloaded

So, the low latency kernel does not boot, but the voluntary kernel preemption one with timer frequency = 1000Hz boots just fine and drops the latency spikes (wired) from 100ms+ down to no more than 20..25ms (seemingly only when I run a speed test from a different device) without having to use isolcpus=1 and re-assign the interrupts.
Only tested for an hour so far.

UPDATE: Even wifi feels faster and pages are in less time, but this is all subjective.

1 Like

Hi, sorry I don’t, I’m still on a steep learning curve, I was just reading about Linux kernel latency spikes and wondered if it could help.

what is low latency kernel ?

Interesting read about kernel, low latency and what to choose: https://askubuntu.com/questions/126664/why-choose-a-low-latency-kernel-over-a-generic-or-realtime-one

Yup, unfortunately the low latency one does not boot. But the second best does. The default Server is optimized for throughput vs response time.

Here are the steps in case anyone wants to give it a shot. The latency is not as great as the stock, but this leaves both CPUs available to the kernel.

  1. @hnyman's build (I disable USB modules, 3G modem support, gdb, and a few other things I do not use)
  2. https://github.com/openwrt/openwrt/pull/632
  3. https://github.com/openwrt/openwrt/pull/669 (not sure if either makes a difference, but they will probably end up in master at some point so might as well try them)
  4. make kernel_menuconfig / Kernel Features
  • Maximum Number of CPUs == 2 (not sure if this makes any impact though)
  • Voluntary Kernel Preemption
  • Timer Frequency == 1000Hz

On the router I reassign CPUs to cores to try to balance them evenly and lower priority of **collectd"", **nlbwmon, and uhttpd. performance governor could also be used.

The low latency kernel could potentially provide better results, but it is not booting and I do not have a serial access to see what is going on.

What does low latency kernel mean?
Full preempt + 1000hz?

I also use voluntary preempt but 250hz on my wrt1200.
Indeed 1000hz will give you lower latency but also reduces throughput.
Bufferbloat.net also recommends 1000hz but I choose somewhat of the middle way here.
And I can also confirm that everything feels snappier.
I don't know how voluntary preempt can make that of a difference. Because it should only affect user space programs?

Is true that kernel hz should match your power grind frequency?
For example if your power grid frequency is 50hz you should use 250hz, 500hz or 1000hz?
For 60hz use 300hz, 600hz, 1200hz ?
But I guess thats a myth because the power grid voltage gets converted anyway.

Rt kernel is only useful if you have applications that can make use of it.
However I also tried the rt kernel thing x)
I could make the patches apply and openwrt just booted fine. But bugged all over the place. For example viewing the graphs generated insane amount of CPU usage then the system crashed :confused:

Yes, that is what I meant. It is called “Low Latency Desktop” in kernel_menuconfig I think. I was surprised that it did not boot.

I do not know, but 100ms spikes are gone. There was no independed confirmation though. Do not dnsmasq, hostapd, etc. run as user space apps?

Related to the WIFI-latency, some searching gave the following results:

A reason for the latency spikes to occur: http://blog.cerowrt.org/post/disabling_channel_scans/

A solution is described here:

https://answers.microsoft.com/en-us/windows/forum/windows_10-networking/is-there-any-way-to-stop-windows-10-from-scanning/3870b3d1-0f07-4875-8779-bb5c11fce0a8

Also, at the end of this help-thread there is a program mentioned that can be found on this page:

It is old but reported to work in Windows 10.

Please note: this is the result of my searches on the subject of latency, I have not tested this yet. Will try to do this during the weekend.

Yes they do.
But as i understand that entire preempt thing...
What it does is, it allows user space programs to interrupt the kernel.
Or im wrong here?
And the difference between voluntary and full preempt is that voluntary adds some "interrupt points" to the kernel and full adds even more.
So it would rather expect the opposite by enabling preempt.

I haven't tested full preempt. I cant tell if it does boot or not.
I think 250hz + voluntary is also default on ubuntu servers (and debian?) so i will stick with that.

@bouwew
Thanks for the links.
But the majority of clients here are android based.
The wifi got better since i switched to the voluntary.
But a couple of commits in the latest trunk tree did also improve it quite a bit.
Maybe the lag comes also from power saving feature of android.
Or because of the adblock i use here.
Im not quite sure. But it did improve :wink:
And the mwlwifi driver is still a bit bugged.

I think it allows any thread to interrupt the kernel, in particular it allows kernel tasks to interrupt other kernel tasks, which might allow packet processing to interrupt say garbage collection type maintenance operations.