Possible cause of R7800 latency issues

Do you know how to enable it in make kernel_menuconfig? I cannot seems find it.

When I tested a new r7800 previously where I did not find latency spike across wan-lan interface, I had wan-lan bridged with all netfilter kmod unloaded. (It was a transparent sqm bridge setup, testing r7800 cake throughput)

I reread your first post - can you try to disable nf_conntrack by unloading all netfilter related kmod / or just -j NOTRACK in the raw table?

I don't know how to do that. Can you make a more detailed description of the procedure?

Try this:

iptables -t raw -I PREROUTING -j NOTRACK
iptables -t raw -I OUTPUT -j NOTRACK

if it complains about unable to initialize raw table, install this kmod:

kmod-ipt-raw

after you are done, you can do /etc/init.d/firewall reload to reinitialize iptable rules.

  • Note, this will break internet connection if your r7800 is the internet facing gateway router (or for any reason it's not gateway router but doing nat / filtering out packets for non-established connection). In the case your router is not supposed to be doing nat, you can / should disable firewall /etc/init.d/firewall stop.

I'm getting an error:

~# iptables -t raw -I PREROUTING -j NOTRACK
iptables v1.6.2: Couldn't load target `NOTRACK':No such file or directory

I've installed kmod-ipt-raw.

To save time I did 100ms interval ping.

My r7800 is supposedly a "new" one with numbers printed on the antenna instead of yellow stickers.

Ping host is ethernet wired to r7800, the r7800 has the physical wan port bridged to lan, and running wireless client mode for internet at the same time. Server that responds to ping is ethernet wired to physical wan port. Both server and host are on same subnet.

I also experience no lags in csgo (consistent <30ms latency) using my r7800 as a wireless client.

Note the pings were done when I am using the computer. wan and lan interface is bridged. internet is through wlan0 interface.

Default: ping with conntrack enabled. Note even though nat is enabled in this case, I don't technically need it for internet because I offloaded nat in the main gateway router to handle nat for 192.168.4.0/24.


variant1: ping with empty iptables rules for nat and filter, but conntrack is still enabled. i.e.

cat /proc/net/nf_conntrack

is not empty and shows connections


variant2: ping with all netfilter modules unloaded

So, the low latency kernel does not boot, but the voluntary kernel preemption one with timer frequency = 1000Hz boots just fine and drops the latency spikes (wired) from 100ms+ down to no more than 20..25ms (seemingly only when I run a speed test from a different device) without having to use isolcpus=1 and re-assign the interrupts.
Only tested for an hour so far.

UPDATE: Even wifi feels faster and pages are in less time, but this is all subjective.

1 Like

Hi, sorry I don’t, I’m still on a steep learning curve, I was just reading about Linux kernel latency spikes and wondered if it could help.

what is low latency kernel ?

Interesting read about kernel, low latency and what to choose: https://askubuntu.com/questions/126664/why-choose-a-low-latency-kernel-over-a-generic-or-realtime-one

Yup, unfortunately the low latency one does not boot. But the second best does. The default Server is optimized for throughput vs response time.

Here are the steps in case anyone wants to give it a shot. The latency is not as great as the stock, but this leaves both CPUs available to the kernel.

  1. @hnyman's build (I disable USB modules, 3G modem support, gdb, and a few other things I do not use)
  2. https://github.com/openwrt/openwrt/pull/632
  3. https://github.com/openwrt/openwrt/pull/669 (not sure if either makes a difference, but they will probably end up in master at some point so might as well try them)
  4. make kernel_menuconfig / Kernel Features
  • Maximum Number of CPUs == 2 (not sure if this makes any impact though)
  • Voluntary Kernel Preemption
  • Timer Frequency == 1000Hz

On the router I reassign CPUs to cores to try to balance them evenly and lower priority of **collectd"", **nlbwmon, and uhttpd. performance governor could also be used.

The low latency kernel could potentially provide better results, but it is not booting and I do not have a serial access to see what is going on.

What does low latency kernel mean?
Full preempt + 1000hz?

I also use voluntary preempt but 250hz on my wrt1200.
Indeed 1000hz will give you lower latency but also reduces throughput.
Bufferbloat.net also recommends 1000hz but I choose somewhat of the middle way here.
And I can also confirm that everything feels snappier.
I don't know how voluntary preempt can make that of a difference. Because it should only affect user space programs?

Is true that kernel hz should match your power grind frequency?
For example if your power grid frequency is 50hz you should use 250hz, 500hz or 1000hz?
For 60hz use 300hz, 600hz, 1200hz ?
But I guess thats a myth because the power grid voltage gets converted anyway.

Rt kernel is only useful if you have applications that can make use of it.
However I also tried the rt kernel thing x)
I could make the patches apply and openwrt just booted fine. But bugged all over the place. For example viewing the graphs generated insane amount of CPU usage then the system crashed :confused:

Yes, that is what I meant. It is called “Low Latency Desktop” in kernel_menuconfig I think. I was surprised that it did not boot.

I do not know, but 100ms spikes are gone. There was no independed confirmation though. Do not dnsmasq, hostapd, etc. run as user space apps?

Related to the WIFI-latency, some searching gave the following results:

A reason for the latency spikes to occur: http://blog.cerowrt.org/post/disabling_channel_scans/

A solution is described here:

https://answers.microsoft.com/en-us/windows/forum/windows_10-networking/is-there-any-way-to-stop-windows-10-from-scanning/3870b3d1-0f07-4875-8779-bb5c11fce0a8

Also, at the end of this help-thread there is a program mentioned that can be found on this page:

It is old but reported to work in Windows 10.

Please note: this is the result of my searches on the subject of latency, I have not tested this yet. Will try to do this during the weekend.

Yes they do.
But as i understand that entire preempt thing...
What it does is, it allows user space programs to interrupt the kernel.
Or im wrong here?
And the difference between voluntary and full preempt is that voluntary adds some "interrupt points" to the kernel and full adds even more.
So it would rather expect the opposite by enabling preempt.

I haven't tested full preempt. I cant tell if it does boot or not.
I think 250hz + voluntary is also default on ubuntu servers (and debian?) so i will stick with that.

@bouwew
Thanks for the links.
But the majority of clients here are android based.
The wifi got better since i switched to the voluntary.
But a couple of commits in the latest trunk tree did also improve it quite a bit.
Maybe the lag comes also from power saving feature of android.
Or because of the adblock i use here.
Im not quite sure. But it did improve :wink:
And the mwlwifi driver is still a bit bugged.

I think it allows any thread to interrupt the kernel, in particular it allows kernel tasks to interrupt other kernel tasks, which might allow packet processing to interrupt say garbage collection type maintenance operations.

Just wanted to briefly say: to all attempting to solve this issue, it is appreciated. I'm unfortunately not in a position to play around with or test my R7800 (in fact, I'm on stock due to the ping issues, and other issues with netlink bandwidth monitor).

For what it's worth, my unit has 'Antenna #' on the antennae; and I also experience ping spikes. Switching over to stock, as expected, the latency issues go away.

Not as detailed and thorough as some of your posts here, but have a look at my own evidence located in this post here.

As far as I could remember, kernel log is the same as hnyman's when booting; so same flash (Micron), RAM (Nanya), CPU stepping/revision, etc..

Truly is very strange that some users don't experience this issue... So then, maybe it has something to do with the connection between a certain modem and it's WAN connection to the R7800. Maybe there is something odd with the switching between the QCA8337 and the respective user's modem (does that even control the WAN port?)? I'm using an SB8200 as the modem, so it's a Broadcom BCM3390Z.

Here is a proper answer: http://devarea.com/understanding-linux-kernel-preemption/. Preemption is about a low priority thread not blocking a higher priority one. So it makes sense that voluntary preemption has improved latency, but a pre-emotive kernel is the best. Not real time, though.

On embedded systems with soft real time requirements it is a best practice to use this option but in a server system that we are usually work asynchronously the first option is better – less context switches – more cpu time

The default kernel configuration is definitely wrong. It is unfortunate that best option is not booting.

We need to find why it doesn't boot... This should be stock for a stock image