Netgear R7800 exploration (IPQ8065, QCA9984)

Can you please share "reduce latency" settings?
I'm experiencing worse latency spikes UpTo 199ms and that's in latency sensitive games :frowning:
Cake doesn't make much difference

Setup
Ftth modem in bridge mode (single nat)
Unbound for DNS
Sqm Cake
Irq balance ON
Packet steering ON
Governor (tried both ON DEMAND with default & Performance)
Adblocker
Nothing fancy .....

you sure you have a correct sqm configuration? a wrong one can increase ping in voip/game scenario where upload is used with download

Yes, here it is
Was using same with WNDR3700V4 and it was latency free always
Struggling with r7800 and gave UpTo it and as a last resort , bothering you guys here for assistance

config queue 'eth1'
	option qdisc 'cake'
	option script 'piece_of_cake.qos'
	option ingress_ecn 'ECN'
	option itarget 'auto'
	option etarget 'auto'
	option enabled '1'
	option interface 'eth0.2'
	option download '45000'
	option upload '45000'
	option debug_logging '0'
	option verbosity '5'
	option qdisc_advanced '1'
	option squash_dscp '1'
	option squash_ingress '1'
	option egress_ecn 'NOECN'
	option qdisc_really_really_advanced '1'
	option iqdisc_opts 'nat dual-dsthost ingress'
	option eqdisc_opts 'nat dual-srchost'
	option linklayer 'ethernet'
	option overhead '44'
	option linklayer_advanced '1'
	option tcMTU '2047'
	option tcTSIZE '128'
	option tcMPU '0'
	option linklayer_adaptation_mechanism 'default'

Everything looks ok to me in sqm settings and speed tests shows OK for everything , it's games when this triggers and goes UpTo 199ms (non usable state) and comes back after few seconds and same repeats
BTW I'm on OpenWrt SNAPSHOT r20776-9e08724634 / LuCI Master git-22.260.19132-34dd31a

Trying every new release to see if it improves anything for me but to no luck :frowning:

Does your problem show up as well if your game PC is connected to a LAN port? If it only shows up for WIFI connected game PC, try to switch from the default ath10k-ct to the mainline ath10k. R7800 works great with the mainline ath10k firmware/driver. Some WIFI clients don't work well with ath10k-ct.

turn off packet steering and irqbalance. :wink:

@Ansuel @quarky @hnyman and all experienced users.
Currently my R7800 is running this @qosmio branch with NSS. I had two reboots for the first three days of use.
As there were opinions that the reboots were due to CPU frequency scaling issue I then switched to performance governor (without irqbalance and packet steering). Running for almost four days and still no reboot.
Currently @ACwifidude 5.10 NSS master branch is running on 4 other R7800s without irqbalance and packet steering but with the default ondemand CPU optimized settings.

    echo 600000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
	echo 600000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
	echo 25 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
	echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

They all are free from any spontaneous reboots for more than 15 days.

Now first of all excuse my lack of competence on the subject and the great appreciation to all that work on this but I really wonder.
How could something so fundamental as CPU scaling, present on a vast majority of hardware and software (Linux, Windows, Android, iOS and others) for a long time, cause such an issue.

1 Like

Many system handle the low level stuff themself like cache scaling or voltage... Ipq806x is very flexible and leaves control to the system and the cache corruption is around the corner of something goes wrong in the scaling.

1 Like

Unfortunately problem exists on wired clients as it's evident on Wifi clients
Any suggestions?

Doesn't help, tried before as well

If the clock is used by some part of the CPU that is buggy at that frequency (like a cache), when that part fails (locks up, corrupts data ...), - then predictable, deterministic behavior goes out the window. To put it mildly.

Hardware released into production isn't bug-free.

I once had to perform dram refresh in software, because the CPU support chip supposed to do that ... didn't.

And it's often cheaper to fix things in software than spin (correct) the silicon again.

Ideally we'd be working from the CPU errata but that doesn't seem to be published by Qualcomm. Dammit.

2 Likes

Try setting this to 1 - see if anything interesting is logged. It's also possible that logging could further screw up your performance, but it only has to be enabled long enough to get something to look at.

I've no idea what gets logged or if it's useful.

1 Like

@Ansuel
Any idea why the download/upload rates got inverted? folks with nss build seem to have no such issues. Router was running stable without reboot for 15 days though until i downgraded.

5.15 build:

5.10 build:

Above data was captured for wifi but i observed download rate of 5-20 mbps even when tested on ethernet using 5.15. Speedtest.net results were normal on both wifi/ethernet though.

try configuring router manually instead of setting it up from backup. It helped me once. Also check what happens when you disable sqm or use simple.qos

That asymmetry between upload and download is happening to everyone. We don't know exactly why yet. We found that mimicking the AC_VI queue in the AC_BE levels them out. However, I doubt is a good solution at this stage.

Speedtest doesn't saturates upload and download simultaneously.

2 Likes

@amteza

I think it depends a lot on the WIFI client being used as well. With Intel Wireless 8260 and Linux 5.19.1-arch2-1, I got relatively symmetric TCP download/upload speeds (~ 75/75) with the default aql_txq_limit 5000/12000. The TCP download speed only dropped when I reduced the aql_txq_limit.

1 Like

Good to know; I was surprised, to be honest. In my case, that asymmetry disappears using a computer connected to another AP acting as a client. Slower speeds along lower latency are expected with 2000/2000 aql_tx_limit, though. So it depends on what you prioritise. Nevertheless, a connection with a latency of only 8-10 ms in my network is much better than increasing my raw speed from 400-450 Mbps to 600 Mbps. Mainly when only two devices in my network are 3x3 MIMO, and none require tremendous speeds or are close enough to the AP to take advantage of a better MCS. I reckon this scenario is more similar to others.

Thank you for the graphs. I saw them, by the way. Unfortunately, I've been a tad busy lately, so I'm a little less active in the forums.

3 Likes

@hnyman do you have by chance a bootlog of 5.15? i need the first few lines of kernel bootup

kernel 5.15 commit b2c507b

(your PR just before your final merge)

1 Like

i'm working on kernel 6.1 and man it's so bugged....

3 Likes

don't go too far ahead, unless there is a real plan to move OpenWrt directly to it for the next release. 5.15 should be stable for all targets first :wink:

1 Like