Netgear R7800 exploration (IPQ8065, QCA9984)

I can do it but I can't test it.

I can help testing it if it is merge-able into the stable release. I hear that master has a less stable wifi driver...

Well that would be awesome :). I can hack my way around a lot of code-bases but I've never done any work on linux, I would be poking around in the dark without a torch.

I run hnyman's build of master (actually i build my own off his scripts because of a couple of extra kmods), I can also help testing.

What is this achieving?

I just did a spot check and it looks like this was actually merged.

I did a spot check before I asked about it and I didn't think it was :). I mean, I can't find the changes in the very first file in the changeset (target/linux/ipq806x/files-4.9/arch/arm/boot/dts/qcom-ipq8064.dtsi. The first patch file he deleted is still there (target/linux/ipq806x/patches-4.9/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch).

My bad, I must have been smoking something.

No worries, it would have nice if it had been merged under the radar! Oh well...

Hi,
I'm seeing random reboots on my newly purchased Netgear r7800 (Nighthaws X4S). I'm not sure why and I'm looking for advice or tips on how to debug.

My setup is the following: I have a gigabit uplink and two VLAN, but otherwise it is a simple setup with a 2 GHz and 5 GHz WLAN.

What I've tried is the following:

  1. OpenWRT stock 18.06. To increase performance I tried both on demand with lowered thresholds and performance scaling govoner. I noticed more frequent reboots with the former.
  2. hnyman master build ath10k (r8960-9b9274342c). With default settings I still noticed reboots.

Is there any way to collect logs on what is going on? Can I redirect the logs to a different device? Can I hook up a USB-serial? Or do I need to attach to the actual serial on the inside?

Any suggestions?

Br,
Martin

Test disabling flow-offloading.

What kind of SQM bandwidth can the R7800 be expected to handle?

I will give that a try, but that would reduce my LAN to WAN throughput significantly. Curiously after the latest reboot the router now has 2 days of uptime.

More generally: what is the most appropriate way of debugging? Do I need to hook up a serial interface?

thanks.

I monitoring CPU temp. with performance and ondemand governors (with optimized settings), and without hi-load CPU temperature is practically equal. (LinkSys EA8500, 1.4Ghz)

I make few tests with different frequencies of CPU / L2 cache / mem,
for small quantities of data difference is 0 or small , but in other tests I received
dramatic difference.

(LinkSys EA8500, IPQ8064 1.4Ghz)
For example, compression with XZ (lzma2 method, 8Mb dict., 6 level of compression , 95Mb memory used) :

performance governor ,
use RAmDisk in /tmp

tar -cf - /usr/ > /tmp/test.tar
ls -la /tmp/test.tar
-rw-r--r-- 1 root root 18684928 Jan 13 23:31 /tmp/test.tar

top
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
12928 12927 root R 96220 20% 50% xz -6 -c test.tar
ps | grep xz
12927 root 1064 S time xz -6 -c test.tar
12928 root 96220 R xz -6 -c test.tar

test :
time xz -6 -c test.tar > test.tar.xz

echo 384000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 384000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
384000
384000

real 2m 20.71s
real 2m 21.17s
it's CPU 384 Mhz , frequencies of / L2 cache / mem always the same (low)

echo 600000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 600000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
600000
600000

real 1m 44.50s (user 1m 43.84s sys 0m 0.58s)
real 1m 44.47s
it's CPU 600 Mhz , frequencies of / L2 cache / mem always the same

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
800000
800000

real 1m 27.01s
real 1m 26.96s
it's CPU 800 Mhz , frequencies of / L2 cache / mem always the same

echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1000000
1000000

real 0m 59.55s
real 0m 59.53s
it's CPU 1000 Mhz with high (for 1000 mhz) frequencies of / L2 cache / mem

echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1200000
1200000

real 1m 8.98s
real 1m 9.02s
it's CPU 1200 Mhz with low (for 1200 mhz) frequencies of / L2 cache / mem

echo 1400000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1400000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1400000
1400000

real 1m 3.71s
real 1m 3.59s
it's 1400 Mhz with low (for 1400 mhz) frequencies of / L2 cache / mem

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 1400000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1400000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1400000
1400000

real 0m 45.12s
real 0m 45.19s
it's CPU 1400 Mhz with high (for 1400 mhz) frequencies of / L2 cache / mem

echo 800000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 800000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1000000
1000000

real 0m 59.56s
it's CPU 1000 Mhz with high (for 1000 mhz) frequencies of / L2 cache / mem

echo 1200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1200000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy1/scaling_max_freq
cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq
1000000
1000000

real 1m 16.21s
it's CPU 1000 Mhz with low frequencies of / L2 cache / mem

(I haven't had time to look openwrt/r7800 stuff in almost month.)

It moves & forces network interfaces' receiving and transmit queues to 2nd cpu core in default config (iirc). And is also reason why commonly used irq balancing attempts done with R7800 aren't as sufficient as should be as those attempts do not move all cpu work caused certain irq.

I also took another look to the script and seem there is config option to disable it. Might be better use that than remove / edit the script itself :slight_smile:

Interrupt first is handled by 1st core, but doing part of computing the interrupt caused is forced done by 2nd core.
Example: First interrupt waits 1st core to wake up and then fully processing (received network packet) also needs to wake up also 2nd core. In continuous heavy traffic this doesn't matter as cores are active already, but for eg. in web browsing this adds latency. (Well, there might be also some cache locality issues adding latency also in continuous traffic). Disabling script allows kernel to use both cores for queues (iirc) and it seemt to stay in already active core. I did not do thorought testing or test heavy loads.

So, if we run irqbalance, should we disable the script or not?

So, the is the option (/etc/config/network) you are talking about and it indeed disables the script. I will give it a shot.

config globals 'globals'
option default_ps '0'

see nbg6817-openwrt-rebooting-constantly
and FS#2026

Thanks Paul. I disabled flow-offloading as suggested by @slh. Whether or not this has anything to do with it I can't say, but my uptime record is now 6 days. I'm running hnymans master build in the non -ct variant.

If you can help me out with the most appropriate way to capture logs, I could append them to your bug-report.

Br,
Martin

FYI, new official fw again, 5 days old.