Netgear R7800 exploration (IPQ8065, QCA9984)

@Ansuel how do you send patch to mailing list?

I tried using git send-email but I think the port got blocked by my uni.

EDIT: nvm I fixed it. Just a typo in my config lol

Builds and loads fine. A few hours in, no issues. I plan to do another 5.10 dsa build later today, load that, and then I'll let that one go for as long as I can.

r7500v2 # uptime
 15:49:08 up  4:59,  load average: 0.00, 0.02, 0.05
r7500v2 # mbw 32
Long uses 4 bytes. Allocating 2*8388608 elements = 67108864 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0       Method: MEMCPY  Elapsed: 0.04399        MiB: 32.00000   Copy: 727.372 MiB/s
1       Method: MEMCPY  Elapsed: 0.04337        MiB: 32.00000   Copy: 737.769 MiB/s
2       Method: MEMCPY  Elapsed: 0.04375        MiB: 32.00000   Copy: 731.445 MiB/s
3       Method: MEMCPY  Elapsed: 0.04338        MiB: 32.00000   Copy: 737.701 MiB/s
4       Method: MEMCPY  Elapsed: 0.04313        MiB: 32.00000   Copy: 742.012 MiB/s
5       Method: MEMCPY  Elapsed: 0.04322        MiB: 32.00000   Copy: 740.329 MiB/s
6       Method: MEMCPY  Elapsed: 0.04290        MiB: 32.00000   Copy: 745.938 MiB/s
7       Method: MEMCPY  Elapsed: 0.04343        MiB: 32.00000   Copy: 736.886 MiB/s
8       Method: MEMCPY  Elapsed: 0.04314        MiB: 32.00000   Copy: 741.788 MiB/s
9       Method: MEMCPY  Elapsed: 0.04287        MiB: 32.00000   Copy: 746.443 MiB/s
AVG     Method: MEMCPY  Elapsed: 0.04332        MiB: 32.00000   Copy: 738.725 MiB/s
0       Method: DUMB    Elapsed: 0.22585        MiB: 32.00000   Copy: 141.686 MiB/s
1       Method: DUMB    Elapsed: 0.22003        MiB: 32.00000   Copy: 145.434 MiB/s
2       Method: DUMB    Elapsed: 0.21724        MiB: 32.00000   Copy: 147.301 MiB/s
3       Method: DUMB    Elapsed: 0.21546        MiB: 32.00000   Copy: 148.521 MiB/s
4       Method: DUMB    Elapsed: 0.21550        MiB: 32.00000   Copy: 148.494 MiB/s
5       Method: DUMB    Elapsed: 0.22330        MiB: 32.00000   Copy: 143.308 MiB/s
6       Method: DUMB    Elapsed: 0.21580        MiB: 32.00000   Copy: 148.285 MiB/s
7       Method: DUMB    Elapsed: 0.21538        MiB: 32.00000   Copy: 148.575 MiB/s
8       Method: DUMB    Elapsed: 0.21532        MiB: 32.00000   Copy: 148.616 MiB/s
9       Method: DUMB    Elapsed: 0.21756        MiB: 32.00000   Copy: 147.085 MiB/s
AVG     Method: DUMB    Elapsed: 0.21814        MiB: 32.00000   Copy: 146.692 MiB/s
0       Method: MCBLOCK Elapsed: 0.04245        MiB: 32.00000   Copy: 753.846 MiB/s
1       Method: MCBLOCK Elapsed: 0.04873        MiB: 32.00000   Copy: 656.639 MiB/s
2       Method: MCBLOCK Elapsed: 0.04319        MiB: 32.00000   Copy: 740.947 MiB/s
3       Method: MCBLOCK Elapsed: 0.04271        MiB: 32.00000   Copy: 749.169 MiB/s
4       Method: MCBLOCK Elapsed: 0.04331        MiB: 32.00000   Copy: 738.894 MiB/s
5       Method: MCBLOCK Elapsed: 0.04275        MiB: 32.00000   Copy: 748.556 MiB/s
6       Method: MCBLOCK Elapsed: 0.04321        MiB: 32.00000   Copy: 740.518 MiB/s
7       Method: MCBLOCK Elapsed: 0.04291        MiB: 32.00000   Copy: 745.712 MiB/s
8       Method: MCBLOCK Elapsed: 0.04304        MiB: 32.00000   Copy: 743.460 MiB/s
9       Method: MCBLOCK Elapsed: 0.04350        MiB: 32.00000   Copy: 735.598 MiB/s
AVG     Method: MCBLOCK Elapsed: 0.04358        MiB: 32.00000   Copy: 734.265 MiB/s
r7500v2 # uname -a
Linux r7500v2 5.10.39 #0 SMP Sun May 23 11:12:18 2021 armv7l GNU/Linux
1 Like

I've ran iptraf3 for an hour now, temps are around 58C... router is stable, no crashes.
Summary:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-3600.01 sec   358 GBytes   855 Mbits/sec    0             sender
[  5]   0.00-3600.00 sec   358 GBytes   855 Mbits/sec                  receiver

It could be shortened/fixed/unified a little:

#!/bin/sh /etc/rc.common

START=99
set_irq_affinity() {
	local name="$1" val="$2"
	local irq
	case "$name" in
		wifi0)  irq=`grep -E 'qcom-pcie-msi' /proc/interrupts | sed "1q;d" | cut -d: -f1 | tr -d ' '` ;;
		wifi1)  irq=`grep -E 'qcom-pcie-msi' /proc/interrupts | sed "2q;d" | cut -d: -f1 | tr -d ' '` ;;
		*)      irq=`grep -E "$name"         /proc/interrupts | sed "1q;d" | cut -d: -f1 | tr -d ' '` ;;
	esac

	[ -n "$irq" ] || echo "$name irq not found."
	echo "$val" > "/proc/irq/$irq/smp_affinity"
}
boot() {
	set_irq_affinity wifi0 2
	set_irq_affinity wifi1 1
	set_irq_affinity eth0 1
	set_irq_affinity eth1 2
}

You can't force set the IRQ affinity for wifi on the R7800, it's always bound to cpu 0.

That's NOT correct at least for version 19.07:

# cat /proc/interrupts
           CPU0       CPU1
 16:    4761738   11572469     GIC-0  18 Edge      gp_timer
 18:         33          0     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Edge      ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 28:   17348416   38919326     GIC-0  67 Edge      qcom-pcie-msi
 29:         35          0     GIC-0  89 Edge      qcom-pcie-msi
 30:     251968          0     GIC-0 202 Edge      adm_dma
 31:   17039152          0     GIC-0 255 Level     eth0
 32:         27    7491548     GIC-0 258 Level     eth1
 33:          0          0     GIC-0 130 Level     bam_dma
 34:          0          0     GIC-0 128 Level     bam_dma
 35:          0          0   PCI-MSI   0 Edge      aerdrv
 36:   17348416   38919326   PCI-MSI   1 Edge      ath10k_pci
 68:          0          0   PCI-MSI   0 Edge      aerdrv
 69:         35          0   PCI-MSI   1 Edge      ath10k_pci
101:          7          0     GIC-0 184 Level     msm_serial0
102:          2          0   msmgpio   6 Edge      keys
103:          2          0   msmgpio  54 Edge      keys
104:          2          0   msmgpio  65 Edge      keys
105:          0          0     GIC-0 142 Level     xhci-hcd:usb1
106:          0          0     GIC-0 237 Level     xhci-hcd:usb3
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:    9684241   17396011  Rescheduling interrupts
IPI3:    3140607    1079852  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:    3506309    5909129  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0
# uname -a
Linux R7800 4.14.221 #0 SMP Mon Feb 22 15:36:55 2021 armv7l GNU/Linux

# ubus call system board
{
        "kernel": "4.14.221",
        "hostname": "R7800",
        "system": "ARMv7 Processor rev 0 (v7l)",
        "model": "Netgear Nighthawk X4S R7800",
        "board_name": "netgear,r7800",
        "release": {
                "distribution": "OpenWrt",
                "version": "19.07-SNAPSHOT",
                "revision": "r11312-e9c0c5021c",
                "target": "ipq806x/generic",
                "description": "OpenWrt 19.07-SNAPSHOT r11312-e9c0c5021c"
        }
}

Will put this here just for the LULZ
My overclocked router still hasn't crashed
Uptime 7d 5h 15m 23s
(1.9 GHz 1.7 Cache)

2 Likes

I have never managed to change it, not on kernel 4.14 or 5.10 running the ath10k-ct driver.

 53:   30060003          0   PCI-MSI 524288 Edge      ath10k_pci
 54:   14394481          0   PCI-MSI 134742016 Edge      ath10k_pci

# echo 2 > /proc/irq/54/smp_affinity
ash: write error: Invalid argument

Also regarding the script, this won't match anything on the R7800:
grep -E 'qcom-pcie-msi' /proc/interrupts

This is what @robimarko said about changing IRQ:

As wrote above I am using 19.07 with 4.14.221 kernel and ath10k-ct driver.
I am using the script mentioned above to relocate IRQs related to WiFi devices.
This script is working flawlessly on my system as you can se from the /proc/interrupts statistics.
Answering your note regarding DWC PCI the script is not trying to relocate PCI devices because it is not possible to perform.
It relocates only GIC MSI IRQs by searching qcom-pcie-msi devices in the interrupts list.

Could it be that your R7800 device is of the first revisions not supporting GIC MIS for wireless devices?

Perhaps, I don't know, I haven't seen any revisions of the R7800, I know Netgear use different flash memory tho' if I remember correctly.

I know you were replying to another post, but since this is thread is about the R7800, for the script to work people need to change qcom-pcie-msi to ath10k_pci then it will output things correctly. But it's not guaranteed it will allow remapping the IRQ for wifi.

By searching the forum you can ensure that other people have 'qcom-pcie-msi' devices on 19.07:

Also you could search for '/proc/irq/28/smp_affinity' to ensure that there are many people reporting to have GIC IRQs 28 and 29 in their systems running ver. 19.07.

As I can get you are running system other than 19.07 right now.
Maybe it would be more correct for you to make such categorical statements only regarding the build you are running on?

I've been running master builds since the very begining, I'm pretty sure my wifi devices have always been named ath10k_pci for me but I could be wrong. Weird and I'm sorry but I thought the R7800 was the same for all.
Perhaps some developer can explain why some have ath10k_pci and some qcom-pcie-msi? Are there different HW revisions after all? Or is it because of the kernel change?

On kernel 4.14 the IRQ's were 45 and 46 for me.
On kernel 5.10 using DSA which i'm running now the IRQs are 53 and 54.

r7500v2 # uname -a
Linux r7500v2 5.10.39 #0 SMP Mon May 24 10:47:46 2021 armv7l GNU/Linux
r7500v2 # uptime
 15:02:07 up 7 days,  4:38,  load average: 0.17, 0.10, 0.09

no issues

This build includes commit e5d50f6 and dsa.

The build just prior to this (also with dsa but did not include the afore mentioned commit) ran 3+ days also no issues.

HTH

my theory about defective device is becoming real....
i just need to find time and backtrack this...

  • could be a bad code in the low level scaling driver
  • cpufreq driver reading the wrong pvs (1 instead of 0 ?)
  • defective device that also crash with the oem firmware
  • add some type of workaround to add a userspace way to declare a higher pvs manually

for the sake of everyone with an ipq806x device, i hope not. Our devices, while similar, are different so I hold out hope.

the problem is strictly about voltage and frequency scaling. It looks like system became unstable after some time... This is very present with nss build but even there there are guys with uptime of 30+ days....

Out of curiosity, can you give me output of regulator_summary? I'm curious to check your pvs

1 Like

Let me know if you want to see anything else

#
# /bin/cat /sys/kernel/debug/regulator/regulator_summary
#
 regulator                      use open bypass  opmode voltage current     min     max
---------------------------------------------------------------------------------------
 regulator-dummy                 13   13      0 unknown     0mV     0mA     0mV     0mV 
    29000000.sata-target          1                                 0mA     0mV     0mV
    29000000.sata-phy             1                                 0mA     0mV     0mV
    29000000.sata-ahci            1                                 0mA     0mV     0mV
    1b700000.pci-vdda_refclk      1                                 0mA     0mV     0mV
    1b700000.pci-vdda_phy         1                                 0mA     0mV     0mV
    1b700000.pci-vdda             1                                 0mA     0mV     0mV
    1b500000.pci-vdda_refclk      1                                 0mA     0mV     0mV
    1b500000.pci-vdda_phy         1                                 0mA     0mV     0mV
    1b500000.pci-vdda             1                                 0mA     0mV     0mV
    s1a                           1    1      0 unknown  1100mV     0mA  1050mV  1150mV 
       soc:l2-cache-l2            1                                 0mA  1100mV  1100mV
    s1b                           0    0      0 unknown  1050mV     0mA  1050mV  1150mV 
    s2a                           1    1      0 unknown  1050mV     0mA   800mV  1250mV 
       cpu0-cpu                   1                                 0mA  1045mV  1155mV
    s2b                           1    1      0 unknown  1050mV     0mA   800mV  1250mV 
       cpu1-cpu                   1                                 0mA  1045mV  1155mV
 SDCC Power                       1    0      0 unknown  3300mV     0mA  3300mV  3300mV 

#
# END /bin/cat /sys/kernel/debug/regulator/regulator_summary
#

For reference, here is the regulator summary recorded May 23 from the "stable" 5.4.99 build up for 52+ days. Should they be different?

#
# /bin/cat /sys/kernel/debug/regulator/regulator_summary
#
 regulator                      use open bypass  opmode voltage current     min     max
---------------------------------------------------------------------------------------
 regulator-dummy                 10   13      0 unknown     0mV     0mA     0mV     0mV
    29000000.sata                 1                                 0mA     0mV     0mV
    29000000.sata                 1                                 0mA     0mV     0mV
    29000000.sata                 1                                 0mA     0mV     0mV
    1b700000.pci                  1                                 0mA     0mV     0mV
    1b700000.pci                  1                                 0mA     0mV     0mV
    1b700000.pci                  1                                 0mA     0mV     0mV
    1b500000.pci                  1                                 0mA     0mV     0mV
    1b500000.pci                  1                                 0mA     0mV     0mV
    1b500000.pci                  1                                 0mA     0mV     0mV
    s1a                           0    2      0 unknown  1100mV     0mA  1050mV  1150mV
       cpu1                       0                                 0mA  1100mV  1150mV
       cpu0                       0                                 0mA  1100mV  1150mV
    s1b                           0    0      0 unknown  1050mV     0mA  1050mV  1150mV
    s2a                           0    1      0 unknown  1100mV     0mA   800mV  1250mV
       cpu0                       0                                 0mA  1100mV  1100mV
    s2b                           0    1      0 unknown  1100mV     0mA   800mV  1250mV
       cpu1                       0                                 0mA  1100mV  1100mV
 SDCC Power                       1    0      0 unknown  3300mV     0mA  3300mV  3300mV

#
# END /bin/cat /sys/kernel/debug/regulator/regulator_summary
#

you have performance gov ?

no, but the cpu min freq. is limited to 800 MHz and I put eth0/eth1 on cpu 1. I.e. from /etc/rc.local:

echo 800000 > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo 75 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor
echo 1000000 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate
echo 2 > /proc/irq/37/smp_affinity 
echo 2 > /proc/irq/38/smp_affinity

and

r7500v2 # cat /proc/interrupts
           CPU0       CPU1       
...
 37:          6   81672625     GIC-0 255 Level     eth0
 38:          8      49389     GIC-0 258 Level     eth1
...

This is an AP only, load average is 0.15-0.25 most of the time, and I rarely see it go above 0.5.

I can use netperf from 2-5 clients (most by wifi but one or two wired if u like) and put it under some load if that helps.

I don't think its related but 5-6 months back i did crash a 5.4 image doing netperf with two wifi clients one sending and the other receiving from the sever. I so far have not been able to reproduce this and I have no logs from the one event.

EDIT just tried netperf -t TCP_MAERTS -l 60 -D 1s -H xxx.xxx.45.26 from one wifi client and netperf -l 60 -D 1s -H xxx.xxx.45.26 from another wifi client to netperf running on the router several times and no issues. During the test the 5.10 + dsa AP load was:

load average: 0.03, 0.05, 0.06

so I guess not the best way to put my AP under any kind of load.

you can have pvs 5 3 or 0
can you give me clk_summary with regulator_summary ?