R7800 performance

           CPU0       CPU1
 16:      41612     111819     GIC-0  18 Edge      gp_timer
 18:         33          0     GIC-0  51 Edge      qcom_rpm_ack
 19:          0          0     GIC-0  53 Edge      qcom_rpm_err
 20:          0          0     GIC-0  54 Edge      qcom_rpm_wakeup
 26:          0          0     GIC-0 241 Edge      ahci[29000000.sata]
 27:          0          0     GIC-0 210 Edge      tsens_interrupt
 28:     180811      18578     GIC-0  67 Edge      qcom-pcie-msi
 29:     188100      85549     GIC-0  89 Edge      qcom-pcie-msi
 30:     205667         25     GIC-0 202 Edge      adm_dma
 31:       3798      11330     GIC-0 255 Level     eth0
 32:        131         26     GIC-0 258 Level     eth1
 33:          0          0     GIC-0 130 Level     bam_dma
 34:          0          0     GIC-0 128 Level     bam_dma
 35:          0          0   PCI-MSI   0 Edge      aerdrv
 36:     180811      18578   PCI-MSI   1 Edge      ath10k_pci
 68:          0          0   PCI-MSI   0 Edge      aerdrv
 69:     188100      85549   PCI-MSI   1 Edge      ath10k_pci
101:         10          0     GIC-0 184 Level     msm_serial0
102:          2          0   msmgpio   6 Edge      gpio-keys
103:          2          0   msmgpio  54 Edge      gpio-keys
104:          2          0   msmgpio  65 Edge      gpio-keys
105:          0          0     GIC-0 142 Level     xhci-hcd:usb1
106:          0          0     GIC-0 237 Level     xhci-hcd:usb3
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:      35221      75618  Rescheduling interrupts
IPI3:         36      20041  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:      49216      85866  IRQ work interrupts
IPI6:          0          0  completion interrupts
Err:          0

A cursory conclusion would be that the irqs are still not being balanced properly, with core #0 receiving the bulk of interrupts from the devices that are most active.

I am in the middle of debugging a similar situation myself.

It would seem so. Hopefully, someone will provide a fix soon.

It's hard to argue with the raw figures you have in your rrd graphs - but I wonder whether a lot of this isn't architecture specific.

Look at a couple of ARM based boards I'm seeing similar imbalances in the IRQs raised - where most of the device IRQs are raised against a single core - however in those cases it looks like at least some of the work is being rescheduled later (presumably via the rescheduling interrupts).

I suppose the test would be to attempt to actually max out one of the cores - and make sure the load was redistributed properly at that point - certainly it looks like at low levels there's a supervisory function assumed by the first core.

FYI: @cannesahs has a few interesting notes to help improve latency here: Netgear R7800 exploration (IPQ8065, QCA9984)

Today I tested Linksys EA8500 (same hardware, as in R7800, but CPU 1.4Ghz instead of your 1.7GHz ),
with OpenWRT 18.06.1 ,
default settings + light tuning (net buffers).
static address, NAT, 2 PC (1st in LAN, 2nd in WAN),
port forwarding (for tests in both directions) for Iperf port.
Few simple rules in firewall (for ssh, ipsec)

Iperf, ftp. (ftp test use passive ftp mode.)

Iperf (tcp, 2 streams, 250K buffers) :
(WAN-LAN, ~~ same for LAN-WAN)

without software offloading,
default settings for CPU governor & power management :
540-560 Mbits/sec. (~70-80 %sirq)

with software offloading,
default settings for CPU governor & power management :
635-650 Mbits/sec. (but less %sirq)

without any software offloading,
optimized settings for CPU governor & power management :
870-900 Mbits/sec. (~50-65 %sirq)

and 900 Mbits/sec. isn't a 100% load - router may more speed (for example, in duplex).

ftp, 1 stream , without any software offloading, in WAN<->LAN (both directions) ,
default settings for CPU governor & power management :
65-70 Mbytes/sec.
optimized settings for CPU governor & power management :
95-103 Mbytes/sec.

Next test : routing disabled, only WiFi AP, speed is limited by speed of WAN channel to other router (100/100 Mbits).
Even for this light load download to wifi client was ~90 in both cases, but upload was worse for default settings for CPU governor.

optimized settings for CPU governor & power management :

  1. settings for ondemand scheduler:
    35 for /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
    (for up_threshold =30 or 40 I not detect any difference.)

10 for /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

  1. or set performance scheduler/governor , no additional settings.

Conclusion :
Settings for CPU governor & power management is
very important for this CPU for hi-speed channels.
Default settings for CPU governor & power management in OpenWRT 18.06.1 (and 17.01.xx too ) is very poor for any router, based on IPQ80xx or any other CPU with advanced power&frequency management .
Yes, I know that these numbers also are in many other "stock kernels", but it's not for routers& firewalls ! ( and not for many more other specific devices)

8 Likes

Changing from soc/cpu frequency to another or from powersaving / "sleep-state" (eg. C1E on x86 intel side) to active takes ages compared to what datatransfering needs.

Try leaving only two available frequencies for ondemand to pick from, if you really want use it.

On heavyish / moderate load it doesn't matter so much as cpu will stay active, but in "normal" use there is anough time between computing / irq-wake-ups to put cpu in powersaving and then wake to first packet of burst takes too long. Ondemand tuning and performance governor can reduce "issue", but won't make it go away fully.

And latencies matter also to get full line speed on single tcp-connection if tcp window isn't big enough / huge.

Yes, imho it's reason for enable "performance" mode.

it's micro-seconds for some packets,
Frequency tuning and performance governor give me 1.5-2x difference on real traffic.

" without software offloading,
default settings for CPU governor & power management :
540-560 Mbits/sec. (~70-80 %sirq)

without any software offloading,
optimized settings for CPU governor & power management :
870-900 Mbits/sec. (~50-65 %sirq)"

1 Like

Could someone please explain if it's feasible to use the ondemand scheduler in combination with setting the scaling governor as well as setting the maximum cpu frequency as advised here: Netgear R7800 exploration (IPQ8065, QCA9984)

I'm having a hard time breaking 100 Mbps through NAT with the two R7800s that I have. When I just reboot the router, I get about 6-700 Mbps. However, as soon as the router has been up for a minute, the performance goes down under 100 Mbps. I've tried with a linksys WRT3200ACM and consistently get 900 Mbps, but of course the R7800 radios work way better than the WRT ones, so I'd like to stick with the R7800.

I see people on here complaining of "bad" performance around 300 Mbps, which I would love to get consistently. Does anyone have suggestions on what I might be doing wrong. Some more context:

  • OpenWrt 18.06.4 r7808-ef686b7292
  • iperf3 to the router from the LAN gets 900 Mbps, so it seems NAT related
  • iperf3 from the router to the WAN gets terrible performance (though if I plug in a computer or a linksys router, the WAN gives fantastic performance)
  • I have 10 firewall rules and 4 port forwards
  • 4 static routes to /20 IPv4 prefixes for other routers on the LAN side, and typically ~20 devices showing in arp -a
  • In steady state conntrack -L | wc -l ranges between 150-400 connections, about half look like DNS (port 53).
  • Just two firewall zones (lan and wan)
  • Top does not show much CPU load even when saturating NAT at ~93 Mbps.
  • I've tried all three of "Software based offloading" checked, both Software and hardware checked, and none checked, and it doesn't seem to make much different to performance or CPU utilization.

Thanks for any suggestions.
Under the firewall menu, I have something that says "Routing/NAT Offloading", but no check box. Next to it reads "Experimental feature not fuly compatible with QoS/SQM". Given that I have near line-rate upstream, I really don't care about QoS, but don't see any place to disable it.

Any suggestions on how I might at least get to 300 Mbps or whatever people on here consider the bad performance? I have played with CPU governors and such, but as expected that stuff doesn't matter because something else must be going on as my CPU shows plenty of idle time.

1 Like

Did you try 19.07? There was a performance affecting bug fixed back in May.
You would have to compile 19.07 this time.

I was able to get ~750 Mbps without SQM.

I tried the snapshot and 19.06.4, and neither can break 100 Mbps. How do I get 19.07?

If it was recently within the last couple of months, then it is the same as 19.07.

Looks very suspicious as this is almost max for 100Mbps link: are you sure all your devices are negotiating 1Gbps connection and not 100Mbps? Maybe a bad cable/port?

I can share my build with you if you want to try: it includes some perf optimizations that are not a part of the default image.

Oh yes, I just tried https://downloads.openwrt.org/snapshots/targets/ipq806x/generic/openwrt-ipq806x-netgear_r7800-squashfs-factory.img tonight after reading your messages. (It was strange because that build is missing luci, but it was routing packets okay at the 100 Mbps rate.)

Seems very unlikely, because A) I can briefly get much faster right after rebooting the router, and B) I can get 900 Mbps with a linksys router. Oh and also I own two R7800s and two linksys routers, so the common factor is always the R7800.

So the only thing I can think is that my ISP could be throttling based on MAC address or something.

I haven't yet done a controlled experiment, so what I could do is set up one of my R7800s NATed to my local network instead of trying to benchmark through my ISP.

1 Like

How do you connect to ISP? PPPOE?

Also, if you have a gigabit switch, place it between your r7800 and the ISP modem.

My ISP installed some kind of box with fiber coming in on one side and an Ethernet port. I connected that Ethernet port to a 1Gig switch, and the R7800's WAN port plugs into that switch. The devices plugged into the switch (including the R7800) get public IP addresses from the ISP via DHCP.

That's already how it's configured (so I can bypass the R7800's poor NAT performance...)

Can you share your tweaks? I am using hnyman build and I am wondering if those will work.

This sounds like a result of the Linux kernel initially starting with the performance governor before reverting to the ondemand governor shortly after (1-2 minutes?) settling in to normal operation, which is apparently what happens.

You have said that you've experimented with the governor settings however I'd suggest you look at these again because there are a couple of things that can be overlooked:

  • there's a separate governor for each CPU core, so you need to set each one separately
  • unless the commands to set the governor are saved in a place like /etc/rc.local, changes made via SSH are lost when you reboot

The simplest approach is just to set the governor to performance for each core, which then means each core runs at maximum speed. If you want to keep the CPU cores running at low speeds when there is little activity, then @ddwrt_refugee's settings here are your best bet.

You probably also need to look into moving some interrupt handling off the first core (cpu0) as well.

These settings definitely affect network throughput once you're looking at handling over 100-150Mbps of traffic with any form of QoS.

LOL 100-150Mbps with current official openwrt build? Seriously you need to fix your builds. 1.7Ghz Arm core can easily do 1gbps nat in sw. In fact I have a custom ipq8x board here with a custom fixed openwrt kernel, that does this:-)