Archer C7 5GHz performance, sirq 99%

First of all, set the performance governor. You should be able to do this. The ondemand one will create latency as the cpu needs to spool up to max. This script will let you set the governor or show you how to set it, whatever you choose. Unless you're using bash the script will not work as it uses a bash-specific regular expression comparison.

#!/bin/bash

CPUS=$(grep -c ^processor /proc/cpuinfo)
AVAILABLE_GOVERNORS=$(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors)

show_governor() 
{
	echo "available governors: $AVAILABLE_GOVERNORS"
	for i in $(seq 0 $(expr $CPUS - 1)); do
		echo "cpu$i: $(cat /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor)"
	done
}


case "$1" in
	"")
		show_governor
		exit 0;;
	*)
		;;
esac

if [[ "$AVAILABLE_GOVERNORS" =~ "$1" ]]; then

	for i in $(seq 0 $(expr $CPUS - 1)); do
		echo $1 > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor
	done

else

	echo "$(basename $0): unknown governor $1"
	echo "supply valid governor type or no arguments to show current available and current governors"
	exit 1

fi

Make sure you get the iperf direction right. To simulate download with the iperf server on the lan side and the client on the wlan side, the client needs the -R flag, otherwise the client sends data to the server (upload effectively) and then you're getting contention on the medium with the other clients. With the -R flag the server will send data to the wireless client and it's not contending for the medium with anything else, since the openwrt server itself schedules the download.

The sirqs are almost certainly being generated by network TX or RX. You can see by

cat /proc/sofitirqs

Do it before and after a iperf run, copy/paste the two outputs into an excel sheet using the text import paste function and check the difference to see where they're being generated. Depending on the direction of your iperf transfer most of the softirqs will likely either be in NET_TX or NET_RX.

Once you've got the iperf direction flow right and you've checked the softirqs, you may get some performance enhancement if you tune the smp_affinity of the irqs on your network adapters. For example, if it is the NET_RX ones doing the interrupts, put half of the network adapter hardware irqs on one core and half on the other core to spread the load a bit - the kernel will likely do the softirq on the same core as the initial hardware irq for that flow.

You can see the hardware irqs in /proc/interrupts and you pin them to a core by echoing a hex mask to /proc/irq/<irq>/smp_affinity, where in a two core system the binary mask 11 or hex 3 means use both cores, 1 means use core 0 and 2 means use core 1. On an untuned system, you'll likely see most network card irqs cluster on core 0.

Are you using sqm scripts or some kind of qos? If so, this will definitely generate a whole lot more softirqs and you should turn it off.

EDIT: you're unlikely, even in a cpu-unbounded state, to get wireless performance much above 500mbps on a 2x2 mimo client, no matter whether you use 80Mhz channels or not....