Netgear R7800 exploration (IPQ8065, QCA9984)

The NSS driver is dependent on the NSS gmac driver. Splitting them them looks interesting. Difficult or doable?

actually nope... the nss core (the drv package) can be loaded without the gmac package... what we need to understand is if it can work without it. (if the gmac part is mandatory or the nss firmware can offload for example only wifi packet)

From the kmod part... they can be loaded separately in fact you load nss-dr first and the nss-gmac withour a problem

A quick test would be remove the gmac driver and check if the wifi offload works (but you need to enable the old gmac binding to make the master driver work)

I’ll give it a try later this week when I have a second. :sunglasses:

1 Like

The nss-drv needs the nss-gmac driver as nss-gmac has hooks for the nss-drv to tap into. So nss-drv needs nss-gmac. Using the in-tree gmac driver will disable all NSS functionalities, as the nss-drv will not load, IIRC.

1 Like

This statement is not quite true.

According to this commit:

Remove RPS/XPS support from netifd core, move the logic to a hotplug script that uses a different policy... bla bla bla
https://github.com/openwrt/openwrt/commit/916e33fa1e14b97daf8c9bf07a1b36f9767db679

packet steering was broken in 18.06 and later migrated to 19.07 in the same state.
You can examine the source code of referred script to see the cause of the problem.

As a workaround, many users remove or disable the '/etc/hotplug.d/net/20-smp-tune' script and then add several commands to the init script:

https://forum.openwrt.org/t/r7800-cache-scaling-issue/44187/18

Or:
https://bugs.openwrt.org/index.php?do=details&task_id=2573

echo 3 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 3 > /sys/class/net/eth0/queues/tx-0/xps_cpus

Feel free to search the forum.
You will find many topics related to the question "How to restore normal (original) operation of RPS / XPS on 19.07".

2 Likes

(self promotion since i like to spam and i should be banned 1??!?!)

[PR] [WIP] ipq806x: request for kernel 5.10 + dsa - For Developers - OpenWrt Forum

6 Likes

@ACwifidude i'm checking the ath10k-ct offload patch... Did you say that it doesn't work ?

This was the breakdown for why it doesn’t work and why it doesn’t cause any issues.

I’m in over my head on fixing it. :grin:

When you have time (i think you use the normal ath10k driver)
Can you test the candela tech version with the ath10k plain driver?
(just replace the firmware with this one https://www.candelatech.com/downloads/ath10k-9984-10-4b/ath10k-fw-beta/firmware-5-ct-full-community.bin )

IMHO the candela firmware just doesn't have the feature compiled in the firmware as in my case it just crash the firmware and no traffic is transmitted... Also the patch for ath10k-ct has some error anyway that cause kernel crash on any firmware crash so i would just drop it

by testing if the ath10k-ct firmware works with plain ath10k driver we would test if the feature is actually compiled in and we would understand if the problem is in the driver itself that require some extra """care""" to make this work... I checked the driver code a bit and WOW SO MUCH CHECKS... They for sure slow down all the wifi operation... wonder if by just cutting them we would gain some extra mb of traffic...
From what i can understand the candela driver really try to work in the more clean way even by removing some performance... There is actually the version htt-mgm that I think use some type of offload but the extra check slow down all the operation

3 Likes

I briefly tested ct driver with encap offload with upstream firmware and it works fine so I guess the problem could be that ct special feature can't deal with 802.3 frames.

1 Like

Created an issue on ath10k-ct... if anyone wants to give some extra feedback about the benefits of the patch feel free to put a comment...

4 Likes

Anyway yesterday me and greearb wasted some time on finding the cause of the firmware crash...
From what i can understand the feature should work but it's really hard to find the cause... We found the version that caused the crash but still no clue about why... and in all case even by using a very old firmware that doesn't crash results in no connection... Let's hope greearb want's to continue fixing this and we find something.

1 Like

Can we try to upstream the patch?

Are we allowed to do that?

Yes we should be allowed to do that but it does require some changes... We need to check the patchwork mailing list about this patch and do the requested changes. With a quick read, they complained about this not being compatible with some old chip version so a solution would be to limit this feature to a specific set of firmware... In short, the patch as is won't be ever accepted

I think the main problem that it wasn't accepted was that it didn't implement the part for 64bit target(WCN3990)

EDIT: did a quick search WCN3990 is a chip for mobile phones

and the complain was that probably it wasn't supported so the patch needs to be changed to address this i think.

1 Like

I'm curious why wireless is bound one cpu, afaik irqbalance etc. can't change it since it's of type "PCI-MSI" and not "GIC-0".

 53:    4332913          0   PCI-MSI 524288 Edge      ath10k_pci
 54:         28          0   PCI-MSI 134742016 Edge      ath10k_pci

Is there a way or do we have to live with the fact that it's always using cpu0 for wireless?

DWC PCI controller only has one MSI interrupt so you can't really balance it out.
Only newer revisions have moved to use the GICv2m MSI extension to handle interrupts dynamically.

1 Like

Ok! Thanks.

Btw, if anyone is interested here are the tweaks I use for the DSA build, it can be adjusted to the normal swconfig build if you change the IRQs for eth0, eth1 and adm dma.

echo 20 > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold
echo 10 > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor

# bind eth0 and eth1 to different cpu cores for even distribution, 
# IRQ31 - WAN, IRQ32 - LAN for nonDSA build
echo 1 > /proc/irq/37/smp_affinity
echo 2 > /proc/irq/38/smp_affinity

# set adm dma to cpu core 2, mostly because it's the least used cpu core due to the above wifi "issue".
echo 2 > /proc/irq/39/smp_affinity

# reduce latency
ethtool -C eth0 tx-usecs 0
ethtool -C eth1 tx-usecs 0
ethtool -C eth0 rx-usecs 31
ethtool -C eth1 rx-usecs 31

# tweak interfaces to use more than one cpu for rx/tx
# this can also be done with:
# uci set network.globals.packet_steering=1
for FILE in /sys/class/net/*/queues/[rt]x-0/[rx]ps_cpus; do
    [ -w "$FILE" ] && echo 3 > "$FILE" 2>/dev/null
done

# There is no need for collectd to run above nice == 19
if ! `grep "NICEPRIO=19" /etc/init.d/collectd`; then
   sed -i 's/^NICEPRIO.*/NICEPRIO=19/g' /etc/init.d/collectd
   # Restart does not pick up the above change right away
   (sleep 300 ; /etc/init.d/collectd stop; sleep 15; /etc/init.d/collectd start) &
fi

# There is no need for uhttpd to run above nice == 19
if [ ! `grep "nice -n 19" /etc/init.d/uhttpd` ]; then
   sed -i "s/procd_set_param command/procd_set_param command nice -n 19/g" /etc/init.d/uhttpd
   # Restart does not pick up the above change right away
   (sleep 300 ; /etc/init.d/uhttpd stop; sleep 15; /etc/init.d/uhttpd start) &
fi

some update abatu ath10k-ct and encap offload...
we fixed the firmware to the latest version but still no traffic... i'm working by sniffing traffic and check if things can be fixed... I'm positive and i think there is hope.

3 Likes