Netgear R7800 exploration (IPQ8065, QCA9984)

[quote="ddunford, post:311, topic:285"]
I have recently upgraded my R7800 to both the latest snapshot version, and 17.01 (inc RC1 and RC2) versions and I've lost the use of wifi.
[/quote]your log showed nothing strange.
And 17.01 has not had relevant radio changes for some time.
If you have trouble with snapshot and 17.01 rc versions, I think that you likely have a faulty wifi config. You have e.g. had a build with full wpad and that has generated config items that do not work with wpad-mini.

What errors does logread show when you do "wifi up" ?

Or delete /etc/config/wireless and start from scratch after a reboot.

1 Like

Here is the error when I run wifi up.

Sat Apr 1 14:25:58 2017 daemon.notice netifd: radio0 (32215): command failed: Not supported (-95)
Sat Apr 1 14:25:58 2017 daemon.notice netifd: radio1 (32229): command failed: Not supported (-95)
Sat Apr 1 14:25:59 2017 daemon.err hostapd: Configuration file: /var/run/hostapd-phy0.conf
Sat Apr 1 14:25:59 2017 daemon.err hostapd: Line 38: unknown configuration item 'ieee80211w'
Sat Apr 1 14:25:59 2017 daemon.err hostapd: 1 errors found in configuration file '/var/run/hostapd-phy0.conf'
Sat Apr 1 14:25:59 2017 daemon.err hostapd: Failed to set up interface with /var/run/hostapd-phy0.conf
Sat Apr 1 14:25:59 2017 daemon.err hostapd: Failed to initialize interface
Sat Apr 1 14:25:59 2017 daemon.notice netifd: radio0 (32215): cat: can't open '/var/run/wifi-phy0.pid': No such file or directory
Sat Apr 1 14:25:59 2017 daemon.notice netifd: radio0 (32215): WARNING (wireless_add_process): executable path /usr/sbin/wpad does not match process path ()
Sat Apr 1 14:25:59 2017 daemon.notice netifd: radio0 (32215): Command failed: Invalid argument
Sat Apr 1 14:25:59 2017 daemon.notice netifd: radio0 (32215): Device setup failed: HOSTAPD_START_FAILED

Guessing that's the problem :slight_smile:

Removing ieee80211w from /etc/config/wireless fixed the issue.

Thanks!

I wonder if anybody has tested how ipq806x: enable QCE hardware crypto inside the kernel could increase SSL/OpenVPN thoroughput or decrease CPU load.

1 Like

Well, I can give you some numbers for the IPQ40XX (Asus RT-AC58U at 666 MHz and not the Full 716 MHz - with QCE enabled)
Note: All values are MiB/s! and not Mbit/s!

This is with the qcrypto/QCE active:

root@xbow:~# cryptsetup benchmark
# Tests are approximate using memory only (no storage IO).
[...] (PBKDF benchmark - not important)
#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b    63.7 MiB/s    63.3 MiB/s
     aes-cbc   256b    55.5 MiB/s    55.4 MiB/s
     aes-xts   256b    50.5 MiB/s    60.7 MiB/s
     aes-xts   512b           N/A           N/A

The AES-XTS 512bit is not working with the QCE. But it is with software ciphers.

w/o crypto (aka unbind qcrypto via sysfs - this can be done at runtime)

#  Algorithm | Key |  Encryption |  Decryption
     aes-cbc   128b    11.1 MiB/s    11.9 MiB/s
     aes-cbc   256b     8.0 MiB/s     9.0 MiB/s
     aes-xts   256b    10.7 MiB/s    11.6 MiB/s
     aes-xts   512b     8.9 MiB/s     8.9 MiB/s

Again, These are in MiB/s!
So even in software cipher manages aes-128-cbc @ 11.1 MiB/s. Which is 11.1 MiB/s * 8 Bits / Byte = 88.8 Mbits/s .

With Hardware-acceleration the aes-128-cbc @ 63.3 MiB/s is about 506.4 MBit/s.
If you have a reasonably easy way to check the VPN, I can provide the numbers as well. Since I expect them to be lower due to the overhead.

Edit: fixed units.

1 Like

Thanks @chunkeey! 6x is a nice factor :smile:

I spent few hours testing IGMP snooping feature in R7800 switch (Atheros AR8337), my conclusion is that IGMPv2 is working fine, but IGMPv3 is brocken. With or without the global igmp_v3 key set to 1, the switch is dropping all IGMPv3 queries and reports on ports (at ingress).

root@LEDE:~# swconfig dev switch0 show
Global attributes:
     [...]
        igmp_snooping: 1
        igmp_v3: 1

According to the Qualcomm QCA8337N datasheet, there are many registries to manipulate IGMP/MLD. I should try to look at driver but I'm not sure to be efficient. Do you know someone who could help me ?

I've upgraded my R7800 to the latest snapshot release.
still seeing errors like these in my logs:

[301291.884900] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 117 tid 0
[301291.884982] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 117 tid 0
[301291.891547] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 117 tid 0
[301291.898902] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 117 tid 0
[314444.989139] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 130 tid 0
[314444.989220] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 130 tid 0
[314444.995644] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 130 tid 0
[314445.003171] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 130 tid 0
[315044.988668] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 0
[315044.988773] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 0
[315044.995327] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 0
[315045.002654] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 0
[315045.011077] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 1
[315045.017397] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 1
[315045.024841] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 1
[315045.032212] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 131 tid 1
[316845.002705] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 0
[316845.002784] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 0
[316845.009310] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 0
[316845.016713] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 0
[316845.024132] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 1
[316845.031391] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 1
[316845.038857] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 1
[316845.046273] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 135 tid 1
[324543.323520] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 137 tid 0
[324543.323602] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 137 tid 0
[324543.330025] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 137 tid 0
[324543.337545] ath10k_pci 0001:01:00.0: failed to lookup txq for peer_id 137 tid 0

anyone any idea, what could be causing these/ how i could prevent these?

Hi All--

Is it possibly significant that there isn't an ID listed for the QCA9984 chips in pci.ids.gz? I mean, other previous/recent QCAs are listed in there.

Thanks for any comments/insight on this.

Ben--

Never heard about that file in Openwrt/LEDE context.
How is it related to the build?

It's part of the pciutils package (provides 'lspci') ... I ask because of the ath10k_pci error I see in my R7800 and like one of the last posts above.

Error=errors

It's kind of a stab but I'm wondering if the missing PCI ID for QCA9984 is insignificant or perhaps contributes to some of the ath10k_pci problems I've seen on my router (corrupt rx/tx ring buffers, "received unexpected tx_fetch_ind event", the error the guy aboetc.) ...

Not sure if it's related or just adds to it, whenever I've built my own images (Snapshots, LEDE-17.01 and 17.01.1) I always get this error towards the end of the build from the pciutils package build:

/usr/share/hwdata/pci.ids is read-only, exiting.
rm: cannot remove 'pci.ids.gz.old': No such file or directory

When circling back on that I went looking into the file and notice that there's no QCA9984 subdevice listed.

lspci shows the following on my R7800:

Network controller [0280]: Qualcomm Atheros Device [168c:0046]
Subsystem: Qualcomm Atheros Device [168c:cafe]

In pci.ids.tgz there is no Atheros device "0046":

168c Qualcomm Atheros

0040 QCA9980/9990 802.11ac Wireless Network Adapter
0041 QCA6164 802.11ac Wireless Network Adapter
0042 QCA9377 802.11ac Wireless Network Adapter
0050 QCA9887 802.11ac Wireless Network Adapter

Where does the 168c:cafe subdevice come from?

gunzip -c /usr/share/pci.ids.gz | grep -i cafe
4101 OLPC Cafe Controller Secure Digital Controller
cafe VirtualBox Guest Service
cafe Chrysalis-ITS
cafe Kona SD

I'm looking between the installed pci.ids.tgz file and on both https://pci-ids.ucw.cz/v2.2/pci.ids and pcidatabase.com

Any idea who the proper person is to add the info into the IDs database? I emailed the admin address at pci-ids.ucw.cz to see what they say.

The PCI ID's absense from that external PCI database package is likely insignificant. Package "pciutils" is not part of the default package set for R7800. I have never used that package. It should have no impact on normal operations ath10k.

I think that the ath10k driver (or the firmware blob) has a small database of the supported PCI IDs needed for the identifying the radio hardware.

The package Makefile here ( https://github.com/openwrt/packages/blob/master/utils/pciutils/Makefile ) gives the URL http://mj.ucw.cz/sw/pciutils/
That page says

If lspci doesn't recognize some device in your machine and you know what the device is, please submit an update to the database.

Link link leads to a page with more submitting details: http://pci-ids.ucw.cz/

Ps. why have you installed pciutils?

That's just it, the "missing board data" messages have left me feeling unsettled. Perhaps I'm making too much of it and going down a rabbit hole. Figured I'd ask you all since you're deep into the builds.

I've been trying the different builds, either my own compiled image or occasionally one of yours, and always seem to end up with some version of this (from 17.01-SNAPSHOT r3356-8b9f7bd7bd):

ath10k_pci 0001:01:00.0: failed to fetch board data for bus=pci,vendor=168c,device=0046,subsystem-vendor=168c,subsystem-device=cafe from ath10k/QCA9984/hw1.0/board-2.bin

(sometimes it's garbled text or has code points in it)

This is my lack of knowledge here but somehow the chip is being identified with unpublished data that doesn't line up with one or two of the qca9984 related firmware files. I started poking around to see if I could track down what was going on.

So I installed pciutils to use lspci to interrogate the device tree to see if there was anything interesting to glean.

What originally got me on this is I've been trying to figure out why the calibration files are seemingly useless, they seem important to this chip and other QCAs:

ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0001:01:00.0.bin failed with error -2
ath10k_pci 0001:01:00.0: Falling back to user helper
firmware ath10k!pre-cal-pci-0001:01:00.0.bin: firmware_loading_store: map pages failed
ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/cal-pci-0001:01:00.0.bin failed with error -2

Most of the firmware versions look for just cal-pci files, other versions look for pre-cal-pci, and this version is looking for both. So those seem like somewhat moving targets.

Is firmware-5.bin the only important firmware blob for the QCA9984?

You should read the discussion in this thread starting about on 22 March containing e.g. links to ath10k development mailing list. They pretty much tell what information there is about those warnings. The driver/firmware for 9984 does not support OTP identification and ath10k lists there errors/warnings related to various methods until it finds a suitable method to get the board data.

But that has nothing to do with external PCI device ID databases.

Thanks, and also thanks for the other patient responses. Please trust me when I say that I have read through the part of this thread your talking about, and pretty much any other thread that mentions the R7800, ath10k, or QCA9984. That's not to imply that I fully understood everything either. I got tired of reading, trying something I read, over and over with no results. I'm not a big fan of any errors and supposedly benign errors and figured I'd chime in about the odd pci id thing.

Going briefly OT I'm really digging the project overall, can you or anyone else point me to where I could get into helping/contributing?

@blogic is there any update on qca8k implementation in lede? Really looking forward to testing it.

@hnyman have you noticed that now default network configuration is based on vlan interfaces ethx.y, i.e. eth0.2 - wan (vlan2) and eth1.1 - LAN (vlan1)
This leads to a problem that when you are trying to configure vlans on switch level it totally breaks connectivity until interfaces are adjusted as well.
I don't think it's expected behavior and it seems that it has been brought with these commits:
https://git.lede-project.org/?p=source.git;a=commit;h=5e0441aaf0531e18222093e4084f4795fcba2343
https://git.lede-project.org/?p=source.git;a=commit;h=73d923ed6baabe3f8844f13216c50a6383a79a46

Setting wan/LAN interfaces back to eth0/eth1 brings back the former behavior and fixes the issue.

@jow