17.01.1/2/3 QCA988x ath10k 5GHz firmware crashed - zyxel nbg6716

Can you modify the /etc/hotplug.d/firmware/11-ath10k-caldata file by adding the "pre"-prefix there in the block containing NBG6716? Then remove the previously generated file under /lib/firmware/ath10k and reboot. Seems that 6716 has incomplete calibration data in flash, which needs to be loaded differently.

Edit: Better still, could someone provide a full dump of the RFdata partition somewhere?

The support state for QCA9984 is better in master than the lede-17.01 branch, especially in case of problems I would suggest to concentrate on snapshots instead of the stable release.

I'd be happy to give that a try if I knew how :slight_smile:
it's not clear to me where the detection call/pre-cal is made ($FIRMWARE is set where?)
i've moved to NB6716 bit to the "pre-" block but that didn't work.

I've upgraded to a snapshot, but also to no avail.

as for dumping RFdata, I'd be happy to oblige, just give me the magic incantation for dd :smiley:
cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00010000 "u-boot"
mtd1: 00010000 00010000 "env"
mtd2: 00010000 00010000 "RFdata"
mtd3: 00fa0000 00010000 "nbu"
mtd4: 00200000 00020000 "zyxel_rfsd"
mtd5: 00200000 00020000 "romd"
mtd6: 00100000 00020000 "header"
mtd7: 00200000 00020000 "kernel"
mtd8: 0f900000 00020000 "ubi"

There hasn't been any related changes in master, so that won't help. You can just cat /dev/mtd2 > /tmp/RFdata.

Here you go!
https://filebin.ca/3nosUDe16dyq/RFdata

Ok, there's no ath10k calibration data there whatsoever. Probably the embedded PCI-E card has the calibration data stored on-board. Can you just

  1. Remove the /lib/firmware/ath10k/(pre-)cal-pci-0000:01:00.0.bin files
  2. Remove /etc/hotplug.d/firmware/11-ath10k-caldata file
  3. Reboot

Then check if 5 GHz WLAN works and whether it has a sensible MAC-address.

(not near my nbg6716 right now but)
without the pre-cal files the ath10k modules refuse to load.
without the 11-ath10k-caldata no (pre-)cal file
ergo no 5Ghz?

or put differently how to check mac etc, without the ath10k modules loaded?

It doesn't load because the (nonexistent) calibration data from flash is bogus and crashes the firmware. If you remove the files I listed, the on-board calibration data is tried next, and it just might work after reboot.

Well... with the bogus calibration data in the pre-cal file the 5Ghz works just fine.
without that file present the firmware crashes. cfr. the logs i posted earlier. e.g.

from 17.01.3

I recall that bogus pre-calibration data can't crash the firmware. It runs, but probably uncalibrated.

you are probably correct.
so preferably we figure out where/how to get proper calibration data.
or figure out how to get the hotplug script to create the pre-cal file instead (but let it run uncalibrated)
or maybe use this option with ath10k: skip_otp=y (haven't tried it yet, but i'm assuming it'll run uncalibrated as well)

There are basically four ways to store calibration data:

  1. On the PCI-E card itself
  2. On the PCI-E card itself, but with the MAC address coming from elsewhere
  3. On the flash in a separate partition
  4. On the flash in the file system of the original firmware

If you buy a QCA988x card from Compex (I have one), option 1 is used. NBG6616 and Archer C7 v2 (both of which I also have) use option 3. NBG6716 obviously uses some other option, I'd guess 1 or 2, since 4 is quite rare. Following my previous instructions will tell us which it is. Pre-cal is only used with board-2.bin file, so that is not the correct way to do this.

I stand corrected... i guess it must be option 1.

removed (pre)-cal files
removed hotplug ath10k script
rebooted

5Ghz is up with a valid Zyxel MAC address

Perfect! I'll send a patch.

great, looking forward to seeing the nbg6716's 5Ghz fixed in an official release.
many thanks for looking into this.

As Lede 17.01.5 is around the corner, can someone please apply the patch by malaakso to the lede-17.01 branch?

18.06 is around the corner, and "beta" builds are already up.

You'd have to check the source code to see if the patch has been applied or if the issue still exists.

The patch only got applied to master and is included in the openwrt-18.06 branch but is still missing in lede-17.01 branch which will be used for the 17.01.5 release. So yes, this issue still exists in the lede-17.01 branch, which is the reason i asked for a cherry-pick to fix this issue for the 17.01.5 release.

Ask in the mailing list. Devs don't care otherwise.

Did that, devs don't care either.