17.01.1/2/3 QCA988x ath10k 5GHz firmware crashed - zyxel nbg6716

I got the same thing with the latest 17.01.2 on a COMFAST CF-E380AC v2, however not on an UniFi AC Lite.

To add, NAC 5 GHz comes up on an R6100, but it's not stable, and will crash after an indefinite period of time. (kmod-ath10k.) The CandelaTech driver seems to last longer, but will also crash after some longer time has elapsed.

just updating the thread with the sad news that even 17.01.3 doesn't fix the issue - 5Ghz still won't load.

[   14.290324] ath10k_pci 0000:01:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0
[   14.511420] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:01:00.0.bin failed with error -2
[   14.522306] ath10k_pci 0000:01:00.0: Falling back to user helper
[   14.644228] firmware ath10k!pre-cal-pci-0000:01:00.0.bin: firmware_loading_store: map pages failed
[   14.653595] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/cal-pci-0000:01:00.0.bin failed with error -2
[   14.664118] ath10k_pci 0000:01:00.0: Falling back to user helper
[   15.038729] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[   15.048120] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[   15.061177] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00016 api 5 features no-p2p,raw-mode,mfp crc32 0c5668f8
[   15.071961] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/board-2.bin failed with error -2
[   15.082566] ath10k_pci 0000:01:00.0: Falling back to user helper
[   15.162933] firmware ath10k!QCA988X!hw2.0!board-2.bin: firmware_loading_store: map pages failed
[   15.184547] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[   16.290631] ath10k_pci 0000:01:00.0: firmware crashed! (uuid d6bb0e34-3b28-4307-ba75-7bd7216ef49c)
[   16.299762] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
[   16.309118] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
[   16.322178] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00016 api 5 features no-p2p,raw-mode,mfp crc32 0c5668f8
[   16.332909] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
[   16.340334] ath10k_pci 0000:01:00.0: htt-ver 0.0 wmi-op 5 htt-op 2 cal file max-sta 128 raw 0 hwcrypto 1
[   16.351975] ath10k_pci 0000:01:00.0: firmware register dump:

Good news! I managed to get 5Ghz up (but no DFS?)

I'm guessing the problem can be traced back to a "unification" effort wrt firmware loading.
https://github.com/openwrt/openwrt/commit/53434afeed4a4f873eb7d66cc120a5473c289c40#diff-7b07534eccd69561f1430487965149eb

I couldn't figure out where/why/how $FIRMWARE (needs to) gets set differently for the NBG6716 but a quick fix is to simply rename the generated calibration file cal... to **pre-**cal....
mv /lib/firmware/ath10k/cal-pci-0000:01:00.0.bin /lib/firmware/ath10k/pre-cal-pci-0000:01:00.0.bin

currently i'm running 10.2.4-1.0-0029 (latest from kvalo), it's complaining that the board id is missing from otp but otherwise initial tests seem fine. let me know how you fare!

Try a reboot after configuring. 5 Ghz was always twitchy on OpenWRT IMO and used to just need a reboot to get back on track. I'm just glad the latest update fixed my 5 Ghz woes AND I can use AC for the first time since original firmware. It took three updates but, so far so good, I'd say LEDE is absolutely delivering right now. Solid work.

mv /lib/firmware/ath10k/cal-pci-0000:01:00.0.bin /lib/firmware/ath10k/pre-cal-pci-0000:01:00.0.bin

Same works for me now with a Compex WLE900VX on LEDE Reboot 17.01.4 r3560-79f57e422d. Thank you @undef ! Hopefully will be fixed in the upstream

Can you modify the /etc/hotplug.d/firmware/11-ath10k-caldata file by adding the "pre"-prefix there in the block containing NBG6716? Then remove the previously generated file under /lib/firmware/ath10k and reboot. Seems that 6716 has incomplete calibration data in flash, which needs to be loaded differently.

Edit: Better still, could someone provide a full dump of the RFdata partition somewhere?

The support state for QCA9984 is better in master than the lede-17.01 branch, especially in case of problems I would suggest to concentrate on snapshots instead of the stable release.

I'd be happy to give that a try if I knew how :slight_smile:
it's not clear to me where the detection call/pre-cal is made ($FIRMWARE is set where?)
i've moved to NB6716 bit to the "pre-" block but that didn't work.

I've upgraded to a snapshot, but also to no avail.

as for dumping RFdata, I'd be happy to oblige, just give me the magic incantation for dd :smiley:
cat /proc/mtd
dev: size erasesize name
mtd0: 00040000 00010000 "u-boot"
mtd1: 00010000 00010000 "env"
mtd2: 00010000 00010000 "RFdata"
mtd3: 00fa0000 00010000 "nbu"
mtd4: 00200000 00020000 "zyxel_rfsd"
mtd5: 00200000 00020000 "romd"
mtd6: 00100000 00020000 "header"
mtd7: 00200000 00020000 "kernel"
mtd8: 0f900000 00020000 "ubi"

There hasn't been any related changes in master, so that won't help. You can just cat /dev/mtd2 > /tmp/RFdata.

Here you go!
https://filebin.ca/3nosUDe16dyq/RFdata

Ok, there's no ath10k calibration data there whatsoever. Probably the embedded PCI-E card has the calibration data stored on-board. Can you just

  1. Remove the /lib/firmware/ath10k/(pre-)cal-pci-0000:01:00.0.bin files
  2. Remove /etc/hotplug.d/firmware/11-ath10k-caldata file
  3. Reboot

Then check if 5 GHz WLAN works and whether it has a sensible MAC-address.

(not near my nbg6716 right now but)
without the pre-cal files the ath10k modules refuse to load.
without the 11-ath10k-caldata no (pre-)cal file
ergo no 5Ghz?

or put differently how to check mac etc, without the ath10k modules loaded?

It doesn't load because the (nonexistent) calibration data from flash is bogus and crashes the firmware. If you remove the files I listed, the on-board calibration data is tried next, and it just might work after reboot.

Well... with the bogus calibration data in the pre-cal file the 5Ghz works just fine.
without that file present the firmware crashes. cfr. the logs i posted earlier. e.g.

from 17.01.3

I recall that bogus pre-calibration data can't crash the firmware. It runs, but probably uncalibrated.

you are probably correct.
so preferably we figure out where/how to get proper calibration data.
or figure out how to get the hotplug script to create the pre-cal file instead (but let it run uncalibrated)
or maybe use this option with ath10k: skip_otp=y (haven't tried it yet, but i'm assuming it'll run uncalibrated as well)

There are basically four ways to store calibration data:

  1. On the PCI-E card itself
  2. On the PCI-E card itself, but with the MAC address coming from elsewhere
  3. On the flash in a separate partition
  4. On the flash in the file system of the original firmware

If you buy a QCA988x card from Compex (I have one), option 1 is used. NBG6616 and Archer C7 v2 (both of which I also have) use option 3. NBG6716 obviously uses some other option, I'd guess 1 or 2, since 4 is quite rare. Following my previous instructions will tell us which it is. Pre-cal is only used with board-2.bin file, so that is not the correct way to do this.

I stand corrected... i guess it must be option 1.

removed (pre)-cal files
removed hotplug ath10k script
rebooted

5Ghz is up with a valid Zyxel MAC address

Perfect! I'll send a patch.