The dmesg's you posted on the ct git hub site indicate warnings, not firmware crashes.

I've posted warnings (unrelated to yours) before and so far the comment(s) that comes back is that they are harmless.

Reassurance from the dev is always nice though so hopefully he will respond and let you know.

Coming from mvebu (wrt1900acs) to the r7800 with hnymans' r11333 (ath10k-ct) build I am very impressed by the wifi stability. After using it for 1 good week I have not had not a single driver crash. Furthermore, wifi is fast. (mvebu sucked in that regard). VHT160 does not work for me but that might be because I live in a crowded area but I will investigate in that further.

One thing is odd though - restarting the router takes about 8 minutes. Most of the waiting time is caused by the direct firmware loading mechanism in ath10k_pci. It waits approximately a minute when it fails to load a firmware file (because it does not exist) until it tries the next firmware file. My logfile is here: https://pastebin.com/sUt3jsnP . Do you guys have an idea what's causing this? Am I the only one?

Sounds strange and not typical for R7800.
I have not noticed anything similar. But I will check with my own router again tonight.

That would be great, thank you! If you need anything (configs?) just let me know.

Looks like you might have some fs/flash issues... must be 50 I/O error messages in that bootlog...

Buffer I/O error on dev mtdblock0, logical block 0, async page read

What version have you got... and what version did you flash from? ( don't panic yet... tho' ... can most likely be resolved - recent master might have also had some changes around that firmware load business ... so you may be seeing something from that ... )

Yes, I also saw these. But I kinda ignored them since the sample log in the r7800 TOH also has them; see: https://openwrt.org/toh/netgear/r7800 . Of course I could have a bad NAND but I hope mtd0/1 are just partially "secure" and therefore not readable in specific regions...
I am running build r11333 by hnyman (ath10k-ct). Before I had r11302 by hnyman (ath10k-ct) and before that it was running the latest official firmware by netgear.

1 Like

No idea so far on what could cause it.
My own router shows quite normal timeline with rapid progress to the next attempted file:

[   17.423514] ath10k_pci 0000:01:00.0: assign IRQ: got 67
[   17.423533] ath10k 4.19 driver, optimized for CT firmware, probing pci device: 0x46.
[   17.423950] ath10k_pci 0000:01:00.0: enabling device (0140 -> 0142)
[   17.430485] ath10k_pci 0000:01:00.0: enabling bus mastering
[   17.430958] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   17.628184] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/fwcfg-pci-0000:01:00.0.txt failed with error -2
[   17.628221] ath10k_pci 0000:01:00.0: Falling back to user helper
[   18.330135] firmware ath10k!fwcfg-pci-0000:01:00.0.txt: firmware_loading_store: map pages failed
[   18.330369] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:01:00.0.bin failed with error -2
[   18.338093] ath10k_pci 0000:01:00.0: Falling back to user helper
[   18.480591] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/ct-firmware-5.bin failed with error -2
[   18.480633] ath10k_pci 0000:01:00.0: Falling back to user helper
[   18.533552] firmware ath10k!QCA9984!hw1.0!ct-firmware-5.bin: firmware_loading_store: map pages failed
[   18.533849] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/ct-firmware-2.bin failed with error -2
[   18.541833] ath10k_pci 0000:01:00.0: Falling back to user helper
[   18.575356] firmware ath10k!QCA9984!hw1.0!ct-firmware-2.bin: firmware_loading_store: map pages failed
[   18.575562] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
[   18.583634] ath10k_pci 0000:01:00.0: Falling back to user helper
[   18.611454] firmware ath10k!QCA9984!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[   18.813735] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[   18.813768] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[   18.825076] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fW-012-54863dff2 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 97790349
[   21.151473] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 85498734
[   26.992129] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[   26.992156] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[   27.074098] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[   27.074956] ath10k_pci 0000:01:00.0: wmi print 'free: 81768 iram: 23364 sram: 14184'
[   27.270184] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1

Thank you for looking into it. It is really weird that only I have the problem. I will try looking into this further in the coming weekend.

UPDATE: I tried a rmmod ath10k_pci followed by a modprobe ath10k_pci and the driver loaded in a reasonable timeframe:

[235775.719206] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 0
[235775.719345] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 0
[235775.725031] ath10k_pci 0001:01:00.0: peer-unmap-event: unknown peer id 0
[235775.833272] br-lan: port 2(wlan1) entered disabled state
[235775.853351] device wlan1 left promiscuous mode
[235775.853408] br-lan: port 2(wlan1) entered disabled state
[235776.095841] ath10k_pci 0001:01:00.0: disabling bus mastering
[235776.099762] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
[235776.100584] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
[235776.107455] ath10k_pci 0000:01:00.0: peer-unmap-event: unknown peer id 0
[235776.205203] br-lan: port 3(wlan0) entered disabled state
[235776.219188] device wlan0 left promiscuous mode
[235776.219231] br-lan: port 3(wlan0) entered disabled state
[235776.385919] ath10k_pci 0000:01:00.0: disabling bus mastering
[235784.439962] ath10k_pci 0000:01:00.0: assign IRQ: got 67
[235784.439999] ath10k 4.19 driver, optimized for CT firmware, probing pci device: 0x46.
[235784.444802] ath10k_pci 0000:01:00.0: enabling bus mastering
[235784.452608] ath10k_pci 0000:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[235784.627612] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/fwcfg-pci-0000:01:00.0.txt failed with error -2
[235784.627656] ath10k_pci 0000:01:00.0: Falling back to user helper
[235784.666507] firmware ath10k!fwcfg-pci-0000:01:00.0.txt: firmware_loading_store: map pages failed
[235784.666921] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/ct-firmware-5.bin failed with error -2
[235784.674411] ath10k_pci 0000:01:00.0: Falling back to user helper
[235784.736912] firmware ath10k!QCA9984!hw1.0!ct-firmware-5.bin: firmware_loading_store: map pages failed
[235784.737221] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/ct-firmware-2.bin failed with error -2
[235784.745131] ath10k_pci 0000:01:00.0: Falling back to user helper
[235784.794166] firmware ath10k!QCA9984!hw1.0!ct-firmware-2.bin: firmware_loading_store: map pages failed
[235784.794435] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
[235784.802517] ath10k_pci 0000:01:00.0: Falling back to user helper
[235784.856651] firmware ath10k!QCA9984!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[235784.858189] ath10k_pci 0000:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[235784.864521] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[235784.877649] ath10k_pci 0000:01:00.0: firmware ver 10.4b-ct-9984-fW-012-54863dff2 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 97790349
[235787.206912] ath10k_pci 0000:01:00.0: board_file api 2 bmi_id 0:1 crc32 85498734
[235793.056284] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[235793.056318] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[235793.138131] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[235793.138958] ath10k_pci 0000:01:00.0: wmi print 'free: 81768 iram: 23364 sram: 14184'
[235793.395380] ath10k_pci 0000:01:00.0: htt-ver 2.2 wmi-op 6 htt-op 4 cal pre-cal-file max-sta 32 raw 0 hwcrypto 1
[235793.498739] ath: EEPROM regdomain: 0x0
[235793.498779] ath: EEPROM indicates default country code should be used
[235793.501392] ath: doing EEPROM country->regdmn map search
[235793.508132] ath: country maps to regdmn code: 0x3a
[235793.513457] ath: Country alpha2 being used: US
[235793.518302] ath: Regpair used: 0x3a
[235793.523081] ath: EEPROM regdomain: 0x8028
[235793.526590] ath: EEPROM indicates we should expect a country code
[235793.530392] ath: doing EEPROM country->regdmn map search
[235793.536650] ath: country maps to regdmn code: 0x37
[235793.542018] ath: Country alpha2 being used: AT
[235793.546799] ath: Regpair used: 0x37
[235793.551219] ath: regdomain 0x8028 dynamically updated by user
[235793.560858] ath10k_pci 0001:01:00.0: assign IRQ: got 100
[235793.560906] ath10k 4.19 driver, optimized for CT firmware, probing pci device: 0x46.
[235793.566926] ath10k_pci 0001:01:00.0: enabling bus mastering
[235793.574520] ath10k_pci 0001:01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[235793.750817] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/fwcfg-pci-0001:01:00.0.txt failed with error -2
[235793.750853] ath10k_pci 0001:01:00.0: Falling back to user helper
[235794.067085] firmware ath10k!fwcfg-pci-0001:01:00.0.txt: firmware_loading_store: map pages failed
[235794.067555] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/ct-firmware-5.bin failed with error -2
[235794.074990] ath10k_pci 0001:01:00.0: Falling back to user helper
[235794.136451] firmware ath10k!QCA9984!hw1.0!ct-firmware-5.bin: firmware_loading_store: map pages failed
[235794.136674] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/ct-firmware-2.bin failed with error -2
[235794.144712] ath10k_pci 0001:01:00.0: Falling back to user helper
[235794.205976] firmware ath10k!QCA9984!hw1.0!ct-firmware-2.bin: firmware_loading_store: map pages failed
[235794.206201] ath10k_pci 0001:01:00.0: Direct firmware load for ath10k/QCA9984/hw1.0/firmware-6.bin failed with error -2
[235794.214330] ath10k_pci 0001:01:00.0: Falling back to user helper
[235794.275302] firmware ath10k!QCA9984!hw1.0!firmware-6.bin: firmware_loading_store: map pages failed
[235794.275657] ath10k_pci 0001:01:00.0: qca9984/qca9994 hw1.0 target 0x01000000 chip_id 0x00000000 sub 168c:cafe
[235794.283198] ath10k_pci 0001:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 0
[235794.295991] ath10k_pci 0001:01:00.0: firmware ver 10.4b-ct-9984-fW-012-54863dff2 api 5 features mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,set-special-CT,tx-rc-CT,cust-stats-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT crc32 97790349
[235796.662476] ath10k_pci 0001:01:00.0: board_file api 2 bmi_id 0:2 crc32 85498734
[235799.536869] ath10k_pci 0000:01:00.0: 10.4 wmi init: vdevs: 16  peers: 48  tid: 96
[235799.536903] ath10k_pci 0000:01:00.0: msdu-desc: 2500  skid: 32
[235799.618234] ath10k_pci 0000:01:00.0: wmi print 'P 48/48 V 16 K 144 PH 176 T 186  msdu-desc: 2500  sw-crypt: 0 ct-sta: 0'
[235799.619070] ath10k_pci 0000:01:00.0: wmi print 'free: 81768 iram: 23364 sram: 14184'
[235799.973688] ath10k_pci 0000:01:00.0: Firmware lacks feature flag indicating a retry limit of > 2 is OK, requested limit: 4

As a result I think the firmware-loading part is not the actual culprit. Maybe something else is causing this because the problem only exists at boot-time...

no issue here. or try renaming the file to ct-firmware-5.bin

cd /lib/firmware/ath10k/QCA9984/hw1.0/ && mv firmware-5.bin ct-firmware-5.bin

Noticed latest build 10/31 boots up ~2x faster. For those 160mhz not working try replacing the ct fw with official, since there's a bug. Though based on the bug report, the current latest ct fw 10.4b-ct-9984-fH-012-54863dff2 is good.
Also sqm now able to get ~100mbps speed for my line, was only up to ~80mbps before

cd /lib/firmware/ath10k/QCA9984/hw1.0/
rm board-2.bin && rm firmware-5.bin
wget http://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/3.10/firmware-5.bin_10.4-3.10-00047 -O ct-firmware-5.bin
wget http://github.com/kvalo/ath10k-firmware/raw/master/QCA9984/hw1.0/board-2.bin

Not sure what do you mean, as to my knowledge the error that you referenced is visible in master but not in the stable 19.07 branch.

19.07 is not the same as the master. 19.07 was branched off in June and the later changes to the wifi driver are not visible there.

So, at the moment I have no intention of doing additional 19.07 builds deviating from the default. If you want to test that, feel free to clone my patches and scripts and modify the sources as you needfor compiling the firmware.

2 Likes

i see, thought it was an issue in 19.07 as well. thanks for clearing that up

about the build script, i've been getting error at the package/install step

make[2]: *** [package/install] Error 255
make[1]: *** [/OwrtLEDE/owrt1907/staging_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/stamp/.package_install] Error 2
make: *** [world] Error 1

Thanks for the advice. However, your solution is a hack and I want to locate the actual root-cause. Something is clearly wrong on my end and I just need to find it.

Alright, I narrowed down the actual root-cause for the long boot-times.

I noticed that the problem disappears after a sysupgrade and does not reappear even after several reboots. So, clearly one of my post-sysupgrade operations that I always do must be causing the problem. Usually, I install the following packages after every sysupgrade: socat, idn, iperf3, haproxy. After an hour of not-fun testing and rebooting I found out that the haproxy-package is causing the long boot-times. It has some dependencies but neither of them cause the problem - I also tested that. Interestingly, mvebu (my old device) did not show that behavior.

TLDR: haproxy does something at boot-time which causes the delays. I will investigate further in that direction shortly. Fun fact... I am one of the maintainers of the haproxy package...

UPDATE: the procd hotplug script that the haproxy package ships is causing the issues. I am reworking that script now. Should be fixed soon.

UPDATE2: The PR with the fix is here: https://github.com/openwrt/packages/pull/10430

1 Like

I'm still getting package/install error with build script... full output https://pastebin.com/wvV5AbUW

Please help, thanks.

That is the high-level summary style console output, not the full logs...

Please look at the actual build logs in the logs directory. E.g. logs/build.log or the package specific logs there to find more closely what happens

1 Like

Saw this in builds.log as I'm trying to build with mainline/official fw instead of ct

Configuring coCollected errors:
 * check_data_file_clashes: Package kmod-ath10k wants to install file /Owrt1907/owrt1907/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/root-ipq806x/lib/modules/4.14.151/ath10k_core.ko
	But that file is already provided by package  * kmod-ath10k-ct
 * check_data_file_clashes: Package kmod-ath10k wants to install file /Owrt1907/owrt1907/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/root-ipq806x/lib/modules/4.14.151/ath10k_pci.ko
	But that file is already provided by package  * kmod-ath10k-ct
 * opkg_install_cmd: Cannot install package kmod-ath10k.
 * check_data_file_clashes: Package ath10k-firmware-qca9984 wants to install file /Owrt1907/owrt1907/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/root-ipq806x/lib/firmware/ath10k/QCA9984/hw1.0/board-2.bin
	But that file is already provided by package  * ath10k-firmware-qca9984-ct
 * check_data_file_clashes: Package ath10k-firmware-qca9984 wants to install file /Owrt1907/owrt1907/build_dir/target-arm_cortex-a15+neon-vfpv4_musl_eabi/root-ipq806x/lib/firmware/ath10k/QCA9984/hw1.0/firmware-5.bin
	But that file is already provided by package  * ath10k-firmware-qca9984-ct
 * opkg_install_cmd: Cannot install package ath10k-firmware-qca9984.
mgt.

and this was the only thing I changed in main.patch and .config / .config.init

+## # Normal ath10k wifi firmware and driver instead of -ct
+CONFIG_PACKAGE_ath10k-firmware-qca9984=y
+## # CONFIG_PACKAGE_ath10k-firmware-qca9984-ct is not set
+CONFIG_PACKAGE_kmod-ath10k=y
+## # CONFIG_PACKAGE_kmod-ath10k-ct is not set

That is not properly formatted.
If you are going to disable -ct, you really need to disable it (as it is a device default package).
Now you have only edited the lines that enable mainline ath10k, but you have not edited the lines needed for disabling -ct.

They need to be like:
# CONFIG_PACKAGE_ath10k-firmware-qca9984-ct is not set
not
## # CONFIG_PACKAGE_ath10k-firmware-qca9984-ct is not set

You should have removed the ## from all of those lines (except from the first textual comment line, where it makes no difference.)

1 Like

I have compiled master based on master-r11367-c2675bb0ce-20191101-ct (R7800-master-r11372-2d00cf7515-20191102-1305-sysupgrade actually) and it failed to boot up fully and when I tried to downgrade the end result was bricked router.
It might be specific to my build and/or config but just for the heads up.

Well, I am running r11372-2d00cf7515 with -ct right now. No problems so far.

1 Like