QCN9274 crashes the system during driver load

I acquired a QCN9274 Wi-Fi 7 module (says hw2.0, see logs below) and have issues with system crashing with it.

I run OpenWrt in a VM, currently stable 24.10.2 (r28739-d9340319c6).

Stock Ubuntu 24.04 host with HWE kernel 6.14 fails to find the board file, which is fine, I don’t plan to use it on the host system anyway:

ath12k_pci 0000:46:00.0: BAR 0 [mem 0x92400000-0x925fffff 64bit]: assigned
ath12k_pci 0000:46:00.0: MSI vectors: 16
ath12k_pci 0000:46:00.0: Hardware name: qcn9274 hw2.0
ath12k_pci 0000:46:00.0: memory type 10 not supported
ath12k_pci 0000:46:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0x401a2200
ath12k_pci 0000:46:00.0: fw_version 0x141580c7 fw_build_timestamp 2024-11-11 10:47 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
ath12k_pci 0000:46:00.0: failed to fetch board data for bus=pci,qmi-chip-id=0,qmi-board-id=255 from ath12k/QCN9274/hw2.0/board-2.bin
ath12k_pci 0000:46:00.0: failed to fetch board.bin from QCN9274/hw2.0
ath12k_pci 0000:46:00.0: qmi failed to load bdf:
ath12k_pci 0000:46:00.0: qmi failed to load board data file:-2

I assign it to KVM virtual machine using vfio and as soon as I start VM the whole PC hard resets.

After reboot motherboard BIOS says:

Warning: PCI-Express PERR/SERR error detected.

I then did what I have done with all other Qualcomm modules in the past (and forgot to do with this one initially): bind the device to vfio-pci driver:

# /etc/modprobe.d/vfio.conf
options vfio-pci ids=17cb:1109
softdep ath12k pre: vfio-pci
softdep ath12k_pci pre: vfio-pci

Now I can boot the host and VM just fine.


The problem now is that the whole PC crashes again when trying to load ath12k driver in the OpenWrt VM. The last lines I see before the machine resets itself are:

0000:01:00.0: MSI vectors: 1
0000:01:00.0: Hardware name: qcn9274 hw2.0

This all sounds like an issue on BPI boards fixed by https://github.com/openwrt/openwrt/pull/15945 and according to the repo branch it should already be included in the release of OpenWrt that I'm currently running (PR also says that it doesn't affect x86-64, but maybe there is something special about the fact that I'm using a VM?).

The PCIe pass-through config is very simple and worked fine with other Qualcomm devices I used for years in the past (libvirtd):

<hostdev mode="subsystem" type="pci" managed="yes">
  <source>
    <address domain="0x0000" bus="0x46" slot="0x00" function="0x0"/>
  </source>
  <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</hostdev>

Any ideas what else it could be? The fact that the whole physical machine just resets out of nowhere is particularly annoying.

Are you sure this is even supposed to work in 24.10 ?

Very good question, I do not honestly know. I can try the nightly snapshot if it is supposed to work instead, been using snapshots in the past for some time.

The driver (ath12k) is present, so the first step would be to at least get the system to not crash when it loads (and maybe complain about lack of firmware instead). Now that you asked, I checked and there is no /lib/firmware/ath12k and no package with QCN9274 firmware on 24.10 :thinking:

I guess I'll need to look into the latest snapshot now (UPD: I see QCN9274 firmware package in the latest snapshot, will give it a try).

I fought Wi-Fi 6E (QCN9074) in the past and eventually won (kind of, the 6 GHz band is still not allowed in my country, so I still can't use it :smiling_face_with_tear:).

So I expect some roadblocks and am willing to do some testing and help get it into a better shape if possible (though hard system crash was not part of my plans).

Tried the snapshot, just like before the system hard reset right after installation of kmod-ath12k package (I installed firmware package right before that just to be sure).

I was booting EFI image, OpenWrt SNAPSHOT (r30806-070d8eb4d5).

These are the last logs I received from VM's output:

ath12k_pci 0000:04:00.0: BAR 0 [mem 0x89400000-0x895fffff 64bit]: assigned
ath12k_pci 0000:04:00.0: MSI vectors: 16

And the host crashed after that.

For the initial debugging, get rid of the virtual environment and test OpenWrt on the bare iron - running from a USB stick is fine. And with wifi7 hardware, you really-really want current main snapshots rather than 24.10. If you can get that to work, you may have a chance to move that into a VM (with PCIe pass-through), but at least in case of problems the VM adds another layer of problems.

Well, my host machine is itself a router and AP and many other things in VMs and containers. I can try to disable Secure Boot and boot pre-configured disk image on bare metal through Ventoy, but that is a lot of ceremony.

The host was crashing when ath12k driver was unloading (even without board files) and virtual environment crashes when loading driver (with and without board files).

I've been running OpenWrt in a VM since 2017 with QCA9880, QCA9984, QCN9074 and even AR5418 as a fallback for some time. None of them had issues this dramatic (though I have seen host system crashes during not so soft VM reboots under certain conditions with QCA9984) :thinking:

Yep, already tried, but the crash is the same :confused:

For now I think I'm looking for ways to better understand what exactly is going on there right before the crash.

Tried on bare metal. Behavior almost identical to Ubuntu 24.04 except last line code changed from -2 to -12:

ath12k_pci 0000:46:00.0: BAR 0 [mem 0x92400000-0x925fffff 64bit]: assigned
ath12k_pci 0000:46:00.0: MSI vectors: 16
ath12k_pci 0000:46:00.0: Hardware name: qcn9274 hw2.0
mhi mhi0: Requested to power ON
mhi mhi0: Power on setup success
mhi mhi0: Wait for device to enter SBL or Mission mode
ath12k_pci 0000:46:00.0: memory type 10 not supported
ath12k_pci 0000:46:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0x401a2200
ath12k_pci 0000:46:00.0: fw_version 0x150c0673 fw_build_timestamp 2025-04-24 15:13 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
ath12k_pci 0000:46:00.0: failed to fetch board data for bus=pci,qmi-chip-id=0,qmi-board-id=255 from ath12k/QCN9274/hw2.0/board-2.bin
ath12k_pci 0000:46:00.0: failed to fetch board.bin from QCN9274/hw2.0
ath12k_pci 0000:46:00.0: qmi failed to load bdf:
ath12k_pci 0000:46:00.0: qmi failed to load board data file:-12

So turns out it is indeed loading driver under VM and unloading on the host is causing issues here.

So the question is: how do I debug this further?

Some updates here.

With pci=noaer on the host I was finally able to prevent host system from resetting, which is progress.

When VM booted I got this:

[    7.393886] ath12k_pci 0000:04:00.0: BAR 0 [mem 0x89400000-0x895fffff 64bit]: assigned
[    7.402838] ath12k_pci 0000:04:00.0: MSI vectors: 16
[    7.407763] ath12k_pci 0000:04:00.0: Hardware name: qcn9274 hw2.0
[    7.542240] ath12k_pci 0000:04:00.0: link down error during global reset
[    7.570471] mhi mhi0: BHI offset: 0xffffffff is out of range: 0x200000
[    7.577614] ath12k_pci 0000:04:00.0: failed to set mhi state: INIT(0)
[    7.578488] ath12k_pci 0000:04:00.0: failed to start mhi: -34
[    7.579312] ath12k_pci 0000:04:00.0: failed to power up :-34
[    7.615653] ath12k_pci 0000:04:00.0: failed to create soc core: -34
[    7.617815] ath12k_pci 0000:04:00.0: unable to create hw group
[    7.655683] ath12k_pci 0000:04:00.0: failed to init core: -34
[    8.123971] ath12k_pci 0000:04:00.0: probe with driver ath12k_pci failed with error -34

Interestingly, this happened the first and every other odd run. On even runs (second, fourth, etc.) I was getting this instead:

[    7.158610] ath12k_pci 0000:04:00.0: BAR 0 [mem 0x89400000-0x895fffff 64bit]: assigned
[    7.172045] ath12k_pci 0000:04:00.0: MSI vectors: 16
[    7.177707] ath12k_pci 0000:04:00.0: Hardware name: qcn9274 hw2.0
[    7.291634] ath12k_pci 0000:04:00.0: link down error during global reset
[    7.320049] mhi mhi0: Requested to power ON
[    7.368077] mhi mhi0: Power on setup success
[    7.538116] mhi mhi0: Wait for device to enter SBL or Mission mode
[    8.036845] ath12k_pci 0000:04:00.0: qmi dma allocation failed (29360128 B type 1), will try later with small size
[    8.046735] ath12k_pci 0000:04:00.0: memory type 10 not supported
[    8.050328] kmodloader: done loading kernel modules from /etc/modules.d/*
[    8.053813] ath12k_pci 0000:04:00.0: chip_id 0x0 chip_family 0xb board_id 0xff soc_id 0x401a2200
[    8.063240] ath12k_pci 0000:04:00.0: fw_version 0x150c0673 fw_build_timestamp 2025-04-24 15:13 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.WBE.1.5-01651-QCAHKSWPL_SILICONZ-1
[    8.086352] ath12k_pci 0000:04:00.0: failed to fetch board data for bus=pci,qmi-chip-id=0,qmi-board-id=255 from ath12k/QCN9274/hw2.0/board-2.bin
[    8.087598] ath12k_pci 0000:04:00.0: failed to fetch board.bin from QCN9274/hw2.0
[    8.093428] ath12k_pci 0000:04:00.0: qmi failed to load bdf:
[    8.094398] ath12k_pci 0000:04:00.0: qmi failed to load board data file:-12

That is even further than before, but since it didn't find the board file, I decided to upgrade from r30806-070d8eb4d5 snapshot to r31110-41aaebad98 snapshot in hope of firmware upgrades. Something has definitely changed, but I'm not sure if it is for the better, it booted look like this:

[    7.208812] ath12k_pci 0000:04:00.0: BAR 0 [mem 0x89400000-0x895fffff 64bit]: assigned
[    7.224830] ath12k_pci 0000:04:00.0: MSI vectors: 16
[    7.230381] ath12k_pci 0000:04:00.0: Hardware name: qcn9274 hw2.0
[    7.342952] ath12k_pci 0000:04:00.0: link down error during global reset
[    7.371390] mhi mhi0: Requested to power ON
[    7.417659] mhi mhi0: Power on setup success
[  164.087661] mhi mhi0: Device failed to enter MHI Ready
[  164.090720] mhi mhi0: MHI did not enter READY state
[  164.091615] ath12k_pci 0000:04:00.0: failed to set mhi state: POWER_ON(2)
[  164.093214] ath12k_pci 0000:04:00.0: failed to start mhi: -110
[  164.094584] ath12k_pci 0000:04:00.0: failed to power up :-110
[  164.127713] ath12k_pci 0000:04:00.0: failed to create soc core: -110
[  164.132306] ath12k_pci 0000:04:00.0: unable to create hw group
[  164.167670] ath12k_pci 0000:04:00.0: failed to init core: -110
[  164.632064] ath12k_pci 0000:04:00.0: probe with driver ath12k_pci failed with error -110

Note that it hangs the boot process (no http server for luci, refusing to shutdown or reboot) for quite some time before giving up and moving on with normal boot process. And after that it now looks like odd runs with mhi mhi0: BHI offset: 0xffffffff is out of range: 0x200000.

Does anyone have an idea what to debug next and what could be done to make pci=noaer unnecessary on the host (my primitive understanding is that something quite wrong is happening that gets propagated to the host and should be fixed).

At least I can now test things more easily without having to reboot the host system, which on this server board takes a few minutes to complete.