AsiaRF card fails to reset when passed to VM

I'm trying to pass my AsiaRF AW7916-NPD to a openwrt virtual machine but the card fails to reset which I guess is required when passing a pcie device to a virtual machine.

Trying to start the VM results in

Error starting domain: internal error: Unknown PCI header type '127' for device '0000:03:00.0'

Libvirt XML for the vfio passthrough

    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
      </source>
      <rom bar="on"/>
      <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
    </hostdev>
# lspci -v -s 0000:03:00.0
0000:03:00.0 Network controller: MEDIATEK Corp. Device 7906
        Subsystem: MEDIATEK Corp. Device 7906
        !!! Unknown header type 7f
        Memory at 60e0000000 (64-bit, prefetchable) [size=1M]
        Memory at 84a00000 (64-bit, non-prefetchable) [size=32K]
        Memory at 60e0100000 (64-bit, prefetchable) [size=4K]
        Kernel driver in use: vfio-pci
        Kernel modules: mt7915e

# echo 1 > /sys/bus/pci/devices/0000:03:00.0/reset

# sudo dmesg | tail -n 15
[   15.885854] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   15.885932] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[  392.349836] vfio-pci 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
[  392.414675] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  393.140122] vfio-pci 0000:03:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  394.366807] vfio-pci 0000:03:00.0: not ready 1023ms after FLR; waiting
[  395.406751] vfio-pci 0000:03:00.0: not ready 2047ms after FLR; waiting
[  397.566768] vfio-pci 0000:03:00.0: not ready 4095ms after FLR; waiting
[  401.833481] vfio-pci 0000:03:00.0: not ready 8191ms after FLR; waiting
[  410.153538] vfio-pci 0000:03:00.0: not ready 16383ms after FLR; waiting
[  426.580319] vfio-pci 0000:03:00.0: not ready 32767ms after FLR; waiting
[  460.713922] vfio-pci 0000:03:00.0: not ready 65535ms after FLR; giving up
[  461.756035] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  478.640501] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  478.640681] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

I guess this is a driver issue. Does anyone know how to fix this?

I had similar issues with a MT7915 device.

I somehow managed to make it work. It was not straight forward. I think the magic was disabling pcie_aspm in kernel command line. I can stop and restart the guest without problem. I have two mt7915 devices running on the host system with two different virtual machines running at (2 and 5 ghz band).

Host kernel:

# uname -a
Linux server 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 
GNU/Linux
# cat /etc/debian_version
11.6

Try add "pcie_aspm=off" to your kernel command line:

/etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_aspm=off vfio_pci.ids=14c3:7915"

guest.xml:

    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </hostdev>
# lspci -v
03:00.0 Unclassified device [0002]: MEDIATEK Corp. Device 7915 (prog-if 80)
        Subsystem: MEDIATEK Corp. Device 7915
        Flags: bus master, fast devsel, latency 0, IRQ 149, IOMMU group 18
        Memory at 6000200000 (64-bit, prefetchable) [size=1M]
        Memory at 6000300000 (64-bit, prefetchable) [size=16K]
        Memory at 6000304000 (64-bit, prefetchable) [size=4K]
        Capabilities: [80] Express Endpoint, MSI 00
        Capabilities: [e0] MSI: Enable+ Count=1/32 Maskable+ 64bit+
        Capabilities: [f8] Power Management version 3
        Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
        Capabilities: [108] Latency Tolerance Reporting
        Capabilities: [110] L1 PM Substates
        Capabilities: [200] Advanced Error Reporting
        Kernel driver in use: vfio-pci

Stable linux kernel 5.15 and newer no longer work. Host freezes on VM stop and restart. But 5.10 kernel works for me.

Well I gave it a shot on Linux 6.1.8 but no dice. Because of my requirement for Intel's sr-iov I don't think I could go back as far as 5.10 unfortunately.

I tried it both with and without vfio_pci ids but neither worked. I also tried blacklisting mt7915e. Also switched guest managed attribute between off/on. Still got the same errors.

$ sudo dmesg | head -n 3
[    0.000000] microcode: microcode updated early to revision 0x112, date = 2022-12-19
[    0.000000] Linux version 6.1.8-1-intel-lts-sriov (linux-intel-lts-sriov@archlinux) (gcc (GCC) 12.2.1 20230201, GNU ld (GNU Binutils) 2.40) #2 SMP PREEMPT_DYNAMIC Tue, 28 Mar 2023 08:14:01 +0000
[    0.000000] Command line: root=UUID=e454dd40-3ef8-4a9d-bb18-f3accf0a3596 rootflags=atgc rw video=HDMI-A-1:1920x1080@60me console=tty0 console=ttyS4,115200 default_hugepagesz=1G hugepagesz=1G hugepages=24 intel_iommu=on pcie_aspm=off i915.enable_guc=7
$ sudo dmesg | tail -n 5
[   16.189319] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   16.189400] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[   74.046119] vfio-pci 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
[   74.108649] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   74.108764] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

So I decided to give it another shot, but this time I disabled a bunch of aspm options and everything I could find that had to do with Pci-e power saving in the BIOS. After reboot the passthrough worked. I did however have to blacklist the mt7915e module because each time I would restart the VM a core would softlock. I assume this was caused by mt7915e not properly releasing/grabbing the card since blacklisting the module fixed it.

@hhowrt Thanks for pointing me in the right direction!!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.