I'm trying to pass my AsiaRF AW7916-NPD to a openwrt virtual machine but the card fails to reset which I guess is required when passing a pcie device to a virtual machine.
Trying to start the VM results in
Error starting domain: internal error: Unknown PCI header type '127' for device '0000:03:00.0'
# lspci -v -s 0000:03:00.0
0000:03:00.0 Network controller: MEDIATEK Corp. Device 7906
Subsystem: MEDIATEK Corp. Device 7906
!!! Unknown header type 7f
Memory at 60e0000000 (64-bit, prefetchable) [size=1M]
Memory at 84a00000 (64-bit, non-prefetchable) [size=32K]
Memory at 60e0100000 (64-bit, prefetchable) [size=4K]
Kernel driver in use: vfio-pci
Kernel modules: mt7915e
# echo 1 > /sys/bus/pci/devices/0000:03:00.0/reset
# sudo dmesg | tail -n 15
[ 15.885854] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 15.885932] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[ 392.349836] vfio-pci 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
[ 392.414675] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 393.140122] vfio-pci 0000:03:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 394.366807] vfio-pci 0000:03:00.0: not ready 1023ms after FLR; waiting
[ 395.406751] vfio-pci 0000:03:00.0: not ready 2047ms after FLR; waiting
[ 397.566768] vfio-pci 0000:03:00.0: not ready 4095ms after FLR; waiting
[ 401.833481] vfio-pci 0000:03:00.0: not ready 8191ms after FLR; waiting
[ 410.153538] vfio-pci 0000:03:00.0: not ready 16383ms after FLR; waiting
[ 426.580319] vfio-pci 0000:03:00.0: not ready 32767ms after FLR; waiting
[ 460.713922] vfio-pci 0000:03:00.0: not ready 65535ms after FLR; giving up
[ 461.756035] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 478.640501] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 478.640681] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
I guess this is a driver issue. Does anyone know how to fix this?
I somehow managed to make it work. It was not straight forward. I think the magic was disabling pcie_aspm in kernel command line. I can stop and restart the guest without problem. I have two mt7915 devices running on the host system with two different virtual machines running at (2 and 5 ghz band).
Host kernel:
# uname -a
Linux server 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64
GNU/Linux
# cat /etc/debian_version
11.6
Try add "pcie_aspm=off" to your kernel command line:
Well I gave it a shot on Linux 6.1.8 but no dice. Because of my requirement for Intel's sr-iov I don't think I could go back as far as 5.10 unfortunately.
I tried it both with and without vfio_pci ids but neither worked. I also tried blacklisting mt7915e. Also switched guest managed attribute between off/on. Still got the same errors.
$ sudo dmesg | head -n 3
[ 0.000000] microcode: microcode updated early to revision 0x112, date = 2022-12-19
[ 0.000000] Linux version 6.1.8-1-intel-lts-sriov (linux-intel-lts-sriov@archlinux) (gcc (GCC) 12.2.1 20230201, GNU ld (GNU Binutils) 2.40) #2 SMP PREEMPT_DYNAMIC Tue, 28 Mar 2023 08:14:01 +0000
[ 0.000000] Command line: root=UUID=e454dd40-3ef8-4a9d-bb18-f3accf0a3596 rootflags=atgc rw video=HDMI-A-1:1920x1080@60me console=tty0 console=ttyS4,115200 default_hugepagesz=1G hugepagesz=1G hugepages=24 intel_iommu=on pcie_aspm=off i915.enable_guc=7
$ sudo dmesg | tail -n 5
[ 16.189319] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 16.189400] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[ 74.046119] vfio-pci 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
[ 74.108649] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 74.108764] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
So I decided to give it another shot, but this time I disabled a bunch of aspm options and everything I could find that had to do with Pci-e power saving in the BIOS. After reboot the passthrough worked. I did however have to blacklist the mt7915e module because each time I would restart the VM a core would softlock. I assume this was caused by mt7915e not properly releasing/grabbing the card since blacklisting the module fixed it.
@hhowrt Thanks for pointing me in the right direction!!