Mt7915e wifi not working

Yet quite a lot has changed since your first paste of /etc/config/wireless.

Did you paste the entire config?

No, I reinstalled because at the time I botched the installation trying to add to grub a newly-installed Debian on the same drive. So I ended installing Debian, then writing OpenWrt image after that.
I gave an example with Manjaro because I have a live usb with it and the card works with both Manjaro live but also with the newly installed Debian.
Firmware for Debian is the same as the one for owrt, but then again, I manually downloaded an updated firmware and the issue persisted, so I can cross that out as a possible cause.
So the new installation of OpenWrt now is the same as the one in the repo x86-64 target, besides a few packages I have installed, point is, the card is not working even on clean install.
Card worked with Owrt 22 stable but given how it is being no longer supported after 3 weeks, I don't really wish to go back to it. Besides, even though it worked, there were some serious issues with the speed, so I reckon it was just barely working.
What I noticed is radio1 was listed as MediaTek MT7915E and radio2 was listed as MediaTek MT7975 on Owrt22.
On this version, both radios are listed as MT7915E

1 Like

so I tried the new 23.05.3 today and the issue is still present.
One thing I had noticed in LuCI was the missing encryption option under the 'wireless security' tab, same as in 23.05.2, but after installing wpad, and some other packages, I can now select password encryption.
However, under advanced settings for the wireless interface configuration, it still is showing undefined mac address.
Running iwinfo shows the mac address of the wlan0 and wlan1 but it takes close to 2 minutes to load the output (very strange, should be instantaneous).
Any suggestions what I might be doing wrong?
It hangs just after this line:

Tx-Power: 3 dBm  Link Quality: unknown/70

the whole output is:

wlan0     ESSID: unknown
          Access Point: XX:XX:XX:XX:XX:XX
          Mode: Client  Channel: unknown (unknown)  HT Mode: NOHT
          Center Channel 1: unknown 2: unknown
          Tx-Power: 3 dBm  Link Quality: unknown/70
          Signal: unknown  Noise: unknown
          Bit Rate: unknown
          Encryption: unknown
          Type: nl80211  HW Mode(s): 802.11ax/b/g/n
          Hardware: 14C3:7915 14C3:7915 [MediaTek MT7915E]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy0

wlan1     ESSID: unknown
          Access Point: XX:XX:XX:XX:XX:XY
          Mode: Client  Channel: unknown (unknown)  HT Mode: NOHT
          Center Channel 1: unknown 2: unknown
          Tx-Power: 3 dBm  Link Quality: unknown/70
          Signal: unknown  Noise: unknown
          Bit Rate: unknown
          Encryption: unknown
          Type: nl80211  HW Mode(s): 802.11ac/ax/n
          Hardware: 14C3:7915 14C3:7915 [MediaTek MT7915E]
          TX power offset: none
          Frequency offset: none
          Supports VAPs: yes  PHY name: phy1

I reverted back to 22.03.6 r20265-f85a79bcb4 - card works perfectly without issues.
I guess newer kernel breaks things for me.
That's unfortunate as 22.03 won't get support after 2 weeks.

Guess that's it. I guess I will have to look for another card.
By the way, I booted into Debian again before reverting to 22.03.6, just to test things up and the card is not behaving properly there. I can connect to is, I can set up a hotspot but only for the 2.4GHz radio. The speed is 50ish Mbps, which makes this card pretty much useless if I want to use to it its full extent.
I will have to accept that something in the kernel borked the card.
There just does not seem to be any explanation as to why it is not working.
A lot of people complained about issues after 5.15 so I guess that's where things got wrong.

adding my related topic just in case if something will change

1 Like

Not sure if this helps.....

But I've been having the same issues with my 7915-npd card.

The WiFi card worked fine in openwrt 22, and also ddwrt, but it didn't work in openwrt 23

I'm using a pc engines apu2, however I got it working by updating my apu2 coreboot firmware.

Now it's running perfect.

Command timeouts look like power management issues.

it was my very first guess. so i had cut off 3.3V from PCI lane and put there 20W DC-DC converter for independent from mainboard power. still, on 22 works, on 23 don't.

You need to disable powersaves in bios

now i don't get power for wifi card thru means of motherboard at all, its work thru virtualization over pcie passthrough with separate 3.3v. on 22 perfectly working. and on very same h/w-system config on 23 it's don't work. as i written in my topic i had the same behavior on bare metal and virtualized environment. with and without separate power lane to wifi card. all combinations tested. so, i don't see how it's h/w issue of the system. driver, firmware, kernel - possible. but not h/w.

power management is PCI slot powerdown that device sends wakeup back to system, nothing to do with extra power.

echo performance > /sys/module/pcie_aspm/parameters/policy

i'm necroing my own thread as I am once again trying to make this card work under 23.05.5

In contrast to my initial post, on a fresh 23.05.5, OpenWrt can't even register the card. No /etc/config/wireless even exists.
dmesg gives this:

dmesg | grep "7915"
[    2.178856] pci 0000:04:00.0: [14c3:7915] type 00 class 0x000280
[    8.304021] mt7915e 0000:04:00.0: enabling device (0000 -> 0002)
[    8.436848] mt7915e 0000:04:00.0: Direct firmware load for mediatek/mt7915_rom_patch.bin failed with error -2
[    8.439998] mt7915e 0000:04:00.0: Falling back to sysfs fallback for: mediatek/mt7915_rom_patch.bin
[    8.447944] mt7915e: probe of 0000:04:00.0 failed with error -12

but mt7915_rom_patch.bin is in /lib/firmware/mediatek:

root@OpenWrt:/lib/firmware/mediatek# ls -lah
drwxr-xr-x    2 root     root        4.0K Dec 27 22:10 .
drwxr-xr-x    6 root     root        4.0K Jul  8 19:13 ..
-rw-r--r--    1 root     root      500.0K Jul 10 21:07 BT_RAM_CODE_MT7922_1_1_hdr.bin
-rw-r--r--    1 root     root      520.4K Jul 10 21:07 BT_RAM_CODE_MT7961_1_2_hdr.bin
-rw-r--r--    1 root     root      134.4K Jul  8 19:13 WIFI_MT7922_patch_mcu_1_1_hdr.bin
-rw-r--r--    1 root     root       90.0K Jul  8 19:13 WIFI_MT7961_patch_mcu_1_2_hdr.bin
-rw-r--r--    1 root     root      994.8K Jul  8 19:13 WIFI_RAM_CODE_MT7922_1.bin
-rw-r--r--    1 root     root      778.6K Jul  8 19:13 WIFI_RAM_CODE_MT7961_1.bin
-rw-r--r--    1 root     root      141.2K Jul  8 19:13 mt7915_rom_patch.bin
-rw-r--r--    1 root     root      113.4K Jul  8 19:13 mt7915_wa.bin
-rw-r--r--    1 root     root        1.2M Jul  8 19:13 mt7915_wm.bin
-rw-r--r--    1 root     root        8.5K Jul  8 19:13 mt7916_rom_patch.bin
-rw-r--r--    1 root     root      496.0K Jul  8 19:13 mt7916_wa.bin
-rw-r--r--    1 root     root        1.6M Jul  8 19:13 mt7916_wm.bin

lspci gives the following output:

04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter [14c3:7915]

Makes files unidentifiable.

ls only lists files and directories, I don't understand what you mean by "it's making the files unidentifiable"
in any case, I hadn't rebooted which is why the system couldn't find the card.
The card is now identified but the old error persists, whenever I try enabling the wireless networks, LuCI becomes unresponsive, the only way to control the system is via SSH and dmesg is still giving the same old message timeout error.
this is given the fact that I change the policy to performance for pcie_aspm

[   33.857603] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout
[  192.001299] mt7915e 0000:04:00.0: Message 000007ed (seq 7) timeout
[  212.454709] mt7915e 0000:04:00.0: Message 00005aed (seq 8) timeout
[  232.918618] mt7915e 0000:04:00.0: Message 00005aed (seq 9) timeout
[  253.388109] mt7915e 0000:04:00.0: Message 00005aed (seq 10) timeout
[  273.858413] mt7915e 0000:04:00.0: Message 00005aed (seq 11) timeout
[  294.329483] mt7915e 0000:04:00.0: Message 00005aed (seq 12) timeout
[  314.801253] mt7915e 0000:04:00.0: Message 00005aed (seq 13) timeout
[  335.273629] mt7915e 0000:04:00.0: Message 00005aed (seq 14) timeout
[  355.746603] mt7915e 0000:04:00.0: Message 00005aed (seq 15) timeout
[  376.220100] mt7915e 0000:04:00.0: Message 00005aed (seq 1) timeout
[  396.694100] mt7915e 0000:04:00.0: Message 00005aed (seq 2) timeout
[  417.168563] mt7915e 0000:04:00.0: Message 00005aed (seq 3) timeout
[  437.643455] mt7915e 0000:04:00.0: Message 00005aed (seq 4) timeout
[  458.118744] mt7915e 0000:04:00.0: Message 00005aed (seq 5) timeout
[  478.594399] mt7915e 0000:04:00.0: Message 00005aed (seq 6) timeout
[  499.070399] mt7915e 0000:04:00.0: Message 00005aed (seq 7) timeout
[  519.546713] mt7915e 0000:04:00.0: Message 00005aed (seq 8) timeout
[  540.023308] mt7915e 0000:04:00.0: Message 00005aed (seq 9) timeout
[  560.500153] mt7915e 0000:04:00.0: Message 00005aed (seq 10) timeout

full description of lspci -vvv:

04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter (prog-if 80)
	Subsystem: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 179
	Region 0: Memory at 6008200000 (64-bit, prefetchable) [size=1M]
	Region 2: Memory at 6008300000 (64-bit, prefetchable) [size=16K]
	Region 4: Memory at 6008304000 (64-bit, prefetchable) [size=4K]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <4us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <8us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [e0] MSI: Enable+ Count=1/32 Maskable+ 64bit+
		Address: 00000000fee00998  Data: 0000
		Masking: fffffffe  Pending: 00000000
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: mt7915e

I do believe this might be related to some obscure power setting. I have disabled all types of power saving options in UEFI, as per AsiaRF, the card shouldn't be consuming more than 10W (https://asiarf.com/product/wi-fi-6-11ax-2x2-dbdc-1800mbps-mini-pcie-module-mt7915-aw7915-npd/)

I am starting to think that the slot I have put this card might not be providing enough amps. However, I am not sure how to test that theory, after all, the card is working fine in OpenWrt 22 (EDIT - amps are fine, that is why I am blurring this)

EDIT:
This is getting ridiculously weird. I have restarted the device several times. I am not sure what I did, but it appears the card is working now. Looking through the logs, I see no message timeouts. I literally changed nothing. I am now afraid to restart it as it might break something but oh well, let's try I guess.

EDIT2: After reboot, the problem reappears. The entire system hangs, I wait for like 5+ minutes until I am finally able to even log through SSH. LuCI still unresponsive. added disabled '1' to radios in /etc/config/wireless and rebooted again, only this fixes LuCI. After second reboot, message timeout error appears for the first time during the booting process and it won't reappear until I enable the radios. When it worked correctly, the message timeout error did not show even during boot (radios were off, but then I managed to switch them on).

So to summarize:
If the error manifests itself at boot even though the radios are switched off, then switching the radios on will crash accessibility to the system (unless an ssh connection has already been established) - LuCI and any new ssh breaks.

last entries in dmesg:

[   19.324678] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   34.492826] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout

if the error manifests, it also breaks what is shown in lspci -vvv:

04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter (prog-if 80)
	Subsystem: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter
	!!! Unknown header type 7f
	Interrupt: pin ? routed to IRQ 179
	Region 0: Memory at 6008200000 (64-bit, prefetchable) [size=1M]
	Region 2: Memory at 6008300000 (64-bit, prefetchable) [size=16K]
	Region 4: Memory at 6008304000 (64-bit, prefetchable) [size=4K]
	Kernel driver in use: mt7915e

Specifically the following line:
!!! Unknown header type 7f
makes me think it is a power issue, maybe some power saving setting somewhere.
looking through
/sys/bus/pci/devices/0000:04:00.0
I want to restart the card without powering off the device
not sure how to accomplish that

In the meantime, I will be powering off the router, disconnecting it from the power source, discharging the capacitors, will restart later with radios OFF at startup to see if the timeout errors appear at boot.

Cold boot did nothing, error still showing at boot even though the radios are switched off.

It just is so damn random.

I have added both parameters to grub.cfg:
pcie_aspm=off pcie_port_pm=off

I have also disabled d3cold:
root@OpenWrt:/sys/bus/pci/devices/0000:04:00.0# echo 0 > d3cold_allowed

Still nothing.

What weirds me out is how long it takes to show the error message even when the radios are off:

root@OpenWrt:~# dmesg | grep "7915"
[    2.177574] pci 0000:04:00.0: [14c3:7915] type 00 class 0x000280
[    8.293117] mt7915e 0000:04:00.0: enabling device (0000 -> 0002)
[    8.428073] mt7915e 0000:04:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220929104113a
[    8.444616] mt7915e 0000:04:00.0: WM Firmware Version: ____000000, Build Time: 20220929104145
[    8.463340] mt7915e 0000:04:00.0: WA Firmware Version: DEV_000000, Build Time: 20220929104205
[    8.615843] mt7915e 0000:04:00.0: registering led 'mt76-phy0'
[    8.658005] mt7915e 0000:04:00.0: registering led 'mt76-phy1'
[   34.492969] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout

It takes almost 30 seconds before it spits the error. It is also the last message to appear during the boot process, the previous one taking place at like 19 second mark:

[   19.274661] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[   34.492969] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout

I managed to boot up properly with the radio working, so it does not matter whether the radio is enabled or disabled upon reboot.
It appears the behavior is totally random.
Once the system boots properly, the WiFi is working absolutely flawless. It's just that more often than not, something goes wrong during booting.

If it was improper connection or another hardware problem, I would expect the card to fail at some point after boot, but this isn't the case. That is why I am sure the problem is software-based.

i did brake +3.3 volts traces on board and wired mechanical switch there. i just was pissed off enough with share quantity of cold reboots while testing :slight_smile:
as far as i can recall, issue is no so "at sys boot" but "at initiate". while i had boot with card in off state and turn it on after system boot, symptoms was the same like i would boot with it (all that good timeouts thing in logs)

That means it is not a matter of which order the booting process loads things. Going by that logic, this indicates that the problem is with the card firmware and/or driver, but then again, OpenWrt 22 would work completely fine with this card. I did notice however that the 5GHz speed on 23 (when it works) reaches 500Mbps (could be more but this 500Mbps is my ISP rated speed), while under OpenWrt 22, the card can't provide more than like half of it.

i agree on that it looks like pure software issue. unfortunately i am ops, not dev :sweat_smile:

Fair enough, I was kind of mentally prepared that this rabbit hole would lead to programming at some point. :sleepy:
not even sure how the linux kernel works, even less sure how openwrt's kmod packages interact with it, i.e. if the issue is with the kernel, with the kmod package (driver) or even with the firmware. Hell, I am not even sure why the firmware of the mt7915e chip is located on the root filesystem, I would expect this to be on the card itself but oh well.
I am eager to go deeper troubleshooting, just unsure where to look.
I wonder where I can find a shoulder to cry on (i.e. to bitch about this damn chip) :joy:

Looking through lspci again, I notice that for some devices, it appears the ASPM hasn't been disabled:

Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #21, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

even though I have pcie_aspm=off in grub.cfg

I tried booting a live installation on the x86 router - out of the 3 isos on my ventoy usb stick, only debian 22 booted up properly, kernel version is 6.1.0-18.
I couldnt boot a live environment for neither ubuntu nor manjaro - both would kernel panic. they boot just fine if I disconnect the wifi adapter which makes it definitely the culprit.

In Debian however, I inspected dmesg and I could see that one of the PCI bridges is throwing errors, namely the same PCI bridge which is connected to the wifi adapter.

I booted into OpenWrt with grub option pci=nomsi and got kernel panic.

It is clear that the PCI bridge is not doing something it should be.