Yet quite a lot has changed since your first paste of /etc/config/wireless.
Did you paste the entire config?
Yet quite a lot has changed since your first paste of /etc/config/wireless.
Did you paste the entire config?
No, I reinstalled because at the time I botched the installation trying to add to grub a newly-installed Debian on the same drive. So I ended installing Debian, then writing OpenWrt image after that.
I gave an example with Manjaro because I have a live usb with it and the card works with both Manjaro live but also with the newly installed Debian.
Firmware for Debian is the same as the one for owrt, but then again, I manually downloaded an updated firmware and the issue persisted, so I can cross that out as a possible cause.
So the new installation of OpenWrt now is the same as the one in the repo x86-64 target, besides a few packages I have installed, point is, the card is not working even on clean install.
Card worked with Owrt 22 stable but given how it is being no longer supported after 3 weeks, I don't really wish to go back to it. Besides, even though it worked, there were some serious issues with the speed, so I reckon it was just barely working.
What I noticed is radio1 was listed as MediaTek MT7915E and radio2 was listed as MediaTek MT7975 on Owrt22.
On this version, both radios are listed as MT7915E
so I tried the new 23.05.3 today and the issue is still present.
One thing I had noticed in LuCI was the missing encryption option under the 'wireless security' tab, same as in 23.05.2, but after installing wpad, and some other packages, I can now select password encryption.
However, under advanced settings for the wireless interface configuration, it still is showing undefined mac address.
Running iwinfo shows the mac address of the wlan0 and wlan1 but it takes close to 2 minutes to load the output (very strange, should be instantaneous).
Any suggestions what I might be doing wrong?
It hangs just after this line:
Tx-Power: 3 dBm Link Quality: unknown/70
the whole output is:
wlan0 ESSID: unknown
Access Point: XX:XX:XX:XX:XX:XX
Mode: Client Channel: unknown (unknown) HT Mode: NOHT
Center Channel 1: unknown 2: unknown
Tx-Power: 3 dBm Link Quality: unknown/70
Signal: unknown Noise: unknown
Bit Rate: unknown
Encryption: unknown
Type: nl80211 HW Mode(s): 802.11ax/b/g/n
Hardware: 14C3:7915 14C3:7915 [MediaTek MT7915E]
TX power offset: none
Frequency offset: none
Supports VAPs: yes PHY name: phy0
wlan1 ESSID: unknown
Access Point: XX:XX:XX:XX:XX:XY
Mode: Client Channel: unknown (unknown) HT Mode: NOHT
Center Channel 1: unknown 2: unknown
Tx-Power: 3 dBm Link Quality: unknown/70
Signal: unknown Noise: unknown
Bit Rate: unknown
Encryption: unknown
Type: nl80211 HW Mode(s): 802.11ac/ax/n
Hardware: 14C3:7915 14C3:7915 [MediaTek MT7915E]
TX power offset: none
Frequency offset: none
Supports VAPs: yes PHY name: phy1
I reverted back to 22.03.6 r20265-f85a79bcb4 - card works perfectly without issues.
I guess newer kernel breaks things for me.
That's unfortunate as 22.03 won't get support after 2 weeks.
Guess that's it. I guess I will have to look for another card.
By the way, I booted into Debian again before reverting to 22.03.6, just to test things up and the card is not behaving properly there. I can connect to is, I can set up a hotspot but only for the 2.4GHz radio. The speed is 50ish Mbps, which makes this card pretty much useless if I want to use to it its full extent.
I will have to accept that something in the kernel borked the card.
There just does not seem to be any explanation as to why it is not working.
A lot of people complained about issues after 5.15 so I guess that's where things got wrong.
adding my related topic just in case if something will change
Not sure if this helps.....
But I've been having the same issues with my 7915-npd card.
The WiFi card worked fine in openwrt 22, and also ddwrt, but it didn't work in openwrt 23
I'm using a pc engines apu2, however I got it working by updating my apu2 coreboot firmware.
Now it's running perfect.
Command timeouts look like power management issues.
it was my very first guess. so i had cut off 3.3V from PCI lane and put there 20W DC-DC converter for independent from mainboard power. still, on 22 works, on 23 don't.
You need to disable powersaves in bios
now i don't get power for wifi card thru means of motherboard at all, its work thru virtualization over pcie passthrough with separate 3.3v. on 22 perfectly working. and on very same h/w-system config on 23 it's don't work. as i written in my topic i had the same behavior on bare metal and virtualized environment. with and without separate power lane to wifi card. all combinations tested. so, i don't see how it's h/w issue of the system. driver, firmware, kernel - possible. but not h/w.
power management is PCI slot powerdown that device sends wakeup back to system, nothing to do with extra power.
echo performance > /sys/module/pcie_aspm/parameters/policy
i'm necroing my own thread as I am once again trying to make this card work under 23.05.5
In contrast to my initial post, on a fresh 23.05.5, OpenWrt can't even register the card. No /etc/config/wireless even exists.
dmesg gives this:
dmesg | grep "7915"
[ 2.178856] pci 0000:04:00.0: [14c3:7915] type 00 class 0x000280
[ 8.304021] mt7915e 0000:04:00.0: enabling device (0000 -> 0002)
[ 8.436848] mt7915e 0000:04:00.0: Direct firmware load for mediatek/mt7915_rom_patch.bin failed with error -2
[ 8.439998] mt7915e 0000:04:00.0: Falling back to sysfs fallback for: mediatek/mt7915_rom_patch.bin
[ 8.447944] mt7915e: probe of 0000:04:00.0 failed with error -12
but mt7915_rom_patch.bin is in /lib/firmware/mediatek:
root@OpenWrt:/lib/firmware/mediatek# ls -lah
drwxr-xr-x 2 root root 4.0K Dec 27 22:10 .
drwxr-xr-x 6 root root 4.0K Jul 8 19:13 ..
-rw-r--r-- 1 root root 500.0K Jul 10 21:07 BT_RAM_CODE_MT7922_1_1_hdr.bin
-rw-r--r-- 1 root root 520.4K Jul 10 21:07 BT_RAM_CODE_MT7961_1_2_hdr.bin
-rw-r--r-- 1 root root 134.4K Jul 8 19:13 WIFI_MT7922_patch_mcu_1_1_hdr.bin
-rw-r--r-- 1 root root 90.0K Jul 8 19:13 WIFI_MT7961_patch_mcu_1_2_hdr.bin
-rw-r--r-- 1 root root 994.8K Jul 8 19:13 WIFI_RAM_CODE_MT7922_1.bin
-rw-r--r-- 1 root root 778.6K Jul 8 19:13 WIFI_RAM_CODE_MT7961_1.bin
-rw-r--r-- 1 root root 141.2K Jul 8 19:13 mt7915_rom_patch.bin
-rw-r--r-- 1 root root 113.4K Jul 8 19:13 mt7915_wa.bin
-rw-r--r-- 1 root root 1.2M Jul 8 19:13 mt7915_wm.bin
-rw-r--r-- 1 root root 8.5K Jul 8 19:13 mt7916_rom_patch.bin
-rw-r--r-- 1 root root 496.0K Jul 8 19:13 mt7916_wa.bin
-rw-r--r-- 1 root root 1.6M Jul 8 19:13 mt7916_wm.bin
lspci gives the following output:
04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter [14c3:7915]
Makes files unidentifiable.
ls only lists files and directories, I don't understand what you mean by "it's making the files unidentifiable"
in any case, I hadn't rebooted which is why the system couldn't find the card.
The card is now identified but the old error persists, whenever I try enabling the wireless networks, LuCI becomes unresponsive, the only way to control the system is via SSH and dmesg is still giving the same old message timeout error.
this is given the fact that I change the policy to performance for pcie_aspm
[ 33.857603] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout
[ 192.001299] mt7915e 0000:04:00.0: Message 000007ed (seq 7) timeout
[ 212.454709] mt7915e 0000:04:00.0: Message 00005aed (seq 8) timeout
[ 232.918618] mt7915e 0000:04:00.0: Message 00005aed (seq 9) timeout
[ 253.388109] mt7915e 0000:04:00.0: Message 00005aed (seq 10) timeout
[ 273.858413] mt7915e 0000:04:00.0: Message 00005aed (seq 11) timeout
[ 294.329483] mt7915e 0000:04:00.0: Message 00005aed (seq 12) timeout
[ 314.801253] mt7915e 0000:04:00.0: Message 00005aed (seq 13) timeout
[ 335.273629] mt7915e 0000:04:00.0: Message 00005aed (seq 14) timeout
[ 355.746603] mt7915e 0000:04:00.0: Message 00005aed (seq 15) timeout
[ 376.220100] mt7915e 0000:04:00.0: Message 00005aed (seq 1) timeout
[ 396.694100] mt7915e 0000:04:00.0: Message 00005aed (seq 2) timeout
[ 417.168563] mt7915e 0000:04:00.0: Message 00005aed (seq 3) timeout
[ 437.643455] mt7915e 0000:04:00.0: Message 00005aed (seq 4) timeout
[ 458.118744] mt7915e 0000:04:00.0: Message 00005aed (seq 5) timeout
[ 478.594399] mt7915e 0000:04:00.0: Message 00005aed (seq 6) timeout
[ 499.070399] mt7915e 0000:04:00.0: Message 00005aed (seq 7) timeout
[ 519.546713] mt7915e 0000:04:00.0: Message 00005aed (seq 8) timeout
[ 540.023308] mt7915e 0000:04:00.0: Message 00005aed (seq 9) timeout
[ 560.500153] mt7915e 0000:04:00.0: Message 00005aed (seq 10) timeout
full description of lspci -vvv:
04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter (prog-if 80)
Subsystem: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 179
Region 0: Memory at 6008200000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at 6008300000 (64-bit, prefetchable) [size=16K]
Region 4: Memory at 6008304000 (64-bit, prefetchable) [size=4K]
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <1us, L1 <4us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <8us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] MSI: Enable+ Count=1/32 Maskable+ 64bit+
Address: 00000000fee00998 Data: 0000
Masking: fffffffe Pending: 00000000
Capabilities: [f8] Power Management version 3
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: mt7915e
I do believe this might be related to some obscure power setting. I have disabled all types of power saving options in UEFI, as per AsiaRF, the card shouldn't be consuming more than 10W (https://asiarf.com/product/wi-fi-6-11ax-2x2-dbdc-1800mbps-mini-pcie-module-mt7915-aw7915-npd/)
I am starting to think that the slot I have put this card might not be providing enough amps. However, I am not sure how to test that theory, after all, the card is working fine in OpenWrt 22 (EDIT - amps are fine, that is why I am blurring this)
EDIT:
This is getting ridiculously weird. I have restarted the device several times. I am not sure what I did, but it appears the card is working now. Looking through the logs, I see no message timeouts. I literally changed nothing. I am now afraid to restart it as it might break something but oh well, let's try I guess.
EDIT2: After reboot, the problem reappears. The entire system hangs, I wait for like 5+ minutes until I am finally able to even log through SSH. LuCI still unresponsive. added disabled '1' to radios in /etc/config/wireless and rebooted again, only this fixes LuCI. After second reboot, message timeout error appears for the first time during the booting process and it won't reappear until I enable the radios. When it worked correctly, the message timeout error did not show even during boot (radios were off, but then I managed to switch them on).
So to summarize:
If the error manifests itself at boot even though the radios are switched off, then switching the radios on will crash accessibility to the system (unless an ssh connection has already been established) - LuCI and any new ssh breaks.
last entries in dmesg:
[ 19.324678] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[ 34.492826] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout
if the error manifests, it also breaks what is shown in lspci -vvv:
04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter (prog-if 80)
Subsystem: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter
!!! Unknown header type 7f
Interrupt: pin ? routed to IRQ 179
Region 0: Memory at 6008200000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at 6008300000 (64-bit, prefetchable) [size=16K]
Region 4: Memory at 6008304000 (64-bit, prefetchable) [size=4K]
Kernel driver in use: mt7915e
Specifically the following line:
!!! Unknown header type 7f
makes me think it is a power issue, maybe some power saving setting somewhere.
looking through
/sys/bus/pci/devices/0000:04:00.0
I want to restart the card without powering off the device
not sure how to accomplish that
In the meantime, I will be powering off the router, disconnecting it from the power source, discharging the capacitors, will restart later with radios OFF at startup to see if the timeout errors appear at boot.
Cold boot did nothing, error still showing at boot even though the radios are switched off.
It just is so damn random.
I have added both parameters to grub.cfg:
pcie_aspm=off pcie_port_pm=off
I have also disabled d3cold:
root@OpenWrt:/sys/bus/pci/devices/0000:04:00.0# echo 0 > d3cold_allowed
Still nothing.
What weirds me out is how long it takes to show the error message even when the radios are off:
root@OpenWrt:~# dmesg | grep "7915"
[ 2.177574] pci 0000:04:00.0: [14c3:7915] type 00 class 0x000280
[ 8.293117] mt7915e 0000:04:00.0: enabling device (0000 -> 0002)
[ 8.428073] mt7915e 0000:04:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220929104113a
[ 8.444616] mt7915e 0000:04:00.0: WM Firmware Version: ____000000, Build Time: 20220929104145
[ 8.463340] mt7915e 0000:04:00.0: WA Firmware Version: DEV_000000, Build Time: 20220929104205
[ 8.615843] mt7915e 0000:04:00.0: registering led 'mt76-phy0'
[ 8.658005] mt7915e 0000:04:00.0: registering led 'mt76-phy1'
[ 34.492969] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout
It takes almost 30 seconds before it spits the error. It is also the last message to appear during the boot process, the previous one taking place at like 19 second mark:
[ 19.274661] IPv6: ADDRCONF(NETDEV_CHANGE): br-lan: link becomes ready
[ 34.492969] mt7915e 0000:04:00.0: Message 00001eed (seq 6) timeout
I managed to boot up properly with the radio working, so it does not matter whether the radio is enabled or disabled upon reboot.
It appears the behavior is totally random.
Once the system boots properly, the WiFi is working absolutely flawless. It's just that more often than not, something goes wrong during booting.
If it was improper connection or another hardware problem, I would expect the card to fail at some point after boot, but this isn't the case. That is why I am sure the problem is software-based.
i did brake +3.3 volts traces on board and wired mechanical switch there. i just was pissed off enough with share quantity of cold reboots while testing
as far as i can recall, issue is no so "at sys boot" but "at initiate". while i had boot with card in off state and turn it on after system boot, symptoms was the same like i would boot with it (all that good timeouts thing in logs)
That means it is not a matter of which order the booting process loads things. Going by that logic, this indicates that the problem is with the card firmware and/or driver, but then again, OpenWrt 22 would work completely fine with this card. I did notice however that the 5GHz speed on 23 (when it works) reaches 500Mbps (could be more but this 500Mbps is my ISP rated speed), while under OpenWrt 22, the card can't provide more than like half of it.
i agree on that it looks like pure software issue. unfortunately i am ops, not dev
Fair enough, I was kind of mentally prepared that this rabbit hole would lead to programming at some point.
not even sure how the linux kernel works, even less sure how openwrt's kmod packages interact with it, i.e. if the issue is with the kernel, with the kmod package (driver) or even with the firmware. Hell, I am not even sure why the firmware of the mt7915e chip is located on the root filesystem, I would expect this to be on the card itself but oh well.
I am eager to go deeper troubleshooting, just unsure where to look.
I wonder where I can find a shoulder to cry on (i.e. to bitch about this damn chip)
Looking through lspci again, I notice that for some devices, it appears the ASPM hasn't been disabled:
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #21, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
even though I have pcie_aspm=off in grub.cfg
I tried booting a live installation on the x86 router - out of the 3 isos on my ventoy usb stick, only debian 22 booted up properly, kernel version is 6.1.0-18.
I couldnt boot a live environment for neither ubuntu nor manjaro - both would kernel panic. they boot just fine if I disconnect the wifi adapter which makes it definitely the culprit.
In Debian however, I inspected dmesg and I could see that one of the PCI bridges is throwing errors, namely the same PCI bridge which is connected to the wifi adapter.
I booted into OpenWrt with grub option pci=nomsi and got kernel panic.
It is clear that the PCI bridge is not doing something it should be.