[solved] Crash and reboot. 24.10.0 bthh5. May be DSL not used but present or ath10k-pci related

Background. bthh5 using relay bridge between 5G and 2G network. I've been using this on builds since 19.xx. On builds 22.xx , 23.05 the router crashed in the same way.
Crashes are between 1 hour and several days. This has been happening for years. I have tried different hardware (both hh5 and power bricks). Today I updated to 24.10.0 and a crash happened in about a hour. Nothing in logs that I can spot (I've got them forwarded to a rsyslog server). After the reboot (green light, green flashing, blue) it is OK until the next time.
I can post more of the configuration. I am trying to set up my own build environment as I post.

Other Information
The LAN ports are connected to 4 machines. DSL is not connected. IPv6 is not being used. 192,168.2.0/24 is the network and address for router control only. The main network is 192.168.115.0/24 with DHCP assigned from the house router.

root@OpenWrt:/etc# cat openwrt_release 
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='24.10.0'
DISTRIB_REVISION='r28427-6df0e3d02a'
DISTRIB_TARGET='lantiq/xrx200'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt 24.10.0 r28427-6df0e3d02a'
DISTRIB_TAINTS=''
root@OpenWrt:/etc# cat openwrt_version 
r28427-6df0e3d02a
root@OpenWrt:/etc# cat config/network 

config interface 'loopback'
	option device 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'

config globals 'globals'
	option ula_prefix 'fd59:d4de:6024::/48'
	option packet_steering '1'

config atm-bridge 'atm'
	option vpi '1'
	option vci '32'
	option encaps 'llc'
	option payload 'bridged'
	option nameprefix 'dsl'

config dsl 'dsl'
	option annex 'a'
	option tone 'av'
	option ds_snr_offset '0'

config device
	option name 'br-lan'
	option type 'bridge'
	list ports 'lan1'
	list ports 'lan2'
	list ports 'lan3'
	list ports 'lan4'
	option ipv6 '0'

config device
	option name 'lan1'
	option macaddr '00:37:b7:22:80:26'

config device
	option name 'lan2'
	option macaddr '00:37:b7:22:80:26'

config device
	option name 'lan3'
	option macaddr '00:37:b7:22:80:26'

config device
	option name 'lan4'
	option macaddr '00:37:b7:22:80:26'

config interface 'lan'
	option device 'br-lan'
	option proto 'static'
	option ipaddr '192.168.2.4'
	option netmask '255.255.255.0'
	option ip6assign '60'

config device
	option name 'dsl0'
	option macaddr '00:37:b7:22:80:27'

config interface 'wwan'
	option proto 'dhcp'

config interface 'Relaybridge'
	option proto 'relay'
	option ipaddr '192.168.1.2'
	list network 'lan'
	list network 'wwan'
	option expiry '30'
	option retry '5'

Found in forwarded logs, seems to be associated with crash, from the kernel up times:

2025-02-24T09:55:58+00:00 OpenWrt kernel: [59212.223855] ath10k_pci 0000:02:00.0: Cannot communicate with firmware, previous wmi cmds: 36954:5890997 36954:5890979 36954:5890960 36954:5890942, jiffies: 5891304, attempting restart restart firmware, dev-flags: 0 x142
2025-02-24T09:55:58+00:00 OpenWrt kernel: [59212.242278] ath10k_pci 0000:02:00.0: failed to send wmi nop: -143
2025-02-24T09:55:58+00:00 OpenWrt kernel: [59212.248256] ath10k_pci 0000:02:00.0: could not request stats (type -268435456 ret -143 specifier 1)
2025-02-23T16:22:04+00:00 OpenWrt kernel: [   65.340974] ath10k_pci 0000:02:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256
2025-02-23T16:22:04+00:00 OpenWrt kernel: [   65.356335] ath10k_pci 0000:02:00.0: wmi print 'P 128 V 8 T 410'2025-02-23T16:22:04+00:00 OpenWrt kernel: [   65.360991] ath10k_pci 0000:02:00.0: wmi print 'msdu-desc: 1424  sw-crypt: 0 ct-sta: 0'
2025-02-23T16:22:04+00:00 OpenWrt kernel: [   65.369691] ath10k_pci 0000:02:00.0: wmi print 'alloc rem: 24984 iram: 38672'

Crash after about 30 hours. Got some ath10k_pci debug just before the crash, but still don't know where to go from here.

My bthh5 (& pnho) units have been stable on OpenWrt, being configured as either modem-router on ADSL, then as a router on FTTP; and as a dumb AP.
You state the the DSL port is not used, yet you have configured settings for it.
The installation guide suggested that this could cause instability.
On my units I have deleted all settings for the DSL port, and disabled the "dsl_control" service in the "system/startup" tab of LuCi.

opkg list-installed | grep ath10k

uptime today is about 18 hours.

root@OpenWrt:~# uptime
 11:37:38 up 17:40,  load average: 0.04, 0.01, 0.00
root@OpenWrt:~# opkg list-installed | grep ath10
ath10k-board-qca988x - 20241110-r1
ath10k-firmware-qca988x-ct - 2020.11.08-r1
kmod-ath10k-ct - 6.6.73.2024.07.30~ac71b14d-r2

@alistair Will try this. Both device settings and stopping the service.

replace 2 -ct packages to equivalents without -ct

This says you use non-ct firmware so package list not really reglects what is crashing.

2025-02-23T16:22:04+00:00 OpenWrt kernel: [   65.340974] ath10k_pci 0000:02:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256

Hi,
I'm running two of them. After removing the DSL port and service:
One is up for 4d 21h
The other was up before 3 and a half days before I accidentally rebooted it.
So, the setting looks promising. I'll keep monitoring.

1 Like

It is rather puzzling how you got -ct packages but non-ct firmware loaded by kernel. Seems reinstall of packages shook wasps nest enough to get it working.

I have two running as I said

I've now changed the firmware and driver on the v24.10.0 version, building an image. The 5G radio seems to be working OK.

root@OpenWrt:~# opkg list-installed | grep ath10
ath10k-board-qca988x - 20241110-r1
ath10k-firmware-qca988x - 20241110-r1
kmod-ath10k - 6.6.73.6.12.6-r1
root@OpenWrt:~# cat /etc/openwrt_release 
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='24.10.0'
DISTRIB_REVISION='r28427-6df0e3d02a'
DISTRIB_TARGET='lantiq/xrx200'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt 24.10.0 r28427-6df0e3d02a'
DISTRIB_TAINTS='no-all busybox'

The other is built from git latest release, and uses ath10k-ct.
This one uses apk

root@OpenWrt:~# apk list | grep ath10 | grep installed
ath10k-board-qca988x-20241110-r1 mips_24kc {feeds/base/firmware/linux-firmware} () [installed]
ath10k-firmware-qca988x-ct-2020.11.08-r1 mips_24kc {feeds/base/firmware/ath10k-ct-firmware} () [installed]
kmod-ath10k-ct-6.6.79.2024.07.30~ac71b14d-r2 mips_24kc {feeds/base/kernel/ath10k-ct} (GPLv2) [installed]
root@OpenWrt:~#

I will continue to monitor uptime.

Both still stable. One 6d21h the other 40h. Looking very hopeful.

Run dmesg | grep ath on stable one, should be ct visible in fw version.

I think I have one with ct firmware and one now.



root@OpenWrt:~# dmesg | grep firmware
[   19.649871] ath10k 6.10 driver, optimized for CT firmware, probing pci device: 0x3c.
[   25.882241] ath10k_pci 0000:02:00.0: firmware ver 10.1-ct-8x-__fW-022-ecad3248 api 2 features wmi-10.x,has-wmi-mgmt-tx,mfp,txstatus-noack,wmi-10.x-CT,ratemask-CT,txrate-CT,get-temp-CT,tx-rc-CT,cust-stats-CT,retry-gt2-CT,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT crc32 3e4cf97f
[   58.317582] ath10k_pci 0000:02:00.0: pdev param 0 not supported by firmware
[   61.789419] ath10k_pci 0000:02:00.0: pdev param 0 not supported by firmware
root@OpenWrt:~# uptime
 17:32:06 up 7 days,  5:20,  load average: 0.00, 0.00, 0.00

The other one:

root@OpenWrt:~# dmesg | grep firmware
[   21.796745] ath10k_pci 0000:02:00.0: firmware ver 10.2.4-1.0-00047 api 5 features no-p2p,raw-mode,mfp,allows-mesh-bcast crc32 35bd9258
[   61.109470] ath10k_pci 0000:02:00.0: pdev param 0 not supported by firmware
root@OpenWrt:~# uptime
 17:31:23 up 1 day, 23:53,  load average: 0.01, 0.01, 0.00

Thats very strange, factory image contains both -ct packages and no mainline bits.
Basic limitation is that driver and actually loaded fw file should match. Some basic functions will work, likely not for long or for many clients.

1 Like

Marking as solved. One stayed up for 8d 21h (then I pulled the power by accident) and the other up for 3d 18h. I'll mark the DSL interface removal and service disabling as the cause since the two of them are using different ath10k firmware and driver.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.