Archer C7 : Atheros QCA9880 802.11nac random crashes since kernel 4.14.94

Hello,

I run currently a build from snapshot OpenWRT on ATH79 for my Archer c7 v2 because i have 1 Gbit/s fiber and i need to offload

I have very good perfs on LAN network and good perfs on WLAN network.

I have updated each month, since january and Kernel 4.14.94, i have random crashes on Atheros QCA9880 802.11nac radio0 without any message on dmesg and logread. The only way to restore service, is to toggle physical button for wlan or reboot the router.

When it is crashed, i see the network on my devices, but they cannot associate with my router. If i try to do a wifi reload, the SSID disappear, and if i run ip addr sh wlan0 the wlan0 interface still DOWN

Today my OpenWRT is : OpenWrt 19.07-SNAPSHOT r10207-158a716215 / LuCI openwrt-19.07 branch (f138fc93)
and the Kernel is 4.14.125
Device is : TP-Link Archer C7 v2

Do you think it is a hardware problem, or a kernel issue ?

Thanks,

File : /etc/config/wireless

config wifi-device 'radio0'
	option type 'mac80211'
	option channel '36'
	option hwmode '11a'
	option path 'pci0000:00/0000:00:00.0'
	option htmode 'VHT80'
	option country 'FR'
	option legacy_rates '0'

Which drivers and firmware? ath10k-ct, or no -ct?

You can increase the debug level (to painfully verbose), see

https://www.candelatech.com/ath10k-ug.php and, for definitions

I will try this, how debug level can I put ?

I use the ath10k-firmware-qca988x-ct with kmod-ath10-ct, it is the default config of target openwrt/target/linux/ath79/image/generic-tp-link.mk

In case of ath10k-ct, the best way to report an issue would be via https://github.com/greearb/ath10k-ct/issues

1 Like

I see another case here : Ath79 builds with all kmod packages through opkg [flow offloading]

seems to be the same case, QCA9880 crash since 4.14.91

Can i try to switch to non ct firmware ?

This can be done by echoing a value to the /sys/kernel/debug/ieee80211/phy0/ath10k/debug_level variable debugfs file

Or, as I recall, by adding it after the module name in, for example /etc/modules.d/ath10k-ct, such as

ath10k_pci debug_mask=0x3f

for some very verbose logging.

My git log suggests that I used /etc/modules.d/ath10k_core, but that was on a different target and my Archer C7v2 units run the "non-CT" drivers as I need 802.11s mesh.

The second archer c7 run regular openwrt 18.06.1 and has an huge uptime of 100 days without crashes.

It run non-ct version but with ar71xx target
I don't know if non-ct works with ath79, I will try

ok i put 0x3f on /sys/kernel/debug/ieee80211/phy0/ath10k/debug_level,

now we will wait the crash (every 1-3 days), but it is very verbose, how can i clear the log for free memory ?

It’s a ring buffer, so things just rotate through it.

If anything you might want to bump it up to 256 kB or so (64 kB is default, as I vaguely recall)

I had a log :slight_smile:

[172981.509845] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.517464] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.535824] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.638227] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.740633] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.836206] ath10k_pci 0000:00:00.0: wmi command 36954 timeout, restarting hardware
[172981.844183] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.945437] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.047852] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.150250] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.252696] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.506132] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.513743] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.559901] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.662243] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.764647] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.867054] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.969461] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.071846] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.174245] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.507129] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.514684] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.536457] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 877bec00 vdev: 0 addr: c0:e8:62:a1:75:c4 
[172983.547715] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 847db800 vdev: 0 addr: 74:8d:08:69:96:dd 
[172983.558935] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 8726ac00 vdev: 0 addr: b8:e8:56:36:3e:60 
[172983.570148] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 8685b800 vdev: 0 addr: c4:6e:1f:4f:d5:19 
[172984.202423] irq 15: nobody cared (try booting with the "irqpoll" option)
[172984.209323] CPU: 0 PID: 13080 Comm: kworker/u2:0 Not tainted 4.14.125 #0
[172984.216275] Workqueue: ath10k_wq ath10k_core_create [ath10k_core]
[172984.222538] Stack : ffffffff 00000003 80653ae0 800b2bd4 804ac8e4 00000000 00000001 86881b14
[172984.231107]         804a8a0c 87c07c9c 80500000 800b3a34 80653ae0 00000000 87c07c78 e5599a39
[172984.239680]         00000000 00000000 00000000 00010a08 00000000 00000000 00000008 00000000
[172984.248250]         00000cdd 80500000 00000cdc 20617468 00000000 805096c0 0000000f 00011000
[172984.256822]         ffffffff 00000003 80653ae0 80504ac8 00000008 8026c264 00000000 80650000
[172984.265393]         ...
[172984.267964] Call Trace:
[172984.270557] [<8006a9ec>] show_stack+0x58/0x100
[172984.275154] [<800b72fc>] __report_bad_irq.isra.0+0x54/0xf0
[172984.280809] [<800b7684>] note_interrupt+0x284/0x330
[172984.285860] [<800b4c18>] handle_irq_event_percpu+0x4c/0x64
[172984.291511] [<800b4c6c>] handle_irq_event+0x3c/0x70
[172984.296544] [<800b8058>] handle_level_irq+0x11c/0x160
[172984.301754] [<800b40f0>] generic_handle_irq+0x38/0x50
[172984.306975] [<802d71d0>] ar724x_pci_irq_handler+0xa4/0xdc
[172984.312542] [<800b40f0>] generic_handle_irq+0x38/0x50
[172984.317756] [<8021bcc4>] ath79_intc_irq_handler+0x94/0xec
[172984.323322] [<800b40f0>] generic_handle_irq+0x38/0x50
[172984.328543] [<804124ec>] do_IRQ+0x1c/0x2c
[172984.332702] [<8021bbd0>] plat_irq_dispatch+0xc0/0x120
[172984.337912] [<800658d8>] handle_int+0x138/0x144
[172984.342593] [<8021bbd0>] plat_irq_dispatch+0xc0/0x120
[172984.347801] handlers:
[172984.350203] [<875734a0>] ath10k_pci_irq_msi_fw_mask [ath10k_pci]
[172984.356377] Disabling IRQ #15
[172984.501866] ieee80211 phy0: Hardware restart was requested
[172985.896271] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16  peers: 127  tid: 256
[172986.006189] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410'
[172986.012392] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424  sw-crypt: 0 ct-sta: 0'
[172986.020623] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 24648 iram: 26168'
[172987.136284] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware
[172987.623976] ath10k_pci 0000:00:00.0: set-coverage-class, phyclk: 88  value: 0
[172991.366545] ath10k_pci 0000:00:00.0: device successfully recovered
[172991.373236] ath10k_pci 0000:00:00.0: Invalid state: 3 in ath10k_htt_tx_32, warning will not be repeated.
[172991.373240] ------------[ cut here ]------------
[172991.373318] WARNING: CPU: 0 PID: 32608 at /home/openwrt/openwrt/build_dir/target-mips_24kc_musl/linux-ath79_generic/ath10k-ct-2019-05-08-f98b6dc4/ath10k-4.19/htt_tx.c:1182 ath10k_htt_tx_alloc_msdu_id+0x160/0x1110 [ath10k_core]
[172991.373320] Modules linked in: ath9k ath9k_common pppoe ppp_async ath9k_hw ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[172991.373480] CPU: 0 PID: 32608 Comm: kworker/0:0 Not tainted 4.14.125 #0
[172991.373607] Workqueue: events_freezable ieee80211_alloc_hw_nm [mac80211]
[172991.373612] Stack : 00000009 0000049e 87d76468 800b2bd4 804ac8e4 00000000 00000001 86821f14
[172991.373633]         804a8a0c 87c07c8c 80500000 800b3a34 87d76468 00000000 87c07c68 15691cb3
[172991.373653]         00000000 00000000 00000000 00000000 00000004 8040d32c 00000001 68775f6e
[172991.373671]         735f6672 80500000 00000d4a 65657a61 00000000 00000000 877553dc 87723f9c
[172991.373690]         00000009 0000049e 87d76468 86900bf0 00000000 00000000 00000000 80650000
[172991.373709]         ...
[172991.373714] Call Trace:
[172991.373736] [<8006a9ec>] show_stack+0x58/0x100
[172991.373760] [<80085060>] __warn+0xe4/0x118
[172991.373771] [<80085124>] warn_slowpath_null+0x1c/0x28
[172991.373831] [<87723f9c>] ath10k_htt_tx_alloc_msdu_id+0x160/0x1110 [ath10k_core]
[172991.373874] ---[ end trace b25120ecf4bc689d ]---
[172991.373887] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.373900] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.373907] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.373931] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.373942] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.373949] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.373965] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.373976] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.373983] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.479573] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.479587] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.479595] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.486975] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.486985] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.486993] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.520361] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.520372] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.520380] ath10k_pci 0000:00:00.0: failed to push frame: -19

I found two cases

Ath10k problems on TP-Link Archer C7 on Snapshot and https://bugs.openwrt.org/index.php?do=details&task_id=2220

Same issue as me :confused:

As me, works stable on 18.06.2 but not on newer snapshot with 4.14 kernel, but I really need 4.14 for offloading

This looks to be a firmware/driver issue to me. See

I'd normally recommend trying a snapshot, but today is the first time I can remember there being "bootability" issues on master in a couple years. Holding off until the builds from Friday's commits are built (which should also include the Linux CVE patches) would be my recommendation. They'll probably be available some time on Saturday, as the buildbots will be unusually busy rebuilding three or four different branches.

1 Like

In parallel, during this investigation, I tested on the second archer c7 a build of 4.14 ath79 with git head placed January 07,no crashing of ath10k so there is a regression between 07 and 22-01-2019, the first build with this issue

I hope I can help you to find this

Modifications on firmware between this ?
Or a kernel modification incompatible with firmware ?

There is no commit on ath10k-ct repo between this period of 7/1/2019 to 22/1/2019, only one, but for 4.19 and 4.20

Yeah, I didn't see anything obvious on master either.

Hello @jeff Is it stable with non ct on your archer c7 ?

If yes I will rebuild without ct

data-point; C7V2 running ath79 / 4.19.x / -ct variant and not seeing this issue; is this just occurring for a particular client.

You can "uninstall" the ath10k-ct drivers/firmware and install the others as a package to try it there without a rebuild.

The snapshot is on 4.14, maybe the bug is fixed in 4.19

The bug was introduced with 4.14.94 the nd if it disappears with 4.19 it is a good news

@jeff, drivers and kmod ?

master moved to 4.19 a few days back, but I doubt that was the fix.