jeff
June 18, 2019, 9:06pm
6
djgreg13:
how debug level
This can be done by echoing a value to the /sys/kernel/debug/ieee80211/phy0/ath10k/debug_level variable debugfs file
Or, as I recall, by adding it after the module name in, for example /etc/modules.d/ath10k-ct
, such as
ath10k_pci debug_mask=0x3f
for some very verbose logging.
My git log suggests that I used /etc/modules.d/ath10k_core
, but that was on a different target and my Archer C7v2 units run the "non-CT" drivers as I need 802.11s mesh.
The second archer c7 run regular openwrt 18.06.1 and has an huge uptime of 100 days without crashes.
It run non-ct version but with ar71xx target
I don't know if non-ct works with ath79, I will try
ok i put 0x3f on /sys/kernel/debug/ieee80211/phy0/ath10k/debug_level,
now we will wait the crash (every 1-3 days), but it is very verbose, how can i clear the log for free memory ?
jeff
June 19, 2019, 1:32pm
9
Itβs a ring buffer, so things just rotate through it.
If anything you might want to bump it up to 256 kB or so (64 kB is default, as I vaguely recall)
I had a log
[172981.509845] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.517464] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.535824] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.638227] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.740633] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.836206] ath10k_pci 0000:00:00.0: wmi command 36954 timeout, restarting hardware
[172981.844183] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172981.945437] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.047852] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.150250] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.252696] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.506132] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.513743] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.559901] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.662243] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.764647] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.867054] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172982.969461] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.071846] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.174245] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.507129] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.514684] ath10k_pci 0000:00:00.0: SWBA overrun on vdev 0, skipped old beacon
[172983.536457] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 877bec00 vdev: 0 addr: c0:e8:62:a1:75:c4
[172983.547715] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 847db800 vdev: 0 addr: 74:8d:08:69:96:dd
[172983.558935] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 8726ac00 vdev: 0 addr: b8:e8:56:36:3e:60
[172983.570148] ath10k_pci 0000:00:00.0: removing peer, cleanup-all, deleting: peer 8685b800 vdev: 0 addr: c4:6e:1f:4f:d5:19
[172984.202423] irq 15: nobody cared (try booting with the "irqpoll" option)
[172984.209323] CPU: 0 PID: 13080 Comm: kworker/u2:0 Not tainted 4.14.125 #0
[172984.216275] Workqueue: ath10k_wq ath10k_core_create [ath10k_core]
[172984.222538] Stack : ffffffff 00000003 80653ae0 800b2bd4 804ac8e4 00000000 00000001 86881b14
[172984.231107] 804a8a0c 87c07c9c 80500000 800b3a34 80653ae0 00000000 87c07c78 e5599a39
[172984.239680] 00000000 00000000 00000000 00010a08 00000000 00000000 00000008 00000000
[172984.248250] 00000cdd 80500000 00000cdc 20617468 00000000 805096c0 0000000f 00011000
[172984.256822] ffffffff 00000003 80653ae0 80504ac8 00000008 8026c264 00000000 80650000
[172984.265393] ...
[172984.267964] Call Trace:
[172984.270557] [<8006a9ec>] show_stack+0x58/0x100
[172984.275154] [<800b72fc>] __report_bad_irq.isra.0+0x54/0xf0
[172984.280809] [<800b7684>] note_interrupt+0x284/0x330
[172984.285860] [<800b4c18>] handle_irq_event_percpu+0x4c/0x64
[172984.291511] [<800b4c6c>] handle_irq_event+0x3c/0x70
[172984.296544] [<800b8058>] handle_level_irq+0x11c/0x160
[172984.301754] [<800b40f0>] generic_handle_irq+0x38/0x50
[172984.306975] [<802d71d0>] ar724x_pci_irq_handler+0xa4/0xdc
[172984.312542] [<800b40f0>] generic_handle_irq+0x38/0x50
[172984.317756] [<8021bcc4>] ath79_intc_irq_handler+0x94/0xec
[172984.323322] [<800b40f0>] generic_handle_irq+0x38/0x50
[172984.328543] [<804124ec>] do_IRQ+0x1c/0x2c
[172984.332702] [<8021bbd0>] plat_irq_dispatch+0xc0/0x120
[172984.337912] [<800658d8>] handle_int+0x138/0x144
[172984.342593] [<8021bbd0>] plat_irq_dispatch+0xc0/0x120
[172984.347801] handlers:
[172984.350203] [<875734a0>] ath10k_pci_irq_msi_fw_mask [ath10k_pci]
[172984.356377] Disabling IRQ #15
[172984.501866] ieee80211 phy0: Hardware restart was requested
[172985.896271] ath10k_pci 0000:00:00.0: 10.1 wmi init: vdevs: 16 peers: 127 tid: 256
[172986.006189] ath10k_pci 0000:00:00.0: wmi print 'P 128 V 8 T 410'
[172986.012392] ath10k_pci 0000:00:00.0: wmi print 'msdu-desc: 1424 sw-crypt: 0 ct-sta: 0'
[172986.020623] ath10k_pci 0000:00:00.0: wmi print 'alloc rem: 24648 iram: 26168'
[172987.136284] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware
[172987.623976] ath10k_pci 0000:00:00.0: set-coverage-class, phyclk: 88 value: 0
[172991.366545] ath10k_pci 0000:00:00.0: device successfully recovered
[172991.373236] ath10k_pci 0000:00:00.0: Invalid state: 3 in ath10k_htt_tx_32, warning will not be repeated.
[172991.373240] ------------[ cut here ]------------
[172991.373318] WARNING: CPU: 0 PID: 32608 at /home/openwrt/openwrt/build_dir/target-mips_24kc_musl/linux-ath79_generic/ath10k-ct-2019-05-08-f98b6dc4/ath10k-4.19/htt_tx.c:1182 ath10k_htt_tx_alloc_msdu_id+0x160/0x1110 [ath10k_core]
[172991.373320] Modules linked in: ath9k ath9k_common pppoe ppp_async ath9k_hw ath10k_pci ath10k_core ath pppox ppp_generic nf_conntrack_ipv6 mac80211 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt compat ledtrig_usbport nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
[172991.373480] CPU: 0 PID: 32608 Comm: kworker/0:0 Not tainted 4.14.125 #0
[172991.373607] Workqueue: events_freezable ieee80211_alloc_hw_nm [mac80211]
[172991.373612] Stack : 00000009 0000049e 87d76468 800b2bd4 804ac8e4 00000000 00000001 86821f14
[172991.373633] 804a8a0c 87c07c8c 80500000 800b3a34 87d76468 00000000 87c07c68 15691cb3
[172991.373653] 00000000 00000000 00000000 00000000 00000004 8040d32c 00000001 68775f6e
[172991.373671] 735f6672 80500000 00000d4a 65657a61 00000000 00000000 877553dc 87723f9c
[172991.373690] 00000009 0000049e 87d76468 86900bf0 00000000 00000000 00000000 80650000
[172991.373709] ...
[172991.373714] Call Trace:
[172991.373736] [<8006a9ec>] show_stack+0x58/0x100
[172991.373760] [<80085060>] __warn+0xe4/0x118
[172991.373771] [<80085124>] warn_slowpath_null+0x1c/0x28
[172991.373831] [<87723f9c>] ath10k_htt_tx_alloc_msdu_id+0x160/0x1110 [ath10k_core]
[172991.373874] ---[ end trace b25120ecf4bc689d ]---
[172991.373887] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.373900] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.373907] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.373931] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.373942] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.373949] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.373965] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.373976] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.373983] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.479573] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.479587] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.479595] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.486975] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.486985] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.486993] ath10k_pci 0000:00:00.0: failed to push frame: -19
[172991.520361] ath10k_pci 0000:00:00.0: failed to transmit packet, dropping: -19
[172991.520372] ath10k_pci 0000:00:00.0: failed to submit frame: -19
[172991.520380] ath10k_pci 0000:00:00.0: failed to push frame: -19
I found two cases
Ath10k problems on TP-Link Archer C7 on Snapshot and https://bugs.openwrt.org/index.php?do=details&task_id=2220
Same issue as me
As me, works stable on 18.06.2 but not on newer snapshot with 4.14 kernel, but I really need 4.14 for offloading
jeff
June 20, 2019, 7:55pm
12
This looks to be a firmware/driver issue to me. See
I'd normally recommend trying a snapshot, but today is the first time I can remember there being "bootability" issues on master
in a couple years. Holding off until the builds from Friday's commits are built (which should also include the Linux CVE patches) would be my recommendation. They'll probably be available some time on Saturday, as the buildbots will be unusually busy rebuilding three or four different branches.
1 Like
In parallel, during this investigation, I tested on the second archer c7 a build of 4.14 ath79 with git head placed January 07,no crashing of ath10k so there is a regression between 07 and 22-01-2019, the first build with this issue
I hope I can help you to find this
Modifications on firmware between this ?
Or a kernel modification incompatible with firmware ?
There is no commit on ath10k-ct repo between this period of 7/1/2019 to 22/1/2019, only one, but for 4.19 and 4.20
jeff
June 20, 2019, 8:52pm
15
Yeah, I didn't see anything obvious on master
either.
Hello @jeff Is it stable with non ct on your archer c7 ?
If yes I will rebuild without ct
data-point; C7V2 running ath79 / 4.19.x / -ct variant and not seeing this issue; is this just occurring for a particular client.
jeff
June 21, 2019, 6:32pm
18
You can "uninstall" the ath10k-ct drivers/firmware and install the others as a package to try it there without a rebuild.
The snapshot is on 4.14, maybe the bug is fixed in 4.19
The bug was introduced with 4.14.94 the nd if it disappears with 4.19 it is a good news
@jeff , drivers and kmod ?
master moved to 4.19 a few days back, but I doubt that was the fix .
jeff
June 21, 2019, 7:34pm
21
I self-build all my images, so my versions are whatever happened to be on master
that day.
I migrate this evening my install to 4.19 and i will tell you if it crash, it happen after 2 days uptime today
HW Offloading is broken with 4.19 kernel, i will reinstall a 4.14 kernel, my gateway is overloaded all the time with my gigabit connection
The NAT Passthrouth is at 390 Mbit/s with 4.19, 890 Mbit/s with 4.14
Currently HW offload is only available on MT7621 device, so you should only be enabling SW offload on C7V2.
I don't know why NAT passthrouth are so bad with OpenWrt SNAPSHOT r10307-629e6538a1 / LuCI Master (f138fc93)
[ 6] 0.00-1.00 sec 39.4 MBytes 331 Mbits/sec
[ 6] 1.00-2.00 sec 34.2 MBytes 287 Mbits/sec
[ 6] 2.00-3.00 sec 29.4 MBytes 246 Mbits/sec