IPQ806x NSS Drivers

Never seen that myself...

well logs are here as a proof .. so either sw or hw problem, no clue?

I've never debugged Linux warnings/oops, so I can't tell what that error actually is.
@quarky @ansuel Sorry to tag you, but any ideas?

First look appears to be related to the ath10k driver. Not sure if it’s related to the issue I’m encountering.

Agree with @quarky - looks like errors with ath (general OpenWrt problem). There is good discussion in the exploration thread on troubleshooting more recent wifi issues some people are having. Might be related or might be another bug.

I was tired of seeing NSS_TX_FAILURE_TOO_SHORT on wlan1 in the logs all the time (things seems to work fine despite that failure), this patch just fixes the log spam, it doesn't fix the actual issue. I think I got the code right, it doesn't crash anyway :slight_smile:
Should be applied after the 999-mac80211-NSS-support.patch that's in @ACwifidude's repository.

--- a/net/mac80211/iface.c	2022-02-17 09:28:56.041204675 +0100
+++ b/net/mac80211/iface.c	2022-02-17 09:27:50.454300512 +0100
@@ -1206,8 +1206,8 @@
 		skb_push(skb, ETH_HLEN);
 		ret = nss_virt_if_tx_buf(sdata->nssctx, skb);
 		if (unlikely(ret)) {
-			if (net_ratelimit()) {
-				sdata_err(sdata, "NSS TX failed with error: %s\n",
+			if (net_ratelimit() && ret != NSS_TX_FAILURE_TOO_SHORT) {
+				sdata_err(sdata, "NSS TX failed with error: %s\n",
 					nss_tx_status_str(ret));
 			}
 			skb_pull(skb, ETH_HLEN);

Patch updated to what quarky mentioned below.

Your code will not work. You can't compare two strings in that way. You should still see the error if I'm not wrong. This should work:

if (net_ratelimit() && ret != NSS_TX_FAILURE_TOO_SHORT) {

The error actually disappeared from the logs, but I'm sure your code is more correct. It sure makes more sense than what I did :slight_smile: ret should contain the string of course

I can add that in next build. Is there an ability to fix the origin of the too short error or is this just log spam in your opinion (@shelterx I too don’t know what the error means and haven’t noticed any issues related to it)

The issue seems to be a result of having mesh nodes in the network. Those nodes sends zero length frames it seems. So I think there’s nothing that can be done at the AP end.

The TX error on wlan1 is got to be from the only device i have connected on the 2.4Ghz Wifi, which's a Yamaha Receiver. All other devices are connected to 5Ghz Wifi.

I don't have any mesh networking going on... Just an extender that I use as a bridge, so I have ethernet clients behind it. It doesn't actually extends the Wifi, that function is turned off.

I can reproduce this wifi crash issue. It happens exactly on my R7800. The version I am using is OpenWrt 21.02-SNAPSHOT r16474+17-97b95ef8b9 named with R7800-20220127-Stable2012NSS-factory. Hope the information helps. The log is the same.

To be sure, is that build you’re referring to running with the ath10k-ct driver or the plain ath10k driver?

It is ath10k-ct driver.

to reproduce how?

Flash the firmware from https://github.com/ACwifidude/openwrt/tree/openwrt-21.02-nss-qsdk11.0/bin/targets/ipq806x/generic on R7800. Set Country to the US, channel to 100 on 5GHz. And just wait until wifi drops.

Okay, since this happend on the NSS build, I'll post it here, running performance governor, @quarky @ansuel ?

[26287.383125] rcu: INFO: rcu_sched self-detected stall on CPU
[26287.383157] rcu:     0-...!: (2100 ticks this GP) idle=9ce/1/0x40000002 softirq=755651/755651 fqs=0 
[26287.387503]  (t=2100 jiffies g=1291965 q=2)
[26287.396524] rcu: rcu_sched kthread starved for 2100 jiffies! g1291965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[26287.400523] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[26287.410938] rcu: RCU grace-period kthread stack dump:
[26287.419960] task:rcu_sched       state:I stack:    0 pid:   11 ppid:     2 flags:0x00000000
[26287.425101] [<c09c6864>] (__schedule) from [<c09c6bfc>] (schedule+0x68/0x110)
[26287.433252] [<c09c6bfc>] (schedule) from [<c09ca7cc>] (schedule_timeout+0x74/0xd8)
[26287.440548] [<c09ca7cc>] (schedule_timeout) from [<c038576c>] (rcu_gp_kthread+0x544/0xd18)
[26287.448011] [<c038576c>] (rcu_gp_kthread) from [<c033e3d8>] (kthread+0x15c/0x160)
[26287.456252] [<c033e3d8>] (kthread) from [<c0300148>] (ret_from_fork+0x14/0x2c)
[26287.463801] Exception stack(0xc1469fb0 to 0xc1469ff8)
[26287.470921] 9fa0:                                     00000000 00000000 00000000 00000000
[26287.476050] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[26287.484208] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[26287.492366] NMI backtrace for cpu 0
[26287.498780] CPU: 0 PID: 8046 Comm: kworker/u4:3 Not tainted 5.10.100 #0
[26287.502254] Hardware name: Generic DT based system
[26287.508973] Workqueue: ecm_nss_ipv4_workqueue ecm_nss_ipv4_stats_sync_req_work [ecm]
[26287.513722] [<c030e32c>] (unwind_backtrace) from [<c030a1ac>] (show_stack+0x14/0x20)
[26287.521620] [<c030a1ac>] (show_stack) from [<c062f3e8>] (dump_stack+0x94/0xa8)
[26287.529343] [<c062f3e8>] (dump_stack) from [<c0637990>] (nmi_cpu_backtrace+0xdc/0x108)
[26287.536374] [<c0637990>] (nmi_cpu_backtrace) from [<c0637adc>] (nmi_trigger_cpumask_backtrace+0x120/0x158)
[26287.544277] [<c0637adc>] (nmi_trigger_cpumask_backtrace) from [<c0381334>] (rcu_dump_cpu_stacks+0xe8/0x118)
[26287.553913] [<c0381334>] (rcu_dump_cpu_stacks) from [<c0386dc8>] (rcu_sched_clock_irq+0x728/0x8f8)
[26287.563550] [<c0386dc8>] (rcu_sched_clock_irq) from [<c038dec4>] (update_process_times+0x64/0x90)
[26287.572583] [<c038dec4>] (update_process_times) from [<c03a0824>] (tick_sched_timer+0x88/0x130)
[26287.581516] [<c03a0824>] (tick_sched_timer) from [<c038e4c8>] (__hrtimer_run_queues+0x184/0x254)
[26287.590022] [<c038e4c8>] (__hrtimer_run_queues) from [<c038f500>] (hrtimer_interrupt+0x130/0x374)
[26287.599058] [<c038f500>] (hrtimer_interrupt) from [<c07e44a4>] (msm_timer_interrupt+0x3c/0x4c)
[26287.607823] [<c07e44a4>] (msm_timer_interrupt) from [<c0377608>] (handle_percpu_devid_irq+0x84/0x178)
[26287.616325] [<c0377608>] (handle_percpu_devid_irq) from [<c037115c>] (__handle_domain_irq+0x90/0xf4)
[26287.625613] [<c037115c>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[26287.634813] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[26287.642880] Exception stack(0xc52b7ed0 to 0xc52b7f18)
[26287.650521] 7ec0:                                     bf985100 00000000 0000b0ad 0000b0ab
[26287.655565] 7ee0: bf9850f0 c4e7ed00 c2136000 c1408e00 00000000 00000080 00000000 c52b6000
[26287.663723] 7f00: 00006575 c52b7f20 bf96025c c09cb498 80000013 ffffffff
[26287.671880] [<c0300b0c>] (__irq_svc) from [<c09cb498>] (_raw_spin_lock_bh+0x44/0x58)
[26287.678362] [<c09cb498>] (_raw_spin_lock_bh) from [<bf96025c>] (ecm_nss_ipv4_stats_sync_req_work+0x20/0x148 [ecm])
[26287.686401] [<bf96025c>] (ecm_nss_ipv4_stats_sync_req_work [ecm]) from [<c033820c>] (process_one_work+0x1fc/0x470)
[26287.696448] [<c033820c>] (process_one_work) from [<c03384f4>] (worker_thread+0x74/0x5d4)
[26287.706774] [<c03384f4>] (worker_thread) from [<c033e3d8>] (kthread+0x15c/0x160)
[26287.715019] [<c033e3d8>] (kthread) from [<c0300148>] (ret_from_fork+0x14/0x2c)
[26287.722395] Exception stack(0xc52b7fb0 to 0xc52b7ff8)
[26287.729427] 7fa0:                                     00000000 00000000 00000000 00000000
[26287.734557] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[26287.742715] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[26287.750870] Sending NMI from CPU 0 to CPUs 1:
[26287.758776] NMI backtrace for cpu 1
[26287.758779] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.100 #0
[26287.758781] Hardware name: Generic DT based system
[26287.758783] PC is at _raw_spin_lock_bh+0x44/0x58
[26287.758785] LR is at ecm_nss_ported_ipv4_connection_destroy_callback+0x21c/0x4c0 [ecm]
[26287.758787] pc : [<c09cb498>]    lr : [<bf9628c4>]    psr: 80000113
[26287.758789] sp : c146d490  ip : 00000000  fp : 00000000
[26287.758791] r10: 00000004  r9 : c7382f28  r8 : c146d80c
[26287.758793] r7 : c9499af4  r6 : bf985160  r5 : 00000000  r4 : c9499a80
[26287.758795] r3 : 0000b0ab  r2 : 0000b0ac  r1 : 00000000  r0 : bf985100
[26287.758797] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[26287.758799] Control: 10c5787d  Table: 46e5406a  DAC: 00000051
[26287.758801] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.100 #0
[26287.758803] Hardware name: Generic DT based system
[26287.758805] [<c030e32c>] (unwind_backtrace) from [<c030a1ac>] (show_stack+0x14/0x20)
[26287.758807] [<c030a1ac>] (show_stack) from [<c062f3e8>] (dump_stack+0x94/0xa8)
[26287.758809] [<c062f3e8>] (dump_stack) from [<c0637978>] (nmi_cpu_backtrace+0xc4/0x108)
[26287.758811] [<c0637978>] (nmi_cpu_backtrace) from [<c030cf84>] (do_handle_IPI+0x74/0x184)
[26287.758813] [<c030cf84>] (do_handle_IPI) from [<c030d0b0>] (ipi_handler+0x1c/0x2c)
[26287.758815] [<c030d0b0>] (ipi_handler) from [<c037115c>] (__handle_domain_irq+0x90/0xf4)
[26287.758817] [<c037115c>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[26287.758820] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[26287.758821] Exception stack(0xc146d440 to 0xc146d488)
[26287.758823] d440: bf985100 00000000 0000b0ac 0000b0ab c9499a80 00000000 bf985160 c9499af4
[26287.758826] d460: c146d80c c7382f28 00000004 00000000 00000000 c146d490 bf9628c4 c09cb498
[26287.758827] d480: 80000113 ffffffff
[26287.758829] [<c0300b0c>] (__irq_svc) from [<c09cb498>] (_raw_spin_lock_bh+0x44/0x58)
[26287.758832] [<c09cb498>] (_raw_spin_lock_bh) from [<bf9628c4>] (ecm_nss_ported_ipv4_connection_destroy_callback+0x21c/0x4c0 [ecm])
[26287.758835] [<bf9628c4>] (ecm_nss_ported_ipv4_connection_destroy_callback [ecm]) from [<bf963028>] (ecm_nss_ported_ipv4_connection_defunct_callback+0xfc/0x164 [ecm])
[26287.758837] [<bf963028>] (ecm_nss_ported_ipv4_connection_defunct_callback [ecm]) from [<bf951934>] (ecm_db_connection_make_defunct+0x34/0xa0 [ecm])
[26287.758840] [<bf951934>] (ecm_db_connection_make_defunct [ecm]) from [<bf95fd4c>] (ecm_conntrack_ipv4_event+0xbc/0xd8 [ecm])
[26287.758842] [<bf95fd4c>] (ecm_conntrack_ipv4_event [ecm]) from [<c0341200>] (atomic_notifier_call_chain+0x64/0x94)
[26287.758844] [<c0341200>] (atomic_notifier_call_chain) from [<bf75ce9c>] (nf_conntrack_eventmask_report+0xa8/0x324 [nf_conntrack])
[26287.758847] [<bf75ce9c>] (nf_conntrack_eventmask_report [nf_conntrack]) from [<bf7522ec>] (nf_ct_delete+0x5c/0x144 [nf_conntrack])
[26287.758849] [<bf7522ec>] (nf_ct_delete [nf_conntrack]) from [<bf7524a0>] (nf_ct_kill_acct+0xcc/0x530 [nf_conntrack])
[26287.758851] [<bf7524a0>] (nf_ct_kill_acct [nf_conntrack]) from [<bf753f6c>] (nf_conntrack_tuple_taken+0x44c/0x750 [nf_conntrack])
[26287.758854] [<bf753f6c>] (nf_conntrack_tuple_taken [nf_conntrack]) from [<bf96c728>] (ecm_classifier_nl_connection_added+0x19c/0x34c [ecm])
[26287.758856] [<bf96c728>] (ecm_classifier_nl_connection_added [ecm]) from [<bf95374c>] (ecm_db_connection_add+0x4a4/0x6a8 [ecm])
[26287.758858] [<bf95374c>] (ecm_db_connection_add [ecm]) from [<bf965478>] (ecm_nss_ported_ipv4_process+0xffc/0x1080 [ecm])
[26287.758861] [<bf965478>] (ecm_nss_ported_ipv4_process [ecm]) from [<bf960ad0>] (ecm_nss_ipv4_init+0x6cc/0x1240 [ecm])
[26287.758863] [<bf960ad0>] (ecm_nss_ipv4_init [ecm]) from [<bf961804>] (ecm_nss_ipv4_post_routing_hook+0xe4/0x118 [ecm])
[26287.758865] [<bf961804>] (ecm_nss_ipv4_post_routing_hook [ecm]) from [<c089c41c>] (nf_hook_slow+0x48/0xd8)
[26287.758868] [<c089c41c>] (nf_hook_slow) from [<c08ad658>] (ip_output+0x138/0x170)
[26287.758869] [<c08ad658>] (ip_output) from [<c09b46f0>] (ip_sabotage_in+0x60/0x70)
[26287.758872] [<c09b46f0>] (ip_sabotage_in) from [<c089c41c>] (nf_hook_slow+0x48/0xd8)
[26287.758874] [<c089c41c>] (nf_hook_slow) from [<c08a7280>] (ip_rcv+0x68/0xe0)
[26287.758876] [<c08a7280>] (ip_rcv) from [<c0824ccc>] (__netif_receive_skb_one_core+0x48/0x58)
[26287.758878] [<c0824ccc>] (__netif_receive_skb_one_core) from [<c0824f60>] (process_backlog+0x100/0x1e8)
[26287.758880] [<c0824f60>] (process_backlog) from [<c0824654>] (__napi_poll+0x34/0x150)
[26287.758882] [<c0824654>] (__napi_poll) from [<c0824970>] (net_rx_action+0xdc/0x270)
[26287.758884] [<c0824970>] (net_rx_action) from [<c03012f8>] (__do_softirq+0x110/0x2b8)
[26287.758886] [<c03012f8>] (__do_softirq) from [<c0322a18>] (irq_exit+0xb8/0x118)
[26287.758888] [<c0322a18>] (irq_exit) from [<c0371160>] (__handle_domain_irq+0x94/0xf4)
[26287.758890] [<c0371160>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[26287.758892] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[26287.758894] Exception stack(0xc146df18 to 0xc146df60)
[26287.758896] df00:                                                       00000000 000017e3
[26287.758899] df20: 1cd58000 dd99fd00 00000000 9e25f0a0 c1c69040 00000000 dd99efb0 000017e3
[26287.758901] df40: 00000000 000017e3 00301100 c146df68 c07b6c84 c07b6ca4 60000013 ffffffff
[26287.758903] [<c0300b0c>] (__irq_svc) from [<c07b6ca4>] (cpuidle_enter_state+0x180/0x380)
[26287.758905] [<c07b6ca4>] (cpuidle_enter_state) from [<c07b6ef4>] (cpuidle_enter+0x3c/0x5c)
[26287.758907] [<c07b6ef4>] (cpuidle_enter) from [<c034dfb0>] (do_idle+0x208/0x2a4)
[26287.758909] [<c034dfb0>] (do_idle) from [<c034e308>] (cpu_startup_entry+0x1c/0x20)
[26287.758911] [<c034e308>] (cpu_startup_entry) from [<4230152c>] (0x4230152c)

Your router rebooted with the above kernel panic log?

Yes. Captured with pstore enabled.

First time I've seen such a trace. Looks like out of memory issue.