IPQ806x NSS Drivers

To be sure, is that build you’re referring to running with the ath10k-ct driver or the plain ath10k driver?

It is ath10k-ct driver.

to reproduce how?

Flash the firmware from https://github.com/ACwifidude/openwrt/tree/openwrt-21.02-nss-qsdk11.0/bin/targets/ipq806x/generic on R7800. Set Country to the US, channel to 100 on 5GHz. And just wait until wifi drops.

Okay, since this happend on the NSS build, I'll post it here, running performance governor, @quarky @ansuel ?

[26287.383125] rcu: INFO: rcu_sched self-detected stall on CPU
[26287.383157] rcu:     0-...!: (2100 ticks this GP) idle=9ce/1/0x40000002 softirq=755651/755651 fqs=0 
[26287.387503]  (t=2100 jiffies g=1291965 q=2)
[26287.396524] rcu: rcu_sched kthread starved for 2100 jiffies! g1291965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
[26287.400523] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[26287.410938] rcu: RCU grace-period kthread stack dump:
[26287.419960] task:rcu_sched       state:I stack:    0 pid:   11 ppid:     2 flags:0x00000000
[26287.425101] [<c09c6864>] (__schedule) from [<c09c6bfc>] (schedule+0x68/0x110)
[26287.433252] [<c09c6bfc>] (schedule) from [<c09ca7cc>] (schedule_timeout+0x74/0xd8)
[26287.440548] [<c09ca7cc>] (schedule_timeout) from [<c038576c>] (rcu_gp_kthread+0x544/0xd18)
[26287.448011] [<c038576c>] (rcu_gp_kthread) from [<c033e3d8>] (kthread+0x15c/0x160)
[26287.456252] [<c033e3d8>] (kthread) from [<c0300148>] (ret_from_fork+0x14/0x2c)
[26287.463801] Exception stack(0xc1469fb0 to 0xc1469ff8)
[26287.470921] 9fa0:                                     00000000 00000000 00000000 00000000
[26287.476050] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[26287.484208] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[26287.492366] NMI backtrace for cpu 0
[26287.498780] CPU: 0 PID: 8046 Comm: kworker/u4:3 Not tainted 5.10.100 #0
[26287.502254] Hardware name: Generic DT based system
[26287.508973] Workqueue: ecm_nss_ipv4_workqueue ecm_nss_ipv4_stats_sync_req_work [ecm]
[26287.513722] [<c030e32c>] (unwind_backtrace) from [<c030a1ac>] (show_stack+0x14/0x20)
[26287.521620] [<c030a1ac>] (show_stack) from [<c062f3e8>] (dump_stack+0x94/0xa8)
[26287.529343] [<c062f3e8>] (dump_stack) from [<c0637990>] (nmi_cpu_backtrace+0xdc/0x108)
[26287.536374] [<c0637990>] (nmi_cpu_backtrace) from [<c0637adc>] (nmi_trigger_cpumask_backtrace+0x120/0x158)
[26287.544277] [<c0637adc>] (nmi_trigger_cpumask_backtrace) from [<c0381334>] (rcu_dump_cpu_stacks+0xe8/0x118)
[26287.553913] [<c0381334>] (rcu_dump_cpu_stacks) from [<c0386dc8>] (rcu_sched_clock_irq+0x728/0x8f8)
[26287.563550] [<c0386dc8>] (rcu_sched_clock_irq) from [<c038dec4>] (update_process_times+0x64/0x90)
[26287.572583] [<c038dec4>] (update_process_times) from [<c03a0824>] (tick_sched_timer+0x88/0x130)
[26287.581516] [<c03a0824>] (tick_sched_timer) from [<c038e4c8>] (__hrtimer_run_queues+0x184/0x254)
[26287.590022] [<c038e4c8>] (__hrtimer_run_queues) from [<c038f500>] (hrtimer_interrupt+0x130/0x374)
[26287.599058] [<c038f500>] (hrtimer_interrupt) from [<c07e44a4>] (msm_timer_interrupt+0x3c/0x4c)
[26287.607823] [<c07e44a4>] (msm_timer_interrupt) from [<c0377608>] (handle_percpu_devid_irq+0x84/0x178)
[26287.616325] [<c0377608>] (handle_percpu_devid_irq) from [<c037115c>] (__handle_domain_irq+0x90/0xf4)
[26287.625613] [<c037115c>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[26287.634813] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[26287.642880] Exception stack(0xc52b7ed0 to 0xc52b7f18)
[26287.650521] 7ec0:                                     bf985100 00000000 0000b0ad 0000b0ab
[26287.655565] 7ee0: bf9850f0 c4e7ed00 c2136000 c1408e00 00000000 00000080 00000000 c52b6000
[26287.663723] 7f00: 00006575 c52b7f20 bf96025c c09cb498 80000013 ffffffff
[26287.671880] [<c0300b0c>] (__irq_svc) from [<c09cb498>] (_raw_spin_lock_bh+0x44/0x58)
[26287.678362] [<c09cb498>] (_raw_spin_lock_bh) from [<bf96025c>] (ecm_nss_ipv4_stats_sync_req_work+0x20/0x148 [ecm])
[26287.686401] [<bf96025c>] (ecm_nss_ipv4_stats_sync_req_work [ecm]) from [<c033820c>] (process_one_work+0x1fc/0x470)
[26287.696448] [<c033820c>] (process_one_work) from [<c03384f4>] (worker_thread+0x74/0x5d4)
[26287.706774] [<c03384f4>] (worker_thread) from [<c033e3d8>] (kthread+0x15c/0x160)
[26287.715019] [<c033e3d8>] (kthread) from [<c0300148>] (ret_from_fork+0x14/0x2c)
[26287.722395] Exception stack(0xc52b7fb0 to 0xc52b7ff8)
[26287.729427] 7fa0:                                     00000000 00000000 00000000 00000000
[26287.734557] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[26287.742715] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[26287.750870] Sending NMI from CPU 0 to CPUs 1:
[26287.758776] NMI backtrace for cpu 1
[26287.758779] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.100 #0
[26287.758781] Hardware name: Generic DT based system
[26287.758783] PC is at _raw_spin_lock_bh+0x44/0x58
[26287.758785] LR is at ecm_nss_ported_ipv4_connection_destroy_callback+0x21c/0x4c0 [ecm]
[26287.758787] pc : [<c09cb498>]    lr : [<bf9628c4>]    psr: 80000113
[26287.758789] sp : c146d490  ip : 00000000  fp : 00000000
[26287.758791] r10: 00000004  r9 : c7382f28  r8 : c146d80c
[26287.758793] r7 : c9499af4  r6 : bf985160  r5 : 00000000  r4 : c9499a80
[26287.758795] r3 : 0000b0ab  r2 : 0000b0ac  r1 : 00000000  r0 : bf985100
[26287.758797] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[26287.758799] Control: 10c5787d  Table: 46e5406a  DAC: 00000051
[26287.758801] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.100 #0
[26287.758803] Hardware name: Generic DT based system
[26287.758805] [<c030e32c>] (unwind_backtrace) from [<c030a1ac>] (show_stack+0x14/0x20)
[26287.758807] [<c030a1ac>] (show_stack) from [<c062f3e8>] (dump_stack+0x94/0xa8)
[26287.758809] [<c062f3e8>] (dump_stack) from [<c0637978>] (nmi_cpu_backtrace+0xc4/0x108)
[26287.758811] [<c0637978>] (nmi_cpu_backtrace) from [<c030cf84>] (do_handle_IPI+0x74/0x184)
[26287.758813] [<c030cf84>] (do_handle_IPI) from [<c030d0b0>] (ipi_handler+0x1c/0x2c)
[26287.758815] [<c030d0b0>] (ipi_handler) from [<c037115c>] (__handle_domain_irq+0x90/0xf4)
[26287.758817] [<c037115c>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[26287.758820] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[26287.758821] Exception stack(0xc146d440 to 0xc146d488)
[26287.758823] d440: bf985100 00000000 0000b0ac 0000b0ab c9499a80 00000000 bf985160 c9499af4
[26287.758826] d460: c146d80c c7382f28 00000004 00000000 00000000 c146d490 bf9628c4 c09cb498
[26287.758827] d480: 80000113 ffffffff
[26287.758829] [<c0300b0c>] (__irq_svc) from [<c09cb498>] (_raw_spin_lock_bh+0x44/0x58)
[26287.758832] [<c09cb498>] (_raw_spin_lock_bh) from [<bf9628c4>] (ecm_nss_ported_ipv4_connection_destroy_callback+0x21c/0x4c0 [ecm])
[26287.758835] [<bf9628c4>] (ecm_nss_ported_ipv4_connection_destroy_callback [ecm]) from [<bf963028>] (ecm_nss_ported_ipv4_connection_defunct_callback+0xfc/0x164 [ecm])
[26287.758837] [<bf963028>] (ecm_nss_ported_ipv4_connection_defunct_callback [ecm]) from [<bf951934>] (ecm_db_connection_make_defunct+0x34/0xa0 [ecm])
[26287.758840] [<bf951934>] (ecm_db_connection_make_defunct [ecm]) from [<bf95fd4c>] (ecm_conntrack_ipv4_event+0xbc/0xd8 [ecm])
[26287.758842] [<bf95fd4c>] (ecm_conntrack_ipv4_event [ecm]) from [<c0341200>] (atomic_notifier_call_chain+0x64/0x94)
[26287.758844] [<c0341200>] (atomic_notifier_call_chain) from [<bf75ce9c>] (nf_conntrack_eventmask_report+0xa8/0x324 [nf_conntrack])
[26287.758847] [<bf75ce9c>] (nf_conntrack_eventmask_report [nf_conntrack]) from [<bf7522ec>] (nf_ct_delete+0x5c/0x144 [nf_conntrack])
[26287.758849] [<bf7522ec>] (nf_ct_delete [nf_conntrack]) from [<bf7524a0>] (nf_ct_kill_acct+0xcc/0x530 [nf_conntrack])
[26287.758851] [<bf7524a0>] (nf_ct_kill_acct [nf_conntrack]) from [<bf753f6c>] (nf_conntrack_tuple_taken+0x44c/0x750 [nf_conntrack])
[26287.758854] [<bf753f6c>] (nf_conntrack_tuple_taken [nf_conntrack]) from [<bf96c728>] (ecm_classifier_nl_connection_added+0x19c/0x34c [ecm])
[26287.758856] [<bf96c728>] (ecm_classifier_nl_connection_added [ecm]) from [<bf95374c>] (ecm_db_connection_add+0x4a4/0x6a8 [ecm])
[26287.758858] [<bf95374c>] (ecm_db_connection_add [ecm]) from [<bf965478>] (ecm_nss_ported_ipv4_process+0xffc/0x1080 [ecm])
[26287.758861] [<bf965478>] (ecm_nss_ported_ipv4_process [ecm]) from [<bf960ad0>] (ecm_nss_ipv4_init+0x6cc/0x1240 [ecm])
[26287.758863] [<bf960ad0>] (ecm_nss_ipv4_init [ecm]) from [<bf961804>] (ecm_nss_ipv4_post_routing_hook+0xe4/0x118 [ecm])
[26287.758865] [<bf961804>] (ecm_nss_ipv4_post_routing_hook [ecm]) from [<c089c41c>] (nf_hook_slow+0x48/0xd8)
[26287.758868] [<c089c41c>] (nf_hook_slow) from [<c08ad658>] (ip_output+0x138/0x170)
[26287.758869] [<c08ad658>] (ip_output) from [<c09b46f0>] (ip_sabotage_in+0x60/0x70)
[26287.758872] [<c09b46f0>] (ip_sabotage_in) from [<c089c41c>] (nf_hook_slow+0x48/0xd8)
[26287.758874] [<c089c41c>] (nf_hook_slow) from [<c08a7280>] (ip_rcv+0x68/0xe0)
[26287.758876] [<c08a7280>] (ip_rcv) from [<c0824ccc>] (__netif_receive_skb_one_core+0x48/0x58)
[26287.758878] [<c0824ccc>] (__netif_receive_skb_one_core) from [<c0824f60>] (process_backlog+0x100/0x1e8)
[26287.758880] [<c0824f60>] (process_backlog) from [<c0824654>] (__napi_poll+0x34/0x150)
[26287.758882] [<c0824654>] (__napi_poll) from [<c0824970>] (net_rx_action+0xdc/0x270)
[26287.758884] [<c0824970>] (net_rx_action) from [<c03012f8>] (__do_softirq+0x110/0x2b8)
[26287.758886] [<c03012f8>] (__do_softirq) from [<c0322a18>] (irq_exit+0xb8/0x118)
[26287.758888] [<c0322a18>] (irq_exit) from [<c0371160>] (__handle_domain_irq+0x94/0xf4)
[26287.758890] [<c0371160>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[26287.758892] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[26287.758894] Exception stack(0xc146df18 to 0xc146df60)
[26287.758896] df00:                                                       00000000 000017e3
[26287.758899] df20: 1cd58000 dd99fd00 00000000 9e25f0a0 c1c69040 00000000 dd99efb0 000017e3
[26287.758901] df40: 00000000 000017e3 00301100 c146df68 c07b6c84 c07b6ca4 60000013 ffffffff
[26287.758903] [<c0300b0c>] (__irq_svc) from [<c07b6ca4>] (cpuidle_enter_state+0x180/0x380)
[26287.758905] [<c07b6ca4>] (cpuidle_enter_state) from [<c07b6ef4>] (cpuidle_enter+0x3c/0x5c)
[26287.758907] [<c07b6ef4>] (cpuidle_enter) from [<c034dfb0>] (do_idle+0x208/0x2a4)
[26287.758909] [<c034dfb0>] (do_idle) from [<c034e308>] (cpu_startup_entry+0x1c/0x20)
[26287.758911] [<c034e308>] (cpu_startup_entry) from [<4230152c>] (0x4230152c)

Your router rebooted with the above kernel panic log?

Yes. Captured with pstore enabled.

First time I've seen such a trace. Looks like out of memory issue.

Well, we'll see if it happens again, trust me, memory isn't an issue, I've kept an eye on it, it's fine, I have 200 Mb free. Unless something happens and it goes kaboom.

EDIT:
INFO: rcu_sched self-detected stall on CPU could be pretty much anything tho' but I think it's directly connected to the NSS/ECM driver somehow.

Pls do post the logs if it happens again.

I do think tho. that the NSS drivers are fairly stable, as it is not doing high intensity tasks. It basically coordinate data packets between the Linux network stack and the NSS firmware.

For what it's worth, I have an Askey RT4230W running 21.02 NSS and it's up for more than 22 days, doing full IPv4 and 6rd IPv6 routing, with IPTV multicast (igmpproxy) and a few tun, tap and WireGuard tunnels. I did disable the Wi-Fi radios, as it is causing high latency issues (which I suspect is due to the new virtual time-base airtime algo.) I have other APs connected to it via LAN cables. So far no complains ... yet.

1 Like

Try this in a startup script:

echo 0 > /sys/kernel/debug/ieee80211/phy0/aql_enable
echo 0 > /sys/kernel/debug/ieee80211/phy1/aql_enable

Since I have disabled aql I don't suffer from occasional stalls anymore, uptime of 7 days without any stall. At my place from time to time when a family member returned home with a smartphone, wifi stalled for 1-2min.

P.S. I also added this to the bootcount startup script in my latest 21 build.

1 Like

can you think of a simple way to debug this?
@quarky tried i think everything to find the source of the problem, but if it's really linked with new clients (already known?) joining the wifi, probably we can try something..
I'll try this evening, back from office, to monitor the latency when connecting my phone to the wifi, but to be honest i've never faced this issue when connecting to the wlan back home..
thanks

Edit: Removed previous analysis, as it is incorrect. Setting aql_enable to 0 essentially makes the ieee80211_txq_airtime_check() method always returns true. This essentially make the ieee80211_next_txq() method just schedule the txqs to be transmitted from left to right of the red-black tree for every round of scheduling. This has the side effect to making the first node (i.e. txq) of the rb tree always eligible for transmission regardless of how much airtime was spent for the txq. Maybe this is the source of the high latency?

So it seems that I'm not the only one affected.

In any case, I've reverted my R7800 back to the old round-robin algo. So far with 3 days uptime without issue (touch wood.) Will monitor it for a bit longer.

How many wifi devices do you have? I have about 15-20 at the same time connected throughout the day.
My Iphone doesn't trigger it, but the samsungs do.

It is probably related to this and there are other ways to reproduce it:

http://lists.infradead.org/pipermail/ath10k/2022-February/013341.html

I don't have nearly as many as you do. My R7800 have 3-4 devices connected to it mostly, but these 3-4 do roam about the house's APs, so it connect and disconnect to the R7800. Sometimes will have another 1-2 more connected to it on and off.

So it doesn't look like it is the number of devices connected to it that triggers the issue. I'm now suspecting it could be the frequent connect and disconnecting of clients that builds up over time which causes the issue. I don't have any clue how to prove it tho. :stuck_out_tongue:

Edit: Oh and my R7800 only has the 5G radio enabled. The 2G radio is disabled.

I'm around 20, the majority on 2G (IOT devices, basically), i have an android phone so i can try.. maybe triggering speedtest recursively..
I assume this is something linked with the wifi itself, not with single SSID (i think i have more than 5 devices on the same ssid only for IOT devices, need to check)..
Sorry, forgot to ask: are you using CT or not CT drivers/firmware? i think quarky has non ct (and has the problem) and so i am (and i DO NOT have the problem)
really curious issue..

1 Like

So it's a bug in ath10k and not directly related to OpenWrt... interesting.

Well, it happened again... @quarky @Ansuel

[40715.214927] rcu: INFO: rcu_sched self-detected stall on CPU
[40715.214960] rcu:     0-...!: (2141 ticks this GP) idle=f9a/1/0x40000004 softirq=1001412/1001414 fqs=24
[40715.219307]  (t=2100 jiffies g=1829977 q=3)
[40715.228676] rcu: rcu_sched kthread starved for 2053 jiffies! g1829977 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[40715.232586] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[40715.243001] rcu: RCU grace-period kthread stack dump:
[40715.252024] task:rcu_sched       state:I stack:    0 pid:   11 ppid:     2 flags:0x00000000
[40715.257167] [<c09c6864>] (__schedule) from [<c09c6bfc>] (schedule+0x68/0x110)
[40715.265315] [<c09c6bfc>] (schedule) from [<c09ca7cc>] (schedule_timeout+0x74/0xd8)
[40715.272612] [<c09ca7cc>] (schedule_timeout) from [<c038576c>] (rcu_gp_kthread+0x544/0xd18)
[40715.280074] [<c038576c>] (rcu_gp_kthread) from [<c033e3d8>] (kthread+0x15c/0x160)
[40715.288318] [<c033e3d8>] (kthread) from [<c0300148>] (ret_from_fork+0x14/0x2c)
[40715.295865] Exception stack(0xc1469fb0 to 0xc1469ff8)
[40715.302984] 9fa0:                                     00000000 00000000 00000000 00000000
[40715.308115] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[40715.316272] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[40715.324427] NMI backtrace for cpu 0
[40715.330845] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.100 #0
[40715.334317] Hardware name: Generic DT based system
[40715.340573] [<c030e32c>] (unwind_backtrace) from [<c030a1ac>] (show_stack+0x14/0x20)
[40715.345264] [<c030a1ac>] (show_stack) from [<c062f3e8>] (dump_stack+0x94/0xa8)
[40715.353162] [<c062f3e8>] (dump_stack) from [<c0637990>] (nmi_cpu_backtrace+0xdc/0x108)
[40715.360192] [<c0637990>] (nmi_cpu_backtrace) from [<c0637adc>] (nmi_trigger_cpumask_backtrace+0x120/0x158)
[40715.368095] [<c0637adc>] (nmi_trigger_cpumask_backtrace) from [<c0381334>] (rcu_dump_cpu_stacks+0xe8/0x118)
[40715.377730] [<c0381334>] (rcu_dump_cpu_stacks) from [<c0386dc8>] (rcu_sched_clock_irq+0x728/0x8f8)
[40715.387368] [<c0386dc8>] (rcu_sched_clock_irq) from [<c038dec4>] (update_process_times+0x64/0x90)
[40715.396400] [<c038dec4>] (update_process_times) from [<c03a0824>] (tick_sched_timer+0x88/0x130)
[40715.405334] [<c03a0824>] (tick_sched_timer) from [<c038e4c8>] (__hrtimer_run_queues+0x184/0x254)
[40715.413840] [<c038e4c8>] (__hrtimer_run_queues) from [<c038f500>] (hrtimer_interrupt+0x130/0x374)
[40715.422875] [<c038f500>] (hrtimer_interrupt) from [<c07e44a4>] (msm_timer_interrupt+0x3c/0x4c)
[40715.431642] [<c07e44a4>] (msm_timer_interrupt) from [<c0377608>] (handle_percpu_devid_irq+0x84/0x178)
[40715.440143] [<c0377608>] (handle_percpu_devid_irq) from [<c037115c>] (__handle_domain_irq+0x90/0xf4)
[40715.449430] [<c037115c>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[40715.458630] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[40715.466698] Exception stack(0xc0d01408 to 0xc0d01450)
[40715.474340] 1400:                   bf970100 00000000 00007940 0000793f c749ea80 00000000
[40715.479381] 1420: bf970160 c749eaf4 c0d017d4 c749e628 00000004 00000000 00000000 c0d01458
[40715.487537] 1440: bf94d8c4 c09cb498 80000113 ffffffff
[40715.495695] [<c0300b0c>] (__irq_svc) from [<c09cb498>] (_raw_spin_lock_bh+0x44/0x58)
[40715.500850] [<c09cb498>] (_raw_spin_lock_bh) from [<bf94d8c4>] (ecm_nss_ported_ipv4_connection_destroy_callback+0x21c/0x4c0 [ecm])
[40715.508713] [<bf94d8c4>] (ecm_nss_ported_ipv4_connection_destroy_callback [ecm]) from [<bf94e028>] (ecm_nss_ported_ipv4_connection_defunct_callback+0xfc/0x164 [ecm])
[40715.520218] [<bf94e028>] (ecm_nss_ported_ipv4_connection_defunct_callback [ecm]) from [<bf93c934>] (ecm_db_connection_make_defunct+0x34/0xa0 [ecm])
[40715.535057] [<bf93c934>] (ecm_db_connection_make_defunct [ecm]) from [<bf94ad4c>] (ecm_conntrack_ipv4_event+0xbc/0xd8 [ecm])
[40715.548023] [<bf94ad4c>] (ecm_conntrack_ipv4_event [ecm]) from [<c0341200>] (atomic_notifier_call_chain+0x64/0x94)
[40715.559443] [<c0341200>] (atomic_notifier_call_chain) from [<bf74de9c>] (nf_conntrack_eventmask_report+0xa8/0x324 [nf_conntrack])
[40715.569600] [<bf74de9c>] (nf_conntrack_eventmask_report [nf_conntrack]) from [<bf7432ec>] (nf_ct_delete+0x5c/0x144 [nf_conntrack])
[40715.581308] [<bf7432ec>] (nf_ct_delete [nf_conntrack]) from [<bf7434a0>] (nf_ct_kill_acct+0xcc/0x530 [nf_conntrack])
[40715.592937] [<bf7434a0>] (nf_ct_kill_acct [nf_conntrack]) from [<bf744f6c>] (nf_conntrack_tuple_taken+0x44c/0x750 [nf_conntrack])
[40715.603668] [<bf744f6c>] (nf_conntrack_tuple_taken [nf_conntrack]) from [<bf957728>] (ecm_classifier_nl_connection_added+0x19c/0x34c [ecm])
[40715.615266] [<bf957728>] (ecm_classifier_nl_connection_added [ecm]) from [<bf93e74c>] (ecm_db_connection_add+0x4a4/0x6a8 [ecm])
[40715.627590] [<bf93e74c>] (ecm_db_connection_add [ecm]) from [<bf950478>] (ecm_nss_ported_ipv4_process+0xffc/0x1080 [ecm])
[40715.639047] [<bf950478>] (ecm_nss_ported_ipv4_process [ecm]) from [<bf94bad0>] (ecm_nss_ipv4_init+0x6cc/0x1240 [ecm])
[40715.650159] [<bf94bad0>] (ecm_nss_ipv4_init [ecm]) from [<bf94c804>] (ecm_nss_ipv4_post_routing_hook+0xe4/0x118 [ecm])
[40715.660699] [<bf94c804>] (ecm_nss_ipv4_post_routing_hook [ecm]) from [<c089c41c>] (nf_hook_slow+0x48/0xd8)
[40715.671225] [<c089c41c>] (nf_hook_slow) from [<c08ad658>] (ip_output+0x138/0x170)
[40715.680855] [<c08ad658>] (ip_output) from [<c09b46f0>] (ip_sabotage_in+0x60/0x70)
[40715.688405] [<c09b46f0>] (ip_sabotage_in) from [<c089c41c>] (nf_hook_slow+0x48/0xd8)
[40715.695870] [<c089c41c>] (nf_hook_slow) from [<c08a7280>] (ip_rcv+0x68/0xe0)
[40715.703685] [<c08a7280>] (ip_rcv) from [<c0824ccc>] (__netif_receive_skb_one_core+0x48/0x58)
[40715.710716] [<c0824ccc>] (__netif_receive_skb_one_core) from [<c0824f60>] (process_backlog+0x100/0x1e8)
[40715.719136] [<c0824f60>] (process_backlog) from [<c0824654>] (__napi_poll+0x34/0x150)
[40715.728248] [<c0824654>] (__napi_poll) from [<c0824970>] (net_rx_action+0xdc/0x270)
[40715.736234] [<c0824970>] (net_rx_action) from [<c03012f8>] (__do_softirq+0x110/0x2b8)
[40715.743699] [<c03012f8>] (__do_softirq) from [<c0322a18>] (irq_exit+0xb8/0x118)
[40715.751683] [<c0322a18>] (irq_exit) from [<c0371160>] (__handle_domain_irq+0x94/0xf4)
[40715.758803] [<c0371160>] (__handle_domain_irq) from [<c0649740>] (gic_handle_irq+0x90/0xb8)
[40715.766790] [<c0649740>] (gic_handle_irq) from [<c0300b0c>] (__irq_svc+0x6c/0x90)
[40715.774945] Exception stack(0xc0d01ee0 to 0xc0d01f28)
[40715.782590] 1ee0: 00000000 00002502 1cd49000 dd990d00 00000000 c61109c0 c1c68840 00000000
[40715.787630] 1f00: dd98ffb0 00002502 00000000 00002502 ff1819c0 c0d01f30 c07b6c84 c07b6ca4
[40715.795782] 1f20: 60000013 ffffffff
[40715.803942] [<c0300b0c>] (__irq_svc) from [<c07b6ca4>] (cpuidle_enter_state+0x180/0x380)
[40715.807243] [<c07b6ca4>] (cpuidle_enter_state) from [<c07b6ef4>] (cpuidle_enter+0x3c/0x5c)
[40715.815575] [<c07b6ef4>] (cpuidle_enter) from [<c034dfb0>] (do_idle+0x208/0x2a4)
[40715.823646] [<c034dfb0>] (do_idle) from [<c034e308>] (cpu_startup_entry+0x1c/0x20)
[40715.831199] [<c034e308>] (cpu_startup_entry) from [<c0c00eb0>] (start_kernel+0x53c/0x54c)

mhh that really looks like a correct stall from a spinlock never released...

If I do manage to run this through gdb, what values do you want me to put into gdb?