[SOLVED] BT Home Hub 5A (lantiq xrx200) odd problem, possibly SMP related

Folks, hi. I have a really perplexing problem with the semi-latest build of LEDE on a BT Home Hub 5A. It works perfectly until anything with a Mediatek chipset running Android connects to the 2.4GHz radio, then it locks up.

LEDE Reboot SNAPSHOT r3315-985c90d / LuCI Master (git-17.036.30808-2292327) built yesterday.
Conosle logs:
[ 199.097484] INFO: rcu_sched self-detected stall on CPU
[ 199.101192] 0-...: (1 GPs behind) idle=a59/2/0 softirq=15156/15158 fqs=14972
[ 199.108393] (t=15000 jiffies g=3549 c=3548 q=997)
[ 199.113267] Task dump for CPU 0:
[ 199.116491] swapper/0 R running 0 0 0 0x00100004
[ 199.122844] Stack : 00000000 800732d8 00000000 800732d8 00000000 00000006 00000006 80680000
00000000 00000000 00000000 00000000 80686da2 00000000 80686da0 00000000
00000000 00000001 805378f8 8053b820 8053b820 80530000 00000000 80530000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 dc8ba62f
...
[ 199.158354] Call Trace:
[ 199.160828] [<80019614>] show_stack+0x50/0x84
[ 199.165176] [<8007e0dc>] rcu_dump_cpu_stacks+0x98/0xf8
[ 199.170300] [<80081a60>] rcu_check_callbacks+0x2d0/0x7d4
[ 199.175611] [<80083fc0>] update_process_times+0x3c/0x78
[ 199.180844] [<80093f30>] tick_sched_timer+0x22c/0x294
[ 199.185881] [<800850a8>] __hrtimer_run_queues+0xfc/0x1d4
[ 199.191183] [<8008532c>] hrtimer_interrupt+0xfc/0x2c0
[ 199.196238] [<8001d3a0>] c0_compare_interrupt+0x98/0xcc
[ 199.201494] [<80074c18>] handle_irq_event_percpu+0x7c/0x1b4
[ 199.207035] [<80079348>] handle_percpu_irq+0x88/0xb8
[ 199.211988] [<800742e0>] generic_handle_irq+0x40/0x58
[ 199.217034] [<800155a4>] do_IRQ+0x1c/0x2c
[ 199.221070] [<8000e7d4>] plat_irq_dispatch+0xac/0xdc
[ 199.226001] [<80002980>] except_vec_vi_end+0xb4/0xc0
[ 199.230952]
[ 379.109482] INFO: rcu_sched self-detected stall on CPU
[ 379.113384] 0-...: (1 GPs behind) idle=a59/2/0 softirq=15156/15158 fqs=59740
[ 379.120591] (t=60003 jiffies g=3549 c=3548 q=6654)
[ 379.125555] Task dump for CPU 0:
[ 379.128769] swapper/0 R running 0 0 0 0x00100004
[ 379.135124] Stack : 00000000 800732d8 00000000 800732d8 00000000 00000006 00000006 80680000
00000000 00000000 00000000 00000000 80686da2 00000000 80686da0 00000000
00000000 00000001 805378f8 8053b820 8053b820 80530000 00000000 80530000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 dc8ba62f
...
[ 379.170634] Call Trace:
[ 379.173095] [<80019614>] show_stack+0x50/0x84
[ 379.177475] [<8007e0dc>] rcu_dump_cpu_stacks+0x98/0xf8
[ 379.182581] [<80081a60>] rcu_check_callbacks+0x2d0/0x7d4
[ 379.187892] [<80083fc0>] update_process_times+0x3c/0x78
[ 379.193106] [<80093f30>] tick_sched_timer+0x22c/0x294
[ 379.198179] [<800850a8>] __hrtimer_run_queues+0xfc/0x1d4
[ 379.203464] [<8008532c>] hrtimer_interrupt+0xfc/0x2c0
[ 379.208519] [<8001d3a0>] c0_compare_interrupt+0x98/0xcc
[ 379.213751] [<80074c18>] handle_irq_event_percpu+0x7c/0x1b4
[ 379.219334] [<80079348>] handle_percpu_irq+0x88/0xb8
[ 379.224269] [<800742e0>] generic_handle_irq+0x40/0x58
[ 379.229314] [<800155a4>] do_IRQ+0x1c/0x2c
[ 379.233335] [<8000e7d4>] plat_irq_dispatch+0xac/0xdc
[ 379.238299] [<80002980>] except_vec_vi_end+0xb4/0xc0
[ 422.001558] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 15001 jiffies s: 35
[ 422.009072] Task dump for CPU 0:
[ 422.012374] swapper/0 R running 0 0 0 0x00100004
[ 422.018676] Stack : 80529fe0 0000ff00 00000020 00000024 00000000 00010d90 0048ac01 ffffffff
80530000 805345ec 8053470c 8057d2a4 00000001 80530000 00000001 80530000
00000000 80015494 1100ff03 00000000 80528000 80529e98 804a0000 8006c5d0
1100ff03 00000000 0003e800 805345ec 90800400 800154ac 87cc5ae4 800096b4
80680000 805773d4 80680000 80530000 80680000 805773d4 87cc0000 87cc5ae4
...
[ 422.054156] Call Trace:
[ 422.056613] [<800091e8>] __schedule+0x7cc/0x900
[ 422.061191] [<80015494>] r4k_wait_irqoff+0x0/0x20
[ 422.065865]
[ 559.121484] INFO: rcu_sched self-detected stall on CPU
[ 559.125187] 0-...: (1 GPs behind) idle=a59/2/0 softirq=15156/15158 fqs=104591
[ 559.132480] (t=105006 jiffies g=3549 c=3548 q=10423)
[ 559.137618] Task dump for CPU 0:
[ 559.140833] swapper/0 R running 0 0 0 0x00100004
[ 559.147187] Stack : 00000000 800732d8 00000000 800732d8 00000000 00000006 00000006 80680000
00000000 00000000 00000000 00000000 80686da2 00000000 80686da0 00000000
00000000 00000001 805378f8 8053b820 8053b820 80530000 00000000 80530000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 dc8ba62f
...
[ 559.182698] Call Trace:
[ 559.185156] [<80019614>] show_stack+0x50/0x84
[ 559.189531] [<8007e0dc>] rcu_dump_cpu_stacks+0x98/0xf8
[ 559.194643] [<80081a60>] rcu_check_callbacks+0x2d0/0x7d4
[ 559.199955] [<80083fc0>] update_process_times+0x3c/0x78
[ 559.205170] [<80093f30>] tick_sched_timer+0x22c/0x294
[ 559.210241] [<800850a8>] __hrtimer_run_queues+0xfc/0x1d4
[ 559.215527] [<8008532c>] hrtimer_interrupt+0xfc/0x2c0
[ 559.220583] [<8001d3a0>] c0_compare_interrupt+0x98/0xcc
[ 559.225829] [<80074c18>] handle_irq_event_percpu+0x7c/0x1b4
[ 559.231396] [<80079348>] handle_percpu_irq+0x88/0xb8
[ 559.236331] [<800742e0>] generic_handle_irq+0x40/0x58
[ 559.241378] [<800155a4>] do_IRQ+0x1c/0x2c
[ 559.245398] [<8000e7d4>] plat_irq_dispatch+0xac/0xdc
[ 559.250362] [<80002980>] except_vec_vi_end+0xb4/0xc0
[ 559.255296]
[ 602.077580] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 60020 jiffies s: 35
[ 602.085092] Task dump for CPU 0:
[ 602.088373] swapper/0 R running 0 0 0 0x00100004
[ 602.094712] Stack : 80529fe0 0000ff00 00000020 00000024 00000000 00010d90 0048ac01 ffffffff
80530000 805345ec 8053470c 8057d2a4 00000001 80530000 00000001 80530000
00000000 80015494 1100ff03 00000000 80528000 80529e98 804a0000 8006c5d0
1100ff03 00000000 0003e800 805345ec 90800400 800154ac 87cc5ae4 800096b4
80680000 805773d4 80680000 80530000 80680000 805773d4 87cc0000 87cc5ae4
...
[ 602.130177] Call Trace:
[ 602.132634] [<800091e8>] __schedule+0x7cc/0x900
[ 602.137209] [<80015494>] r4k_wait_irqoff+0x0/0x20
[ 602.141912]

Any ideas, please?

EDIT: I can confirm backing out this commit stops the lockup. Quite why this would be the case with only Mediatek devices connecting I have no idea. Qualcomm devices are fine, Intel wifi on my laptop is fine on both bands, even the Amazon FireTV stick connects fine. Very strange edge case problem.

Would you please add your informations to https://bugs.lede-project.org/index.php?do=details&task_id=471.

1 Like

Done, thanks.