19.x-mac80211.ko-symbols - Lockup in 'D' state when reading debugfs (?)

Hello,

From time to time, my router locks up. The state is:

  • Network interfaces still up, responds to ping internally and IPv4 routing is up
  • Internal DHCP is unresponsive (dnsmasq is in 'D' state) - must set static IP on clients
  • Wifi clients can't connect (hostapd is in 'D' state)
  • SSH is responsive
# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='19.07.2'
DISTRIB_REVISION='r10947-65030d81f3'
DISTRIB_TARGET='ar71xx/generic'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt 19.07.2 r10947-65030d81f3'
DISTRIB_TAINTS=''

# uname -a
Linux mat 4.14.171 #0 Thu Feb 27 21:05:12 2020 mips GNU/Linux

Here is output of 'ps':

# ps
  PID USER       VSZ STAT COMMAND
    1 root      1568 S    /sbin/procd
    2 root         0 SW   [kthreadd]
    4 root         0 IW<  [kworker/0:0H]
    6 root         0 IW<  [mm_percpu_wq]
    7 root         0 SW   [ksoftirqd/0]
   86 root         0 SW   [oom_reaper]
   87 root         0 IW<  [writeback]
   89 root         0 SW   [kcompactd0]
   90 root         0 IW<  [crypto]
   92 root         0 IW<  [kblockd]
  111 root         0 IW<  [watchdogd]
  132 root         0 SW   [kswapd0]
  222 root         0 SW   [spi0]
  365 root         0 IW<  [ipv6_addrconf]
  367 root         0 IW<  [dsa_ordered]
  374 root         0 IW<  [kworker/0:1H]
  439 root         0 SWN  [jffs2_gcd_mtd8]
  452 root         0 SW   [scsi_eh_0]
  453 root         0 IW<  [scsi_tmf_0]
  454 root         0 SW   [usb-storage]
  604 root      1252 S    /sbin/ubusd
  608 root       924 S    /sbin/askfirst /usr/libexec/login.sh
  636 root      1028 S    /sbin/urngd
  784 root         0 IW<  [cfg80211]
  982 root      2216 S    /sbin/logd -S 1024
  984 root      1368 S    /sbin/logread -f -h mat -r 104.131.51.57 1468 -p /var/run/logread.2.pid
 1009 root      2312 S    /sbin/rpcd -s /var/run/ubus.sock -t 30
 1105 root      1080 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -s -a -p 22 -K 300 -T 3
 1171 root      1748 S    /sbin/netifd
 1214 root       952 D    /root/collect -o /mnt/sda1/prom -T 100000000
 1240 root      1216 S    /usr/sbin/crond -f -c /etc/crontabs -l 5
 1280 root      1168 S    /sbin/namematch -h cly.nanobit.org -h lun.nanobit.org -h mob.nanobit.org
 1316 root      1356 S    /usr/sbin/uhttpd -f -h /www -r mat -x /cgi-bin -t 60 -T 30 -k 20 -A 1 -n 3 -N 100 -R -p 0.0.0.0:80 -p [:
 1673 root      1780 D    /usr/sbin/hostapd -s -P /var/run/wifi-phy0.pid -B /var/run/hostapd-phy0.conf
 1761 root      1288 S<   /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.ntp.org -p 2.
 1878 dnsmasq   1360 D    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
 8467 root      1212 D    udhcpc -p /var/run/udhcpc-eth1.pid -s /lib/netifd/dhcp.script -f -t 0 -i eth1 -x hostname:mat -C -O 121
 8553 root      1420 S    /bin/sh /usr/lib/ddns/dynamic_dns_updater.sh -v 0 -S freedns -- start
14414 root         0 IW   [kworker/u2:1]
25398 root         0 IW   [kworker/u2:2]
29153 root         0 DW   [kworker/0:2]
29155 root         0 DW   [kworker/0:1]
29165 root      1464 S    /bin/gzip
29481 root      1148 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -s -a -p 22 -K 300 -T 3
30000 root         0 IW   [kworker/0:3]
30352 root         0 IW   [kworker/0:0]
30579 root      1148 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -s -a -p 22 -K 300 -T 3
30580 root      1220 S    -ash
30590 root         0 IW   [kworker/0:4]
30598 root      1212 R    ps
30600 root      1212 S    /usr/bin/nslookup mat.nanobit.org

Note many core processes stuck in 'D' (disk) state: hostapd, dnsmasq, udhcpc. 'collect' is my custom daemon for collecting statistics (including reading debugfs).

I used sysrq-trigger to get kernel stack traces for all blocked tasks:

sysrq: SysRq : Show Blocked State
  task                PC stack   pid father
collect         D    0  1214      1 0x08100000
Stack : 00000016 8014c8c4 00000007 80432c54 00000000 804c0000 87571d24 00007ffb
        804e5208 fffffff8 00000002 87571d58 fffffffd 00000001 00000000 804e520c
        00000000 804360ec 87964000 8014e31c 00000081 878ad640 00000001 80436318
        877e2b00 87964500 873171e0 801d2548 87571df0 80437944 87571dec 87f7e630
        873171e0 87964500 873171e8 87571ef4 801d2484 873171e0 868f3e20 804e520c
        ...
Call Trace:
[<8014c8c4>] 0x8014c8c4
[<80432c54>] 0x80432c54
[<804360ec>] 0x804360ec
[<8014e31c>] 0x8014e31c
[<80436318>] 0x80436318
[<801d2548>] 0x801d2548
[<80437944>] 0x80437944
[<801d2484>] 0x801d2484
[<877cb620>] 0x877cb620 [mac80211@87780000+0x6c160]
[<877cb6d4>] 0x877cb6d4 [mac80211@87780000+0x6c160]
[<801501b4>] 0x801501b4
[<877cb78c>] 0x877cb78c [mac80211@87780000+0x6c160]
[<80140d54>] 0x80140d54
[<877cb620>] 0x877cb620 [mac80211@87780000+0x6c160]
[<801d1168>] 0x801d1168
[<8013f2d0>] 0x8013f2d0
[<80155a28>] 0x80155a28
[<80140eb8>] 0x80140eb8
[<80160738>] 0x80160738
[<80140f70>] 0x80140f70
[<80155924>] 0x80155924
[<8007148c>] 0x8007148c

hostapd         D    0  1673      1 0x08100000
Stack : 873b4a00 876f8000 870fd0b0 870590c0 014000c0 00000000 804d1658 00000000
        7fffffff 00000002 870fd480 00000000 870593c0 00000000 876b9cac 87def800
        00000028 804360ec 00000000 87e5a760 87e5a760 800a72fc 00000028 804390c0
        87a0e500 00004000 000000cc 804daa84 81102100 87c04000 00000001 8009ac74
        87a0e500 8015d63c 00000000 878aa990 879eaaa0 876b9b7c 00000002 870fd480
        ...
Call Trace:
[<804360ec>] 0x804360ec
[<800a72fc>] 0x800a72fc
[<804390c0>] 0x804390c0
[<8009ac74>] 0x8009ac74
[<8015d63c>] 0x8015d63c
[<80437414>] 0x80437414
[<800a73ec>] 0x800a73ec
[<800be628>] 0x800be628
[<800bdd50>] 0x800bdd50
[<877cdeec>] 0x877cdeec [mac80211@87780000+0x6c160]
[<877884e0>] 0x877884e0 [mac80211@87780000+0x6c160]
[<87788540>] 0x87788540 [mac80211@87780000+0x6c160]
[<877885f0>] 0x877885f0 [mac80211@87780000+0x6c160]
[<8770f0f8>] 0x8770f0f8 [cfg80211@87700000+0x36230]
[<803397e8>] 0x803397e8
[<800a70f4>] 0x800a70f4
[<803394fc>] 0x803394fc
[<80337e94>] 0x80337e94
[<80338760>] 0x80338760
[<80335d0c>] 0x80335d0c
[<80337558>] 0x80337558
[<80337a34>] 0x80337a34
[<802e0c5c>] 0x802e0c5c
[<80337658>] 0x80337658
[<802e0ebc>] 0x802e0ebc
[<802e2d70>] 0x802e2d70
[<803b09f0>] 0x803b09f0
[<800ae450>] 0x800ae450
[<800a70f4>] 0x800a70f4
[<800a72fc>] 0x800a72fc
[<800a9fe8>] 0x800a9fe8
[<802e1744>] 0x802e1744
[<802df748>] 0x802df748
[<8007148c>] 0x8007148c
[<802e14e4>] 0x802e14e4

dnsmasq         D    0  1878      1 0x08100000
Stack : 87c1a840 80376b5c 00000000 8021673c 00000200 8694f380 00000000 87353d6c
        804e5208 fffffff8 00000002 87353dc8 fffffffd 00000000 00000000 804e520c
        00000000 804360ec 00000040 804c0000 879cf6e0 7f767dcc 00000010 80436318
        7f767de4 802def64 00000010 77ee0bd8 00000001 80437944 87353ee8 00000159
        00000040 804c0000 879cf6e0 802e106c 3b9aca00 87353e0c 804e520c 8690de18
        ...
Call Trace:
[<80376b5c>] 0x80376b5c
[<8021673c>] 0x8021673c
[<804360ec>] 0x804360ec
[<80436318>] 0x80436318
[<802def64>] 0x802def64
[<80437944>] 0x80437944
[<802e106c>] 0x802e106c
[<8037f80c>] 0x8037f80c
[<8031f790>] 0x8031f790
[<802e0548>] 0x802e0548
[<8015536c>] 0x8015536c
[<80155ef4>] 0x80155ef4
[<802e1950>] 0x802e1950
[<80155458>] 0x80155458
[<8007148c>] 0x8007148c

udhcpc          D    0  8467   1171 0x08100000
Stack : 80087978 00080014 80540000 8047139c 87c272a0 873d1200 00000012 80087978
        804e5208 fffffff8 00000002 8690de18 fffffffd 00000000 00000000 804e520c
        00000000 804360ec 87c272a0 8690de80 00000000 00005b00 00000001 80436318
        873d1200 00000001 00000055 77f9d03c 87d3a540 80437944 87d350d8 00000000
        8690de80 8690de4c 00008927 8031f790 80490000 80491b60 87353dc8 868f3e20
        ...
Call Trace:
[<80087978>] 0x80087978
[<80087978>] 0x80087978
[<804360ec>] 0x804360ec
[<80436318>] 0x80436318
[<80437944>] 0x80437944
[<8031f790>] 0x8031f790
[<8031fa7c>] 0x8031fa7c
[<80371d70>] 0x80371d70
[<803814d0>] 0x803814d0
[<802dea30>] 0x802dea30
[<8015536c>] 0x8015536c
[<802deafc>] 0x802deafc
[<80141750>] 0x80141750
[<8009fb90>] 0x8009fb90
[<80155458>] 0x80155458
[<80067250>] 0x80067250

kworker/0:2     D    0 29153      2 0x00100000
Workqueue: ipv6_addrconf 0x803bf1e8
Stack : 87ce5480 802c8f10 868f3e20 00000005 87f6dc00 00000070 00000000 04a2424c
        804e5208 fffffff8 00000002 868f3e20 fffffffd 00000000 00000000 804e520c
        00000000 804360ec 00000000 00000000 00000000 00000000 00000000 80436318
        87c4fa80 00000005 87e87084 80234ac8 00000000 80437944 00000000 00000000
        87e87274 87faaa98 804d1658 87e585a0 804d1658 800aa2a0 8690de18 87571d58
        ...
Call Trace:
[<802c8f10>] 0x802c8f10
[<804360ec>] 0x804360ec
[<80436318>] 0x80436318
[<80234ac8>] 0x80234ac8
[<80437944>] 0x80437944
[<87faaa98>] 0x87faaa98 [gpio_button_hotplug@87faa000+0x1400]
[<800aa2a0>] 0x800aa2a0
[<800a968c>] 0x800a968c
[<803bf1f8>] 0x803bf1f8
[<8009bcf8>] 0x8009bcf8
[<8009c1c4>] 0x8009c1c4
[<8009bea4>] 0x8009bea4
[<800a1594>] 0x800a1594
[<800a146c>] 0x800a146c
[<800a146c>] 0x800a146c
[<80067118>] 0x80067118

kworker/0:1     D    0 29155      2 0x00100000
Workqueue: events 0x800be42c
Stack : 00000001 804daa84 00000000 800be50c 00000000 00000000 804e0000 fffffffe
        876b9b74 804daa84 804daa6e 804daa74 00000000 00000000 804e0000 fffffffe
        804d0000 804360ec 00000000 804d1658 87e5a760 8009d478 87e5a760 800be4ec
        87c261c0 804d1658 00000000 80436034 00000000 87e5a760 804daa74 804daa74
        00000000 804daa84 86a04100 00000000 81102100 8009bcf8 804d0d64 804d0be8
        ...
Call Trace:
[<800be50c>] 0x800be50c
[<804360ec>] 0x804360ec
[<8009d478>] 0x8009d478
[<800be4ec>] 0x800be4ec
[<80436034>] 0x80436034
[<8009bcf8>] 0x8009bcf8
[<8009c1c4>] 0x8009c1c4
[<8009bea4>] 0x8009bea4
[<800a1594>] 0x800a1594
[<800a146c>] 0x800a146c
[<800a146c>] 0x800a146c
[<80067118>] 0x80067118

Unfortunately seems kernel symbols are nowhere to be found - though there is a __ksymtab section in the kernel, and I used this to write a crude symbolizer (exported symbols only so take the symbols with a grain of salt!!):

sysrq: SysRq : Show Blocked State
  task                PC stack   pid father
collect         D    0  1214      1 0x08100000
Stack : 00000016 8014c8c4 00000007 80432c54 00000000 804c0000 87571d24 00007ffb
        804e5208 fffffff8 00000002 87571d58 fffffffd 00000001 00000000 804e520c
        00000000 804360ec 87964000 8014e31c 00000081 878ad640 00000001 80436318
        877e2b00 87964500 873171e0 801d2548 87571df0 80437944 87571dec 87f7e630
        873171e0 87964500 873171e8 87571ef4 801d2484 873171e0 868f3e20 804e520c
        ...
Call Trace:
[<8014c8c4> lookup_one_len+0x428] 0x8014c8c4
[<80432c54> scnprintf+0x24] 0x80432c54
[<804360ec> schedule+0x6c] 0x804360ec
[<8014e31c> vfs_rename+0xf8c] 0x8014e31c
[<80436318> __cond_resched_softirq+0xec] 0x80436318
[<801d2548> debugfs_create_automount+0x20f4] 0x801d2548
[<80437944> mutex_trylock+0x2b0] 0x80437944
[<801d2484> debugfs_create_automount+0x2030] 0x801d2484
[<877cb620>] 0x877cb620 [mac80211@87780000+0x6c160]
[<877cb6d4>] 0x877cb6d4 [mac80211@87780000+0x6c160]
[<801501b4> vfs_rename+0x2e24] 0x801501b4
[<877cb78c>] 0x877cb78c [mac80211@87780000+0x6c160]
[<80140d54> kernel_write+0x3e8] 0x80140d54
[<877cb620>] 0x877cb620 [mac80211@87780000+0x6c160]
[<801d1168> debugfs_create_automount+0xd14] 0x801d1168
[<8013f2d0> vfs_clone_file_prep_inodes+0x5c4] 0x8013f2d0
[<80155a28> iterate_dir+0x588] 0x80155a28
[<80140eb8> kernel_write+0x54c] 0x80140eb8
[<80160738> fd_install+0x2a4] 0x80160738
[<80140f70> kernel_write+0x604] 0x80140f70
[<80155924> iterate_dir+0x484] 0x80155924
[<8007148c> _save_fp+0x90c] 0x8007148c

hostapd         D    0  1673      1 0x08100000
Stack : 873b4a00 876f8000 870fd0b0 870590c0 014000c0 00000000 804d1658 00000000
        7fffffff 00000002 870fd480 00000000 870593c0 00000000 876b9cac 87def800
        00000028 804360ec 00000000 87e5a760 87e5a760 800a72fc 00000028 804390c0
        87a0e500 00004000 000000cc 804daa84 81102100 87c04000 00000001 8009ac74
        87a0e500 8015d63c 00000000 878aa990 879eaaa0 876b9b7c 00000002 870fd480
        ...
Call Trace:
[<804360ec> schedule+0x6c] 0x804360ec
[<800a72fc> __cond_resched_lock+0x188] 0x800a72fc
[<804390c0> schedule_timeout+0x30] 0x804390c0
[<8009ac74> call_usermodehelper_setup+0x21cc] 0x8009ac74
[<8015d63c> __destroy_inode+0x150] 0x8015d63c
[<80437414> wait_for_completion+0x9c] 0x80437414
[<800a73ec>] 0x800a73ec
[<800be628> probe_irq_off+0x2164] 0x800be628
[<800bdd50> probe_irq_off+0x188c] 0x800bdd50
[<877cdeec>] 0x877cdeec [mac80211@87780000+0x6c160]
[<877884e0>] 0x877884e0 [mac80211@87780000+0x6c160]
[<87788540>] 0x87788540 [mac80211@87780000+0x6c160]
[<877885f0>] 0x877885f0 [mac80211@87780000+0x6c160]
[<8770f0f8>] 0x8770f0f8 [cfg80211@87700000+0x36230]
[<803397e8> genl_unregister_family+0x4e8] 0x803397e8
[<800a70f4> set_user_nice+0x20c] 0x800a70f4
[<803394fc> genl_unregister_family+0x1fc] 0x803394fc
[<80337e94> netlink_rcv_skb+0xd4] 0x80337e94
[<80338760> genlmsg_put+0xdc] 0x80338760
[<80335d0c> netlink_broadcast+0x768] 0x80335d0c
[<80337558> netlink_unicast+0x180] 0x80337558
[<80337a34> netlink_unicast+0x65c] 0x80337a34
[<802e0c5c> kernel_sendmsg+0x5dc] 0x802e0c5c
[<80337658> netlink_unicast+0x280] 0x80337658
[<802e0ebc> kernel_sendmsg+0x83c] 0x802e0ebc
[<802e2d70> sock_init_data+0x344] 0x802e2d70
[<803b09f0> xfrm_replay_seqhi+0x4d28] 0x803b09f0
[<800ae450> remove_wait_queue+0x220] 0x800ae450
[<800a70f4> set_user_nice+0x20c] 0x800a70f4
[<800a72fc> __cond_resched_lock+0x188] 0x800a72fc
[<800a9fe8> default_wake_function+0x2bfc] 0x800a9fe8
[<802e1744> kernel_sendmsg+0x10c4] 0x802e1744
[<802df748> sock_create_kern+0xc8] 0x802df748
[<8007148c> _save_fp+0x90c] 0x8007148c
[<802e14e4> kernel_sendmsg+0xe64] 0x802e14e4

dnsmasq         D    0  1878      1 0x08100000
Stack : 87c1a840 80376b5c 00000000 8021673c 00000200 8694f380 00000000 87353d6c
        804e5208 fffffff8 00000002 87353dc8 fffffffd 00000000 00000000 804e520c
        00000000 804360ec 00000040 804c0000 879cf6e0 7f767dcc 00000010 80436318
        7f767de4 802def64 00000010 77ee0bd8 00000001 80437944 87353ee8 00000159
        00000040 804c0000 879cf6e0 802e106c 3b9aca00 87353e0c 804e520c 8690de18
        ...
Call Trace:
[<80376b5c> udp_encap_enable+0x4e8] 0x80376b5c
[<8021673c> import_iovec+0x40] 0x8021673c
[<804360ec> schedule+0x6c] 0x804360ec
[<80436318> __cond_resched_softirq+0xec] 0x80436318
[<802def64> sock_create_lite+0x134] 0x802def64
[<80437944> mutex_trylock+0x2b0] 0x80437944
[<802e106c> kernel_sendmsg+0x9ec] 0x802e106c
[<8037f80c> inet_confirm_addr+0x1aec] 0x8037f80c
[<8031f790> dev_load+0x18] 0x8031f790
[<802e0548> kernel_sock_ip_overhead+0x55c] 0x802e0548
[<8015536c> generic_block_fiemap+0x8f8] 0x8015536c
[<80155ef4> poll_schedule_timeout+0x58] 0x80155ef4
[<802e1950> kernel_sendmsg+0x12d0] 0x802e1950
[<80155458> generic_block_fiemap+0x9e4] 0x80155458
[<8007148c> _save_fp+0x90c] 0x8007148c

udhcpc          D    0  8467   1171 0x08100000
Stack : 80087978 00080014 80540000 8047139c 87c272a0 873d1200 00000012 80087978
        804e5208 fffffff8 00000002 8690de18 fffffffd 00000000 00000000 804e520c
        00000000 804360ec 87c272a0 8690de80 00000000 00005b00 00000001 80436318
        873d1200 00000001 00000055 77f9d03c 87d3a540 80437944 87d350d8 00000000
        8690de80 8690de4c 00008927 8031f790 80490000 80491b60 87353dc8 868f3e20
        ...
Call Trace:
[<80087978> abort+0x1b0] 0x80087978
[<80087978> abort+0x1b0] 0x80087978
[<804360ec> schedule+0x6c] 0x804360ec
[<80436318> __cond_resched_softirq+0xec] 0x80436318
[<80437944> mutex_trylock+0x2b0] 0x80437944
[<8031f790> dev_load+0x18] 0x8031f790
[<8031fa7c> dev_load+0x304] 0x8031fa7c
[<80371d70> ip4_datagram_connect+0x894] 0x80371d70
[<803814d0> inet_release+0x60] 0x803814d0
[<802dea30> sock_register+0xd4] 0x802dea30
[<8015536c> generic_block_fiemap+0x8f8] 0x8015536c
[<802deafc> sock_register+0x1a0] 0x802deafc
[<80141750> kernel_write+0xde4] 0x80141750
[<8009fb90> __task_pid_nr_ns+0xd74] 0x8009fb90
[<80155458> generic_block_fiemap+0x9e4] 0x80155458
[<80067250> mips_elf_read_implies_exec+0x3d0] 0x80067250

kworker/0:2     D    0 29153      2 0x00100000
Workqueue: ipv6_addrconf 0x803bf1e8
Stack : 87ce5480 802c8f10 868f3e20 00000005 87f6dc00 00000070 00000000 04a2424c
        804e5208 fffffff8 00000002 868f3e20 fffffffd 00000000 00000000 804e520c
        00000000 804360ec 00000000 00000000 00000000 00000000 00000000 80436318
        87c4fa80 00000005 87e87084 80234ac8 00000000 80437944 00000000 00000000
        87e87274 87faaa98 804d1658 87e585a0 804d1658 800aa2a0 8690de18 87571d58
        ...
Call Trace:
[<802c8f10> fixed_phy_update_state+0x6618] 0x802c8f10
[<804360ec> schedule+0x6c] 0x804360ec
[<80436318> __cond_resched_softirq+0xec] 0x80436318
[<80234ac8> devm_gpio_free+0x2414] 0x80234ac8
[<80437944> mutex_trylock+0x2b0] 0x80437944
[<87faaa98>] 0x87faaa98 [gpio_button_hotplug@87faa000+0x1400]
[<800aa2a0> default_wake_function+0x2eb4] 0x800aa2a0
[<800a968c> default_wake_function+0x22a0] 0x800a968c
[<803bf1f8> ipv6_dev_get_saddr+0xe68] 0x803bf1f8
[<8009bcf8> cancel_delayed_work+0x274] 0x8009bcf8
[<8009c1c4> cancel_delayed_work+0x740] 0x8009c1c4
[<8009bea4> cancel_delayed_work+0x420] 0x8009bea4
[<800a1594> kthread_create_on_node+0x14c] 0x800a1594
[<800a146c> kthread_create_on_node+0x24] 0x800a146c
[<800a146c> kthread_create_on_node+0x24] 0x800a146c
[<80067118> mips_elf_read_implies_exec+0x298] 0x80067118

kworker/0:1     D    0 29155      2 0x00100000
Workqueue: events 0x800be42c
Stack : 00000001 804daa84 00000000 800be50c 00000000 00000000 804e0000 fffffffe
        876b9b74 804daa84 804daa6e 804daa74 00000000 00000000 804e0000 fffffffe
        804d0000 804360ec 00000000 804d1658 87e5a760 8009d478 87e5a760 800be4ec
        87c261c0 804d1658 00000000 80436034 00000000 87e5a760 804daa74 804daa74
        00000000 804daa84 86a04100 00000000 81102100 8009bcf8 804d0d64 804d0be8
        ...
Call Trace:
[<800be50c> probe_irq_off+0x2048] 0x800be50c
[<804360ec> schedule+0x6c] 0x804360ec
[<8009d478> flush_workqueue+0x734] 0x8009d478
[<800be4ec> probe_irq_off+0x2028] 0x800be4ec
[<80436034> clkdev_alloc+0x1758] 0x80436034
[<8009bcf8> cancel_delayed_work+0x274] 0x8009bcf8
[<8009c1c4> cancel_delayed_work+0x740] 0x8009c1c4
[<8009bea4> cancel_delayed_work+0x420] 0x8009bea4
[<800a1594> kthread_create_on_node+0x14c] 0x800a1594
[<800a146c> kthread_create_on_node+0x24] 0x800a146c
[<800a146c> kthread_create_on_node+0x24] 0x800a146c
[<80067118> mips_elf_read_implies_exec+0x298] 0x80067118

Packaging kernel symbols for releases before stripping would really help here!

Any idea what might be going on? hostapd seems like it called into netlink, which entered mac80211 but I'm not sure about the later frames.

'collect' may have been doing iterate_dir, which is plausible, that's what it does most of the time, but not sure what went wrong after that.

I'm thinking about setting some simple watchdog that checks for too many 'D' states on the system and triggers a crash. Any existing OpenWrt packages / options for this?

Happened again:

$ ./ksymoops.py openwrt-19.07.2-ar71xx-generic-vmlinux.elf  <dmesg-20201029.txt
[2897523.633865] sysrq: SysRq : Show Blocked State
[2897523.638408]   task                PC stack   pid father
[2897523.643875] collect         D    0  1215      1 0x08100000
[2897523.649548] Stack : 00000016 8014c8c4 00000007 80432c54 87369ce4 804c0000 87369d24 00007ffb
[2897523.658068]         804e5208 fffffff8 00000002 87369d58 fffffffd 00000000 00000000 804e520c
[2897523.666605]         00000000 804360ec 87964000 8014e31c 00000081 878ad640 00000001 80436318
[2897523.675140]         877e2b00 87964500 8698ebe0 801d2548 87369df0 80437944 87369dec 8747d630
[2897523.683790]         8698ebe0 87964500 8698ebe8 87369ef4 801d2484 8698ebe0 8757be20 804e520c
[2897523.692340]         ...
[2897523.694955] Call Trace:
[2897523.694966] [<8014c8c4> lookup_one_len+0x428] 0x8014c8c4
[2897523.701238] [<80432c54> scnprintf+0x24] 0x80432c54
[2897523.704961] [<804360ec> schedule+0x6c] 0x804360ec
[2897523.708626] [<8014e31c> vfs_rename+0xf8c] 0x8014e31c
[2897523.712306] [<80436318> __cond_resched_softirq+0xec] 0x80436318
[2897523.715970] [<801d2548> debugfs_create_automount+0x20f4] 0x801d2548
[2897523.719648] [<80437944> mutex_trylock+0x2b0] 0x80437944
[2897523.723344] [<801d2484> debugfs_create_automount+0x2030] 0x801d2484
[2897523.727047] [<877cb620>] 0x877cb620 [mac80211@87780000+0x6c160]
[2897523.733143] [<877cb6d4>] 0x877cb6d4 [mac80211@87780000+0x6c160]
[2897523.739221] [<801501b4> vfs_rename+0x2e24] 0x801501b4
[2897523.742923] [<877cb78c>] 0x877cb78c [mac80211@87780000+0x6c160]
[2897523.749010] [<80140d54> kernel_write+0x3e8] 0x80140d54
[2897523.752678] [<877cb620>] 0x877cb620 [mac80211@87780000+0x6c160]
[2897523.758783] [<801d1168> debugfs_create_automount+0xd14] 0x801d1168
[2897523.762487] [<8013f2d0> vfs_clone_file_prep_inodes+0x5c4] 0x8013f2d0
[2897523.766142] [<80155a28> iterate_dir+0x588] 0x80155a28
[2897523.769867] [<80140eb8> kernel_write+0x54c] 0x80140eb8
[2897523.773569] [<80160738> fd_install+0x2a4] 0x80160738
[2897523.777217] [<80140f70> kernel_write+0x604] 0x80140f70
[2897523.780882] [<80155924> iterate_dir+0x484] 0x80155924
[2897523.784536] [<8007148c> _save_fp+0x90c] 0x8007148c
[2897523.788194]
[2897523.789862] hostapd         D    0  1668      1 0x08100000
[2897523.795510] Stack : 87cf05d0 87c261f0 804d1688 800aa6ec 014000c0 00000000 804d1658 00000000
[2897523.804039]         7fffffff 00000002 87304480 00000000 873d5e40 00000000 869c5cac 875c4000
[2897523.812568]         00000028 804360ec 00000000 87e59680 87e59680 800a72fc 00000028 804390c0
[2897523.821096]         87a0e500 00004000 000000cc 804daa84 81102100 87c04000 00000001 8009ac74
[2897523.829625]         87a0e500 8015d63c 00000000 87bd2330 879eb770 869c5b7c 00000002 87304480
[2897523.838146]         ...
[2897523.840768] Call Trace:
[2897523.840808] [<800aa6ec> default_wake_function+0x3300] 0x800aa6ec
[2897523.847104] [<804360ec> schedule+0x6c] 0x804360ec
[2897523.850798] [<800a72fc> __cond_resched_lock+0x188] 0x800a72fc
[2897523.854454] [<804390c0> schedule_timeout+0x30] 0x804390c0
[2897523.858133] [<8009ac74> call_usermodehelper_setup+0x21cc] 0x8009ac74
[2897523.861817] [<8015d63c> __destroy_inode+0x150] 0x8015d63c
[2897523.865503] [<80437414> wait_for_completion+0x9c] 0x80437414
[2897523.869192] [<800a73ec>] 0x800a73ec
[2897523.872885] [<800be628> probe_irq_off+0x2164] 0x800be628
[2897523.876534] [<800bdd50> probe_irq_off+0x188c] 0x800bdd50
[2897523.880206] [<877cdeec>] 0x877cdeec [mac80211@87780000+0x6c160]
[2897523.886316] [<877884e0>] 0x877884e0 [mac80211@87780000+0x6c160]
[2897523.892473] [<87788540>] 0x87788540 [mac80211@87780000+0x6c160]
[2897523.898612] [<877885f0>] 0x877885f0 [mac80211@87780000+0x6c160]
[2897523.904752] [<8770f0f8>] 0x8770f0f8 [cfg80211@87700000+0x36230]
[2897523.910882] [<803397e8> genl_unregister_family+0x4e8] 0x803397e8
[2897523.914537] [<800a70f4> set_user_nice+0x20c] 0x800a70f4
[2897523.918312] [<803394fc> genl_unregister_family+0x1fc] 0x803394fc
[2897523.921995] [<80337e94> netlink_rcv_skb+0xd4] 0x80337e94
[2897523.925683] [<80338760> genlmsg_put+0xdc] 0x80338760
[2897523.929346] [<80335d0c> netlink_broadcast+0x768] 0x80335d0c
[2897523.933037] [<80337558> netlink_unicast+0x180] 0x80337558
[2897523.936711] [<80337a34> netlink_unicast+0x65c] 0x80337a34
[2897523.940397] [<802e0c5c> kernel_sendmsg+0x5dc] 0x802e0c5c
[2897523.944074] [<80337658> netlink_unicast+0x280] 0x80337658
[2897523.947726] [<802e0ebc> kernel_sendmsg+0x83c] 0x802e0ebc
[2897523.951411] [<802e2d70> sock_init_data+0x344] 0x802e2d70
[2897523.955061] [<803b09f0> xfrm_replay_seqhi+0x4d28] 0x803b09f0
[2897523.958714] [<800ae450> remove_wait_queue+0x220] 0x800ae450
[2897523.962404] [<800a70f4> set_user_nice+0x20c] 0x800a70f4
[2897523.966082] [<800a72fc> __cond_resched_lock+0x188] 0x800a72fc
[2897523.969779] [<800a9fe8> default_wake_function+0x2bfc] 0x800a9fe8
[2897523.973471] [<802e1744> kernel_sendmsg+0x10c4] 0x802e1744
[2897523.977139] [<802df748> sock_create_kern+0xc8] 0x802df748
[2897523.980811] [<8007148c> _save_fp+0x90c] 0x8007148c
[2897523.984469] [<802e14e4> kernel_sendmsg+0xe64] 0x802e14e4
[2897523.988122]
[2897523.989792] dnsmasq         D    0  1879      1 0x08100000
[2897523.995442] Stack : 86b4ecc0 80376b5c 00000000 8021673c 00000200 869a1900 00000000 87331d6c
[2897524.003972]         804e5208 fffffff8 00000002 87331dc8 fffffffd 00000000 00000000 804e520c
[2897524.012499]         00000000 804360ec 00000040 804c0000 879cf9a0 7f9f6bec 00000010 80436318
[2897524.021028]         7f9f6c04 802def64 00000010 77f2cbd8 00000001 80437944 87331ee8 0000012c
[2897524.029557]         00000040 804c0000 879cf9a0 802e106c 3b9aca00 87331e0c 804e520c 8757be20
[2897524.038078]         ...
[2897524.040711] Call Trace:
[2897524.040718] [<80376b5c> udp_encap_enable+0x4e8] 0x80376b5c
[2897524.046974] [<8021673c> import_iovec+0x40] 0x8021673c
[2897524.050671] [<804360ec> schedule+0x6c] 0x804360ec
[2897524.054340] [<80436318> __cond_resched_softirq+0xec] 0x80436318
[2897524.057987] [<802def64> sock_create_lite+0x134] 0x802def64
[2897524.061654] [<80437944> mutex_trylock+0x2b0] 0x80437944
[2897524.065334] [<802e106c> kernel_sendmsg+0x9ec] 0x802e106c
[2897524.069022] [<8037f80c> inet_confirm_addr+0x1aec] 0x8037f80c
[2897524.072702] [<8031f790> dev_load+0x18] 0x8031f790
[2897524.076357] [<802e0548> kernel_sock_ip_overhead+0x55c] 0x802e0548
[2897524.080050] [<8015536c> generic_block_fiemap+0x8f8] 0x8015536c
[2897524.083705] [<80155ef4> poll_schedule_timeout+0x58] 0x80155ef4
[2897524.087373] [<802e1950> kernel_sendmsg+0x12d0] 0x802e1950
[2897524.091052] [<80155458> generic_block_fiemap+0x9e4] 0x80155458
[2897524.094702] [<8007148c> _save_fp+0x90c] 0x8007148c
[2897524.098361]
[2897524.100031] kworker/u2:0    D    0 12848      2 0x00100000
[2897524.105691] Workqueue: phy0 0x87796404 [mac80211@87780000+0x6c160]
[2897524.112037] Stack : 874c9838 00000000 00000000 801002d8 874c984c 113905f8 804e0000 801f81f8
[2897524.120561]         87079034 fffffff8 00000002 86423df0 fffffffd 00000000 00000000 87079038
[2897524.129082]         00000000 804360ec 80479d40 801003e4 874c9838 1138fa40 86423db0 80436318
[2897524.137622]         874c9838 874c98a8 00004382 00000000 00000d7f 80437944 00000000 00000000
[2897524.146147]         00000000 00000000 804e0000 00000001 874c98c4 8016d484 87079038 87079038
[2897524.154678]         ...
[2897524.157293] Call Trace:
[2897524.157321] [<801002d8> balance_dirty_pages_ratelimited+0xe00] 0x801002d8
[2897524.163609] [<801f81f8> blk_finish_plug+0x20] 0x801f81f8
[2897524.167286] [<804360ec> schedule+0x6c] 0x804360ec
[2897524.170945] [<801003e4> balance_dirty_pages_ratelimited+0xf0c] 0x801003e4
[2897524.174613] [<80436318> __cond_resched_softirq+0xec] 0x80436318
[2897524.178291] [<80437944> mutex_trylock+0x2b0] 0x80437944
[2897524.181968] [<8016d484> sync_inode_metadata+0x388] 0x8016d484
[2897524.185724] [<877964cc>] 0x877964cc [mac80211@87780000+0x6c160]
[2897524.191813] [<800a968c> default_wake_function+0x22a0] 0x800a968c
[2897524.195493] [<80436034> clkdev_alloc+0x1758] 0x80436034
[2897524.199170] [<8009bcf8> cancel_delayed_work+0x274] 0x8009bcf8
[2897524.202912] [<8009c1c4> cancel_delayed_work+0x740] 0x8009c1c4
[2897524.206567] [<8009bea4> cancel_delayed_work+0x420] 0x8009bea4
[2897524.210248] [<800a1594> kthread_create_on_node+0x14c] 0x800a1594
[2897524.213899] [<800a146c> kthread_create_on_node+0x24] 0x800a146c
[2897524.217552] [<800a146c> kthread_create_on_node+0x24] 0x800a146c
[2897524.221219] [<80067118> mips_elf_read_implies_exec+0x298] 0x80067118
[2897524.224875]
[2897524.226531] kworker/0:1     D    0 22241      2 0x00100000
[2897524.232191] Workqueue: events 0x800be42c
[2897524.236279] Stack : 00000001 804daa84 00000000 800be50c 87c4fa80 00000005 87e87084 80234ac8
[2897524.244809]         869c5b74 804daa84 804daa6c 804daa74 00000000 00000000 804e0000 fffffffe
[2897524.253335]         804d0000 804360ec 00000000 804d1658 87e59680 8009d478 87e59680 800be4ec
[2897524.261864]         87c261c0 804d1658 00000000 80436034 00000001 87e59680 804daa74 804daa74
[2897524.270393]         00000000 804daa84 87f7d080 00000000 81102100 8009bcf8 804d0d64 804d0be8
[2897524.278914]         ...
[2897524.281538] Call Trace:
[2897524.281544] [<800be50c> probe_irq_off+0x2048] 0x800be50c
[2897524.287834] [<80234ac8> devm_gpio_free+0x2414] 0x80234ac8
[2897524.291495] [<804360ec> schedule+0x6c] 0x804360ec
[2897524.295163] [<8009d478> flush_workqueue+0x734] 0x8009d478
[2897524.298829] [<800be4ec> probe_irq_off+0x2028] 0x800be4ec
[2897524.302508] [<80436034> clkdev_alloc+0x1758] 0x80436034
[2897524.306185] [<8009bcf8> cancel_delayed_work+0x274] 0x8009bcf8
[2897524.309882] [<8009c1c4> cancel_delayed_work+0x740] 0x8009c1c4
[2897524.313617] [<8009bea4> cancel_delayed_work+0x420] 0x8009bea4
[2897524.317281] [<800a1594> kthread_create_on_node+0x14c] 0x800a1594
[2897524.320942] [<800a146c> kthread_create_on_node+0x24] 0x800a146c
[2897524.324621] [<800a146c> kthread_create_on_node+0x24] 0x800a146c
[2897524.328288] [<80067118> mips_elf_read_implies_exec+0x298] 0x80067118
[2897524.331953]
[2897524.333615] kworker/0:0     D    0 22502      2 0x00100000
[2897524.339264] Workqueue: ipv6_addrconf 0x803bf1e8
[2897524.343963] Stack : 87ce5480 802c8f10 8757be20 00000005 87f73c00 00000032 00000000 0b6acc02
[2897524.352488]         804e5208 fffffff8 00000002 8757be20 fffffffd 00000000 00000000 804e520c
[2897524.361017]         00000000 804360ec 00000000 00000000 00000000 00000000 00000000 80436318
[2897524.369546]         87c4fa80 00000005 87e87084 80234ac8 00000000 80437944 00000000 00000000
[2897524.378067]         87e87274 87f90a98 804d1658 87c945a0 804d1658 800aa2a0 87331dc8 87369d58
[2897524.386596]         ...
[2897524.389210] Call Trace:
[2897524.389238] [<802c8f10> fixed_phy_update_state+0x6618] 0x802c8f10
[2897524.395542] [<804360ec> schedule+0x6c] 0x804360ec
[2897524.399198] [<80436318> __cond_resched_softirq+0xec] 0x80436318
[2897524.402894] [<80234ac8> devm_gpio_free+0x2414] 0x80234ac8
[2897524.406548] [<80437944> mutex_trylock+0x2b0] 0x80437944
[2897524.410258] [<87f90a98>] 0x87f90a98 [gpio_button_hotplug@87f90000+0x1400]
[2897524.417212] [<800aa2a0> default_wake_function+0x2eb4] 0x800aa2a0
[2897524.420913] [<800a968c> default_wake_function+0x22a0] 0x800a968c
[2897524.424569] [<803bf1f8> ipv6_dev_get_saddr+0xe68] 0x803bf1f8
[2897524.428236] [<8009bcf8> cancel_delayed_work+0x274] 0x8009bcf8
[2897524.431898] [<8009c1c4> cancel_delayed_work+0x740] 0x8009c1c4
[2897524.435550] [<8009bea4> cancel_delayed_work+0x420] 0x8009bea4
[2897524.439218] [<800a1594> kthread_create_on_node+0x14c] 0x800a1594
[2897524.442884] [<800a146c> kthread_create_on_node+0x24] 0x800a146c
[2897524.446535] [<800a146c> kthread_create_on_node+0x24] 0x800a146c
[2897524.450200] [<80067118> mips_elf_read_implies_exec+0x298] 0x80067118
[2897524.453859]

your issue seems to be with custom code and not openwrt... either way... you should really test under the latest release...

stracing your binary and or lsof/stat/perf/inotify/vm.block_dump etc. should be enough for you to track down what is blocking... otherwise without code... nobody can guess what you are doing...

having said that... the error is pretty self explanatory

I think the adequate fix, it to correct the runaway / unchecked behavior in your primary code...

Sure, collect is really nothing special. Here is the source: https://github.com/cpatulea/src/blob/master/mat/prometheus/collect.c

It is basically a loop:

for F in /sys/kernel/debug/ieee80211/phy0/*; do
  cat $F > ...
done

I have been trying to symbolize the stack frames in modules, like [mac80211@87780000+0x6c160], but am having difficulty. readelf -s mac80211.ko shows some symbols, but they are all below ~0x59000, so I can't seem to make sense of the "+0x6c160" offset.

Also, unlike the main kernel image, seems kernel modules have no section load address (in readelf -S), so I guess there is someone unknown section load address which I would need to be able to make sense of the "+0x6c160". Is there some typical layout the kernel uses for module sections, which I could use to decode that offset?

[edit] Just realized "+0x6c160" is not the text offset! Maybe it is the module size, but anyway looking more closely at the stack:

[2897523.876534] [<800bdd50>] 0x800bdd50
[2897523.880206] [<877cdeec>] 0x877cdeec [mac80211@87780000+0x6c160]
[2897523.886316] [<877884e0>] 0x877884e0 [mac80211@87780000+0x6c160]
[2897523.892473] [<87788540>] 0x87788540 [mac80211@87780000+0x6c160]

the text offsets are 0x877cdeec - 0x87780000 = 0x4deec, 0x84e0, etc, which I think I can map using my script!

1 Like

typically, some targets have a kernel.debug in the downloads... I checked for your board/release but did not find one... ( possibly snapshot only )

you can also build from source and specify CFLAGS that retain/add debug symbols...

also, snapshot builds contain additional symbols not available in release builds...

not too sure about that one... have to defer to guru-level support...

you really should be at minimum testing ( -s &| -f &| -L etc. ) here... ( but as it's C... and you've been a good sport and shared the source... hopefully someone will give you some tips to potentiate your while/opendir logic )

Hmm, seems there are supposed to be some files in /sys/module to give section addresses:

but my router doesn't have these. Maybe it's the wrong kernel version or this feature is disabled in openwrt.

I read about debugging kernel modules in gdb: https://svenschwermer.de/2016/11/02/debugging-openwrt_lede-kernel.html

tried loading the module:

$ openwrt-sdk-19.07.2-ar71xx-generic_gcc-7.5.0_musl.Linux-x86_64/staging_dir/toolchain-mips_24kc_gcc-7.5.0_musl/bin/mips-openwrt-linux-gdb openwrt-19.07.2-ar71xx-generic-vmlinux.elf
...
(gdb) add-symbol-file mac80211.ko 0x87780000
add symbol table from file "mac80211.ko" at
	.text_addr = 0x87780000
(y or n) y
Reading symbols from mac80211.ko...
(No debugging symbols found in mac80211.ko)
(gdb) info functions
All defined functions:

Non-debugging symbols:
0x00000000  _458
0x0000004c  _530
0x00000144  _24
0x87780000  _305
0x87780060  _57
0x87780804  _424
0x877808f4  _152
0x877809bc  _287
0x877814ec  _93
0x877816a4  _171
0x87781998  _69
0x87781ae0  _304
0x87781b00  _294
...

which matches the symbols I saw in the module section ksymtab, so I think there is something special about module symbols and they don't directly map to their function names.

Also here is the kernel module load section layout logic:

I think it would be difficult to reproduce it exactly, but possible.

Sorry, I'm still on 4.14.171 and mac80211 locked up on this again. Relevant processes in D state:

 1383 root      1780 D    /usr/sbin/hostapd -s -P /var/run/wifi-phy0.pid -B /v
 1472 root      1088 D    /root/collect -o /mnt/evo/prom -T 100000000
 3029 root      1780 D    /usr/sbin/hostapd -s -P /var/run/wifi-phy1.pid -B /v
21882 root         0 DW   [kworker/u2:2]
25690 root         0 DW   [kworker/0:0]
25902 root         0 DW   [kworker/0:3]

Looks like the file that my metrics collector was reading was this one:

root@mat:/proc/1472/fd# ls -l
...
lr-x------    1 root     root            64 Feb 13 20:03 5 -> /sys/kernel/debug/ieee80211
l-wx------    1 root     root            64 Feb 13 18:51 6 -> pipe:[557767]
lr-x------    1 root     root            64 Feb 13 20:03 7 -> /sys/kernel/debug/ieee80211/phy1/statistics
lr-x------    1 root     root            64 Feb 13 20:03 8 -> /sys/kernel/debug/ieee80211/phy1/statistics/dot11RTSSuccessCount

collect syscall is:

root@mat:/proc/1472# cat syscall
0 0x8 0x7f749480 0x2 0x77fd3830 0x0 0x0 0x7f749468 0x77f92344

and hostapd are stuck on syscalls:

root@mat:/proc/1472# pgrep hostapd
1383
3029
root@mat:/proc/1472# cat /proc/1383/syscall
0 0x5 0x7fe0b748 0x0 0x0 0x0 0x0 0x7fe0b678 0x77ed78a4
root@mat:/proc/1472# cat /proc/3029/syscall
0 0x5 0x7fb49610 0x0 0x0 0x0 0x0 0x7fb49540 0x77e678a4

note sure how to look this up in Linux MIPS syscall table (https://syscalls.w3challs.com/?arch=mips_o32) as syscalls there appear to start at 4000. [edit:] Ah, looks like /proc/$pid/syscall shows the relative syscall number from __NR_O32_Linux (https://elixir.bootlin.com/linux/v4.14.171/source/arch/mips/kernel/scall32-o32.S#L128). [edit:] but hmm the first field is "0" in both so that's syscall 0? https://elixir.bootlin.com/linux/v4.14.171/source/fs/proc/base.c#L648

er, is syscall_get_nr broken on mips?

x86 pulls nr from regs (makes sense): https://elixir.bootlin.com/linux/v4.14.171/source/arch/x86/include/asm/syscall.h#L43

mips always get it from current task: https://elixir.bootlin.com/linux/v5.16.10/source/arch/mips/include/asm/syscall.h#L38

Looks like recent builds have started publishing kernel symbols!

https://downloads.openwrt.org/releases/21.02.2/targets/malta/be/kernel-debug.tar.zst

This is great news, it will make it much easier to debug problems like this.

Also asked on #mipslinux about the syscall file, it seems like it's a genuine bug in upstream kernel.