[workaround] Hostapd terminated by oom killer

safetydan · April 17, 2018, 4:51am

I've got two TP-Link RE350 units serving the same SSID. However one of them keeps dying due to the hostapd process getting killed by the oom-killer. Any clues on how to figure out what's going on? They're identical hardware running the same build from trunk. Both units have close to 20 MB of free RAM typically so I'm a bit puzzled why the OOM killer runs at all and only one on of the instances.

[ 1606.389347] kthreadd invoked oom-killer: gfp_mask=0x15000c0(GFP_KERNEL_ACCOUNT), nodemask=(null),  order=1, oom_score_adj=0
[ 1606.411595] COMPACTION is disabled!!!
[ 1606.418898] CPU: 3 PID: 2 Comm: kthreadd Not tainted 4.14.34 #0
[ 1606.430674] Stack : 00000006 80540000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1606.447330]         00000000 00000000 00000000 00000000 00000000 00000001 83c4db88 2c45ddf5
[ 1606.463989]         83c4dc20 00000000 00000000 00004fb8 00000038 8042d778 00000007 00000000
[ 1606.480647]         00000000 804c0000 00066452 00000000 83c4db68 00000000 804d0000 8046c5ac
[ 1606.497314]         00000001 00200000 ffffffff 00000023 00000002 8026c268 0000000c 8051000c
[ 1606.513962]         ...
[ 1606.518821] Call Trace:
[ 1606.518840] [<8042d778>] 0x8042d778
[ 1606.530623] [<8026c268>] 0x8026c268
[ 1606.537551] [<8000fe28>] 0x8000fe28
[ 1606.544503] [<8000fe30>] 0x8000fe30
[ 1606.551442] [<80416bcc>] 0x80416bcc
[ 1606.558368] [<8006fcf4>] 0x8006fcf4
[ 1606.565338] [<800c0b54>] 0x800c0b54
[ 1606.572280] [<800bffa8>] 0x800bffa8
[ 1606.579207] [<800c09d0>] 0x800c09d0
[ 1606.586138] [<800c507c>] 0x800c507c
[ 1606.593078] [<8004af28>] 0x8004af28
[ 1606.600005] [<8002bdb8>] 0x8002bdb8
[ 1606.606952] [<8004af28>] 0x8004af28
[ 1606.613886] [<8002d310>] 0x8002d310
[ 1606.620819] [<8042f888>] 0x8042f888
[ 1606.627759] [<8004af28>] 0x8004af28
[ 1606.634700] [<8002d55c>] 0x8002d55c
[ 1606.641642] [<8042f9a0>] 0x8042f9a0
[ 1606.648572] [<8004bf78>] 0x8004bf78
[ 1606.655512] [<8004be74>] 0x8004be74
[ 1606.662471] [<8004be74>] 0x8004be74
[ 1606.669414] [<8000af78>] 0x8000af78
[ 1606.676358]
[ 1606.679565] Mem-Info:
[ 1606.684143] active_anon:6 inactive_anon:8 isolated_anon:0
[ 1606.684143]  active_file:101 inactive_file:218 isolated_file:0
[ 1606.684143]  unevictable:0 dirty:0 writeback:0 unstable:0
[ 1606.684143]  slab_reclaimable:330 slab_unreclaimable:1628
[ 1606.684143]  mapped:32 shmem:0 pagetables:42 bounce:0
[ 1606.684143]  free:2712 free_pcp:3 free_cma:0
[ 1606.746540] Node 0 active_anon:44kB inactive_anon:40kB active_file:532kB inactive_file:936kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:204kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1606.790270] Normal free:10568kB min:8192kB low:10240kB high:12288kB active_anon:52kB inactive_anon:52kB active_file:556kB inactive_file:936kB unevictable:0kB writepending:0kB present:65536kB managed:59424kB mlocked:0kB kernel_stack:512kB pagetables:168kB bounce:0kB free_pcp:20kB local_pcp:0kB free_cma:0kB
[ 1606.844144] lowmem_reserve[]: 0 0 0
[ 1606.851135] Normal: 131*4kB (UMEH) 106*8kB (UMEH) 30*16kB (UMEH) 24*32kB (UMEH) 18*64kB (ME) 9*128kB (UMH) 5*256kB (MH) 2*512kB (M) 1*1024kB (U) 1*2048kB (H) 0*4096kB = 10300kB
[ 1606.882559] 466 total pagecache pages
[ 1606.889856] 26 pages in swap cache
[ 1606.896646] Swap cache stats: add 388, delete 362, find 8/121
[ 1606.908110] Free swap  = 27900kB
[ 1606.914556] Total swap = 29692kB
[ 1606.921007] 16384 pages RAM
[ 1606.926571] 0 pages HighMem/MovableOnly
[ 1606.934264] 1528 pages reserved
[ 1606.940544] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[ 1606.957604] [  502]     0   502      298        0       4       0       15             0 ubusd
[ 1606.974814] [  503]     0   503      226        0       3       0        9             0 askfirst
[ 1606.992540] [  843]     0   843      308       73       4       0       17             0 logd
[ 1607.009582] [  844]     0   844      338        1       4       0       20             0 logread
[ 1607.027133] [  861]     0   861      383        0       3       0       24             0 rpcd
[ 1607.044181] [  919]     0   919      415       34       4       0       35             0 netifd
[ 1607.061604] [ 1110]     0  1110      267        0       3       0        9             0 dropbear
[ 1607.079391] [ 1228]     0  1228      384        0       4       0       24             0 uhttpd
[ 1607.096791] [ 1334]     0  1334      302        0       3       0       10             0 ntpd
[ 1607.113823] [ 1564]     0  1564      426        0       3       0       36             0 hostapd
[ 1607.131354] [ 1579]     0  1579      426        0       3       0       33             0 hostapd

safetydan · April 20, 2018, 1:23am

A quick follow up, if I disable the 5 GHz radio and run a single hostapd then I don't get this issue. What's the reason for running multiple hostapd processes anyway?

safetydan · April 21, 2018, 11:33am

Found a workaround. It seems like the vm.min_free_kbytes setting is too high at 8 MB. I've tweaked that down to 4 MB and set vm.swapiness to 1 and that seems to have stopped the OOM killer.

tmomas · April 21, 2018, 1:42pm

If your problem is solved, you can mark it as [Solved] in the topic headline.

safetydan · April 21, 2018, 10:20pm

I don't think it's solved though, just worked around for now. I'm trying to figure out why the memory usage now means the kernel reserve needs to be lower.