From watch /proc/meminfo
, looks like Slab
, SUnreclaim
, and seem to go up consistently, and Active
, Active(anon)
, AnonPages
, Mapped
, Commited_AS
went up a bit at first and seemed to stabilize over the hour I was watching for, so...kernel leak?
Can you try this command to see what might be using memory?
ps -e -o pid,vsz,comm= | sort -n -k 2
root@LEDE-3200ACM:~# ps -e -o pid,vsz,comm= | sort -n -k 2
PID VSZ
1 1328 procd
2 0 kthreadd
4 0 kworker/0:0H
6 0 mm_percpu_wq
7 0 ksoftirqd/0
8 0 rcu_sched
9 0 rcu_bh
10 0 migration/0
11 0 cpuhp/0
12 0 cpuhp/1
13 0 migration/1
14 0 ksoftirqd/1
16 0 kworker/1:0H
209 0 oom_reaper
210 0 writeback
211 0 crypto
213 0 kblockd
215 0 ata_sff
247 0 watchdogd
281 0 kswapd0
350 0 pencrypt
352 0 pdecrypt
450 0 scsi_eh_0
451 0 scsi_tmf_0
454 0 scsi_eh_1
455 0 scsi_tmf_1
618 0 irq/43-mmc0
651 0 irq/39-f1090000
652 0 irq/40-f1090000
674 0 ipv6_addrconf
676 0 dsa_ordered
714 0 ubi_bgt0d
721 0 kworker/1:1H
722 0 kworker/0:1H
726 0 irq/47-gpio-key
727 0 irq/48-gpio-key
836 0 ubifs_bgt0_1
848 0 ubi_bgt1d
853 0 ubifs_bgt1_0
1031 988 ubusd
1032 676 askfirst
1100 0 bond0
1489 0 cryptodev_queue
1495 0 cfg80211
1965 0 btmrvl_main_ser
1966 0 kworker/u5:0
1975 0 kworker/u5:2
2011 0 krfcommd
2362 1004 logd
2379 1332 rpcd
2389 1520 haveged
2470 1448 netifd
2571 1220 odhcpd
2619 816 dropbear
4153 1616 hostapd
4268 1688 dnscrypt-proxy
4675 1620 hostapd
5021 3408 uhttpd
5153 2576 smbd
5154 2620 nmbd
5413 3464 collectd
5848 1064 sh
5878 1068 ntpd
18225 0 kworker/0:1
19287 880 dropbear
19288 1064 ash
19481 0 kworker/1:2
21336 0 kworker/0:2
21814 0 kworker/1:0
22931 0 kworker/u4:0
23284 0 kworker/u4:1
23574 1064 sleep
23663 0 kworker/u4:2
23842 1460 ps
23843 1064 sort
and /proc/meminfo is at
root@LEDE-3200ACM:~# cat /proc/meminfo
MemTotal: 510920 kB
MemFree: 144496 kB
MemAvailable: 135136 kB
Buffers: 6488 kB
Cached: 19220 kB
SwapCached: 0 kB
Active: 22280 kB
Inactive: 6808 kB
Active(anon): 4320 kB
Inactive(anon): 256 kB
Active(file): 17960 kB
Inactive(file): 6552 kB
Unevictable: 4 kB
Mlocked: 4 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 510920 kB
LowFree: 144496 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 3408 kB
Mapped: 5884 kB
Shmem: 1196 kB
Slab: 169496 kB
SReclaimable: 5916 kB
SUnreclaim: 163580 kB
KernelStack: 664 kB
PageTables: 348 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 255460 kB
Committed_AS: 8980 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Notable from meminfo compared to an hour ago is Slab
was at 167764
, SUnreclaim
was at 161860
, and MemAvailable
was at 155344
, but I can't even find 20MB worth of additional memory usage in /proc/meminfo's break down?
Usual suspect on rango is mwlwifi.
lsmod doesn't indicate that any module's memory size has changed in the past hour though? unless lsmod doesn't show the current amount of memory allocated for the module and I misunderstood the size column
I don’t believe you can use a simple offset for zImage.
For reference, another hour later:
MemTotal: 510920 kB
MemFree: 102180 kB
MemAvailable: 92824 kB
Buffers: 6488 kB
Cached: 19220 kB
SwapCached: 0 kB
Active: 23080 kB
Inactive: 6808 kB
Active(anon): 5120 kB
Inactive(anon): 256 kB
Active(file): 17960 kB
Inactive(file): 6552 kB
Unevictable: 4 kB
Mlocked: 4 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 510920 kB
LowFree: 102180 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 4220 kB
Mapped: 6128 kB
Shmem: 1196 kB
Slab: 173324 kB
SReclaimable: 5924 kB
SUnreclaim: 167400 kB
KernelStack: 672 kB
PageTables: 348 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 255460 kB
Committed_AS: 10052 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
root@LEDE-3200ACM:~# ps -e -o pid,vsz,comm= | sort -n -k 2
PID VSZ
1 1328 procd
2 0 kthreadd
4 0 kworker/0:0H
6 0 mm_percpu_wq
7 0 ksoftirqd/0
8 0 rcu_sched
9 0 rcu_bh
10 0 migration/0
11 0 cpuhp/0
12 0 cpuhp/1
13 0 migration/1
14 0 ksoftirqd/1
16 0 kworker/1:0H
209 0 oom_reaper
210 0 writeback
211 0 crypto
213 0 kblockd
215 0 ata_sff
247 0 watchdogd
281 0 kswapd0
350 0 pencrypt
352 0 pdecrypt
450 0 scsi_eh_0
451 0 scsi_tmf_0
454 0 scsi_eh_1
455 0 scsi_tmf_1
618 0 irq/43-mmc0
651 0 irq/39-f1090000
652 0 irq/40-f1090000
674 0 ipv6_addrconf
676 0 dsa_ordered
714 0 ubi_bgt0d
721 0 kworker/1:1H
722 0 kworker/0:1H
726 0 irq/47-gpio-key
727 0 irq/48-gpio-key
836 0 ubifs_bgt0_1
848 0 ubi_bgt1d
853 0 ubifs_bgt1_0
1031 988 ubusd
1032 676 askfirst
1100 0 bond0
1489 0 cryptodev_queue
1495 0 cfg80211
1965 0 btmrvl_main_ser
1966 0 kworker/u5:0
1975 0 kworker/u5:2
2011 0 krfcommd
2362 1004 logd
2379 1332 rpcd
2389 1520 haveged
2470 1448 netifd
2571 1220 odhcpd
2619 816 dropbear
4153 1616 hostapd
4268 1688 dnscrypt-proxy
4675 1620 hostapd
5021 3408 uhttpd
5153 2576 smbd
5154 2620 nmbd
5413 3464 collectd
5848 1064 sh
5878 1068 ntpd
18225 0 kworker/0:1
19287 880 dropbear
19288 1068 ash
19481 0 kworker/1:2
21336 0 kworker/0:2
27994 0 kworker/1:1
29583 0 kworker/u4:0
30213 1064 sleep
30246 0 kworker/u4:2
30811 0 kworker/u4:1
30844 1460 ps
30845 1064 sort
Is all the RAM being used mainly cache?
Please run the following and take a look to see if the amount of RAM used has gone down.
sync; echo 1 > /proc/sys/vm/drop_caches
sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 3 > /proc/sys/vm/drop_caches
Unfortunately this doesn't seem to be the issue. That does increase the MemFree a little bit, but we're still well beyond what should be being used.
Before sync:
MemTotal: 510920 kB
MemFree: 30436 kB
MemAvailable: 21080 kB
Buffers: 6488 kB
Cached: 19220 kB
SwapCached: 0 kB
Active: 22908 kB
Inactive: 6804 kB
Active(anon): 4944 kB
Inactive(anon): 256 kB
Active(file): 17964 kB
Inactive(file): 6548 kB
Unevictable: 4 kB
Mlocked: 4 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 510920 kB
LowFree: 30436 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 4012 kB
Mapped: 6140 kB
Shmem: 1196 kB
Slab: 179416 kB
SReclaimable: 5928 kB
SUnreclaim: 173488 kB
KernelStack: 672 kB
PageTables: 348 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 255460 kB
Committed_AS: 9704 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
After sync:
MemTotal: 510920 kB
MemFree: 50684 kB
MemAvailable: 30400 kB
Buffers: 240 kB
Cached: 7224 kB
SwapCached: 0 kB
Active: 7796 kB
Inactive: 3688 kB
Active(anon): 4272 kB
Inactive(anon): 944 kB
Active(file): 3524 kB
Inactive(file): 2744 kB
Unevictable: 4 kB
Mlocked: 4 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 510920 kB
LowFree: 50684 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 4012 kB
Mapped: 6128 kB
Shmem: 1196 kB
Slab: 175796 kB
SReclaimable: 2312 kB
SUnreclaim: 173484 kB
KernelStack: 696 kB
PageTables: 344 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 255460 kB
Committed_AS: 9704 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
Wireless router crashed and the memory at the end looked like this:
MemTotal: 510920 kB
MemFree: 18100 kB
MemAvailable: 0 kB
Buffers: 1580 kB
Cached: 4412 kB
SwapCached: 0 kB
Active: 6748 kB
Inactive: 3300 kB
Active(anon): 4328 kB
Inactive(anon): 924 kB
Active(file): 2420 kB
Inactive(file): 2376 kB
Unevictable: 4 kB
Mlocked: 4 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 510920 kB
LowFree: 18100 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 4064 kB
Mapped: 2416 kB
Shmem: 1196 kB
Slab: 176892 kB
SReclaimable: 2288 kB
SUnreclaim: 174604 kB
KernelStack: 728 kB
PageTables: 344 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 255460 kB
Committed_AS: 10004 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
It'll be a couple more days until it bugs out again and starts leaking afaict
Did you try enabling kernel threads in htop and then ordering by memory usage? That should give you leaker.
I was still also checking ps, which displays the kernel threads, and no process appeared to be leaking any memory, it looks like most of the leaking (if not all) is to slab allocation. Unfortunately there's no /proc/slabinfo to peer into in this kernel build
What other modules are installed on this build. This would be on top of what the build already comes with.
Nothing at all, normally I actually remove a lot of modules, but for the purpose of testing I did not when I upgraded to the latest build a couple days ago. I run 2 SSIDs on each radio, 1 on vlan 1 and the other on vlan 2, other than that it runs as a dumb AP, no dhcp, no dns, nothing. My server handles all the services for dns, dhcp, routing, pppoe, etc.
(honestly the only reason I even need openwrt/lede is for vlans and multiple ssids, Nest Smoke Detectors send out router advertisements for ipv6 to set up their own network, which were screwing up my network, so I had to isolate them)
/etc/config/network
config interface 'loopback'
option ifname 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals 'globals'
option ula_prefix 'fddf:d092:9cae::/48'
config interface 'lan'
option type 'bridge'
option ifname 'eth0.1'
option proto 'static'
option netmask '255.255.255.0'
option ipaddr '192.168.1.249'
option gateway '192.168.1.1'
option dns '192.168.1.1 8.8.8.8'
config switch
option name 'switch0'
option reset '1'
option enable_vlan '1'
config switch_vlan
option device 'switch0'
option vlan '1'
option ports '0 1 2 3 4 5t'
option vid '1'
config switch_vlan
option device 'switch0'
option vlan '2'
option ports '4t 6t'
option vid '2'
config interface 'VLAN2'
option type 'bridge'
option proto 'static'
option ifname 'eth1.2'
option ipaddr '192.168.254.249'
option netmask '255.255.255.0'
option gateway '192.168.254.1'
/etc/config/wireless
config wifi-device 'radio0'
option type 'mac80211'
option hwmode '11a'
option htmode 'VHT80'
option channel '44'
option country 'CA'
option path 'soc/soc:pcie/pci0000:00/0000:00:01.0/0000:01:00.0'
config wifi-iface 'default_radio0'
option device 'radio0'
option network 'lan'
option mode 'ap'
option ssid 'a'
option encryption 'psk2'
option key '------'
config wifi-device 'radio1'
option type 'mac80211'
option hwmode '11g'
option htmode 'HT20'
option channel '6'
option country 'CA'
option path 'soc/soc:pcie/pci0000:00/0000:00:02.0/0000:02:00.0'
config wifi-iface 'default_radio1'
option device 'radio1'
option network 'lan'
option mode 'ap'
option ssid 'a'
option encryption 'psk2'
option key '-------'
config wifi-iface
option device 'radio0'
option mode 'ap'
option ssid 'a-nest'
option network 'VLAN2'
option encryption 'psk2'
option key '------'
config wifi-iface
option device 'radio1'
option mode 'ap'
option ssid 'a-nest'
option network 'VLAN2'
option encryption 'psk2'
option key '-----'
From this point you might try removing even more modules.
Have you removed luci-app-statistics?
opkg remove --autoremove luci-app-statistics
Wouldn't that have had some indication in ps from a process as having taken up a lot of memory if it was part of luci, or anything except some of the kmod packages?
@davidc502 Any chance you can add CONFIG_SLABINFO=y to the kernel whenever you do a new build next so that I can try to get further insight into what's leaking, since it seems to be kernel related?
No problem..
@david -
i know there's a lot going on with openwrt/lede, the forums, new forks, and your builds. i appreciate your efforts tremendously.
your 5/18 build seemed to have more wifi probs on my wrt32x than the previous build had on my wrt3200acm (same silicon).
for what it's worth, i have loaded 5/23 lede 18.06 snapshot onto my wrt32x and it has been up for 4 hours now. i'll provide further info later. is there anything in particular that i can do to help you?
thanks again
@davidc502
Can you please include this feed in your builds please https://github.com/InkblotAdmirer/custom_feed.
This feed has a dnscrypt-proxy-v2 package.