Need New DavidC502 Thread

From watch /proc/meminfo, looks like Slab, SUnreclaim, and seem to go up consistently, and Active, Active(anon), AnonPages, Mapped, Commited_AS went up a bit at first and seemed to stabilize over the hour I was watching for, so...kernel leak?

Can you try this command to see what might be using memory?

ps -e -o pid,vsz,comm= | sort -n -k 2

root@LEDE-3200ACM:~# ps -e -o pid,vsz,comm= | sort -n -k 2
  PID    VSZ
    1   1328 procd
    2      0 kthreadd
    4      0 kworker/0:0H
    6      0 mm_percpu_wq
    7      0 ksoftirqd/0
    8      0 rcu_sched
    9      0 rcu_bh
   10      0 migration/0
   11      0 cpuhp/0
   12      0 cpuhp/1
   13      0 migration/1
   14      0 ksoftirqd/1
   16      0 kworker/1:0H
  209      0 oom_reaper
  210      0 writeback
  211      0 crypto
  213      0 kblockd
  215      0 ata_sff
  247      0 watchdogd
  281      0 kswapd0
  350      0 pencrypt
  352      0 pdecrypt
  450      0 scsi_eh_0
  451      0 scsi_tmf_0
  454      0 scsi_eh_1
  455      0 scsi_tmf_1
  618      0 irq/43-mmc0
  651      0 irq/39-f1090000
  652      0 irq/40-f1090000
  674      0 ipv6_addrconf
  676      0 dsa_ordered
  714      0 ubi_bgt0d
  721      0 kworker/1:1H
  722      0 kworker/0:1H
  726      0 irq/47-gpio-key
  727      0 irq/48-gpio-key
  836      0 ubifs_bgt0_1
  848      0 ubi_bgt1d
  853      0 ubifs_bgt1_0
 1031    988 ubusd
 1032    676 askfirst
 1100      0 bond0
 1489      0 cryptodev_queue
 1495      0 cfg80211
 1965      0 btmrvl_main_ser
 1966      0 kworker/u5:0
 1975      0 kworker/u5:2
 2011      0 krfcommd
 2362   1004 logd
 2379   1332 rpcd
 2389   1520 haveged
 2470   1448 netifd
 2571   1220 odhcpd
 2619    816 dropbear
 4153   1616 hostapd
 4268   1688 dnscrypt-proxy
 4675   1620 hostapd
 5021   3408 uhttpd
 5153   2576 smbd
 5154   2620 nmbd
 5413   3464 collectd
 5848   1064 sh
 5878   1068 ntpd
18225      0 kworker/0:1
19287    880 dropbear
19288   1064 ash
19481      0 kworker/1:2
21336      0 kworker/0:2
21814      0 kworker/1:0
22931      0 kworker/u4:0
23284      0 kworker/u4:1
23574   1064 sleep
23663      0 kworker/u4:2
23842   1460 ps
23843   1064 sort

and /proc/meminfo is at

root@LEDE-3200ACM:~# cat /proc/meminfo
MemTotal:         510920 kB
MemFree:          144496 kB
MemAvailable:     135136 kB
Buffers:            6488 kB
Cached:            19220 kB
SwapCached:            0 kB
Active:            22280 kB
Inactive:           6808 kB
Active(anon):       4320 kB
Inactive(anon):      256 kB
Active(file):      17960 kB
Inactive(file):     6552 kB
Unevictable:           4 kB
Mlocked:               4 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         510920 kB
LowFree:          144496 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          3408 kB
Mapped:             5884 kB
Shmem:              1196 kB
Slab:             169496 kB
SReclaimable:       5916 kB
SUnreclaim:       163580 kB
KernelStack:         664 kB
PageTables:          348 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      255460 kB
Committed_AS:       8980 kB
VmallocTotal:     507904 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB

Notable from meminfo compared to an hour ago is Slab was at 167764, SUnreclaim was at 161860, and MemAvailable was at 155344, but I can't even find 20MB worth of additional memory usage in /proc/meminfo's break down?

Usual suspect on rango is mwlwifi.

lsmod doesn't indicate that any module's memory size has changed in the past hour though? unless lsmod doesn't show the current amount of memory allocated for the module and I misunderstood the size column

I don’t believe you can use a simple offset for zImage.

For reference, another hour later:

MemTotal:         510920 kB
MemFree:          102180 kB
MemAvailable:      92824 kB
Buffers:            6488 kB
Cached:            19220 kB
SwapCached:            0 kB
Active:            23080 kB
Inactive:           6808 kB
Active(anon):       5120 kB
Inactive(anon):      256 kB
Active(file):      17960 kB
Inactive(file):     6552 kB
Unevictable:           4 kB
Mlocked:               4 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         510920 kB
LowFree:          102180 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          4220 kB
Mapped:             6128 kB
Shmem:              1196 kB
Slab:             173324 kB
SReclaimable:       5924 kB
SUnreclaim:       167400 kB
KernelStack:         672 kB
PageTables:          348 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      255460 kB
Committed_AS:      10052 kB
VmallocTotal:     507904 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
root@LEDE-3200ACM:~# ps -e -o pid,vsz,comm= | sort -n -k 2
  PID    VSZ
    1   1328 procd
    2      0 kthreadd
    4      0 kworker/0:0H
    6      0 mm_percpu_wq
    7      0 ksoftirqd/0
    8      0 rcu_sched
    9      0 rcu_bh
   10      0 migration/0
   11      0 cpuhp/0
   12      0 cpuhp/1
   13      0 migration/1
   14      0 ksoftirqd/1
   16      0 kworker/1:0H
  209      0 oom_reaper
  210      0 writeback
  211      0 crypto
  213      0 kblockd
  215      0 ata_sff
  247      0 watchdogd
  281      0 kswapd0
  350      0 pencrypt
  352      0 pdecrypt
  450      0 scsi_eh_0
  451      0 scsi_tmf_0
  454      0 scsi_eh_1
  455      0 scsi_tmf_1
  618      0 irq/43-mmc0
  651      0 irq/39-f1090000
  652      0 irq/40-f1090000
  674      0 ipv6_addrconf
  676      0 dsa_ordered
  714      0 ubi_bgt0d
  721      0 kworker/1:1H
  722      0 kworker/0:1H
  726      0 irq/47-gpio-key
  727      0 irq/48-gpio-key
  836      0 ubifs_bgt0_1
  848      0 ubi_bgt1d
  853      0 ubifs_bgt1_0
 1031    988 ubusd
 1032    676 askfirst
 1100      0 bond0
 1489      0 cryptodev_queue
 1495      0 cfg80211
 1965      0 btmrvl_main_ser
 1966      0 kworker/u5:0
 1975      0 kworker/u5:2
 2011      0 krfcommd
 2362   1004 logd
 2379   1332 rpcd
 2389   1520 haveged
 2470   1448 netifd
 2571   1220 odhcpd
 2619    816 dropbear
 4153   1616 hostapd
 4268   1688 dnscrypt-proxy
 4675   1620 hostapd
 5021   3408 uhttpd
 5153   2576 smbd
 5154   2620 nmbd
 5413   3464 collectd
 5848   1064 sh
 5878   1068 ntpd
18225      0 kworker/0:1
19287    880 dropbear
19288   1068 ash
19481      0 kworker/1:2
21336      0 kworker/0:2
27994      0 kworker/1:1
29583      0 kworker/u4:0
30213   1064 sleep
30246      0 kworker/u4:2
30811      0 kworker/u4:1
30844   1460 ps
30845   1064 sort

Is all the RAM being used mainly cache?

Please run the following and take a look to see if the amount of RAM used has gone down.

sync; echo 1 > /proc/sys/vm/drop_caches
sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 3 > /proc/sys/vm/drop_caches 

Unfortunately this doesn't seem to be the issue. That does increase the MemFree a little bit, but we're still well beyond what should be being used.

Before sync:

MemTotal:         510920 kB
MemFree:           30436 kB
MemAvailable:      21080 kB
Buffers:            6488 kB
Cached:            19220 kB
SwapCached:            0 kB
Active:            22908 kB
Inactive:           6804 kB
Active(anon):       4944 kB
Inactive(anon):      256 kB
Active(file):      17964 kB
Inactive(file):     6548 kB
Unevictable:           4 kB
Mlocked:               4 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         510920 kB
LowFree:           30436 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          4012 kB
Mapped:             6140 kB
Shmem:              1196 kB
Slab:             179416 kB
SReclaimable:       5928 kB
SUnreclaim:       173488 kB
KernelStack:         672 kB
PageTables:          348 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      255460 kB
Committed_AS:       9704 kB
VmallocTotal:     507904 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB

After sync:

MemTotal:         510920 kB
MemFree:           50684 kB
MemAvailable:      30400 kB
Buffers:             240 kB
Cached:             7224 kB
SwapCached:            0 kB
Active:             7796 kB
Inactive:           3688 kB
Active(anon):       4272 kB
Inactive(anon):      944 kB
Active(file):       3524 kB
Inactive(file):     2744 kB
Unevictable:           4 kB
Mlocked:               4 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         510920 kB
LowFree:           50684 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          4012 kB
Mapped:             6128 kB
Shmem:              1196 kB
Slab:             175796 kB
SReclaimable:       2312 kB
SUnreclaim:       173484 kB
KernelStack:         696 kB
PageTables:          344 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      255460 kB
Committed_AS:       9704 kB
VmallocTotal:     507904 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB

Wireless router crashed and the memory at the end looked like this:

MemTotal:         510920 kB
MemFree:           18100 kB
MemAvailable:          0 kB
Buffers:            1580 kB
Cached:             4412 kB
SwapCached:            0 kB
Active:             6748 kB
Inactive:           3300 kB
Active(anon):       4328 kB
Inactive(anon):      924 kB
Active(file):       2420 kB
Inactive(file):     2376 kB
Unevictable:           4 kB
Mlocked:               4 kB
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         510920 kB
LowFree:           18100 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          4064 kB
Mapped:             2416 kB
Shmem:              1196 kB
Slab:             176892 kB
SReclaimable:       2288 kB
SUnreclaim:       174604 kB
KernelStack:         728 kB
PageTables:          344 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      255460 kB
Committed_AS:      10004 kB
VmallocTotal:     507904 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB

It'll be a couple more days until it bugs out again and starts leaking afaict

Did you try enabling kernel threads in htop and then ordering by memory usage? That should give you leaker.

I was still also checking ps, which displays the kernel threads, and no process appeared to be leaking any memory, it looks like most of the leaking (if not all) is to slab allocation. Unfortunately there's no /proc/slabinfo to peer into in this kernel build

What other modules are installed on this build. This would be on top of what the build already comes with.

Nothing at all, normally I actually remove a lot of modules, but for the purpose of testing I did not when I upgraded to the latest build a couple days ago. I run 2 SSIDs on each radio, 1 on vlan 1 and the other on vlan 2, other than that it runs as a dumb AP, no dhcp, no dns, nothing. My server handles all the services for dns, dhcp, routing, pppoe, etc.

(honestly the only reason I even need openwrt/lede is for vlans and multiple ssids, Nest Smoke Detectors send out router advertisements for ipv6 to set up their own network, which were screwing up my network, so I had to isolate them)

/etc/config/network

config interface 'loopback'
        option ifname 'lo'
        option proto 'static'
        option ipaddr '127.0.0.1'
        option netmask '255.0.0.0'

config globals 'globals'
        option ula_prefix 'fddf:d092:9cae::/48'

config interface 'lan'
        option type 'bridge'
        option ifname 'eth0.1'
        option proto 'static'
        option netmask '255.255.255.0'
        option ipaddr '192.168.1.249'
        option gateway '192.168.1.1'
        option dns '192.168.1.1 8.8.8.8'

config switch
        option name 'switch0'
        option reset '1'
        option enable_vlan '1'

config switch_vlan
        option device 'switch0'
        option vlan '1'
        option ports '0 1 2 3 4 5t'
        option vid '1'

config switch_vlan
        option device 'switch0'
        option vlan '2'
        option ports '4t 6t'
        option vid '2'

config interface 'VLAN2'
        option type 'bridge'
        option proto 'static'
        option ifname 'eth1.2'
        option ipaddr '192.168.254.249'
        option netmask '255.255.255.0'
        option gateway '192.168.254.1'

/etc/config/wireless

config wifi-device 'radio0'
        option type 'mac80211'
        option hwmode '11a'
        option htmode 'VHT80'
        option channel '44'
        option country 'CA'
        option path 'soc/soc:pcie/pci0000:00/0000:00:01.0/0000:01:00.0'

config wifi-iface 'default_radio0'
        option device 'radio0'
        option network 'lan'
        option mode 'ap'
        option ssid 'a'
        option encryption 'psk2'
        option key '------'

config wifi-device 'radio1'
        option type 'mac80211'
        option hwmode '11g'
        option htmode 'HT20'
        option channel '6'
        option country 'CA'
        option path 'soc/soc:pcie/pci0000:00/0000:00:02.0/0000:02:00.0'

config wifi-iface 'default_radio1'
        option device 'radio1'
        option network 'lan'
        option mode 'ap'
        option ssid 'a'
        option encryption 'psk2'
        option key '-------'

config wifi-iface
        option device 'radio0'
        option mode 'ap'
        option ssid 'a-nest'
        option network 'VLAN2'
        option encryption 'psk2'
        option key '------'

config wifi-iface
        option device 'radio1'
        option mode 'ap'
        option ssid 'a-nest'
        option network 'VLAN2'
        option encryption 'psk2'
        option key '-----'

From this point you might try removing even more modules.

Have you removed luci-app-statistics?

opkg remove --autoremove luci-app-statistics

Wouldn't that have had some indication in ps from a process as having taken up a lot of memory if it was part of luci, or anything except some of the kmod packages?

@lantis1008 -
thank you. at offset 3018 is lzma image header 5D 00 00

@davidc502 Any chance you can add CONFIG_SLABINFO=y to the kernel whenever you do a new build next so that I can try to get further insight into what's leaking, since it seems to be kernel related?

No problem..

@david -
i know there's a lot going on with openwrt/lede, the forums, new forks, and your builds. i appreciate your efforts tremendously.
your 5/18 build seemed to have more wifi probs on my wrt32x than the previous build had on my wrt3200acm (same silicon).
for what it's worth, i have loaded 5/23 lede 18.06 snapshot onto my wrt32x and it has been up for 4 hours now. i'll provide further info later. is there anything in particular that i can do to help you?
thanks again

@davidc502
Can you please include this feed in your builds please https://github.com/InkblotAdmirer/custom_feed.
This feed has a dnscrypt-proxy-v2 package.