LUCI slow down to death (WR1043N V1)

Snotmann · March 1, 2017, 11:26am

Supply the following if possible:

WR1043N V1
Default Packages with luci-ssl or without

After a while with default config on the webinterface slow down to death. The CPU getting higher and higher until i shut down the browser window.
I tried it with ssl or without both the same. Tried material skin but the same with the bootstrap skin...

Dont know what i should do more ...

CPU: 3% usr 95% sys 0% nic 0% idle 0% io 0% irq 1% sirq
Load average: 1.85 2.20 1.56 4/51 22942

PID PPID USER STAT VSZ %VSZ %CPU COMMAND

22836 21534 root R 3464 12% 30% {luci} /usr/bin/lua /www/cgi-bin/luci

CPU: 16% usr 82% sys 0% nic 0% idle 0% io 0% irq 1% sirq
Load average: 1.68 2.12 1.58 5/48 23210

PID PPID USER STAT VSZ %VSZ %CPU COMMAND

23112 21534 root R 3040 11% 40% {luci} /usr/bin/lua /www/cgi-bin/luci

After reboot it works fine but after 12 hours without reboot its not useable any more

Mem: 28176 20812 7364 1212 568 2500
-/+ buffers/cache: 17744 10432
Swap: 0 0 0

root@gw:~# df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 2.3M 2.3M 0 100% /rom
tmpfs 13.8M 1.2M 12.6M 9% /tmp
/dev/mtdblock3 4.4M 1.4M 3.0M 32% /overlay
overlayfs:/overlay 4.4M 1.4M 3.0M 32% /
tmpfs 512.0K 0 512.0K 0% /dev

Someone here got an idea ?

Snot

hnyman · March 1, 2017, 11:33am

Does that CPU load come from all LuCI pages, or only if the front page (or some other page that refreshed content every 5 seconds) is open?
- Try disabling auto-refresh (button in the right top corner)
When load is high, are there other processes shown in "ps" causing the load
Which exact LEDE version (and LuCI version) you are using?

Snotmann · March 1, 2017, 11:38am

The loading of one sie needs 2 - 3 minutes if im on a site with auto refresh it will not shown.
No on every page in luci i got high load at siteload.

LEDE Reboot 17.01.0 r3205-59508e3 / LuCI lede-17.01 branch (git-17.051.53299-a100738)

But its after a couple of time ... if i reboot the router all is fine ... 12 hours later not.
I tried to kill and start the uhttpd but that dont fix it.

Here the ps oputput on high load:
PID USER VSZ STAT COMMAND
1 root 1532 S /sbin/procd
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
5 root 0 SW< [kworker/0:0H]
26 root 0 SW [kworker/u2:2]
68 root 0 SW< [writeback]
69 root 0 SW< [crypto]
71 root 0 SW< [bioset]
73 root 0 SW< [kblockd]
99 root 0 SW [kworker/0:1]
106 root 0 SW [kswapd0]
158 root 0 SW [fsnotify_mark]
167 root 0 SW [spi0]
201 root 0 SW< [bioset]
206 root 0 SW< [bioset]
211 root 0 SW< [bioset]
216 root 0 SW< [bioset]
221 root 0 SW< [bioset]
226 root 0 SW< [bioset]
297 root 0 SW< [ipv6_addrconf]
304 root 0 SW< [deferwq]
306 root 0 SW< [kworker/0:1H]
379 root 0 SWN [jffs2_gcd_mtd3]
440 root 1184 S /sbin/ubusd
441 root 896 S /sbin/askfirst /usr/libexec/login.sh
568 root 0 SW< [cfg80211]
678 root 1228 S /sbin/logd -S 64
687 root 1444 S /sbin/rpcd
734 root 1636 S /sbin/netifd
760 root 1416 S /usr/sbin/odhcpd
778 root 1188 S /usr/sbin/crond -f -c /etc/crontabs -l 9
818 root 0 SW [kworker/0:2]
879 root 1020 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 eth0.2
951 root 1060 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 192.168.0.2:22 -p fd7d:129a:3131::1:22 -K 300
1908 dnsmasq 1072 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg02411c -k -x /var/run/dnsmasq/dnsmasq.cfg02411c.pid
2092 root 1188 S < /usr/sbin/ntpd -n -N -l -S /usr/sbin/ntpd-hotplug -p 0.lede.pool.ntp.org -p 1.lede.pool.ntp.org -p 2.lede.pool.ntp.org -p 3.lede.pool.ntp.org
2165 root 1688 S /usr/sbin/hostapd -s -P /var/run/wifi-phy0.pid -B /var/run/hostapd-phy0.conf
2228 root 956 S /usr/sbin/vnstatd -d
2474 root 1352 S {dynamic_dns_upd} /bin/sh /usr/lib/ddns/dynamic_dns_updater.sh -v 0 -S snotnet -- start
12252 root 0 SW [kworker/u2:0]
16767 root 1128 S /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 192.168.0.2:22 -p fd7d:129a:3131::1:22 -K 300
16768 root 1192 S -ash
19836 root 0 SW [kworker/u2:1]
20099 root 1540 S /usr/sbin/uhttpd -f -h /www -r schneider.gw -x /cgi-bin -u /ubus -t 60 -T 30 -k 0 -A 0 -n 1 -N 10 -R -p 0.0.0.0:8080 -p [::]:8080
20255 root 1184 S sleep 600
20587 root 2356 R {luci} /usr/bin/lua /www/cgi-bin/luci
20588 root 1184 R ps

Snotmann · March 1, 2017, 3:26pm

Maybe its a memory problem ...

hnyman · March 1, 2017, 3:41pm

I suspect that some setting causes some service to loop trying to do something.

You should test disabling unnecessary services.
In the process list I spot at least ddns.
Also that "sleep 600" looks rather strange.

2474 root 1352 S {dynamic_dns_upd} /bin/sh /usr/lib/ddns/dynamic_dns_updater.sh -v 0 -S snotnet -- start
20255 root 1184 S sleep 600

Is the problem there if you only have the basic firmware and Luci, and no additional services like ddns, adblock, qos, sqm, mwan3 etc. ?

r43k3n · March 1, 2017, 4:17pm

I have the same issue but for me it's happening after like 5 minutes. I have a script that makes QSS LED blinking according to system load and the load becomes so big the the LED is "crashing", after a few minutes later the WiFi stops working, I can see the network but I cannot connect to it. SSH also stops working, it stops taking commands, just hanging so I cannot access log of see how much free RAM is still available. After a reboot everything is OK for a few minutes.

It's been happening for me since RC1 (I didn't really tested previous builds) and it's happening on stable too. For now I just came back to Chaos Calmer. No issues like that.

Snotmann · March 1, 2017, 4:19pm

Yeah it would be better if i got back to CC too ... the issue with too less space and so on are annoying me.
I only got SQM, DDNS and VNSTAT working ... all other is default

r43k3n do you have a 1043n V1 too ?

jow · March 1, 2017, 6:58pm

I suspect SQM, iirc there were some other reports about it "killing" devices.

Snotmann · March 1, 2017, 7:55pm

yeah ok sqm could be the reason but if i stop sqm the issue is the same ... further the cpu load is low if i connect via ssh. Only if iI open luci via http or https the load get trough the roof ...

CPU: 3% usr 94% sys 0% nic 0% idle 0% io 0% irq 1% sirq
Load average: 3.39 2.66 1.76 4/45 27036
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
26924 15188 root R 3416 12% 26% [luci]

jow · March 1, 2017, 7:58pm

Hm, that is quite interesting indeed, but at the moment I have no idea at all why. Could you do me a favor, install strace using opkg and provide me the output of running strace -p $(pidof luci) ? Maybe this gives some clues about what the luci process is actually doing when spending this CPU time...

Snotmann · March 1, 2017, 8:01pm

I will do ... but i got an idea .. i cleaned up /tmp/ via "rm -rf opkg-*" and now it works fine ... it seems to have to do with the cache from uhttpd i think ...

Here is the output ... im too stupid to read it https://www.file-upload.net/download-12346148/out.txt.html

Here the long version:
https://www.file-upload.net/download-12346217/teraterm.log.html

r43k3n · March 2, 2017, 2:33pm

Yes, I do. Version 1.8 to be precise.
I thought it had something to do with only 32MB RAM but to be honest I didn't saw any indications that there was a problem with to low RAM available. I have SWAP enabled too.

The difference for me is, I don't have LuCI installed at all.
I have installed: kmod-usb-core kmod-usb2 kmod-usb-printer p910nd ntpdate dnscrypt-proxy-resolvers dnscrypt-proxy hostip iodine libsodium dnsmasq-full vnstat sqm-scripts ddns-scripts miniupnpd etherwake curl wget

If it is SQM's fault then it's a bummer since the only reason I updated to LEDE is because of cake. Now I'm back to Chaos Calmer and it's working great but unfortunately no cake there.

BTW Is it possible to use cake on Chaos Calmer?

Snotmann · March 2, 2017, 2:35pm

SWAP enabled with usb stick ?

r43k3n · March 2, 2017, 2:39pm

Yes, Not very fast old MP3 player xD
I also tried zRAM but it didn't work well for me.

Snotmann · March 2, 2017, 4:04pm

Hm im pretty sure its a memory problem but dont know whats happen. Has to do with the opkg update issue on 32 MB devices because if i delete the opkg-lists the issue is not happen ... after i update the lists the device became slow and get problems if i open luci

hnyman · March 2, 2017, 4:23pm

"opkg lists" are just files. But they are stored in the ramdisk and thus decrease the memory available for running the system. Lists seem to take some 460 kB at the moment.

Most likely the device has just enough free memory to run, and saving the opkg packages lists to ramdisk takes away the crucial amount, tipping the router toward crash due to memory exhaustion.

You might try backporting the cake package and editing the Makefile a bit. If I remember correctly, a restriction of only kernel 4.0+ support was added to cake some time ago. But originally cake has been developed using 3.18 based firmwares, I think. Note that you need to patch also "tc" in iproute2 package.

In principle it is rather straightforward. I wrote the original two patches that imported cake into LEDE. The support is creatd by just two files:

Yeah, currently cake Makefile restricts that it does not build on 3.18. See discussion at https://github.com/lede-project/source/commit/9c49c937ab74506f70c651d1207948e3943b60fc

Snotmann · March 2, 2017, 4:26pm

Hey hnyman,

thanks for your help. But there was enought memory free but the issue is the same like here:
http://stackoverflow.com/questions/34112053/openwrt-cant-install-packages-memory-issue

Only if i comment out some parts in distfeeds.conf i could update the opkg.

same here:
https://bugs.lede-project.org/index.php?do=details&task_id=120

Snotmann · March 3, 2017, 6:50am

Got the same after 9 hours runtime again:
Filesystem Size Used Available Use% Mounted on
/dev/root 2.3M 2.3M 0 100% /rom
tmpfs 13.8M 220.0K 13.5M 2% /tmp
/dev/mtdblock3 4.4M 1.5M 2.9M 34% /overlay
overlayfs:/overlay 4.4M 1.5M 2.9M 34% /
tmpfs 512.0K 0 512.0K 0% /dev

tp:
Mem: 20664K used, 7512K free, 220K shrd, 836K buff, 1776K cached
CPU: 25% usr 73% sys 0% nic 0% idle 0% io 0% irq 0% sirq
Load average: 2.73 2.07 0.93 4/46 16443
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
1545 1 root S 2420 9% 25% /usr/sbin/uhttpd -f -h /www -r schnei
99 2 root RW 0 0% 11% [kworker/0:1]
16441 16116 root R 1184 4% 9% top
770 1 root S 1416 5% 4% /usr/sbin/odhcpd
1 0 root S 1532 5% 3% /sbin/procd
440 1 root S 1188 4% 3% /sbin/ubusd
106 2 root SW 0 0% 2% [kswapd0]
3 2 root SW 0 0% 1% [ksoftirqd/0]
1960 1 root S 1688 6% 1% /usr/sbin/hostapd -s -P /var/run/wifi
747 1 root S 1704 6% 0% /sbin/netifd
16070 1005 root S 1128 4% 0% /usr/sbin/dropbear -F -P /var/run/dro
26 2 root SW 0 0% 0% [kworker/u2:2]
16442 1545 root R 2420 9% 0% /usr/sbin/uhttpd -f -h /www -r schnei
700 1 root S 1444 5% 0% /sbin/rpcd
3538 1 root S 1356 5% 0% {dynamic_dns_upd} /bin/sh /usr/lib/dd
793 1 root S 1188 4% 0% /usr/sbin/crond -f -c /etc/crontabs -
1711 1 root S < 1188 4% 0% /usr/sbin/ntpd -n -N -l -S /usr/sbin/
16116 16070 root S 1188 4% 0% -ash
15918 3538 root S 1184 4% 0% sleep 600
691 1 root S 1180 4% 0% /sbin/logd -S 16

i dont understand why i got after reboot 3 MB free ram but if luci get down i got 7 Mb ...

Damit !

r43k3n · March 3, 2017, 4:49pm

I don't think it's related to LuCI.
I have the same issue without LuCI installed.

There is something else wrong there.

Snotmann · March 3, 2017, 11:51pm

i make my config complete new and use zram now ... until now it seems to be ok !