How do I prevent my router from crashing on low memory?

Router : TP-Link TL-WR741ND v2 https://openwrt.org/toh/tp-link/tl-wr741nd
Openwrt version: 19.07.7, Image Builder firmware, removed LUCI, PPPoE, among other things, added curl (for scripts because of HTTPS) and zram.

Problem: Router occasionally stops responding and reboots itself (about once a day), and this occurs after (re)connecting a device to the network (wired or wireless), like turning on the computer in the morning. This doesn't happen always when I connect a device but when it happens it's after connecting a device.

So what I'm assuming is happening is that it runs out of RAM while doing the whole DHCP thing, I can only assume, because I have no access to logs (it reboots).

output of free during normal operation:

              total        used        free      shared  buff/cache   available
Mem:          28100       18800        2752        1288        6548        5680
Swap:         13308        2816       10492     

With 18.06.9 (also Image Builder firmware), I have no reboot issues.

My question is: Is there anyway I can make the router purge the memory or restart services and prevent rebooting (even if this provokes a temporary slowdown)?

Alternatively: Is there anyway I can see what actually is going on right before rebooting?

Also: Thoughts?

I know this is one of those 4MB Flash / 32MB RAM routers, and that in and of itself might mean there's no solution for my problem, but I hope there is.

post what packages you have installed ...

and a screen shot of the system->startup window.

Thank you for your reply @frollic!

Installed packages:

~# opkg list-installed
base-files - 204.2-r11306-c4a6851c72
busybox - 1.30.1-6
ca-bundle - 20200601-1
curl - 7.66.0-3
dnsmasq - 2.80-16.3
dropbear - 2019.78-2
firewall - 2019-11-22-8174814a-3
fstools - 2020-05-12-84269037-1
fwtool - 2
getdns - 1.6.0-5
getrandom - 2019-06-16-4df34a4d-3
hostapd-common - 2019-08-08-ca8c2bd2-5
ip6tables - 1.8.3-1
iptables - 1.8.3-1
iw - 5.0.1-1
iwinfo - 2019-10-16-07315b6f-1
jshn - 2020-05-25-66195aee-1
jsonfilter - 2018-02-04-c7e938d6-1
kernel - 4.14.221-1-29c4ad98a33daa6bb57350a8723eac22
kmod-ath - 4.14.221+4.19.161-1-1
kmod-ath9k - 4.14.221+4.19.161-1-1
kmod-ath9k-common - 4.14.221+4.19.161-1-1
kmod-cfg80211 - 4.14.221+4.19.161-1-1
kmod-crypto-acompress - 4.14.221-1
kmod-gpio-button-hotplug - 4.14.221-3
kmod-ip6tables - 4.14.221-1
kmod-ipt-conntrack - 4.14.221-1
kmod-ipt-core - 4.14.221-1
kmod-ipt-nat - 4.14.221-1
kmod-ipt-offload - 4.14.221-1
kmod-lib-crc-ccitt - 4.14.221-1
kmod-lib-lz4 - 4.14.221-1
kmod-lib-lzo - 4.14.221-1
kmod-mac80211 - 4.14.221+4.19.161-1-1
kmod-nf-conntrack - 4.14.221-1
kmod-nf-conntrack6 - 4.14.221-1
kmod-nf-flow - 4.14.221-1
kmod-nf-ipt - 4.14.221-1
kmod-nf-ipt6 - 4.14.221-1
kmod-nf-nat - 4.14.221-1
kmod-nf-reject - 4.14.221-1
kmod-nf-reject6 - 4.14.221-1
kmod-zram - 4.14.221-1
libblobmsg-json - 2020-05-25-66195aee-1
libc - 1.1.24-2
libcurl4 - 7.66.0-3
libgcc1 - 7.5.0-2
libip4tc2 - 1.8.3-1
libip6tc2 - 1.8.3-1
libiwinfo20181126 - 2019-10-16-07315b6f-1
libjson-c2 - 0.12.1-3.1
libjson-script - 2020-05-25-66195aee-1
libmbedtls12 - 2.16.9-1
libnl-tiny - 0.1-5
libopenssl1.1 - 1.1.1k-1
libpthread - 1.1.24-2
libubox20191228 - 2020-05-25-66195aee-1
libubus20191227 - 2019-12-27-041c9d1c-1
libuci20130104 - 2019-09-01-415f9e48-4
libuclient20160123 - 2020-06-17-51e16ebf-1
libxtables12 - 1.8.3-1
libyaml - 0.2.2-1
logd - 2019-06-16-4df34a4d-3
mtd - 24
netifd - 2021-01-09-753c351b-1
odhcp6c - 2021-01-09-64e1b4e7-16
odhcpd-ipv6only - 2020-05-03-49e4949c-3
openwrt-keyring - 2019-07-25-8080ef34-1
opkg - 2021-01-31-c5dccea9-1
procd - 2020-03-07-09b9bd82-1
stubby - 0.3.0-1
swconfig - 12
uboot-envtools - 2018.03-3.1
ubox - 2019-06-16-4df34a4d-3
ubus - 2019-12-27-041c9d1c-1
ubusd - 2019-12-27-041c9d1c-1
uci - 2019-09-01-415f9e48-4
uclient-fetch - 2020-06-17-51e16ebf-1
urandom-seed - 1.0-1
urngd - 2020-01-21-c7f7b6b6-1
usign - 2020-05-23-f1f65026-1
wireless-regdb - 2020.11.20-1
wpad-mini - 2019-08-08-ca8c2bd2-5
zram-swap - 1.1-3

I'm not running Luci, so by system -> startup I assume you mean the contents of /etc/rc.local?

rc.local:

# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.

# Connect to wifix (this is a script to login into a wifi portal)
sleep 5
sh /etc/scripts/wifix.sh test
until wget -qsT 10 -O - "http://www.google.com" &> /dev/null; do sleep 15; done

# Install Stubby startup (I'm installing Stubby to RAM, yes)
opkg update
opkg install stubby -d ram
rm /tmp/opkg-lists/*
sleep 5
export LD_LIBRARY_PATH="/tmp/lib:/tmp/usr/lib" && /tmp/usr/sbin/stubby -C /etc/scripts/stubby.yml -g

# Blocklists for dnsmasq (these are just small adblocking and malware blocking hosts lists, no more than 150 Kb total)
sleep 10
sh /etc/scripts/block.sh
exit 0 

yeah, you did write that, I forgot.

I meant all the running daemons.

stubby and dnsmasq is probably not a good combo on a 32mb device.
there's also a bug in dnsmasq where it eats all available RAM if there are too many requests coming in at the same time, if combined with a large block list.
Read about it here Opening Taxi App - Oom_reaper kills dnsmasq

You will probably get away with stubby, but not with both.

1 Like
  PID USER       VSZ STAT COMMAND
  1 root      1556 S    /sbin/procd
  2 root         0 SW   [kthreadd]
  4 root         0 IW<  [kworker/0:0H]
  6 root         0 IW<  [mm_percpu_wq]
  7 root         0 SW   [ksoftirqd/0]
  8 root         0 IW   [kworker/u2:1]
  80 root         0 SW   [oom_reaper]
  81 root         0 IW<  [writeback]
  83 root         0 SW   [kcompactd0]
  84 root         0 IW<  [crypto]
  86 root         0 IW<  [kblockd]
  112 root         0 IW   [kworker/0:1]
  120 root         0 SW   [kswapd0]
  177 root         0 SW   [spi0]
  319 root         0 IW<  [ipv6_addrconf]
  321 root         0 IW<  [dsa_ordered]
  328 root         0 IW<  [kworker/0:1H]
  378 root         0 IW   [kworker/0:2]
  409 root         0 SWN  [jffs2_gcd_mtd3]
  455 root      1208 S    /sbin/ubusd
  456 root       920 S    /sbin/askfirst /usr/libexec/login.sh
  473 root      1024 S    /sbin/urngd
  540 root         0 IW<  [cfg80211]
  666 root      1244 S    /sbin/logd -S 64
  833 root      1744 S    /sbin/netifd
  866 root      1444 S    /usr/sbin/odhcpd
  896 root      1212 S    /usr/sbin/crond -f -c /etc/crontabs -l 8
  1086 root      1076 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 192.168.2.1:22 -p fa6f:9001:cd7d::1:22 -
  1254 root      1712 S    /usr/sbin/hostapd -s -P /var/run/wifi-phy0.pid -B /var/run/hostapd-phy0.conf
  1268 root      1732 S    /usr/sbin/wpa_supplicant -B -s -P /var/run/wpa_supplicant-wlan0.pid -D nl80211 -i wlan0 -c /
  1291 root      1208 S    udhcpc -p /var/run/udhcpc-wlan0.pid -s /lib/netifd/dhcp.script -f -t 0 -i wlan0 -x hostname:
  1766 root      7100 S    /tmp/usr/sbin/stubby -C /etc/scripts/stubby.yml -g
  1864 dnsmasq   2160 S    /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411
  1917 root      1212 S<   /usr/sbin/ntpd -n -N -S /usr/sbin/ntpd-hotplug -p 0.openwrt.pool.ntp.org -p 1.openwrt.pool.n
  3101 root         0 IW   [kworker/u2:2]
  3189 root      1144 S    /usr/sbin/dropbear -F -P /var/run/dropbear.1.pid -p 192.168.2.1:22 -p fa6f:9001:cd7d::1:22 -
  3203 root      1216 S    -ash
  3214 root      1208 R    ps

Thanks,

Yeah, I'm aware I'm pushing the limits here, this exact configuration does not crash on 18.06.9 (2 months uptime until a reboot because the power went out temporally).

The block list I'm using isn't large, it's <150 Kb, but maybe for the amount of RAM I'm working with is.

Ok, now that is an interesting info about dnsmasq, Hopefully we'll see this patch implemented soon in a next version of dnsmasq (and that version available for 19.07).

It depends on whether the maintainer of dnsmasq agrees with us about it being a problem, or not.
I wouldn't expect it to be fixed in 19.07, 21.02 at best.

You cold try to disable the block list, to see if the router crashes without it enabled.

You don't need curl for HTTPS support. Use uclient-fetch with libustream-openssl (since you're already using libopenssl).

You don't need iwinfo without LuCI.

It's a 4 MiB flash device, you're not using opkg, so you don't need either openwrt-keyring or opkg.

Running stubby on a 32 MiB RAM device is definitely pushing your luck.

Without opkg, you also don't need usign.

Why wpad-mini? Do you need both STA and AP functionality? If not, use hostapd-openssl (since you're already using libopenssl).

Do you need odhcpd? You already have dnsmasq.

4 Likes

Thank you for your careful analysis! I know I'm definitely pushing my luck with this router's limitations, no question about it.

I could say that my problem is RAM and not Flash, I've managed to create an image that fits the 4 MB, one of the ways I achieved this is by installing stubby to RAM, which in turn reduces the amount of free RAM. So, if I can reduce the Flash footprint further, I guess it's possible to get stubby in there.

So, to give a bit of context, one of the things this router does is connecting to a wireless network that has a captive portal and log in into that portal and then act as a repeater. So, for the login I use curl,and I need the AP and STA capabilities for acting as a repeater.

To replace curl with uclient-fetch I would need it to be able to do something similar to these commands I use in some of my scripts::

curl -s -m 20 -X POST -F "username=..." -F "password=..." "https://...."

and

curl -s -m 5 --data-urlencode email=$EMAIL --data-urlencode password=$PASS --data-urlencode token=$TOKEN -H "content-type: application/x-www-form-urlencoded" -H "x-auth-formtoken:$TOKEN" "https://..."

I use iwinfo in one of my scripts to check if it's connected to the AP, but maybe there's another way:

iwinfo wlan0 assoclist

odhcpd is for IPv6 right?

I need opkg to install stubby to RAM, but... if it's already in the firmware image I don't need opkg!

So, it seems like a very good idea to replace opkg (and related packages) with stubby in the image builder, and that way I can achieve a smaller RAM footprint indirectly.

EDIT:
replaced

iwinfo wlan0 assoclist

with

iw wlan0 station dump

In that case, it's acting both as STA and AP, so you need to keep wpad-mini.

This may be too complex for uclient-fetch, but I wonder if it could be done, manually encoding all the POST data as a simple string…?

Yes, but there's also dnsmasq-dhcpv6 and dnsmasq-full. If you don't need IPv6 at all, you can just remove odhcpd. Otherwise, you could try and replace dnsmasq with dnsmasq-dhcpv6 and remove both dnsmasq and odhcpd.

That was going to be my suggestion too. :slight_smile:

1 Like

A bit of an update here. My plan to remove opkg and iwinfo to put stubby in the firmware didn't work simply because getdns requires libopenssl, which is about 440 Kb and libgetdns is almost 290 Kb.

So my next attempt was to reduce the memory footprint by removing IPv6 (Global build settings / "Enable IPv6 support in packages" unchecked) and odhcp, and removing iwinfo, while keeping all the rest like I had before (curl, dnsmasq blocklists, stubby installed to RAM and opkg to install stubby).

The result strangely didn't change much in terms of memory footprint:

              total        used        free      shared  buff/cache   available 
Mem:          28356       19816        2216        1280        6324        4772
Swap:         13308        2892       10416

Except for the total memory which was 28100 and now is 28356, gained 256 Kb there (yay?).

But also as a consequence I haven't had any crashes but I don't believe I've stopped having oom problems with dnsmasq because of just 256 kb, which might mean that the problem/thread @frollic alluded to might not be the issue here or that IPv6 causes dnsmasq to create more child processes?

So, the current status is that, by removing ipv6 on this 4/32 router stubby and dnsmasq are running on Openwrt 19.07.7 fine.

Edit: I CAN crash the router if I open 97 tabs in Chrome at the same time, works fine with 37 tabs, so, somewhere in between is the limit.

Edit 2: Made a new build this time using the new ATH79 instead of ar7xx (with no ipv6 again), All settings are the same, (except updated settings in network due to switch configuration change) and although the total memory is now less (27908) I cannot crash the router anymore, even opening more than 100 tabs.

If your problem is solved, please consider marking this topic as [Solved]. See How to mark a topic as [Solved] for a short how-to.

Thank you, I'm not sure it's solved yet, I'm waiting a week or so to see the router's behaviour under normal usage if that's ok.

1 Like

It's been a few days and the router is behaving well. I cannot crash it.

The difference is only the target when doing the custom build, using (the new) ATH79 instead of ar7xx makes it stable, that's my conclusion.

And it doesn't seem to be a memory issue either, the ATH79 build if anything consumes more flash and more RAM, not less. It just seems to handle connections better?

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.