Router : TP-Link TL-WR741ND v2 https://openwrt.org/toh/tp-link/tl-wr741nd
Openwrt version: 19.07.7, Image Builder firmware, removed LUCI, PPPoE, among other things, added curl (for scripts because of HTTPS) and zram.
Problem: Router occasionally stops responding and reboots itself (about once a day), and this occurs after (re)connecting a device to the network (wired or wireless), like turning on the computer in the morning. This doesn't happen always when I connect a device but when it happens it's after connecting a device.
So what I'm assuming is happening is that it runs out of RAM while doing the whole DHCP thing, I can only assume, because I have no access to logs (it reboots).
output of free during normal operation:
total used free shared buff/cache available
Mem: 28100 18800 2752 1288 6548 5680
Swap: 13308 2816 10492
With 18.06.9 (also Image Builder firmware), I have no reboot issues.
My question is: Is there anyway I can make the router purge the memory or restart services and prevent rebooting (even if this provokes a temporary slowdown)?
Alternatively: Is there anyway I can see what actually is going on right before rebooting?
Also: Thoughts?
I know this is one of those 4MB Flash / 32MB RAM routers, and that in and of itself might mean there's no solution for my problem, but I hope there is.
I'm not running Luci, so by system -> startup I assume you mean the contents of /etc/rc.local?
rc.local:
# Put your custom commands here that should be executed once
# the system init finished. By default this file does nothing.
# Connect to wifix (this is a script to login into a wifi portal)
sleep 5
sh /etc/scripts/wifix.sh test
until wget -qsT 10 -O - "http://www.google.com" &> /dev/null; do sleep 15; done
# Install Stubby startup (I'm installing Stubby to RAM, yes)
opkg update
opkg install stubby -d ram
rm /tmp/opkg-lists/*
sleep 5
export LD_LIBRARY_PATH="/tmp/lib:/tmp/usr/lib" && /tmp/usr/sbin/stubby -C /etc/scripts/stubby.yml -g
# Blocklists for dnsmasq (these are just small adblocking and malware blocking hosts lists, no more than 150 Kb total)
sleep 10
sh /etc/scripts/block.sh
exit 0
stubby and dnsmasq is probably not a good combo on a 32mb device.
there's also a bug in dnsmasq where it eats all available RAM if there are too many requests coming in at the same time, if combined with a large block list.
Read about it here Opening Taxi App - Oom_reaper kills dnsmasq
You will probably get away with stubby, but not with both.
Yeah, I'm aware I'm pushing the limits here, this exact configuration does not crash on 18.06.9 (2 months uptime until a reboot because the power went out temporally).
The block list I'm using isn't large, it's <150 Kb, but maybe for the amount of RAM I'm working with is.
Ok, now that is an interesting info about dnsmasq, Hopefully we'll see this patch implemented soon in a next version of dnsmasq (and that version available for 19.07).
It depends on whether the maintainer of dnsmasq agrees with us about it being a problem, or not.
I wouldn't expect it to be fixed in 19.07, 21.02 at best.
You cold try to disable the block list, to see if the router crashes without it enabled.
Thank you for your careful analysis! I know I'm definitely pushing my luck with this router's limitations, no question about it.
I could say that my problem is RAM and not Flash, I've managed to create an image that fits the 4 MB, one of the ways I achieved this is by installing stubby to RAM, which in turn reduces the amount of free RAM. So, if I can reduce the Flash footprint further, I guess it's possible to get stubby in there.
So, to give a bit of context, one of the things this router does is connecting to a wireless network that has a captive portal and log in into that portal and then act as a repeater. So, for the login I use curl,and I need the AP and STA capabilities for acting as a repeater.
To replace curl with uclient-fetch I would need it to be able to do something similar to these commands I use in some of my scripts::
I use iwinfo in one of my scripts to check if it's connected to the AP, but maybe there's another way:
iwinfo wlan0 assoclist
odhcpd is for IPv6 right?
I need opkg to install stubby to RAM, but... if it's already in the firmware image I don't need opkg!
So, it seems like a very good idea to replace opkg (and related packages) with stubby in the image builder, and that way I can achieve a smaller RAM footprint indirectly.
In that case, it's acting both as STA and AP, so you need to keep wpad-mini.
This may be too complex for uclient-fetch, but I wonder if it could be done, manually encoding all the POST data as a simple string…?
Yes, but there's also dnsmasq-dhcpv6 and dnsmasq-full. If you don't need IPv6 at all, you can just remove odhcpd. Otherwise, you could try and replace dnsmasq with dnsmasq-dhcpv6 and remove both dnsmasq and odhcpd.
A bit of an update here. My plan to remove opkg and iwinfo to put stubby in the firmware didn't work simply because getdns requires libopenssl, which is about 440 Kb and libgetdns is almost 290 Kb.
So my next attempt was to reduce the memory footprint by removing IPv6 (Global build settings / "Enable IPv6 support in packages" unchecked) and odhcp, and removing iwinfo, while keeping all the rest like I had before (curl, dnsmasq blocklists, stubby installed to RAM and opkg to install stubby).
The result strangely didn't change much in terms of memory footprint:
total used free shared buff/cache available
Mem: 28356 19816 2216 1280 6324 4772
Swap: 13308 2892 10416
Except for the total memory which was 28100 and now is 28356, gained 256 Kb there (yay?).
But also as a consequence I haven't had any crashes but I don't believe I've stopped having oom problems with dnsmasq because of just 256 kb, which might mean that the problem/thread @frollic alluded to might not be the issue here or that IPv6 causes dnsmasq to create more child processes?
So, the current status is that, by removing ipv6 on this 4/32 router stubby and dnsmasq are running on Openwrt 19.07.7 fine.
Edit: I CAN crash the router if I open 97 tabs in Chrome at the same time, works fine with 37 tabs, so, somewhere in between is the limit.
Edit 2: Made a new build this time using the new ATH79 instead of ar7xx (with no ipv6 again), All settings are the same, (except updated settings in network due to switch configuration change) and although the total memory is now less (27908) I cannot crash the router anymore, even opening more than 100 tabs.
It's been a few days and the router is behaving well. I cannot crash it.
The difference is only the target when doing the custom build, using (the new) ATH79 instead of ar7xx makes it stable, that's my conclusion.
And it doesn't seem to be a memory issue either, the ATH79 build if anything consumes more flash and more RAM, not less. It just seems to handle connections better?