Hi, lately dnsmasq has started disappearing on my Archer C7 v5. I usually notice it when DNS stops working, because I've got my systems set up to use the router's DNS.
Lately it seems to happen when I connect a particular computer to the LAN (wired connection).
When I try to manually restart dnsmasq from ssh with /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -8 -, this happens:
dnsmasq[16405]: started, version 2.84 cachesize 150
dnsmasq[16405]: DNS service limited to local subnets
dnsmasq[16405]: compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth no-cryptohash no-DNSSEC no-ID loop-detect inotify dumpfile
dnsmasq[16405]: UBus support enabled: connected to system bus
[snip]
dnsmasq-dhcp[16386]: read /etc/ethers - 0 addresses
dnsmasq-dhcp[16386]: DHCPREQUEST(br-lan) ***
dnsmasq[16386]: could not get memory
dnsmasq[16386]: FAILED to start up
*** Here I've deleted the IP and MAC address that follows.
However, free -h seems to show plenty of memory left:
total used free shared buff/cache available
Mem: 123564 44276 57656 304 21632 45644
Swap: 0 0 0
Rebooting fixes the problem (for a while, at least). It only seems to affect dnsmasq, not any other services that I've noticed. I am not using an ad-blocker.
I don't think it is, because in the failing state, ps shows that no dnsmasq processes are running. When I then try to run it once in the foreground, that single process immediately fails.
Dnsmasq could be trying to spawn a ton of processes at the very moment it starts, but that's rather hard to see because it exits so quickly. I'll try to run strace on dnsmasq next time I get my router into the failing state.
What's SFE and have you tried a recent, official OpenWrt master image?
I haven't seen any OOM issues with dnsmasq recently, I'd assume there would be reports since plenty of people run master builds, and dnsmasq is default.
SFE is a branch that offloads some processing to hardware.
I managed to reproduce the error and did an strace on the dnsmasq version on the router, 2.84-1, after it got stuck in the failing state. It did not fork, but seemed to try to allocate a ton of memory after getting a DHCP request:
I then used opkg to upgrade dnsmasq to 2.85-1, and the problem hasn't appeared yet. If it does, I'll try to use an official OpenWrt master image; and if that doesn't work, I'll post more about it.
After upgrading to 2.85-1, dnsmasq started throwing segmentation faults instead at the same spot. I suspected some kind of in-memory corruption and upgraded from SFE to OpenWrt 19.07.7 r11306-c4a6851c72, and I haven't had any problems since.
Unless it starts misbehaving again, I'm going to just chalk this up to some kind of kernel misbehavior or incompatibility by the SFE branch.