Hi, lately dnsmasq has started disappearing on my Archer C7 v5. I usually notice it when DNS stops working, because I've got my systems set up to use the router's DNS.
Lately it seems to happen when I connect a particular computer to the LAN (wired connection).
When I try to manually restart dnsmasq from ssh with /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -8 -, this happens:
dnsmasq: started, version 2.84 cachesize 150
dnsmasq: DNS service limited to local subnets
dnsmasq: compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth no-cryptohash no-DNSSEC no-ID loop-detect inotify dumpfile
dnsmasq: UBus support enabled: connected to system bus
dnsmasq-dhcp: read /etc/ethers - 0 addresses
dnsmasq-dhcp: DHCPREQUEST(br-lan) ***
dnsmasq: could not get memory
dnsmasq: FAILED to start up
*** Here I've deleted the IP and MAC address that follows.
However, free -h seems to show plenty of memory left:
total used free shared buff/cache available
Mem: 123564 44276 57656 304 21632 45644
Swap: 0 0 0
I don't think it is, because in the failing state, ps shows that no dnsmasq processes are running. When I then try to run it once in the foreground, that single process immediately fails.
Dnsmasq could be trying to spawn a ton of processes at the very moment it starts, but that's rather hard to see because it exits so quickly. I'll try to run strace on dnsmasq next time I get my router into the failing state.
SFE is a branch that offloads some processing to hardware.
I managed to reproduce the error and did an strace on the dnsmasq version on the router, 2.84-1, after it got stuck in the failing state. It did not fork, but seemed to try to allocate a ton of memory after getting a DHCP request:
After upgrading to 2.85-1, dnsmasq started throwing segmentation faults instead at the same spot. I suspected some kind of in-memory corruption and upgraded from SFE to OpenWrt 19.07.7 r11306-c4a6851c72, and I haven't had any problems since.
Unless it starts misbehaving again, I'm going to just chalk this up to some kind of kernel misbehavior or incompatibility by the SFE branch.