Dnsmasq out of memory error with memory available

Hi, lately dnsmasq has started disappearing on my Archer C7 v5. I usually notice it when DNS stops working, because I've got my systems set up to use the router's DNS.

Lately it seems to happen when I connect a particular computer to the LAN (wired connection).

When I try to manually restart dnsmasq from ssh with /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -8 -, this happens:

dnsmasq[16405]: started, version 2.84 cachesize 150
dnsmasq[16405]: DNS service limited to local subnets
dnsmasq[16405]: compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua TFTP no-conntrack no-ipset no-auth no-cryptohash no-DNSSEC no-ID loop-detect inotify dumpfile
dnsmasq[16405]: UBus support enabled: connected to system bus
dnsmasq-dhcp[16386]: read /etc/ethers - 0 addresses
dnsmasq-dhcp[16386]: DHCPREQUEST(br-lan) ***
dnsmasq[16386]: could not get memory
dnsmasq[16386]: FAILED to start up

*** Here I've deleted the IP and MAC address that follows.

However, free -h seems to show plenty of memory left:

              total        used        free      shared  buff/cache   available
Mem:         123564       44276       57656         304       21632       45644
Swap:             0           0           0

Distribution info (/etc/openwrt_release) is

DISTRIB_DESCRIPTION='OpenWrt SFE r15569-2d8422842c'

Rebooting fixes the problem (for a while, at least). It only seems to affect dnsmasq, not any other services that I've noticed. I am not using an ad-blocker.

What's going on, and how can I fix it?

Check if this isn't the cause of it

I don't think it is, because in the failing state, ps shows that no dnsmasq processes are running. When I then try to run it once in the foreground, that single process immediately fails.

Dnsmasq could be trying to spawn a ton of processes at the very moment it starts, but that's rather hard to see because it exits so quickly. I'll try to run strace on dnsmasq next time I get my router into the failing state.

What's SFE and have you tried a recent, official OpenWrt master image?

I haven't seen any OOM issues with dnsmasq recently, I'd assume there would be reports since plenty of people run master builds, and dnsmasq is default.

SFE is a branch that offloads some processing to hardware.

I managed to reproduce the error and did an strace on the dnsmasq version on the router, 2.84-1, after it got stuck in the failing state. It did not fork, but seemed to try to allocate a ton of memory after getting a DHCP request:

recvmsg(5, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=20, type=NLMSG_DONE, flags=NLM_F_MULTI, seq=4, pid=7358}, 0}, iov_len=792}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
getpid()                                = 7358
getpid()                                = 7358
write(41, "dnsmasq-dhcp[7358]: DHCPREQUEST("..., 70) = 70
mmap2(NULL, 2011389952, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Out of memory)

I then used opkg to upgrade dnsmasq to 2.85-1, and the problem hasn't appeared yet. If it does, I'll try to use an official OpenWrt master image; and if that doesn't work, I'll post more about it.

1 Like

After upgrading to 2.85-1, dnsmasq started throwing segmentation faults instead at the same spot. I suspected some kind of in-memory corruption and upgraded from SFE to OpenWrt 19.07.7 r11306-c4a6851c72, and I haven't had any problems since.

Unless it starts misbehaving again, I'm going to just chalk this up to some kind of kernel misbehavior or incompatibility by the SFE branch.

It might be worth it trying the 21.02 RC1 to see if it happens in vanilla OpenWrt as well. RC2 is rumoured to be around the corner as well.

Just a heads-up since 19.07 will be phased out rather quickly after 21.02 goes 'gold'.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.