Unbound DNS resolver process gradually consumes RAM until none left, stops resolving

I have an issue with the unbound DNS resolver gradually consuming RAM over the course of up to three weeks, until all 512M are consumed at which point DNS resolution stops working.

Is this a known issue? Is there a way I can get visibility into what unbound is doing with all this memory to help figure out the problem?

root@link:~# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='18.06.0'
DISTRIB_REVISION='r7188-b0b5c64c22'
DISTRIB_TARGET='mvebu/cortexa9'
DISTRIB_ARCH='arm_cortex-a9_vfpv3'
DISTRIB_DESCRIPTION='OpenWrt 18.06.0 r7188-b0b5c64c22'
DISTRIB_TAINTS=''
root@link:~# uptime
 13:49:53 up 14 days, 21:25,  load average: 0.00, 0.00, 0.00
root@link:~# grep ^Vm /proc/$(pidof unbound)/status
VmPeak:   315772 kB
VmSize:   315772 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    268760 kB
VmRSS:    268760 kB
VmData:   312472 kB
VmStk:       132 kB
VmExe:       172 kB
VmLib:      2904 kB
VmPTE:       316 kB
VmPMD:         0 kB
VmSwap:        0 kB
root@link:~# grep -v -e '^#' -e '^ +#' -e '^\t#' -e '^$' /etc/unbound/*.conf
/etc/unbound/unbound.conf:server:
/etc/unbound/unbound.conf:      verbosity: 1
/etc/unbound/unbound.conf:      username: "unbound"
/etc/unbound/unbound.conf:      directory: "/var/lib/unbound"
/etc/unbound/unbound.conf:      chroot: "/var/lib/unbound"
/etc/unbound/unbound.conf:      pidfile: "/var/run/unbound.pid"
/etc/unbound/unbound.conf:      num-threads: 1
/etc/unbound/unbound.conf:      msg-cache-slabs: 1
/etc/unbound/unbound.conf:      rrset-cache-slabs: 1
/etc/unbound/unbound.conf:      infra-cache-slabs: 1
/etc/unbound/unbound.conf:      key-cache-slabs: 1
/etc/unbound/unbound.conf:      interface: 0.0.0.0
/etc/unbound/unbound.conf:      interface: ::0
/etc/unbound/unbound.conf:      access-control: 0.0.0.0/0 allow
/etc/unbound/unbound.conf:      access-control: ::0/0 allow
/etc/unbound/unbound.conf:      outgoing-num-tcp: 1
/etc/unbound/unbound.conf:      incoming-num-tcp: 1
/etc/unbound/unbound.conf:      outgoing-port-permit: "10240-65335"
/etc/unbound/unbound.conf:      outgoing-range: 60
/etc/unbound/unbound.conf:      num-queries-per-thread: 30
/etc/unbound/unbound.conf:      msg-buffer-size: 8192
/etc/unbound/unbound.conf:      infra-cache-numhosts: 200
/etc/unbound/unbound.conf:      msg-cache-size: 100k
/etc/unbound/unbound.conf:      rrset-cache-size: 100k
/etc/unbound/unbound.conf:      key-cache-size: 100k
/etc/unbound/unbound.conf:      neg-cache-size: 10k
/etc/unbound/unbound.conf:      target-fetch-policy: "2 1 0 0 0 0"
/etc/unbound/unbound.conf:      harden-large-queries: yes
/etc/unbound/unbound.conf:      harden-short-bufsize: yes
/etc/unbound/unbound.conf:
/etc/unbound/unbound.conf:python:
/etc/unbound/unbound.conf:remote-control:
/etc/unbound/unbound_ext.conf:forward-zone:
/etc/unbound/unbound_ext.conf:  name: "."
/etc/unbound/unbound_ext.conf:  forward-addr: 1.0.0.1@853
/etc/unbound/unbound_ext.conf:  forward-ssl-upstream: yes
/etc/unbound/unbound_srv.conf:use-syslog: no
/etc/unbound/unbound_srv.conf:logfile: /mnt/sda1/unbound_log
/etc/unbound/unbound_srv.conf:log-queries: yes
/etc/unbound/unbound_srv.conf:log-replies: yes
root@link:~# opkg info unbound
Package: unbound
Version: 1.7.3-2
Depends: libc, libopenssl, libunbound
Status: install ok installed
Architecture: arm_cortex-a9_vfpv3
Conffiles:
 /etc/config/unbound efedbbe21c74a37135e98d9344aba20a23f69a7dc3b581659f1ceabdb744c02a
 /etc/unbound/unbound.conf 4ec559b4a9331fa1af4aea21d4dfffe1111cb107aabc256048c363b29a560da2
 /etc/unbound/unbound_ext.conf e8b6b7e5d1a92385ca39ac47e9c763a44411ca17e294a5caee3c09117c486755
 /etc/unbound/unbound_srv.conf 62b6e7d5d67d146f98c1a4687742a7929e53217f98e9068bb547ca4fbbfd984b
Installed-Time: 1533529446

root@link:~#

If there's a better place to report this and/or find assistance, please let me know.

Unbound issues

1 Like

Packages feed repo, where unbound package is located...
https://github.com/openwrt/packages/issues

and @EricLuehrsen is the unbound maintainer.

But possibly your problem originates from upstream, the actual unbound development somewhere, so the issue might needed to be reported there.

1 Like

Increasing the log level may help. From https://nlnetlabs.nl/documentation/unbound/unbound.conf/

       verbosity: <number>
              The verbosity number, level 0 means no verbosity,  only  errors.
              Level  1  gives  operational information. Level 2 gives detailed
              operational information. Level 3 gives query level  information,
              output  per  query.   Level 4 gives algorithm level information.
              Level 5 logs client identification for cache misses.  Default is
              level  1.  The verbosity can also be increased from the command-
              line, see unbound(8).

I'd also double check that your config isn't one tailored for high-volume use. (Edit: OP has config extract in their post above, no "smoking gun" seen.) A caching resolver is going to consume memory for the cache. Creeping RAM usage might come from a slowly filling cache ("slowly" compared to an enterprise deployment answering thousands of queries per second). Many of the example configs for unbound assume RAM sizes in the gigabytes.

Edit: I took a quick look at unbound-control stats_noreset on one of my DNS servers (v1.7.0) and I don't see anything there immediately useful around cache or memory utilization. status didn't show anything RAM/cache-related either.

1 Like

Could add that I have also have had some funky behaviour as of late with unbound, but I also have openssl PR in play.

The only changes I made outside of LuCI were to add DNS-over-HTTPS encryption via /etc/unbound/unbound_ext.conf (included in the troubleshooting output above).

In LuCI I have "Memory Resource: default" which I assumed would result in a sane value for embedded devices. (This is one of those WRT3200ACMs with half a gig of RAM, so I'd expect the default to be well below what this device can handle.)

Just saw the config extract scrolling down and I didn't immediately see anything inconsistent with what man unbound.conf suggests for "settings that reduce memory usage"

       # example settings that reduce memory usage
       server:
            num-threads: 1
            outgoing-num-tcp: 1 # this limits TCP service, uses less buffers.
            incoming-num-tcp: 1
            outgoing-range: 60  # uses less memory, but less performance.
            msg-buffer-size: 8192   # note this limits service, 'no huge stuff'.
            msg-cache-size: 100k
            msg-cache-slabs: 1
            rrset-cache-size: 100k
            rrset-cache-slabs: 1
            infra-cache-numhosts: 200
            infra-cache-slabs: 1
            key-cache-size: 100k
            key-cache-slabs: 1
            neg-cache-size: 10k
            num-queries-per-thread: 30
            target-fetch-policy: "2 1 0 0 0 0"
            harden-large-queries: "yes"
            harden-short-bufsize: "yes"
1 Like

I created an issue in github, trying to cover all the bases:

1 Like

Do you have SSL upstream enabled? I noticed this RAM / memory leak issue only with SSL when unbound has to use TCP on forward servers. Without TLS, unbound uses regular UDP so memory use remained static.

Edit: I checked your github issue entry that saw that you have SSL upstream enabled. That is the culprit at least for my setup.

2 Likes

Further discussion to be managed in github, but quickly yes, we are seeing something with relationship to OpenSSL own buffer allocations.

3 Likes

Thanks @EricLuehrsen. Would you like it to be kept in this state to gather info later, or are you already able to reproduce it without too much effort?

If it's trivial for you to reproduce, I might throw in a cron job to restart unbound (or unbound-control reload if that frees the memory) as a workaround until you folks can come up with a better workaround or fix.

I don't mind keeping it in this state if it might help, there's probably another week before it becomes a problem again.

cron and possibly configuration tweaks to stretch the duration ... see github

For what it's worth, unbound-control reload don't release those leaked memory. Restarting it with /etc/init.d/unbound restart was more effective.

2 Likes

Resolved by PR #7112

2 Likes

The patch is now in the 18.06 snap show. I'm happy to report that unbound memory utilisation is stable with SSL upstream enabled.

Thanks @EricLuehrsen