Slow dnsmasq replies with nftset

I am seeing odd, but consistent behavior with dnsmasq where replies to DNS queries in which a queried domain is a member of an ipset are very slow.

For example, I have the following ipset defined in /etc/config/dhcp:

config ipset
	list name 'nextdns_hosts_6'
	list domain 'ipv6-vultr-atl-1.edge.nextdns.io'
	option table_family 'inet'

When I query for the AAAA record for ipv6-vultr-atl-1.edge.nextdns.io, the DNS reply arrives >700ms later:

❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 1% cpu 0.778 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 1% cpu 0.808 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 1% cpu 0.835 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 1% cpu 0.763 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 0% cpu 0.801 total

If I remove that host from my ipset, restart dnsmasq, and then perform the same query, I see "normal" response times:

❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 10% cpu 0.080 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.01s system 12% cpu 0.063 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.00s system 6% cpu 0.099 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.00s system 7% cpu 0.070 total
❯ time host ipv6-vultr-atl-1.edge.nextdns.io
ipv6-vultr-atl-1.edge.nextdns.io has IPv6 address 2001:19f0:5401:1d93:5400:2ff:fece:25e9
host ipv6-vultr-atl-1.edge.nextdns.io  0.00s user 0.00s system 7% cpu 0.093 total

With a domain in an ipset, such as the example domain from above, this is what the dnsmasq log indicates:

Sun Jan  7 00:06:04 2024 daemon.info dnsmasq[20859]: 255 192.168.xx.yy/51810 query[AAAA] ipv6-vultr-atl-1.edge.nextdns.io from 192.168.xx.yy
Sun Jan  7 00:06:04 2024 daemon.info dnsmasq[20859]: 255 192.168.xx.yy/51810 forwarded ipv6-vultr-atl-1.edge.nextdns.io to 127.0.0.1#53535
Sun Jan  7 00:06:05 2024 daemon.info dnsmasq[20859]: 255 192.168.xx.yy/51810 nftset add 6 inet fw4 nextdns_hosts_6 2001:19f0:5401:1d93:5400:2ff:fece:25e9 ipv6-vultr-atl-1.edge.nextdns.io
Sun Jan  7 00:06:05 2024 daemon.info dnsmasq[20859]: 255 192.168.xx.yy/51810 reply ipv6-vultr-atl-1.edge.nextdns.io is 2001:19f0:5401:1d93:5400:2ff:fece:25e9

The nftset command (third line in the log ^^^) spikes a CPU core to 100% for the dnsmasq process. That seems to spike for nearly the query response length of time--in the example I provided that is >700ms. If I add additional domains into the ipset, the time grows linearly in relation to the number of domains.

Given the full size of my ipset is 26 domains, this pegs a CPU core on my host for >18 seconds each time ANY of the domains in the ipset are queried. During that CPU spike, dnsmasq itself becomes extremely sluggish and any other DNS queries get bogged down as a result.

Now, to be clear, the resulting IP address(es) from the upstream query response is stored by dnsmasq into the appropriate nft set. My general hope here is to figure out if this is an OpenWrt specific issue or if this is specific to dnsmasq 2.89.

Anyone else witnessing the same behavior I have described?

Additional Details
root@OpenWrt:~# dnsmasq --version
Dnsmasq version 2.89  Copyright (c) 2000-2022 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus UBus no-i18n no-IDN DHCP no-DHCPv6 no-Lua no-TFTP conntrack no-ipset nftset no-auth cryptohash DNSSEC loop-detect inotify dumpfile

root@OpenWrt:~# cat /etc/os-release
NAME="OpenWrt"
VERSION="SNAPSHOT"
ID="openwrt"
ID_LIKE="lede openwrt"
PRETTY_NAME="OpenWrt SNAPSHOT"
VERSION_ID="snapshot"
HOME_URL="https://openwrt.org/"
BUG_URL="https://bugs.openwrt.org/"
SUPPORT_URL="https://forum.openwrt.org/"
BUILD_ID="r24762-c0d7842bf2"
OPENWRT_BOARD="x86/64"
OPENWRT_ARCH="x86_64"
OPENWRT_TAINTS="no-all busybox"
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"
OPENWRT_DEVICE_PRODUCT="Generic"
OPENWRT_DEVICE_REVISION="v0"
OPENWRT_RELEASE="OpenWrt SNAPSHOT r24762-c0d7842bf2"

A couple additional things I've tried:

  • Turned on dnsmasq cache -- Did not help
  • Manually added a few IPs to an existing set -- Takes 0.008 seconds per nft add element command

I'm narrowing in on this being more of a dnsmasq specific issue, rather than fw4, but if anyone has any other thoughts or tests, I'm all ears.


Update 1:
I just rebuilt dnsmasq from the latest commit in the dnsmasq master repo. One particular recent commit caught my eye. It looks like one introduced by @ldir:
https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=b8b5b734b4175311e7d432d86a9ca69401b0717d

Interestingly, the state of this new dnsmasq build has quickened the nftset add. I am now getting a reply time in the 300-400ms range. Still not ideal, but that's 50% of the reply time I have been seeing, so it's a win of sorts.

Update 2:
Not sure what happened, but query times are back into the 600+ms for domains in the nftset config of dnsmasq. Odd.


Update 3:
Until something else comes along to try, I've just commented out the xappend line in /etc/init.d/dnsmasq that writes nftsets to the generated dnsmasq runtime config (/var/etc/dnsmasq.conf.cfg0*411c file). Instead, I'll just keep my nft sets updated with the tried and true method here: