[Solved] Luci floods dnsmasq log with PTR queries

Barney · February 19, 2021, 11:42am

I'm using OpenWrt SNAPSHOT r14740-0b31713c85 on a FriendlyARM NanoPi R1.

For dnsmasq (version 2.82-6) I've enabled logging with the option to log queries (log-facility=/var/log/dnsmasq.log).

Just by accident I discovered, that luci is flooding about every 5 seconds dnsmasq with PTR queries, when I select the status overview. The flooding continues until I choose another menu.

Here's an excerpt from the dnsmasq log:

Feb 19 12:24:02 dnsmasq[8644]: 216639 127.0.0.1/39204 query[PTR] 73.40.89.10.in-addr.arpa from 127.0.0.1
Feb 19 12:24:02 dnsmasq[8644]: 216639 127.0.0.1/39204 DHCP 10.89.40.73 is TABLET-SMART
Feb 19 12:24:02 dnsmasq[8644]: 216640 127.0.0.1/39204 query[PTR] 21.25.91.10.in-addr.arpa from 127.0.0.1
Feb 19 12:24:02 dnsmasq[8644]: 216640 127.0.0.1/39204 /tmp/hosts/dhcp.cfg01411c 10.91.25.21 is LTC-PC
Feb 19 12:24:07 dnsmasq[8644]: 216641 127.0.0.1/40081 query[PTR] 50.25.91.10.in-addr.arpa from 127.0.0.1
Feb 19 12:24:07 dnsmasq[8644]: 216641 127.0.0.1/40081 /tmp/hosts/dhcp.cfg01411c 10.91.25.50 is DSA-NAS
Feb 19 12:24:07 dnsmasq[8644]: 216642 127.0.0.1/40081 query[PTR] 90.25.91.10.in-addr.arpa from 127.0.0.1

Luci asks for the PTR infos for every device, which has connected to the router since the last boot. That means: there are a lot of devices, which are currently not connected to the router.

The frequence and the amount of queries result in a rapidly growing dnsmasq log file. Because it resides in a tmpfs, this behaviour can make the router inoperable (out of ram memory).

Is there any chance to change this behaviour of luci?

vgaetera · February 19, 2021, 11:48am

It should be fine, since OpenWrt log uses ring buffer and its size is fixed:
https://openwrt.org/docs/guide-user/base-system/log.essentials

frollic · February 19, 2021, 11:48am

Tried disabling the upper right corner auto update ?

Barney · February 19, 2021, 8:21pm

It is not fine, because I've defined a logfile for dnsmasq (see my OP), and this logfile isn't a ring buffer, so it grows and grows ....

frollic · February 19, 2021, 8:24pm

What's what logrotate's for.

Barney · February 19, 2021, 8:30pm

Just tried it and the flooding stopped.

The default is REFRESHING, to avoid the flooding it should be PAUSED.

If I want an update of the page, I can press the reload button of my browser. I think, that's better than the flooding as a result of luci's refreshing.

Barney · February 19, 2021, 8:37pm

I know logrotate and use it since several decades. But I prefer to get rid of the cause, and not mitigate the symptoms.

vgaetera · February 19, 2021, 9:46pm

System log should rotate as specified in the settings:

uci show system; logread | wc

Dnsmasq query logging is disabled by default.
So, your problem is the result of your own customization.

If you believe this is a common issue, file a ticket here:
https://github.com/openwrt/luci/issues

Barney · February 19, 2021, 10:29pm

I think, you are mixing up cause and result. Cause is luci's query flooding, the result is the growing dnsmasq logfile. My definition of a logfile for dnsmasq unveils luci's misbehaviour.

vgaetera · February 20, 2021, 2:49am

This behavior is likely intentional and not a problem for most users.
You can report it to the LuCI bug tracker as mentioned above.

Barney · February 23, 2021, 8:39pm

I don't concur with you.

I can't see any useful need to query PTR infos for devices, which are neither connected nor have a valid lease record.

To be a problem for users they must be aware of the flooding. I'm using openwrt since 2017 and noticed this behaviour just a few weeks ago.

I'll give you some numbers to better describe this behaviour. The numbers are based on my network environment (about 10-15 connected devices).

The flooding adds about 5 MiB/hour to the log, normal is about 30 KiB. During 1 hour the log records 23878 queries, 23760 are PTR queries and 4355 of those relate to devices not connected resp. having no lease record.

As far as I remember on a fresh openwrt install the log buffer is set to 64 KiB. This 64 KiB ring buffer will be cluttered within 46 secs. Even if you adjust it to 1MiB, it will last only 12 minutes to wipe out all usefull information.

In my case I had defined a logfile solely for dnsmasq. This logfile grows by 5 MiB every hour. Depending on the RAM of your device it's just a matter of time, when you run out of memory. Your router reboots and (because all log info resides in volatile memory) you have no clue, why all this happened.

And you really think, that's not a problem?

I've been working for more than 4 decades as a system engineer and reviewed thousands, perhaps even millions lines of codes. The coding, which is responsible for the flooding, has been done by a person, who never really reflected the outcome of his/her coding.

If the developers of openwrt are interested in reliability and stability, they should improve this part of coding, because it eliminates a reason for a mysterious crash/reboot.

vgaetera · February 24, 2021, 12:22am

Not a problem until you properly report it to the bug tracker.
To be clear, this forum is not a bug tracker.

lleachii · February 26, 2021, 8:52pm

Wait...you leave the page open long enough to get 5MiB of data you don't want anyway???

Nonetheless...have you seen this:

screen18

I doubt that you do, since current versions of OpenWrt do not do those PTR lookups automatically.

If you disagree that it's a bug or not, it's very rare that the developors will backport a feature request (i.e. the software already exists in a newer version). My advice to you is to upgrade.

I'm really having a hard time understanding what the OP calls/describes as a bug...

I should be clear - this statement is false regarding "refreshing hostname lookups"; and the default while the page loads IPs is not to lookup hostnames.

Barney · February 28, 2021, 3:29pm

Indeed, the page was long enough open. In my case it was intentionally because of testing. But it can also happen by accident: imagine the postman rings twice and then comes your neighbour's wife asking you to do her a favour. Finally after hours you're returning to your pc and detect, that your router has stalled. You know: shit always happens.

No, I haven't. But I must admit I've only looked at [Network]->[DHCP and DNS].

Does this mean, that only a few versions in the past showed this behaviour? If true, than newer versions of openwrt shouldn't exhibit this behaviour. Right?

lleachii · February 28, 2021, 8:51pm

I assume this is in base LuCI
I'm not sure what version you're on - as you run a Snapshot with no LuCI pre-installed
So I'm merely saying the current versions don't do this...the auto lookups were a slightly separate issue long ago - that was fixed by not doing them automatically, hence e addition of that button...

That button is located at Status > Realtime Graphs > Connections. I thought this is where the issues arises for you. Is that not the case?

Barney · March 1, 2021, 8:25pm

No, it is not. I observed the PTR query flood being on the status overview page. Those queries were done only for devices on the LAN side.

The flood starts, when you enter the page. It stops, when you leave the page or when you press the REFRESHING button.

I realized the Enable/Disable DNS lookups button after reading your post. The PTR query flood starts after pressing Enable DNS lookups. The PTR queries triggered by that button ask for hosts on the LAN and WAN side.

This is true for the Status > Realtime Graphs > Connections page, but not for the status overview page.

So for me it seems, that there are different parts of code: one is responsible for the query flood in the status overview, another one for the query flood under Status > Realtime Graphs > Connections

akardam · August 30, 2021, 11:10pm

I seem to be experiencing a similar (if not the same) issue as the OP.

I have a RPI4 running 21.02.0-rc4, acting as a Dumb AP. To that end, per the instructions on said topic, I've disabled dnsmasq, firewall, and odhcpd, and confirmed they were all inactive (stopped) after a reboot.

On the LAN interface, I've configured a static IP, netmask, and gateway, and I've configured a custom DNS server, pointing at my internal recursive resolver.

My DNS server logs, wireshark (on the switch port to which the PI is connected) and tcpdump (on the PI itself) all show numerous PTR lookups originating from the PI, looking for the PTR for the PI itself, and the default gateway. All of the sources above show responses being returned, and tcpdump running on the PI seems to show the packets actually reaching the PI (so no firewall/blocking issues in the line).

From the data collected, it looks like the PI is receiving the response, but then ignoring it? I can't immediately come up with any explanation as to why it would ask again almost immediate (you figure it would cache the response for some time in accordance with the TTL), and continue to do so repeatedly.

As the OP states, this appears to occur when logged into the web GUI on the status (landing) page. I also find it occurs when on the interfaces page. If I remove the custom DNS server from the LAN interface config, then the queries, obviously, stop.

Further, if I shell into the box and issue a query via nslookup, the request goes out, the response comes back, and nslookup shows the response correctly. For each invocation of nslookup, only one query and one response are seen - they do not repeat, as they do when the web GUI is being accessed..

I'd configured DNS so that I could ping machines from the PI by hostname rather than IP, and a few other reasons (for example I don't know if SSH on OpenWRT makes use of reverse lookups or not). So it's a matter of convenience, I suppose, to have a DNS server for the PI itself configured, as the AP appears to work just fine w/o one. But, it's still a nice-to-have.

Before I consider this a bug, I wanted to see if there was a configuration setting I've overlooked.to reduce or eliminate this behavior.

Thank you for your time.

jow · August 31, 2021, 1:41pm

The LuCI ui has various pages which refresh data every 5s or so. Some of those pages directly or indirectly acquire reverse DNS names to show hostname hints for locally known IP addresses. There is no local caching of the received DNS replies as the assumption is that a local DNS cache (such as dnsmasq) is installed or that a remote DNS is configured which performs caching on behalf of the system.

By disabling dnsmasq locally you essentially removed any local DNS reply caching capability and frequent requests will translate directly to upstream requests.

If you’d run the nslookup command in a 5 second interval loop, you’d also see corresponding upstream requests every 5s.

tmomas · August 31, 2021, 2:21pm

@Barney Does this answer also your question?

Barney · August 31, 2021, 3:51pm

@jow 's posting explains and confirms the behaviour, but it doesn't deliver a solution to prevent the flooding of dnsmasq's log file.