Dnsmasq opens strict order mode and changes timeout issues

Hi

I have two problem with using dnsmasq on OpenWrt

In resolv.conf.auto, i add two dns server like a picture, 2.2.2.2 is a fake DNS server.

image

In strict-order mode, dnsmasq will switch to the secondary DNS server when the primary DNS server does not send a response. I can see this behavior on the host side (192.168.1.xxx).
But on the OpenWrt console (127.0.0.1), when I use ping search, dnsmasq always sends the dns query to the primary DNS server and cannot switch to the secondary DNS server.

Can someone tell me why on localhost, dnsmasq only sends dns queries to the primary DNS server?

When the host side (192.168.1.xxx) uses ping search, I can see that dnsmasq will switch to the secondary DNS server when no response is received for more than 5 seconds.
I added "options timeout:1" in resolv.conf.auto, but it didn't change the timeout, it was still 5 seconds.

Can I change the timeout of this switch in dnsmasq?

Is your /etc/resolv.conf a symlink to /tmp/resolv.conf.auto? When pinging from OpenWrt localhost, likely the name resolution was done by the c library, dnsmasq is not involved.

grep localuse /etc/init.d/dnsmasq to see if you can enable option localuse 1 in /etc/config/dhcp, see http://git.openwrt.org/c17a68cc61a0f8a28e19c7f60b24beaf1a1a402d for details.

I am answering based on my reading of the dnsmasq code. It could be wrong.

For each query, dnsmasq will retry next upstream should sendto() fail. No retry initiated by dnsmasq for response timeout. My guess is that the 5 secs timeout was probably from your 192.168.1.xxx host.

In etc/resolv.conf , i point nameserver to 127.0.0.1, give it to dnsmasq for name resolution.
search lan
nameserver 127.0.0.1

In dnsmasq log ,i did see that the dns query from localhost for name resolution was forwarded to 2.2.2.2.

I use command grep localuse /etc/init.d/dnsmasq to check, but did not see anything about setting option localuse. default no setting option localuse
so you mean i have to setting option localuse to 0 ?, but i think default option localuse on my OpenWrt is 0.

After reading the dnsmasq code, I have the same opinion about you. but i see 5 seconds time out on wireshark let me think about resolver setting default timeout is 5 seconds .
you mean maybe host tell to dnsmasq to setting 5 secs timeout ?

localuse is only a recent addition to OpenWrt. It intends to make it explicit that /tmp/resolv.conf should point to udp:127.0.0.1:53 of dnsmasq

When host retries 5 secs later, dnsmasq will find out previous unanswered struct forward record and try next in the upstream server list.

dnsmasq has a hardcoded timeout for struct forward record. But when this timeout happens, it just free up memory occupied and won't retry forwarding the query to another upstream server.

My conclusion is that there is no such setting in dnsmasq to "change the timeout of this switch in dnsmasq"

For musl-libc, the resolver there only send 1 query to each entry in /etc/resolv.conf. It does not do timeout and retry (https://wiki.musl-libc.org/functional-differences-from-glibc.html). This is different from your 192.168.1.xxx sending another query when timed out

I finally understand that the timeout you said is from the sake of the host.

For musl-libc, the resolver there only send 1 query to each entry in /etc/resolv.conf . It does not do timeout and retry (https://wiki.musl-libc.org/functional-differences-from-glibc.html ).

In OpenWrt, I set to browse the same website every 3 seconds. But in dnsmasq.log, I see that dnsmasq always forwards dns queries from localhost to 2.2.2.2. Why dnsmasq does not switch to the DNS server when it receives the second dns query from localhost, just as it receives the second dns query frome host?.

From my understanding of the dnsmasq implementation, it will only switch to the next in the upstream server list if the query is from the same source and for the same domain and dns records. My guess is that source ports of queries from musl-libc resolver were different.

Sharing tcpdump output can help clarify things here.

Sorry that I didn't install tcpdump. The picture is catch from dnsmasq.log.


In picture, 192.168.1.135 is host. we can see host used different source port to send dns query to dnsmasq.

In /etc/config/dhcp, i set option domain 'lan' under "config dnsmasq". I think it is caused by a different domain?

From udp address 192.168.1.135:43860, 2 queries for graph.facebook.com were sent, 5 seconds apart. One was forwarded to 2.2.2.2 and the other to 8.8.8.8

As for queries from 127.0.0.1 for acs.ais.co.th, they were in 8 seconds interval and the source ports were all different from each other.

That's for search directive in /etc/resolv.conf as written by dhcp client. I assume it's a separate topic we can pursue later if needed :wink:

In other words, if I use the same source port to send the second DNS query, dnsmasq will switch to the second DNS server?

Yes, I think so. Please do note that these are implementation details that may be helpful for debugging and may change in future versions. There are also many details not discussed here.