Temporary DNS queries outages

Dear colleagues,
I've been using OpenWrt on Comtrend VR-3026e for last six months or so and in all installed builds I've been running constantly several times a day into this specific problem with DNS queries: imagine browsing website, clicking links and pop, just like that, I get DNS_PROBE_FINISHED_NXDOMAIN error. From this point, internal network is unable to resolve this particular domain name for next 20 minutes. What I've tried so far:

  • flashing different OpenWrt build: no change
  • restarting internal (LAN) router interface : works
  • waiting 20 minutes: works
  • renewing client PC TCPIP: no change
  • querying ISP's DNS server directly using nslookup: works, but querying OpenWRT LAN interface fails again

Syslog:

Sat Feb 29 15:44:28 2020 daemon.info dnsmasq[1391]: 12937 192.168.1.213/60636 query[A] www.youtube.com from 192.168.1.213
Sat Feb 29 15:44:28 2020 daemon.info dnsmasq[1391]: 12937 192.168.1.213/60636 cached www.youtube.com is <CNAME>
Sat Feb 29 15:44:28 2020 daemon.info dnsmasq[1391]: 12937 192.168.1.213/60636 cached www.youtube.com is NODATA-IPv4

nslookup:

www.youtube.com
Server:  OpenWrt.lan
Address:  192.168.1.1
Non-authoritative answer:
Name:    www.youtube.com
(and no address)

network:

config interface 'loopback'
	option ifname 'lo'
	option proto 'static'
	option ipaddr '127.0.0.1'
	option netmask '255.0.0.0'
config globals 'globals'
	option ula_prefix 'fd62:509e:73f8::/48'
config interface 'lan'
	option type 'bridge'
	option ifname 'eth0.1'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'
config switch
	option name 'switch0'
	option reset '1'
	option enable_vlan '1'
config switch_vlan
	option device 'switch0'
	option vlan '1'
	option ports '1 2 3 8t'
config switch_vlan
	option device 'switch0'
	option vlan '2'
	option ports '0 8t'
config interface 'WWAN'
	option ifname 'eth0.2'
	option proto 'static'
	option ipaddr '100.64.65.66'
	option gateway '100.64.65.65'
	option netmask '255.255.255.252'
	list dns '100.64.65.65'
	list dns '10.70.238.2'
	list dns '10.70.237.6'

ipconfig

Ethernet adapter Ethernet 2:
   Connection-specific DNS Suffix  . : lan
   Description . . . . . . . . . . . : Intel(R) Ethernet Connection (2) I219-V
   Physical Address. . . . . . . . . : 30-9C-23-48-24-90
   DHCP Enabled. . . . . . . . . . . : Yes
   Autoconfiguration Enabled . . . . : Yes
   IPv4 Address. . . . . . . . . . . : 192.168.1.213(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Lease Obtained. . . . . . . . . . : čtvrtek 27. února 2020 22:45:35
   Lease Expires . . . . . . . . . . : neděle 1. března 2020 3:20:10
   Default Gateway . . . . . . . . . : 192.168.1.1
   DHCP Server . . . . . . . . . . . : 192.168.1.1
   DNS Servers . . . . . . . . . . . : 192.168.1.1
   NetBIOS over Tcpip. . . . . . . . : Enabled

Would you guys please have any idea what could be causing these temporary DNS query outages? Never happens if I use my ISP's or public DNS server directly on my LAN client devices. Thanks

Openwrt version? Adblock?

Also which extra packages have you installed?
Output of uci export dhcp ?

config dnsmasq
        option localise_queries '1'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option nonwildcard '1'
        option authoritative '1'
        option localservice '0'
        option logqueries '1'
        option rebind_protection '0'
        option serversfile '/tmp/adb_list.overall'
config dhcp 'lan'
        option interface 'lan'
        option start '100'
        option limit '150'
        option leasetime '12h'
        option dhcpv6 'server'
        option ra 'server'
        option ra_management '1'
        option force '1'
config dhcp 'wan'
        option interface 'wan'
        option ignore '1'
config odhcpd 'odhcpd'
        option maindhcp '0'
        option leasefile '/tmp/hosts/odhcpd'
        option leasetrigger '/usr/sbin/odhcpd-update'
        option loglevel '4'

I have only adblock installed but it doesn't affect this particular misbehaviour. It had been happening already prior adblock installation, removing doesn't help. Still, it's a good idea since adblock uses DNS mechanism, I'll try to disable it just to be double sure. Thanks. EDIT: I forgot, Adblock hasn't been installed since flashing to this version:

OpenWrt version: OpenWrt SNAPSHOT r12322-8f93c05a59 / LuCI Master git-20.055.55429-c6ce80e

This phenomenon occured also on 18.06.5 stable which was my first version I tried.

First of all please fix the post above and use preformatted text (the </> button) instead of blockquote that you used. It's hard to read without the proper indentation.

1 Like

Nonwildcard is 1, but there is no interface list. If unsure, change it to 0.
If you are not using adblock, remove the entry option serversfile '/tmp/adb_list.overall'
Just in case, add the option allservers '1'
And while you're at it, post the following as well:
ls -l /etc/resolv.* /tmp/resolv.*; head -n -0 /etc/resolv.* /tmp/resolv.*

sounds maybe bit dumb,but did you tried where is wwan interface put for example cloudflare/google/quad9 dns?

This is not a valid result, it should have given the NXDOMAIN response.

All 3 of these DNS servers live in Private IP space. Have you tried using a Public IP as @Dratas noted:

Please test with known Public DNS servers.

Thank you guys for these hints, I'll try them (e.g. using public DNS services) and come back to you. It can take some time as the outage isn't reproducible on-demand and happens randomly.
Strange thing is that if I query those ISP's non-public DNS servers directly, they answer and translate the currently affected domain name just fine. If I get back to scenario where gateway=DNS then I get nothing again.

Will get back, thanks!

I'm back with some more experiences I've made in this case. First, I've set 8.8.8.8 public DNS as my primary on WAN interface, which resolves all public addresses fine and all described problems went away.

Secondary DNS on my WAN interface is my ISP's internal for internal resolves, mainly SMTP. These resolves fail quite often, so there must be some misunderstanding in how OpenWRT queries this particular DNS server or handes it's cache. Every now and then when I try to send an e-mail I get "SMTP server not found" in my e-mail client, where OpenWRT log say:

Fri Apr 17 09:03:56 2020 daemon.info dnsmasq[1503]: 14031 192.168.1.213/59521 query[A] smtp.svata.net from 192.168.1.213
Fri Apr 17 09:03:56 2020 daemon.info dnsmasq[1503]: 14031 192.168.1.213/59521 cached smtp.svata.net is NXDOMAIN

and that's it. Restarting LAN interface in OpenWRT using LuCI (or waiting 15-20 minutes) clears the problem and logs state this:

Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14139 192.168.1.213/56751 query[A] smtp.svata.net from 192.168.1.213
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14139 192.168.1.213/56751 forwarded smtp.svata.net to 8.8.8.8
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14139 192.168.1.213/56751 forwarded smtp.svata.net to 100.64.65.65
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14139 192.168.1.213/56751 reply smtp.svata.net is 10.70.237.38
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14140 192.168.1.213/54298 query[A] smtp.svata.net from 192.168.1.213
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14140 192.168.1.213/54298 cached smtp.svata.net is 10.70.237.38
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14141 192.168.1.213/60516 query[AAAA] smtp.svata.net from 192.168.1.213
Fri Apr 17 09:15:44 2020 daemon.info dnsmasq[1503]: 14141 192.168.1.213/60516 forwarded smtp.svata.net to 100.64.65.65

Any ideas, please?

sounds like a round robin issue... smtp.svata.net is NOT publicly resolvable...

1 Like

You could configure OpenWrt to ask all servers and not use strict order in the resolvconf file.
But I think it is best if you add a DNS forwarding /svata.net/1.2.3.4 where 1.2.3.4 is the NS of your ISP.

3 Likes

Actually getting back to thank @trendy for the hint about querying all servers, the option Query all available upstream DNS servers in LuCI improved my situation dramatically,I don't have to bite my keyboard 10 times a day any more.

For me, the case is closed, thanks to all.

1 Like

This topic was automatically closed 0 minutes after the last reply. New replies are no longer allowed.