Randomly losing DNS

For couple of days now I've woken up in the morning and whilst my LAN is fine no device on the network, including the router can resolve anything via DNS. I can ping addresses by IP address from any device on the network, including the router (obviously!) and SSH to my VPS (based on IP address), but no joy resolving URLs. Restarting the WAN interface didn't resolve the problem and only rebooting seems to resolve it.

My network configuration looks as follows....

config interface 'wan'
	option proto 'pppoe'
	option ifname 'eth1'
	option username '##########'
	option password '##########'
	option ipv6 'auto'
	option mtu '1500'
	option keepalive '0'
	option _orig_ifname 'eth1'
	option _orig_bridge 'false'
	option peerdns '0'
	option dns '1.1.1.1 8.8.8.8 9.9.9.10'

config interface 'lan'
	option type 'bridge'
	option ifname 'eth0'
	option proto 'static'
	option ipaddr '192.168.1.1'
	option netmask '255.255.255.0'
	option ip6assign '60'

config interface 'wan6'
	option ifname 'eth1'
	option proto 'dhcpv6'

The relevant section of my /etc/config/dhcp

config dnsmasq
	option domainneeded '1'
	option localise_queries '1'
	option rebind_protection '1'
	option rebind_localhost '1'
	option local '/lan/'
	option domain 'lan'
	option expandhosts '1'
	option authoritative '1'
	option readethers '1'
	option leasefile '/tmp/dhcp.leases'
	option resolvfile '/tmp/resolv.conf.auto'
	option localservice '1'
	option serversfile '/tmp/adb_list.overall'

config dhcp 'lan'
	option interface 'lan'
	option start '2'
	option limit '20'
	option leasetime '12h'
	option dhcpv6 'server'
	option ra 'server'
	option ra_management '1'
	list dhcp_option 'option:dns-server,192.168.1.1'

I have tried installing dnscrypt and have the configuration below, but on re-reading the linked page I do not think I've ever configured it correctly as I did not comment out the line option resolvfile /tmp/resolv.conf.auto and add in the lines to point to dnscrypt so I don't think that can be the issue but am including it for completeness.

config global
	# start dnscrypt-proxy from procd interface trigger rather than immediately in init
	# if needed you can restrict trigger to certain interface(s)
	# list procd_trigger 'wan'
	# list procd_trigger 'wan6'

config dnscrypt-proxy ns1
	option address '127.0.0.1'
	option port '5353'
	option resolver 'fvz-anyone'

Strange thing is its been fine for a long time (like a year) and its just started playing up without any update.

I found in the logs...

Tue Nov 26 06:00:32 2019 user.err adblock-3.6.5[11635]: 'dnsmasq' not running or not executable

...suggesting that the dnsmasq process may have died, but nothing indicating why it would have died.

I've tried using uci but it results in...

~$ uci show network
uci: I/O error

On the back of which I searched and found this thread suggesting the flash memory may be on its way out :-/

Any advice or suggestions on how to investigate this or any other possible causes/solutions to the DNS issue would be appreciated.

Thanks in advance.

On first sight, I would have said that your flash is having an issue, but this is the second time someone post a DNS-related issue that is also related to an I/O error message; this is weird. Have a look to the logs, perhaps there is some message there. I would also try to reflash the device (without saving any configuration) and then reconfigure it again.

I have very, very vague memories of uci show coming up with I/O error being related to malformed config files.

It would have been uci: Parse error.

This would check the config:
for CONF in /etc/config/* ; do uci show "${CONF##*/}" > /dev/null || echo "${CONF}"; done

2 Likes

Try changing that to

list dhcp_option '6,192.168.1.1'

Thanks for taking the time to read my problem and reply.

Tried that, no output at all.

I'll give that a whirl, looked it up and is the 6 based on this table?

Having not done any changes to configuration or the system/ROM on this router its strange that all of a sudden its stopped working.

That is a good thing.

1 Like

Is it possible that your adblock list is too big and crashing dnsmasq? How much RAM does the router have? And run wc -l /tmp/adb_list.overall.

1 Like

Phew!

Plenty of RAM free on this...

#  wc -l /tmp/adb_list.overall
44637 /tmp/adb_list.overall
# free -m
              total        used        free      shared  buff/cache   available
Mem:         510912       79968      378804        5960       52140      387372
Swap:             0           0           0

I've given @Broseidon suggestion a whirl, will wait and see if DNS disappears overnight now. Wife got home this PM and it wasn't working, rebooted the router (easiest as she's not technically minded) and it was lost after an hour (no bad thing daughter couldn't watch Netflix for a while though!).

If it does, it would be a good idea to check if dnsmasq is actually running and get some more data.

ps -w | grep dnsmasq
netstat -nap | grep dnsmasq
grep dnsmasq /var/log/system.log
grep -i 'cut here' /var/log/system.log
1 Like

On a sidenote, you don't need to specify this option. The default behavior is to advertise itself as NS. You would want to use it if you needed to advertise to dhcp clients a different NS.

A malformed file should never produce an "I/O error"; if this is the case, it should be considered a bug in "uci", IMHO.

So far so good, although one phone had trouble obtaining an IP address.

Going to keep an eye on things as it "feels" like the hardware is becoming flaking with strange problems like this cropping up and will consider getting a new router I have two to drop in but no OpenWRT image available for either (Fritz!Box7530 and a Zumax P####) and would replace this for another Linksys.

Thanks all for your time and assistance.