DNS - Stumped Again

I have a problem I have been unable to solve. My current build was at 23 days uptime without any issues until 3 days ago.

My network/firewall/dhcp hadn't been touched from the last 7 builds. Suddenly, I only had local DNS resolution as of 3 days ago - by-by WAN access.

I do get a lease from my ISP on boot:

Tue Nov 16 13:06:08 2021 daemon.notice netifd: Network device 'wan' link is up
Tue Nov 16 13:06:18 2021 daemon.notice netifd: Interface 'wan' has link connectivity
Tue Nov 16 13:06:18 2021 daemon.notice netifd: Interface 'wan' is setting up now
Tue Nov 16 13:06:21 2021 daemon.notice netifd: wan (18807): udhcpc: broadcasting discover
Tue Nov 16 13:06:22 2021 daemon.notice netifd: wan (18807): udhcpc: broadcasting select for 100.114.74.83, server 10.43.50.36
Tue Nov 16 13:06:23 2021 daemon.notice netifd: wan (18807): udhcpc: lease of 100.114.74.83 obtained from 10.43.50.36, lease time 11706
Tue Nov 16 13:06:23 2021 daemon.notice netifd: Interface 'wan' is now up

I can see Link, RX, and TX on the leds. I've rung out the patch cable and changed to a new one, so I don't see any hardware issue. I also went back to a September 9 build on the second partition that ran 15 days without issue and Surprise I have the same issue!!!

Dnsmasq startup looks normal in the log, and my local subnet is running normally including local resolution:

root@RuralRoots:~# ping HP-OfficeJet-3830-AIO
PING HP-OfficeJet-3830-AIO (10.10.1.110): 56 data bytes
64 bytes from 10.10.1.110: seq=0 ttl=255 time=2.556 ms
64 bytes from 10.10.1.110: seq=1 ttl=255 time=49.064 ms
64 bytes from 10.10.1.110: seq=2 ttl=255 time=46.670 ms
64 bytes from 10.10.1.110: seq=3 ttl=255 time=1.190 ms
^C
--- HP-OfficeJet-3830-AIO ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 1.190/24.870/49.064 ms

I get this from the syslog:

Tue Nov 16 13:07:32 2021 daemon.notice dnscrypt-proxy[1626]: System DNS configuration not usable yet, exceptionally resolving [download.dnscrypt.info] using bootstrap resolvers over tcp
Tue Nov 16 13:07:54 2021 daemon.notice dnscrypt-proxy[1626]: Bootstrap resolvers didn't respond - Trying with the system resolver as a last resort

I was starting to figure it was hardware, but the fact I get a lease from my ISP has me not knowing where to look next. I see absolutely nothing out of line?

are you using your ISPs DNS or do you override it?

What device do you have and what build are you running now?
Do you count 7 random snapshots or 7 stable builds?
What OpenWRT version is build 1?

Do you use original configs or have you made your own custom settings?

DHCP original config file was massively changed between 19.07.6 and 19.07.7.

Not to mention the introduction of DSA mostly changed everything in network file from 21.02.

Using it now direct to their modem, but I've used dnscrypt-proxy2 from day one.

That assumes the device

  1. Has a swtich
  2. Is one of the devices that has been converted to DSA

I saved time to multitasking in the post.

1 Like

Your lease time is 3.25 days... this fits quite well with the description of 3 days and 15 days with respect to the problem occurrences. You should check to see if you can ping 8.8.8.8 or any public IP address when you appear to have a DNS issue. If you cannot ping via IP, it means that your DHCP lease has likely expired and not been properly renewed (or there is something happening on the ISP side that is causing issues). If pinging by IP still works, it is plausible that there is an issue with the upstream DNS server definitions you are using (especially if you're using the DHCP advertised ones).

You can check the general DHCP status by issuing this command

logread | grep dhcpc

WRT1900ACSV2
r17781 Kernel 5.10.74
I've never used stable.
Custom
Long passed any DSA issues.

have you tried the troubleshooting at

Are you certain about this? I could be mistaken, but there are very rarely any significant changes in service releases unless they are directly related to a bug or security vulnerability.
The 19.07.7 release notes do not mention any changes to the DHCP config file and should not have affected anything in the file except for the odhcp6c fix that was noted in the network userland and dnsmasq back port fixes.

I have a customized .toml including logging to usb, relays, load balancing, . . . - last change was April when the .toml underwent some changes.

That doesn't mean anything to me, I asked if you tried the troubleshooting steps on that page

Only issue:
==> /tmp/resolv.conf.d <== head: /tmp/resolv.conf.d: I/O error

Edit:
root@RuralRoots:/tmp/resolv.conf.d# ls -l -rw-r--r-- 1 root root 0 Nov 16 15:08 resolv.conf.auto

well, that's a folder ... so, that's not an issue at all

So, I guess you see nothing wrong in the results of that troubleshooting - you've confirmed that the configured resolver is actually operational

I can ping any device on my local network by both IP and hostname. I can ping nothing outside. My normal lease DHCP lease from my ISP modem is 14440 seconds (4 Hrs), and I can track it every half life right done to renew. I thought it notable the latest renewal time differs.

I get no return, and see nothing from find / -iname dhcpd or dhcp*

I was tending to that side of things, but by bypassing the router by moving switch port from router to ISP modem and changing the subnet everybody is working, so it's not from the ISP end.

Doh! I didn't divide by 24...silly mistake. Yes, it was 3.25 hours (not days... lol).

This would be consistent with an issue with the connection to the ISP not working (DHCP or otherwise).

So is the ISP modem a modem+router, or just a modem?

So you run master and complain about it not working?

Have you forgot the dnsmasq crash?
The dhcp config file got a big chunk of config text at the top with the new dnsmasq.

I concur - BUT, my ISP gives me a lease on wan. If i move my uplink from my switch direct to my isp modem (that is the only wire that goes to WRT1900 lan port), we all work.

It's a Viasat modem, so no it's a PTP device.

This started as a long 3+Hr outage due to snow on the dish obstructing signal. When it came back - well here we are. What I cannot concieve is why an older working build all have the same issue - bizarre,

Under what circumstances? I just compared the dhcp files from 19.07.6 and 19.07.7 and they are identical.