The problem is as follow: OpenWRT works fine for linux, android, iOS, and IoT clients connected (using both WiF or ethernet). But when the windows machine joins the network, the name resolution stops working. Internet connectivity is still there (I can ping eg 1.1.1.1) but names can't be resolved. This affects wifi and ethernet connected clients. After some time (usualy 5-15 minutess) everything start working again. This happens for two machines with W10...
I've checked logs - nothing unusual is indicated there. I've restarted services (dnsmasq, firewall, odhcpd, among others), but without effect. OpenWRT reboot helps.
My current OpenWrt is 22.03.0, but this was happening also for two previous versions... Few days back I've freshly installed OpenWRT and apart from changing the network address and setting some static IPs I haven't changed anything.
I'm on Linksys EA8300 (Dallas).
I will appreciate any tips on what can cause the problem or how to diagnose it.
So your network configuration appears to be pretty standard. There is nothing that I would suspect in the router's configuration.
So...
Do you experience the same name resolution issue with all systems and the router at the same time?
Have you tried using a public DNS server (instead of what is assigned by DHCP) in your wan configuration on OpenWrt? This could fix the issue if the actual root cause is, for example, the ISPs DNS servers.
Do you experience the same name resolution issue with all systems and the router at the same time?
Yes, when the Windows machine joins the network, name resolution stops working at all connected computers (also connected by the ethernet cable) and the OpeWRT router itself (I can ssh to the router and ping openwrt.org, but the name can't be resolved).
Have you tried using a public DNS server (instead of what is assigned by DHCP) in your wan configuration on OpenWrt? This could fix the issue if the actual root cause is, for example, the ISPs DNS servers.
Yes. In the previous configs of OpenWRT I used quad9, cloudflare, or google DNS; for some time I also had DoH enabled on my router. But always the effect was the same...
Some other observations:
This doesn't happen when we use another router instead one with OpenWRT (eg android wifi router form mobile phone)
I've check if windows machines start their own DNS or DHCP servers, but no, this is not the case (at least as far as I can tell).
"Resolution problem" usually resolves itself after 5-15 minutes (sometime longer, sometime never) so something must happen in the OpenWRT (a service restart or sth?) or in Windows machines...
I will appreciate any tips how to debug this issue.
# nslookup openwrt.org; nslookup openwrt.org localhost; nslookup openwrt.org 8.8.8.8
Server: ::1
Address: [::1]:53
Non-authoritative answer:
Name: openwrt.org
Address: 139.59.209.225
Non-authoritative answer:
Name: openwrt.org
Address: 2a03:b0c0:3:d0::1af1:1
Server: localhost
Address: [::1]:53
Non-authoritative answer:
Name: openwrt.org
Address: 139.59.209.225
Non-authoritative answer:
Name: openwrt.org
Address: 2a03:b0c0:3:d0::1af1:1
;; connection timed out; no servers could be reached
## testing other NS
# nslookup openwrt.org 1.1.1.1 & nslookup openwrt.org 8.8.8.8 & nslookup openwrt.org 9.9.9.9 &
;; connection timed out; no servers could be reached
;; connection timed out; no servers could be reached
;; connection timed out; no servers could be reached
Linux client:
# nslookup openwrt.org; nslookup openwrt.org 10.0.0.200; nslookup openwrt.org 8.8.8.8
;; connection timed out; no servers could be reached
;; connection timed out; no servers could be reached
;; connection timed out; no servers could be reached
I wasn't able to test it on windows client, I will try to do it during next days.
Interestingly, during such NS "blackouts" I can still successfully ping these NS (eg. ping 1.1.1.1)
I've also set up a "probe" on one linux client on the network. It is testing name resolution using various nameservers every minute. I've noticed that frequently there are problems with name resolution using quad9, google, or cloudflare servers, but not using localhost or openwrt...
If I'm reading your chart properly, your local lookup (10.0.0.200 and localhost) works all the time, but the external ones are problemat sporadically. To me, this looks like a possible ISP issue (despite pings going through, there could be issues with the higher level protocols like TCP/UDP).
During these DNS-blackouts, are you able to browse the internet using a IP addresses (so openwrt.org -> 139.59.209.225)? Obviously anything not hosted at that IP would fail, but still worth a quick test.
nslookup openwrt.org 2620:fe::fe
nslookup openwrt.org 2001:4860:4860::8888
nslookup openwrt.org 2606:4700:4700::1111
# and other mentioned
Nothing is displayed...
Few other observations:
Blackout doesn't happen when the windows machine first connects to another "source of the internet" (eg via mobile router) and then reconnects to OpenWRT network
Blackout doesn't happen when the windows machine immediately connects to VPN after startup.
Blackout also affects the name resolution by the ISP-provided DNS servers
During the blackout if I connect with my laptop to the upstream cable ISP modem, everything works fine. So I don't thing blocking queries or name servers by the ISP is the case here...
Then run /etc/init.d/network reload and try after. No need to specify public IPv6 DNS servers if you don't have public IPv6 - eliminate this as the issue.
ip -4 route and ip -6 route give the same output during the normal conditions and during the "blackout" (run from one of the linux clients):
$ip -4 route
default via 10.0.0.200 dev wlp3s0 proto dhcp metric 600
10.0.0.0/24 dev wlp3s0 proto kernel scope link src 10.0.0.165 metric 600
169.254.0.0/16 dev wlp3s0 scope link metric 1000
$ip -6 route
::1 dev lo proto kernel metric 256 pref medium
fd07:942f:62c5::1c9 dev wlp3s0 proto kernel metric 600 pref medium
fd07:942f:62c5::/64 dev wlp3s0 proto ra metric 600 pref medium
fd07:942f:62c5::/48 via fe80::3223:3ff:fe73:5a76 dev wlp3s0 proto ra metric 600 pref medium
fe80::/64 dev wlp3s0 proto kernel metric 1024 pref medium
I've also started monitoring if the connected computers can access web pages (using the wget probe and regular addresses). In most cases - it can not:
Look at the Windows network configuration -- what is the IP address (if static) or what is the IP that is issued (if DHCP)? You should also be able to see the MAC address in the hardware properties of your network adapter.
Windows can be configured for multiple IP addresses. Make sure you check all the IP addresses configured on your windows machine.
If it is fetching a DHCP address check that you have not given it a reserved IP.
If the DNS service goes down after you run a particular app on Windows such as the VPN client you should again check the IP address of your Windows machine after running the app. I might modify the IP address of your windows machine.
You might have misbehaving app or even malware on your Windows machine that is doing a Denial-of-Service attack.