I'm wrting this to share my experience and solution of a very weird problem in the hope that this information might help others in the future. I searched on the Internet and here in the forum, and I did not find any mention to this problem nor a solution. This way I'm sharing below my findings and solution. If anyone has a better way to solve this issue please feel free to share!
1. The Context
I installed AdGuardHome (on a NanoPi R4S but I believe this might apply to any other device) and everything was working as expected. I rebooted OpenWrt many times and AdGuardHome came up and life was good.
2. The Problem
Howvever, after a power cycle (power disconnect and reconnect) AdGuard stopped working and the only solution was to remove and reinstall it. It worked well until the next power cycle. Any transient power outage could result in losing internet connectivity due to AdGuardHome being offline after a power cycle.
Looking at AdGuardHome logs there was no clear indication of any problem. AdGuardHome just stopped responding to DNS queries in this situation.
It took me many hours and many different trial-and-error experiments to identify the problem. After this the solution was easy, which I will try to summarize below.
3. The Root Cause
I have AdGuardHome configured to use DoH upstream to NextDNS (
https://dns.nextdns.io/xxxxxx). While AdGuardHome has a configuration of bootstrap DNS to resolve the DoH URL to an IP address for the initial connection, the problem was that the HTTPS connection to the DoH server was failing because of wrong system time (I believe it is related to certificate validation). It took me a while to find this out since there was no clear error about this in the log. Also rebooting OpenWrt does preserve the correct date-time, so I needed an "eureka" moment to understand the issue.
After I realized that the wrong system date/time was the root cause, then I understood the problem. OpenWrt is confiured by default to update the system date/time with NTP servers (ex: 0.openwrt.pool.ntp.org). However since AdGuardHome does not come up, NTP client cannot resolve the NTP server name to update the system date/time (I have the router itself also configured to use AdGuardHome as primary DNS). And with the wrong date/time, AdGuardHome cannot connect to the DoH server (I believe due to failing https certificate validation). So the classic chicken or egg problem.
4. The Solution
After many hours going down in this rabbit hole, the solution (actually a workaround) was very simple: I just replace the default NTP servers in OpenWrt by the IP addressess as follows:
- 0.openwrt.pool.ntp.org -> 198.199.14.69
- 1.openwrt.pool.ntp.org -> 54.36.152.158
- 2.openwrt.pool.ntp.org -> 194.0.5.123
- 3.openwrt.pool.ntp.org -> 200.189.40.8
This will break the chicken-or-egg problem (DNS needs correct date time, and NTP needs DNS to update it). With this very simple change I was able to solve the problem. Now even after a power cycle AdGuardHome is starting fine. It can be done either via LuCI or via CLI. To facilitate below are the UCI commands to replace the default NTP servers by their respective IP addresses:
uci -q delete system.ntp.server
uci add_list system.ntp.server="198.199.14.69"
uci add_list system.ntp.server="54.36.152.158"
uci add_list system.ntp.server="194.0.5.123"
uci add_list system.ntp.server="200.189.40.8"
uci commit system
/etc/init.d/sysntpd restart
Another possible solution would be to change the local (OpenWrt) DNS to another DNS server (so OpenWrt could resolve ntp servers hostnames regardless if AdGuard is working). But I want to have OpenWrt using AdGuard to be able to resolve local hostnames in my home network, and also to be protected by te AdGuard filtes.