Edit
I first mistakenly posted this to the dd-wrt forum, because honestly until recently I hadn't noticed that they were different projects. That shows how much I know about them. Anyway, I referenced dd-wrt because I didn't clean up the copy/paste; since it's already been noted in a response, I'm going to do strike-through corrections.
So, this is my hail-mary. I understand that this is a difficult configuration to answer questions about; regardless, I'm hoping someone will say: "oh, that problem" and have an answer.
BLUF: 2-3 times a day, LAN host name resolution will stop working. I have to log into the server and run a command, after which everything works for a while. This always happens overnight; it usually happens at some point during the day, as well.
I have a GL-iNet AX1800 running in router mode; it's running whatever customized OpenWRT GL-iNet puts on their routers -- I have not reflashed this. The router claims it's running OpenWrt 21.02-SNAPSHOT r16399+173-c67509efd7. I have it configured to connect to Mullvad via Wireguard; I'm excluding a small group of IPs from the VPN for other work-related VPN connections. Although I know almost nothing about dd-wrt OpenWRT, I've poked around in the shell to add some LAN cnames.
For a long while, I was struggling to consistently getting named LAN hosts to be resolved by the router; it was inconsistent at best: sometimes, dig sting.lan
would work, other times not, and some hosts would just never resolve. During all of this, I think I installed the dnsmasq-full
package -- relacing the default dnsmasq
-- in any case, dnsmasq-full
is what's currently installed. The end result is that I got LAN hosts to reliably resolve. Some time (a few weeks? and maybe a firmware update) passed.
With that explained, to my issue: LAN host resolution now sporadically, but reliably, stops working. It stops working overnight, every night, and then 1-3 times during the day. I've tracked it down enough to know a minimum command to run to fix it, but I don't know why it works. I'm also concerned by the number of dnsmasq instances that are running. WAN resolution never stops working.
When LAN resolution starts failing, my work-around is to ssh into the server and run /etc/init.d/vpnpolicy-apply restart
. This may be a GL-iNet script, but it's only a few lines long. What it does is:
- Sets
$mode
touci -q get vpnpolicy.route_policy.proxy_mode
- Based on the value of
$mode
, runs either/usr/bin/vpn_domain_update.sh
, or/usr/bin/route_policy $mode
, or both. - In my case, I know that
vpnpolicy.route_policy.proxy_mode
is "3", because this is the only value that causes both scripts are run, and I know from tracing both are being executed.
One other thing I've noticed is that I have four (4) dnsmasq
instances running at once, which seems suspicious: two pairs of identical arguments:
5668 root 2704 S /usr/sbin/dnsmasq -C /etc/dnsmasq.conf.vpn -x /var/run/dnsmasq/dnsmasq.vpn.pid --server=193.138.219.228 --no-resolv
5669 root 2676 S /usr/sbin/dnsmasq -C /etc/dnsmasq.conf.vpn -x /var/run/dnsmasq/dnsmasq.vpn.pid --server=193.138.219.228 --no-resolv
6002 dnsmasq 2724 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
6007 root 2692 S /usr/sbin/dnsmasq -C /var/etc/dnsmasq.conf.cfg01411c -k -x /var/run/dnsmasq/dnsmasq.cfg01411c.pid
The configurations for each are different, although both have the same dns-leasefile
value, and does indeed contain the LAN leases and host names. And -- regardless -- even with all 4 processes running, it works... until it doesn't. I know for a fact that the non-vpn config versions are being run by the /usr/bin/route_policy
script; I don't know what's starting the vpn-config versions although I suspect /usr/bin/vpn_domain_update.sh
. I do know that only the vpn process(es) are necessary for all domain resolution to work.
The issue isn't transient. It works until it doesn't, and then continues not working until I run the service vpnpolicy-apply restart
command.
Things I've tried/considered:
- I've tried uninstalling
dnsmasq-full
, but that just breaks all client internal and external DNS resolution. I haven't tried uninstallingdnsmasq-full
and installingdnsmasq
; I don't get the feeling that the problem is in thednsmasq-full
package, and I do have a vague feeling that it was installing-full
that caused LAN host resolution to start working. - I've tried stopping and disabling the dnsmasq service. Indeed, it kills the non-vpn-config pair, and both LAN and WAN DNS resolution continues to work without them. However, it doesn't prevent the issue occurring, and it just gets started back up by
/usr/bin/route_policy
when I runservice vpnpolicy-apply restart
. - I've renamed
/etc/init.d/dnsmasq
. This causes/usr/bin/route_policy
to complain, does prevent the second set of dnsmasq instances from running, and it leaves LAN/WAN DNS resolution in a working state -- but it's obviously not a long-term solution nor does it tell me what I'm doing wrong. - I've considered just running the damned
service vpnpolicy-apply restart
command every hour via a cron job, but that's such a horrible OPS-ey solution, I'd really rather figure out what I've got wrong than do that.
I know that GL-iNet isn't "pure" DD-WRT OpenWRT, and that it's a long shot; does anyone see anything in what I've posted that looks obviously misconfigured, or have any suggestions for what I could try to get DNS LAN resolution consistently working?
Thanks,