I am using dnsmasq-full in order to utilize dnssec. Dnsmasq is pointing to a local stubby instance on port 5453 with the default configuration. Both DNSSEC and DNSSEC check unsigned are enabled in dnsmasq. This setup has been working for many weeks just fine. However, since today dnsmasq keeps on crashing and won't come up. A reboot does not fix the issue:
[ 38.416666] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[ 38.434564] epc = 00411a2b in dnsmasq[400000+31000]
[ 38.444661] ra = 00411a11 in dnsmasq[400000+31000]
I disabled the DNSSEC options, and removed the pointer to Stubby and dnsmasq is again working fine. Does anyone have any idea what is wrong? Device I am using is the Asus AC57U.
I have the same problem, also using DNSSEC, dnsmasq-full, and a Stubby resolver. Exact same memory pointers and also started crashing today. I have virtually no idea what's happening. Dnsmasq stuck in a crash loop.
I'm running OpenWrt 19.07.1 on Mi Wifi Mini.
I guess Cloudflare might crash dnsmasq, as Stubby uses 126.96.36.199 by default and there were some problems with DNSSEC when using 188.8.131.52.
I managed to stop crashes by moving DNSSEC validation from Dnsmasq to Stubby and using Dnsmaq only to proxy DNSSEC responses (I previously had Dnsmasq validating these responses, ignoring Stubby). I obviously broke something as I can't access any website now from my browser, but Dig runs just fine and reports that my DNSSEC validation is working (although randomly, sometimes I get DNSSEC flag, sometimes not).
Update: I switched from Cloudflare to Quad9 and my setup started working again.
Isn't it worrysome that apparently something cloudflare is doing is triggering segmentation faults in dnsmasq? That might hint at a bug in dnsmasq's DNSSEC support which might or might not be exploitable.
It might make sense to capture the traffic between "stubby" and dnsmasq in order to have a reliable reproducer once cloudflare fixed whatever is broken.
A little perspective gentlemen. Actually what is in openwrt is not vanilla dnsmasq v2.80. It has a number of patches backported from upstream. It is very likely that one or more of those backports opened the hole that cloudflare managed to fall over. Indeed there are further patches in dnsmasq master that fix crashes, related to DNSSEC/validation. Those fix patches had not been backported.
Jan 6th 2020 was the last (single) patch backported. Prior to that the backports were updated Jan 18th 2019 (!) 32 patches. I attempted to bring those patches up to date on Mar 9 2019 (now 57 patches) but an issue arose so that got reverted as I didn't have time to investigate.
There is clearly risk associated with backporting patches. There is also risk associated with NOT backporting patches.
Dnsmasq has had a bit of churn in handling TCP, mainly in now caching TCP responses (it used to throw any info away) Because DNSSEC tends to have large packet sizes, exceeding UDP DNS lengths, TCP is likely to be more involved and as DNSSEC becomes more popular/required, it makes sense to cache those responses.
My mistake is to start backporting any of the patches from what is to become 2.81. I am hopeful that 2.81 final comes out soon!
I don't think you are right though. As one of the earlier links pointed out, Pi-holes were also crashing, which are using FTL, a fork of upstream dnsmasq. They are not using OpenWRT's version of dnsmasq, and hence this issue does not seem to be OpenWRT specific.