Dnsmasq-full crashing

I am using dnsmasq-full in order to utilize dnssec. Dnsmasq is pointing to a local stubby instance on port 5453 with the default configuration. Both DNSSEC and DNSSEC check unsigned are enabled in dnsmasq. This setup has been working for many weeks just fine. However, since today dnsmasq keeps on crashing and won't come up. A reboot does not fix the issue:

[ 38.416666] do_page_fault(): sending SIGSEGV to dnsmasq for invalid read access from 00000000
[ 38.434564] epc = 00411a2b in dnsmasq[400000+31000]
[ 38.444661] ra = 00411a11 in dnsmasq[400000+31000]

I disabled the DNSSEC options, and removed the pointer to Stubby and dnsmasq is again working fine. Does anyone have any idea what is wrong? Device I am using is the Asus AC57U.

1 Like

Hi,

I have the same problem, also using DNSSEC, dnsmasq-full, and a Stubby resolver. Exact same memory pointers and also started crashing today. I have virtually no idea what's happening. Dnsmasq stuck in a crash loop.

I'm running OpenWrt 19.07.1 on Mi Wifi Mini.

I guess Cloudflare might crash dnsmasq, as Stubby uses 1.1.1.1 by default and there were some problems with DNSSEC when using 1.1.1.1.

Using cloudflare as well. But I have the same setup on a different location, and that one is working fine.

Edit: Plus regardless of any issues at Cloudflare, dnsmasq should handle that gracefully instead of hard crashing of course.

I managed to stop crashes by moving DNSSEC validation from Dnsmasq to Stubby and using Dnsmaq only to proxy DNSSEC responses (I previously had Dnsmasq validating these responses, ignoring Stubby). I obviously broke something as I can't access any website now from my browser, but Dig runs just fine and reports that my DNSSEC validation is working (although randomly, sometimes I get DNSSEC flag, sometimes not).

Update: I switched from Cloudflare to Quad9 and my setup started working again.

1 Like

Cloudflare issue

1 Like

Same dnsmasq+stubby setup here, but using Google's DNSs, and validating on dnsmasq. So far, it works.

Also same setup, but using Quad9, no issues.

It seems to be a Cloudflare + DNSSEC + dnsmasq specific issue. I will patiently await a fix from Cloudflare.

Isn't it worrysome that apparently something cloudflare is doing is triggering segmentation faults in dnsmasq? That might hint at a bug in dnsmasq's DNSSEC support which might or might not be exploitable.

It might make sense to capture the traffic between "stubby" and dnsmasq in order to have a reliable reproducer once cloudflare fixed whatever is broken.

2 Likes

Having run into the same problem yesterday, I tried updating dnsmasq to 2.81rc1, which at least fixed the crashes. So seemingly this is already fixed upstream. Maybe this commit?
http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=04db1483d1a86823240d986e10cfdbf75e1b9954

2 Likes

It definitely is. As @Mushoz said, Dnsmasq should handle this gracefully instead of crashing alltogether.

1 Like

A little perspective gentlemen. Actually what is in openwrt is not vanilla dnsmasq v2.80. It has a number of patches backported from upstream. It is very likely that one or more of those backports opened the hole that cloudflare managed to fall over. Indeed there are further patches in dnsmasq master that fix crashes, related to DNSSEC/validation. Those fix patches had not been backported.

Jan 6th 2020 was the last (single) patch backported. Prior to that the backports were updated Jan 18th 2019 (!) 32 patches. I attempted to bring those patches up to date on Mar 9 2019 (now 57 patches) but an issue arose so that got reverted as I didn't have time to investigate.

So if we assume this problem has been around since Jan 18th 2019 it has taken till 3rd March 2020 for cloudflare to misconfigure something that exposed the problem. I suspect the fix was committed upstream Oct 11th 2019 (http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=04db1483d1a86823240d986e10cfdbf75e1b9954)

There is clearly risk associated with backporting patches. There is also risk associated with NOT backporting patches.

Dnsmasq has had a bit of churn in handling TCP, mainly in now caching TCP responses (it used to throw any info away) Because DNSSEC tends to have large packet sizes, exceeding UDP DNS lengths, TCP is likely to be more involved and as DNSSEC becomes more popular/required, it makes sense to cache those responses.

My mistake is to start backporting any of the patches from what is to become 2.81. I am hopeful that 2.81 final comes out soon!

7 Likes

I don't think you are right though. As one of the earlier links pointed out, Pi-holes were also crashing, which are using FTL, a fork of upstream dnsmasq. They are not using OpenWRT's version of dnsmasq, and hence this issue does not seem to be OpenWRT specific.

The issue was probably fixed by Cloudflare, worth to check.

Whatever response Cloudfare was sending, dnsmasq should never crash. Perhaps Cloudfare has stopped doing whatever they where doing that was triggering the bug. But this is a bug at dnsmasq.

1 Like

Agreed. However, I haven't captured the responses from Cloudflare that triggered the crash, so it's going to be very difficult to pinpoint the bug.

1 Like

Have to say, no issues seen....here's mine and I have CF in there:

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.