Dnsmasq: operation not permitted

trendy · November 7, 2022, 10:18am

I have noticed recently some issues in resolving. Internet works fine, that means pinging to some IP address works. However I noticed that dnsmasq is not responding to queries, the logs are pilling up with

Mon Nov  7 07:41:29 2022 daemon.err dnsmasq[5359]: failed to send packet: Operation not permitted

and the Recv-Q and Send-Q in netstat are increasing a lot.
Following the discussion from this post by @ldir, I am posting the strace results, in case it can provide any help understanding the problem.

root@barracuda:[~]#strace -Z --trace=sendmsg -p 30892
strace: Process 30892 attached
sendmsg(11, {msg_name={sa_family=AF_INET, sin_port=htons(48417), sin_addr=inet_addr("10.0.2.1")}, msg_namelen=16, msg_iov=[{iov_base="q&\201\203\0\1\0\0\0\1\0\0\1c\1f\19\1a\0014\0013\0015\0013\0010\0015"..., iov_len=172}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = -1 EPERM (Operation not permitted)
sendmsg(11, {msg_name={sa_family=AF_INET, sin_port=htons(48417), sin_addr=inet_addr("10.0.2.1")}, msg_namelen=16, msg_iov=[{iov_base="\302\354\205\200\0\1\0\1\0\0\0\0\00250\0012\0010\00210\7in-addr\4a"..., iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = -1 EPERM (Operation not permitted)
^Cstrace: Process 30892 detached

root@barracuda:[~]#readlink /proc/30892/fd/11
socket:[196094017]
root@barracuda:[~]#grep 196094017 /proc/30892/net/*
/proc/30892/net/udp: 1935: 0102000A:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 196094017 2 0000000080e95bd3 0
root@barracuda:[~]#ubus call system board
{
        "kernel": "5.4.188",
        "hostname": "barracuda",
        "system": "Intel(R) Celeron(R) CPU N3350 @ 1.10GHz",
        "model": "Barracuda Networks, Inc Barracuda NG Firewall F12",
        "board_name": "barracuda-networks-inc-barracuda-ng-firewall-f12",
        "release": {
                "distribution": "OpenWrt",
                "version": "21.02.3",
                "revision": "r16554-1d4dea6d4f",
                "target": "x86/64",
                "description": "OpenWrt 21.02.3 r16554-1d4dea6d4f"
        }
}
dnsmasq-full - 2.85-8

config dnsmasq 'lan_dns'
        option domainneeded '1'
        option localise_queries '1'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases.lan'
        option domain 'mrv'
        option local '/mrv/'
        option tftp_root '/tmp/'
        option dhcp_boot 'vmlinux'
        option enable_tftp '1'
        option queryport '5354'
        option filterwin2k '1'
        option sequential_ip '1'
        option rebind_protection '0'
        option confdir '/tmp/dnsmasq.d'
        option localservice '0'
        option localuse '1'
        list interface 'lan'
        list interface 'elvetias'
        list interface 'roadwarrior'
        list server '127.0.0.1'
        list server '::1'
        list server '78.47.140.199'
        list server '127.0.0.1#5453'
        list server '::1#5453'
        list server '/elv/10.0.1.1'
        option dnsseccheckunsigned '0'
        list notinterface 'loopback'
        option noresolv '1'
        option allservers '0'
        option quietdhcp '1'

After a restart it starts working again.

ulmwind · November 7, 2022, 11:38am

It has been updated recently, check again.

anon54515283 · November 8, 2022, 8:48pm

I had this message on a device a few month ago in a log too. But I cannot remember exactly what I did to get rid of it resepectivley I was tinkering with a fitting setup. For this setup I've removed some options and I'm under the impression that deleting the option filterwin2k explicitly fixed it. Even if default is "0" anyway. It is highly likely that I error about it completely and it is a more complex bug. So don't be angry if I write nonsense here.

In case its another options or this specific setup is not influenced by this bug. For my setup script I'm squashing out:

del dhcp.@dnsmasq[0].nonwildcard
del dhcp.@dnsmasq[0].resolvfile
del dhcp.@dnsmasq[0].boguspriv
del dhcp.@dnsmasq[0].filterwin2k
del dhcp.@dnsmasq[0].nonegcache

End product looks:

config dnsmasq
        option domainneeded '1'
        option localise_queries '1'
        option rebind_protection '1'
        option rebind_localhost '1'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option localservice '1'
        option ednspacket_max '1232'
        list server '127.0.0.1#5453'
        list server '0::1#5453'
        option noresolv '1'
        option localuse '1'

I don't have this message above. Maybe it helps to compare settings.

trendy · November 8, 2022, 11:20pm

Thank you, I removed filterwin2k as this was the only one I had from your list and will monitor

trendy · November 9, 2022, 2:03pm

The message is still there in the logs, but for the time being dnsmasq is not having the problematic behaviour as before to become unresponsive. I'll leave it for a bit more in case there is some other idea, otherwise I'll mark your answer as the solution

anon54515283 · November 9, 2022, 6:02pm

Didn't expect that it has an impact. But I'm glad it was not a complete fail. I was a bit scared to post this because it is so unrelated (imo).

Sadly I don't know what I did in addition to that. I just remembering one point. I had an older setup script which I've used on the 21.-branch and it influenced dnsmasq negativ on the 22.-branch. So I decided to see whats in config/dhcp after a boot with a plain 22.03.2 image.

What I do not see in your configuration is the line: option ednspacket_max '1232'

I would never have added this line by myself. So maybe its activated by default and has to be present due to a bug? So it might be that your upstream resolver is using the larger packages (e. g. BIND is using it by default nowdays even if DNSSEC is not activated). But both sides must support it to work error free. Just another wild guess.

Yes, that is a good idea. I'm interested in what is causing this issue and what the proper solution would be too.

trendy · November 9, 2022, 8:47pm

A bad idea is still better than a no idea

I never messed with it, I left it at the default value of 1280. But I can give it a try to reduce it to 1232.

It is not connected to DNSSEC. The upstream resolver does support it and it is necessary for me to be able to identify the original host making the query.

anon54515283 · November 9, 2022, 10:39pm

Me neither. This value is just there. I never added it and I never adjusted it. Maybe a bug? In your config abstract from above the whole entry is not there also.

trendy · November 21, 2022, 10:27am

The messages have stopped for more than 24 hours already after I disabled the Adblock service. Also there are no interruptions with resolving.
It's weird because there were no errors coming from adblock.

Sun Nov 20 08:01:01 2022 user.info adblock-4.1.3[29151]: adblock instance started ::: action: reload, priority: 10, pid: 29151
Sun Nov 20 08:01:47 2022 user.info adblock-4.1.3[29151]: blocklist with overall 258946 blocked domains loaded successfully (Barracuda Networks, Inc Barracuda NG Firewall F12, OpenWrt 21.02.3 r16554-1d4dea6d4f)
Sun Nov 20 09:01:01 2022 user.info adblock-4.1.3[22583]: adblock instance started ::: action: reload, priority: 10, pid: 22583
Sun Nov 20 09:01:50 2022 user.info adblock-4.1.3[22583]: blocklist with overall 258946 blocked domains loaded successfully (Barracuda Networks, Inc Barracuda NG Firewall F12, OpenWrt 21.02.3 r16554-1d4dea6d4f)
Sun Nov 20 09:05:04 2022 user.info adblock-4.1.3[29576]: adblock is currently disabled, please set the config option 'adb_enabled' to '1' to use this service

Looking at the adblock thread, there didn't seem to be any relevant posts. The other cases where users had this issue were concerning misconfigured IPv6 and had errors in adblock logs. @dibdot for a heads up on that.

Edit: Some mrtg graphs showing less CPU utilization after adblock was disabled.

anon54515283 · November 21, 2022, 3:54pm

Hmm, weird. I don't have installed adblock on my routers. I've outsourced the public name resolution part to my own VPS since ~ 2 years now. So if this workaround works for you the issue lies somewhere else in your case.

But your post brought me to a new impression that it could be a timing issue. Because "failed to send" in connection with "not permitted" could be indicate such an issue (yeah just an idea).
So ... we have both set noresolv. --> No upstream servers available. It could be that this causing issues with several things within OpenWrt itself which could lead to such an error. Esp. on startup where you need timeservers present. I remember now that I've linked a local timeserver (with plain IP) into my configuration as I was searching for a solution for this error in the past.

Could it be that adblock service is looking for an upstream server via an internal OpenWrt routine first and after that failed asking interal DNS? And if so maybe there is a timedrift within the answer and we have issues like above and have to ask again?

trendy · November 21, 2022, 5:04pm

The only way I see it can be connected is the this warning here

Beware of race condition with Adblock service when using DNS encryption.

dibdot · November 21, 2022, 5:19pm

Sorry, but I have neither 21.02 nor dnsmasq in use - maybe an update to 22.03 would be an option for you?

Nope, there is no "adblock service". Adblock only downloads and prepares a blocklist for the selected DNS backend - nothing more. There is no active service running after that ...

anon54515283 · November 21, 2022, 6:44pm

I was thinking about this too:

Resolve the race condition with sysntpd service. When running dnsmasq with noresolv and localuse options and using DNS encryption for local system.

Sry. Didn't find it before. It's simply to long in the past to remember everything I was searching for at this issue.

Ty, for clearifying it. It's just guesswork atm.

trendy · November 22, 2022, 7:44am

Yes, at some point I'll have to upgrade, but I keep postponing it for now. I guess I'll have to live without adblock for a while.

It should not be the case though. It is covering the case that one router boots up, has wrong time, tries to contact some DoH or DoT nameserver, fails due to bad time sync, and cannot resolve anything eventually.
My problems occurred while device was up already for months with proper time and I am also using a non-encrypted upstream server.