Adblock-lean: set up adblock using dnsmasq blocklist

Lynx · April 10, 2023, 8:21am

@Wizballs, @antonk and myself maintain a new and ultra simple and lightweight adblocking solution for OpenWrt: adblock-lean.

The Hagezi DNS blocklists lists are fully supported by adblock-lean and strongly endorsed, and Hagezi himself recommends adblock-lean:

Features of adblock-lean include inter alia:

ultra light on resources - minimal CPU and RAM usage, facilitating massive blocklist use on low-spec routers or and even larger blocklists (one million lines plus) on mid to higher-spec routers;
implemented as a service script - no package dependencies;
no bloatware and (hopefully) minimal bugs or glitches;
daily updates of blocklists;
leverage compression during blocklist processing and optional compression for generated blocklist;
volatile memory used for all processing - no flash wear;
feature rich with many safeguards such as:
- rigorous checking through downloaded blocklist lines and generated blocklist file;
- protections against malicious or temporarily broken remote blocklists; and
- verifying certain test DNS lookups succeed upon restarting DNSmasq; and
set once and forget.

adblock-lean is available here:

adblock-lean is a super simple and lightweight adblocking solution that leverages the major rewrite of the DNS server and domain handling code associated with dnsmasq 2.86 that drastically improves performance and reduces memory foot-print, facilitating the use of very large blocklists for even older, low performance devices.

adblock-lean was designed primarily for use with the dnsmasq variant of the oisd blocklist used by major adblockers and which is intended to block ads without interfering with normal use.

adblock-lean is written as a service and 'service adblock-lean start' will download and setup dnsmasq with a new blocklist file. Various checks are performed and, in dependence upon the outcome of those checks, the script will either: accept the new blocklist file; fallback to a previous blocklist file if available; or restart dnsmasq with no blocklist file.

History of development can be traced in this thread.

And there is a thread concerning the perpetual optimisation of adblok-lean here.

Test your adblock efficacy here:

Whilst there are already a few users (including myself of course), adblock-lean will benefit from testing and feedback from more users.

Wizballs · April 11, 2023, 12:24am

Cool, not a bad idea to start a separate/clean topic for adblock-lean. Can focus on more specific development here vs the general discussion in the other thread.

Wizballs · April 14, 2023, 9:37pm

Have been running the latest adblock-lean for a few days with 22.03.4 release, all is good. PS laughed at thundering herd name

a-z · April 23, 2023, 4:51pm

No sure if this is the right thread, but I've been testing the adblock lean project and so far I'm happy, although I modded the script a bit just to use another lists. The (HaGeZi) ones.

The updates and cron working great. Would be possible to add more than 2 links to the list?

One thing I'm having some extra thoughts is the adblocking is working really great and I'm using with DNS https proxy in that way I hijack all the dns DoH/DoT from my network and forced to pass it to my dns with the adblock lean and works great.

But I need sometimes a device without the blocking because need to check something with GA/Marketing. I think maybe in the future would be nice to have a LUCI UI, to add extra list or add more functionalities. I'm replacing NextDNS with adblock-lean and I'm really impressed, thank you.

quick question regarding adding a second list? I know that i could modify the file size and obviously my device will be affected by the ram consumed.

blocklist_url="https://big.oisd.nl/dnsmasq2"

but would be possible to add a second list?

blocklist_url="https://big.oisd.nl/dnsmasq2, link2"

Lynx · April 23, 2023, 5:01pm

Glad that you are finding this useful. I just have it on all the time and forget it's running. Every now and then I check the logs to make sure it's updating the list daily.

That's pretty easy - just set up DNS hijacking in LuCi to send requests from a particular MAC addresses to a specific DNS server e.g. 1.1.1.1.

I suppose it would be, but it would add another layer of complexity. Any thoughts on that @Wizballs? Mindful that multiple lists could give rise to a lot of duplicate entries, do we already have any special handling to deal with duplicate entries? If not, I wonder what dnsmasq does with duplicate entries? If it handles those just fine, then a crude solution might be just to download all lists then combine them into one, then proceed as we were proceeding before?

a-z · April 23, 2023, 5:35pm

I really like this project because I can set it up and forget about it. At first, I used a Raspberry Pi with AdguardHome and Unbound, but I had to repurpose it for another project. The problem was that I couldn't enforce DoH/DoT on my network without using complex scripts and hotplugs due to OpenWrt 19.X - 22.03.x's limitations. (IPTables to NFTables)

To solve this issue, I switched to using DNS https Proxy packages to force all DoH/DoT traffic on my network. However, I needed to filter out annoying ads, and that's where Adblock-Lean came in handy.

While Like the oisd lists I prefer HaGeZi, he has some extra lists that can complement Adblock-Lean in my case. Nonetheless, using just one list works fine for me. Initially, I was worried that the cron wouldn't work with a different list, but it loaded and updated everything perfectly.

I'll keep testing.

Lynx · April 23, 2023, 6:13pm

What gets blocked by HaGeZi that doesn't get blocked by OISD I wonder? If it's significant then it should really get folded into OISD since the idea behind OISD is one list of lists to rule them all in terms of ad-blocking without affecting general internet use.

By the way incase you didn't know there are a few more sophisticated and mature ad-blocking solutions already available for OpenWrt - see:

None were entirely to my taste - personally I find them a little too fussy, which is why I wrote my own with help from @Wizballs. But you should consider all of them, and especially simple-adblock.

a-z · April 23, 2023, 6:26pm

Thank you for your input. I have already tried some other ad-blocking solutions, and I agree that they can be too fussy. However, I think everyone's experience might differ based on the devices they use. For me, Adblock-Lean works nicely.

As for HaGeZi, he has done an excellent job curating several lists and layers, including ones for telemetry apple, social networks and Microsoft. When I used his list with AdguardHome, I noticed that around 30% of my queries came from IoT devices like Alexa and Microsoft Windows clients. In total, I had around 1.2 million queries per month.

I would also like to add that HaGeZi is featured in NextDNS, so that's another option to consider.
While I like OSID and started with them when I was learning to use pihole, I eventually moved to AdguardHome and discovered HaGeZi. However, I think it comes down to personal preference. I'm glad that your project allows me to change the list or test a different one. I really like this!

Lynx · April 23, 2023, 6:29pm

If I understand you correctly you'd see value in OISD big list plus HaGeZi? I have in mind a way to combine lists, and I'd just need to figure out with input from @Wizballs how best to handle duplicates.

a-z · April 23, 2023, 7:01pm

To clarify, I don't want to use multiple adblock lists, although it would be nice to have that option. However, I'm not sure if every device would be able to handle it. For me, the adblock list from OISD is good enough, but my personal preference is the one from HaGeZi.

He curates several layers or different adblocks for different situations and adds more layers for additional security. When I was using AdguardHome on my Raspberry Pi, I added some extra lists from HaGeZi that were focused on telemetry, malware, and threat intelligence. I understand that this approach may not be for everyone, and I'm happy to use the OSID list with Adblock-Lean regularly.

I think it's a great solution. My comment was simply to suggest adding those extra lists, but I understand that other people may not need them.

I'm using this on my EdgeRouter-X (production) with OpenWrt 22.03.3, it has enough memory ram to support OISD list but also a bigger ones. Also I'm testing on a Linksys E5600 (testing) with OpenWrt 22.03.3, cpu and memory is more limited so OISD is enough.

Wizballs · April 23, 2023, 8:06pm

Hello,
I'll test some duplicate removal options over the next couple of days and see what works. I suspect this could be a CPU heavy task, lets find out!?

Multiple blocklist options are a good idea though. How many are doable? eg could you have 5, 10....?

Are you thinking download, and test for rouge individually, then combine. Or even possibly keep separate files per blocklist in tmp/dnsmasq.d. Testing individually would be ideal, as eg 3 lists might pass, and 1 fails - in this case 3 lists are updated, and the one that failed would use the previous good list.

Also how would you feel about switching to curl for the download, as it can test file size before downloading to /tmp, and abort download if over set size. Might be useful if downloading multiple large blocklist as a ram usage safeguard. But only a thought as wget works perfectly well as is.
curl --max-filesize 20971520 --max-time 60 --retry 3

@a-z Luci GUI has been mentioned before. Is anyone out there able to assist with getting this project into a package?

Lynx · April 23, 2023, 8:21pm

Actually that seems like a better idea than what I had in mind. I forgot that dnsmasq can work with multiple files in /tmp/dndmasq.d/.

Depending on how dnsmasq handles duplicates then perhaps there would even be no need to weed out duplicates?

Yes good point. Is curl a default package? If not, then that would be one reservation I'd have, as right now there are no dependencies I think?

I'm open minded. On the one hand, the existing solution seems elegant to me in that it involves just one simple service file to put into /etc/init.d/ and we don't even have any dependencies if I remember correctly. So it's easy to try out and tweak or ditch. On the other hand, I can see the benefit in a package with a very simple LuCi page.

If anyone reading has any thoughts on the above or fancies getting involved in this respect then please chip in!

dave14305 · April 23, 2023, 8:51pm

Using multiple lists can be complex if you end up combining a wildcard domain list with a non-wildcard list that includes multiple hosts entries from a domain. You could end up wasting memory.

The more features you add, the less lean you get.

adblock-lean, simple-adblock, adblock. There are target audiences for each.

Wizballs · April 23, 2023, 9:07pm

Correct, curl is not a default package. And I totally understand your POV. Maybe if curl exists then....otherwise wget. Maybe something I can work on. It's not functionally required as is anyhow.

And agree with a LUCI option, ie it's not necessary. It might help some people switch if they find manual method to difficult. Would be nice to use resume/pause buttons though!

Lynx · April 23, 2023, 9:17pm

Ah yes nice idea - optimization if curl binary present, else fall back. Seems worthwhile experimenting with implementing this. Feel free to try or I will when I'm back from Munich.

We have this to some extent already given the service file aspect. I mean user can just click on service start or stop in the LuCi service script page.

Yes good points - I totally agree. We have to be careful here.

Wizballs · April 23, 2023, 9:33pm

Great point! I missed that / didn't think of it. Cheers

Wizballs · April 25, 2023, 12:33pm

@Lynx

If multiple files are going to be used, I think they should all be in the same format. Output below will be the same syntax as OISD eg local=/ad.site.com/ . Otherwise, finding duplicates will be very difficult later.

Here is a single line that will clean whitespace, convert ('server' or 'address') to 'local', and remove any trailing #'s
eg server=/ad.site.com/# will become local=/ad.site.com/
Unless anyone has convincing arguments to use server or address, instead of local?

sed -i -e '\~^\s*$~d;s/^[ \t]*//;s/[ \t]*$//;s/^address/local/;s/^server/local/;s/#$//' /tmp/blocklist

There is a really elegant/simple solution to removing duplicates across multiple files - and it's super fast also! Bbbbbuuuuuutttttttt it requires gawk package, at ~250kb. Can use upto 12 files.

gawk -i inplace '!seen[$0]++' file1 file2 ... file12

I'll look for another solution using inbuilt packages, but I don't think any will be this simple
Also, this looks for exact matches per line, and so isn't taking into account wildcard domains combined with non-wildcard. Unsure how much real-world impact this will have. To be reviewed.

Wizballs · April 30, 2023, 9:52am

@Lynx realise you mentioned travelling at the moment, but I have to write this here before I forget it

Awk definitely seems like the correct tool for removing duplicates. Tested on r7800 with 623,000 lines took approx 11 seconds. IMO that is keeping inline with the lean philosophy.

Keep only unique lines from multiple files, using openwrt native version awk, and write back to separate files as per original inputs. This needs to write to a separate file as inline editing isn't available with OWRT's awk (inline only available in gawk package). Therefore the original files will need to be removed after.

awk ' { if (!seen[$0]++) print $0 > "/tmp/dnsmasq.d/" FILENAME } ' /tmp/blocklist1 /tmp/blocklist2

This method does not combine sub domains from top level domains as @dave14305 mentioned. However as a test I de-duped Hagezi multi pro & oisd big - only 16k lines were left in the OISD file. And random testing a bunch of these, I didn't find any examples of duplicate top level domains anyway. Removing exact line duplicate method is very fast to process. Even if there were a few thousand lines extra due to domains, the redundancy / increased resourse useage isn't relatively large.

I think stick to removing exact line duplicates, especially if users are a little discerning when selecting multiple lists. This duplicate method can always be revisted in future if the need ever arises, but I really can't see that it will need to change.

Lynx · April 30, 2023, 7:54pm

Sounds great. Can you elaborate on what the above command does? Does that write out non-duplicates to FILENAME?

Wizballs · April 30, 2023, 8:40pm

FILENAME in awk is a variable, which means it will write to the original file name. So in this example, reading from /tmp/blocklist1 and tmp/blocklist2, and writing to /tmp/dnsmasq.d/blocklist1 and /tmp/dnsmasq.d/blocklist2.

This command will keep the first instance of a unique line it encounters, writing to the original file name albeit in a different folder.

Example

Input:

/tmp/blocklist1:
A
B
C
D
Z

/tmp/blocklist2:
A
B
F
G
H

Output:

/tmp/dnsmasq.d/blocklist1:
A
B
C
D
Z

/tmp/dnsmasq.d/blocklist2:
F
G
H