Adblock-lean: set up adblock using dnsmasq blocklist

under development. @Lynx I've been traveling and today I've chance to do some testing today before I travel in two days, so I need to take with me adblock-lean for traveling. just to confirm I should be using the second improved branch correct?

2 Likes

I've just tweaked a little bit again to render use of blocklist compression optional. The code is still very fresh and needs further work.

Please download the latest code once more.

Now please enter:

compress_blocklist=1

in the adblock-lean config (or generate a new config and compare differences) and also add the line:

        list addnmount '/bin/busybox'

at the end of the dnsmasq section of /etc/config/dhcp or alternatively run:

uci add_list dhcp.@dnsmasq[0].addnmount='/bin/busybox'
uci commit

This is to ensure that dnsmasq can use busybox to extract the compressed blocklist when it loads.

1 Like

Yes, I put the original line in comments.

Edit: this is with the main branch.

I grabbed the latest of both branches just before running them

I love what you guys are doing, but I have one question which you undoubtedly can answer.

You (and alternative packages) are using local=/blockeddomain/
I understand that this works as queries in these domains are answered from /etc/hosts or DHCP only.

I think you can also use address=/blockeddomain/# which should resolve that domain to NULL, so should also work.

But what is the advantage of using local ?

One advantage is the ability to block query types besides A and AAAA, such as the SVCB and HTTPS (Type65) queries, prevalent on iOS devices.

1 Like

I just tested using:

https://raw.githubusercontent.com/FiltersHeroes/KADhosts/master/KADdnsmasq.txt

The wildcard entries are accepted, but there is a rejection, as follows:

Checking for any rogue elements.
Rogue element: '2278: local=/zgarniij_vouchher.skroc.pl/' identified originating in blocklist file part from: https://raw.githubusercontent.com/FiltersHeroes/KADhosts/master/KADdnsmasq.txt.

Should that be rejected?

Hi, just want to remember, that all this things are already in my testing version, but, seems, nobody was interested:

There is also a technice to reduce /tmp and ram usage also during download by splitting files by first character and sorting files based on it.
:wink:

1 Like

I was interested, albeit this thread is intended for adblock-lean. Myself and I'm sure @Wizballs would welcome any contributions if you are interested. It took me a while to mostly catch up with the memory savings you captured in your alternative script. Your split is clever. If I understand it correctly, you end up with roughly evenly sized chunks that you can then delete as you pass on to sort? By the way, the busybox copy can be avoided by adding an appropriate 'addnmount' to /etc/config/dhcp.

Yes, the idea was, to split the files during download and do 8 sorts serial instead of 1 big sort - as it will use about 1/8 of peak ram usage. Btw, if you need, you could simply split e.g. by 16.

Nice finding - this saves another 300 kb ram usage if I remember correctly. My Idea of the test script was, not to change anything on the config - simple delete the script and maybe a reboot later, all of my test script is gone :wink:
In my script, I used all tricks I could get to only save compressed files on /tmp, with a minimum of ram and /tmp usage, of course at the cost of a much longer processing by CPU. But, I think, this is ony a problem after a reboot, as the next run is a update, and the old list is still active...

I think the right approach. Who cares about some CPU use during the night, especially with your 'nice -19' trick. As you state, it is just once per day or so.

Hmmm I'm reading from various sources that underscores are not permitted in domains?

Either way, this line should allow it

rogue_element=$(sed -nE '\~(^(local|server|address)=/)[[:alnum:]*][[:alnum:]*.-_]+(/$)|^#|^\s*$~d;{p;=;q}' /tmp/blocklist.${blocklist_id} | { read match; read line; [[ ! -z "${match}" ]] && echo "${line}: ${match}"; })

1 Like

If it goes this way of smaller chunks, it would be a good idea to process allowlist entries on the smaller chunks also...

And if it doesn't go this way of chunks, it would still be better to process the allowlist on the individual blocklist downloads rather than at the end on the single combined list.

1 Like
abl_remove_duplicate | abl_add_blacklist | abl_remove_allow | abl_check_min_line_count | nice -n 19 gzip > /tmp/adblock-lean-wd/adblock-lean-blocklist.new.gz

It is done in the stream. 8 "sort -u" serialized and to the output the blacklist will be added and the whitelist will be removed.

Nice. I’ll test and all being well replace the rogue elements check with this.

Why? Would it be faster or use less memory?

Here is where this could be done at the moment:

So prior to this loop we check if allow lists exists, if it does, we call the sed to generate global allowlist template, and during the loop right after the chunk extraction, we also call the awk before piping to sort. Seem OK?

And just to check @Wizballs in the awk call with have two files appended - the allowlist then the blocklist. Can I remove the blocklist file and instead pipe the uncompressed chunk to the awk command with the allowlist still listed at the end? Could you give me the right syntax for that?

Hey guys, adblock-lean doesn’t work for me unless I create a dns hijacking rule in my firewall, is this normal?

Also even though I have enabled the service adblock-lean enable it doesn’t work after a router reboot, I have to manually restart the service.

Any suggestion?

The router should pass its own IP to clients as the DNS server. @dave14305 is an expert here. Hijacking can help, albeit some clients can bypass router altogether by encrypting DNS requests and those can’t be dealt with. Personally I don’t hijack since I am fine with clients using their own DNS servers (and losing out on caching at adblocking) instead of the one my router serves.

After reboot is your connection not yet up so the download and update of new blocklist isn’t possible? Can you post output of logread | grep adblock?

Yes it will use less peak memory/temp as it is not processing the combined list (and writing to another output file) in one go. This is at the expense of some CPU useage as allowlist will be processed/removed in each blocklist part, and duplicates haven't been removed yet. This isn't really a concern though.

Good question, I'll look at this later on:
If you specify no files or you specify a dash (–) as a file, awk reads data from standard input (stdin)

1 Like

I’ve just disabled Peer DNS and entered Cloudflare servers in wan and wan6 interface, but ads were appearing until I used the dns hijacking rule.

I restarted my router and ads started reappearing, it fails to download blocklists even though connectivity and time is up and correct,
here’s the output

logread | grep adblock
Thu Feb 15 05:35:52 2024 user.notice adblock-lean: Started adblock-lean.
Thu Feb 15 05:35:52 2024 user.notice adblock-lean: No local blocklist identified.
Thu Feb 15 05:35:52 2024 user.notice adblock-lean: Downloading new blocklist file part(s).
Thu Feb 15 05:35:52 2024 user.notice adblock-lean: Downloading new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/light.txt.
Thu Feb 15 05:35:57 2024 user.notice adblock-lean: Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/light.txt failed.
Thu Feb 15 05:35:57 2024 user.notice adblock-lean: Sleeping for 5 seconds after failed download attempt.
Thu Feb 15 05:36:02 2024 user.notice adblock-lean: Downloading new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/light.txt.
Thu Feb 15 05:36:08 2024 user.notice adblock-lean: Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/light.txt failed.
Thu Feb 15 05:36:08 2024 user.notice adblock-lean: Sleeping for 5 seconds after failed download attempt.
Thu Feb 15 05:36:13 2024 user.notice adblock-lean: Downloading new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/light.txt.
Thu Feb 15 05:36:18 2024 user.notice adblock-lean: Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/light.txt failed.
Thu Feb 15 05:36:18 2024 user.notice adblock-lean: Sleeping for 5 seconds after failed download attempt.
Thu Feb 15 05:36:23 2024 user.notice adblock-lean: Exiting after three failed download attempts.
Thu Feb 15 05:36:23 2024 user.notice adblock-lean: Failed to generate preprocessed blocklist file with at least one line.

Just make sure that clients are served the router IP as the DNS server. No need to hijack.

I guess we should add configurable startup delay when not running from terminal.

Can you try adding sleep 120 at the top of the start() function and then reboot?