Optimization of adblock-lean

While waiting for a connection in the Vienna airport, I thought to touch on another topic. The faster_processing branch is now almost 2150 lines. Some of it are empty lines, lots of comments. Some lines count (probably nearly 100) could be decreased by simply condensing these structures

if [ condition ]
then
[dosomething]
fi

for i in a b c
do
[dosomething]
done

To these

if [ condition ]; then
[dosomething]
fi

for i in a b c; do
[dosomething]
done

This would be nice. However the script would still be too long to comfortably navigate.

So I'd suggest splitting a library out of it. In particular, the 2 biggest things we have are processing-related and config-related. Probably splitting out the config-related functions (around 560 lines including spaces and comments) into a separate library would make sense. The processing code is actually less than that at about 450 lines (including everything again) but still pretty big. So also makes sense to consider splitting it into its own library.

I know that this doesn't exactly fit with "one script, super easy install" philosophy but I suppose that this philosophy was devised way before anyone knew about the path this project actually took. And I'd suggest to consider adjusting it at this point.

Maybe not with the upcoming release but eventually.

How's ya'll doing? I'm back from vacation.
Worked some more on the URLs conversion logic. I think it's in good shape now, anyone wanna test?

Welcome back! Yes but hopefully tomorrow. Totally knackered after the last week. Work work work work work work!

1 Like

Hey how was the holiday?? I could use one also lol

Sure thing, I'll log tonight and tomorrow morning. let you know!

1 Like

I joined my father in his vacation on some fairly random beach in Montenegro xD. Nothing too exciting to tell (except almost missing the plane, twice), but I enjoyed it.

Please do!

1 Like

I like that backup of config into /tmp, as that will automatically disappear after the next reboot, but still gives you a window to find it still if needed (although it's a pretty simple config, should be easy to recreate from scratch anyway).

Also I see that flag file "new_urls_format" so the process will only happen once:

On the first run with the new script , when using gen_config this happened. Is this related to the new location maybe?

root@OpenWrt:~# service adblock-lean gen_config
Generating new default config for adblock-lean.
Error: Failed to validate the new config.

But running adblock-lean start did fix that error, after following the y/n prompts. Then gen_config worked on each subsequent run.

These two should probs be moved do /etc also do you think?

local_allowlist_path="/root/adblock-lean/allowlist"
local_blocklist_path="/root/adblock-lean/blocklist"

When starting with dnsmasq lists, and no flag file "new_urls_format", that second prompt Convert known dnsmasq-formatted list URLs which are currently configured as "raw domains" to "raw domain" list URLs? might not be needed? As when I hit yes on that, I end up with no blocklists, and if I hit no, then I get an invalid config:

root@OpenWrt:~# service adblock-lean start

NOTE: adblock-lean now supports both dnsmasq-formatted lists and "raw domain" lists.
The "raw domain" lists are functionally identical but their filesize is smaller and they are faster to process.
Hagezi and OISD dnsmasq-formatted list URLs can be automatically converted to "raw domain" list URLs.
If you are using other dnsmasq-formatted lists, they will be moved into separate config options and continue to work,
but you may want to look for their raw domain equivalents (sometimes called "wildcard domains").

Current "raw domains" URL config entries:
blocklist_urls=""
blocklist_ipv4_urls=""
allowlist_urls=""


Current "dnsmasq" URL config entries:
dnsmasq_blocklist_urls=""
dnsmasq_blocklist_ipv4_urls="https://nsfw.oisd.nl/dnsmasq2 https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt"
dnsmasq_allowlist_urls=""


Convert known dnsmasq-formatted list URLs which are currently configured as "raw domains" to "raw domain" list URLs?
(you will be asked to confirm before writing the config)
y|n: y

Updated "raw domain" URL config entries:
blocklist_urls=""
blocklist_ipv4_urls=""
allowlist_urls=""

Updated dnsmasq URL config entries:
dnsmasq_blocklist_urls=""
dnsmasq_blocklist_ipv4_urls=""
dnsmasq_allowlist_urls=""

Confirm URLs update in config?
y|n:
1 Like

Great testing, as always.

That's my oversight and easy to fix.

When generating default config, the default paths for local will get changed to /etc/. Also when converting existing config, and the files /root/adblock-lean/*list do not exist, the paths will be changed. Only if you already have these files in /root/, the paths will be migrated to the new config. I think this is the least trouble for the user and for us.

Hmm, I'll try to reproduce this.

1 Like

Just pushed a new revision fixing both issues (gen_config when .new_urls_format is missing and existing dnsmasq_* URLs not getting migrated. @Wizballs

Hmm, I think I created another bug - for some reason when the dnsmasq_blocklist_ipv4_urls config entry doesn't exist, wrong URLs end up in that entry.
Edit: nah, I just accidentally put the URLs in the wrong entry to begin with :slight_smile:
Edit2: ah, I just copied the entries from your config, and that's where the trouble started. Those URLs are supposed to be in dnsmasq_blocklist_urls rather than
in dnsmasq_blocklist_ipv4_urls.

1 Like

My bad, I did a few runs and must have pasted in the the wrong config line at least one. But even when placing into the correct dnsmasq_blocklist_urls config, the same issue occured:

However, tried your updated git and while not the the same issue as before, it also doesn't seem to convert dnsmasq to raw lists:

My config still contains:
dnsmasq_blocklist_urls="https://big.oisd.nl/dnsmasq2 https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt"


root@OpenWrt:~# service adblock-lean start

NOTE: adblock-lean now supports both dnsmasq-formatted lists and "raw domain" lists.
The "raw domain" lists are functionally identical but their filesize is smaller and they are faster to process.
Hagezi and OISD dnsmasq-formatted list URLs can be automatically converted to "raw domain" list URLs.
If you are using other dnsmasq-formatted lists, they will be moved into separate config options and continue to work,
but you may want to look for their raw domain equivalents (sometimes called "wildcard domains").

Current "raw domains" URL config entries:
blocklist_urls=""
blocklist_ipv4_urls=""
allowlist_urls=""


Current "dnsmasq" URL config entries:
dnsmasq_blocklist_urls="https://big.oisd.nl/dnsmasq2 https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt"
dnsmasq_blocklist_ipv4_urls=""
dnsmasq_allowlist_urls=""


Convert known dnsmasq-formatted list URLs which are currently configured as "raw domains" to "raw domain" list URLs?
(you will be asked to confirm before writing the config)
y|n: y

Updated "raw domain" URL config entries:
blocklist_urls=""
blocklist_ipv4_urls=""
allowlist_urls=""

Updated dnsmasq URL config entries:
dnsmasq_blocklist_urls="https://big.oisd.nl/dnsmasq2 https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt"
dnsmasq_blocklist_ipv4_urls=""
dnsmasq_allowlist_urls=""

Confirm URLs update in config?
y|n: y

Old config file was saved as /tmp/adblock-lean_config.old.

Saving new config file to '/etc/adblock-lean/config'.
Started adblock-lean.

gawk detected so using gawk for fast (sub)domain match removal.
coreutils-sort detected so sort will be fast.

No existing compressed or uncompressed blocklist identified.

Found local allowlist. Sanitizing.
Successfully processed allowlist (source file size: 1 B, sanitized line count: 0).
Not using any allowlist for blocklist processing.

Found local blocklist. Sanitizing and compressing.
Successfully processed blocklist (source file size: 1 B, sanitized line count: 0).

Starting dnsmasq blocklist part(s) download.

Downloading, checking and sanitizing dnsmasq blocklist part from: https://big.oisd.nl/dnsmasq2.
1 Like

The algorithm is only intended to convert URLs in the non-dnsmasq entries. My assumption is that if someone decided to put the dnsmasq URLs in the dnsmasq config entries then they want to keep these URLs as is. I suppose it's better to not offer conversion when there is nothing to convert, but that would require some additional logic, and since for the typical user the URL conversion will only run once when they don't have anything in the dnsmasq entries yet, and have dnsmasq URLs in the non-dnsmasq entries, I'm not sure adding more logic is justified.

Ah yep I see, the intent is really to convert the existing user database. So that is working for me. Roundabout way but I got there!

1 Like

Going back a week here, but I think while adblock-lean is still primarily installed via command line, best to keep as one file.

But as you mentioned there are some line consolidation options to reduce line count. Also, was some of this code inteneded to be temporary? Ie after x months, can assume that most people have migrated config and/or lists, and those parts can be removed from the main branch?

1 Like

We will need @Lynx's permission to do that :slight_smile:

Yep, the temporary things are currently: some small part of the parse_config() function intended for config migration from initial format to v2 (or higher), which was added in August, and now URLs conversion code and the config file migration logic. It is all marked in the code.

Edit: found a bug with oisd URLs conversion, fixed in the current revision.

1 Like

Sounds good but I’ve forgotten what those are?

Yeah I think so. Big long file is just fine. Line count doesn’t seem such an issue to me.

These:

(and while [condition]; do as well)
Edit: ah, and function() { rather than

function()
{

But no pressure, if any of this makes you uncomfortable then I'm lifting the suggestion.

Ah yeah that’s very much stylistic and I strongly prefer the extra lines. Hope that’s OK?

1 Like

After some thinking, I decided to make some improvements for the URLs conversion:

  • Don't offer conversion when there is nothing to convert (silently creates the .new_urls_format file)
  • When only unknown URLs are found, skip the dialog asking about conversion (move the unknown URLs to the dnsmasq_* config entries and ask for approval)

This is already implemented in the newest revision.

I think this should be good to go now. Should I open a PR? @Lynx

2 Likes

Yes I think so - seems like a good time!

1 Like

PR opened. I'm wondering if it's a good idea to request community feedback on this version before merging it?