Done, force-pushed. Note that now when the part size is too large, we will skip it and continue, regardless of the value of ${download_failed_action}.

[ also sometime ago I suggested unifying all the possible fail options into 1 option, e.g. part_fail_action, this would cover all possible fail conditions, while making the logic a bit simpler, and making it easier for the user to understand the config options, and I still think that this is a good idea ]

2 Likes

Might be a good idea. I wondered about something like tracking a state then case statement or something. I’m not sure haven’t thought through but the logic flow does indeed seem a bit flaky at the moment and open to improvement.

So I guess PR is ready to go, albeit maybe you'd like to reorder functions in same PR (e.g. with different commit) or different PR? Please let me know when you'd like me to merge in.

1 Like

Yes, I'll submit a second commit into this PR, working on it rn.

1 Like

Found and fixed a bug (see comment in the PR), and submitted a 2nd commit with the functions reordered.

2 Likes

Another idea I have is to make the check of whether the blocklist was loaded a bit more robust. To my understanding, currently it relies on the presence of the blocklist file in the /tmp/dnsmasq.d directory, and on dnsmasq process running and responding to dns requests. This seems a bit flimsy since no verification of the specific blocklist entries is performed.

My idea is to add a custom entry to the end of the blocklist file, e.g. local=/adblock-lean-test123.info/ and then when verifying that the blocklist was loaded, make a dns request for that entry and check the result. It's quite simple to implement. What do you guys think?

1 Like

Yes, much better.

Also right now this line can be improved:

@dave14305 gave a way to get a direct output, but I've forgotten.

Ah, maybe this:

I agree, probably we should use the ubus call. It might be slightly less efficient, but it is simpler and more robust.

Edit: looking at the output of ubus call dnsmasq metrics, the output is different from kill -USR1 $(pidof dnsmasq) (I replaced pgrep with pidof because this makes it less likely to accidentally send a signal to the wrong process).

kill -USR1 $(pidof dnsmasq):

Thu Aug  8 19:29:33 2024 daemon.info dnsmasq[1]: time 1723145373
Thu Aug  8 19:29:33 2024 daemon.info dnsmasq[1]: cache size 1000, 0/23 cache insertions re-used unexpired cache entries.
Thu Aug  8 19:29:33 2024 daemon.info dnsmasq[1]: queries forwarded 9, queries answered locally 9
Thu Aug  8 19:29:33 2024 daemon.info dnsmasq[1]: pool memory in use 0, max 0, allocated 0
Thu Aug  8 19:29:33 2024 daemon.info dnsmasq[1]: server 192.168.201.0#53: queries sent 9, retried 0, failed 0, nxdomain replies 0, avg. latency 0ms

ubus call dnsmasq metrics (with some sed processing to remove the json stuff):

dns_cache_inserted: 23
dns_cache_live_freed: 0
dns_queries_forwarded: 9
dns_auth_answered: 0
dns_local_answered: 9
dns_stale_answered: 0
dns_unanswered: 0
bootp: 0
pxe: 0
dhcp_ack: 0
dhcp_decline: 0
dhcp_discover: 0
dhcp_inform: 0
dhcp_nak: 0
dhcp_offer: 0
dhcp_release: 0
dhcp_request: 0
noanswer: 0
leases_allocated_4: 0
leases_pruned_4: 0
leases_allocated_6: 0
leases_pruned_6: 0

I'm not sure which one is more useful, opinions? @Lynx

So I looked closer at the generate_preprocessed_blocklist_file_parts() function and found a few bugs (list id's not getting incremented; reported list line count might be incorrect in some scenarios; allowlist and blocklist checks are different for no apparent reason), and lots of code duplication. The bugs, I believe, are mostly a result of the code getting too chaotic. So I decided to rewrite it. Kinda in the process of doing that rn. Just letting you guys know what I'm up to.

2 Likes

It's just awesome the effort you (and everyone included) is putting in to this. And that's on top of the efforts prior to yourself joining.

2 Likes

So I have a working result now, which obviously needs testing (and probably has some bugs). To make logic as straightforward as possible, I already implemented this idea:

Unify all the possible part fail options into 1 option, e.g. part_fail_action. This would cover all possible fail conditions, while making the logic a bit simpler, and making it easier for the user to understand the config options.

Because this changes some config options, I suggest @Lynx to create a new branch for this PR. I'm considering to implement some automated mechanism for config migration, so if this happens, we will merge this into the same branch before merging to master, so the users (hopefully) will get a new version without the headache of manual config migration.

2 Likes

I can run test combinations now if you like? Just send the link if so, I can't see where the branch is.

1 Like

Just pushed to Github:

https://github.com/friendly-bits/adblock-lean/blob/improve-processing-logic/adblock-lean

1 Like

Would you like new branch to be same? So improve-processing-logic? Or testing or development? Just let me know and I’ll create it.

1 Like

I suppose something like "Development" is good.

Especially important to test with local allowlist, local blocklist, either and both, and with downloaded allowlist. These are combinations I haven't tested.

@Lynx Besides testing, I think manual code review is needed from you, since I changed lots of things and testing might not reveal all potential problems. Mainly this applies to gen_list_parts(), download_and_process_list_part(). I also made some changes to generate_and_process_blocklist_file() but not very significant.

2 Likes

That combining all fail conditions into one is nice, and shortens the config file.

Using large blocklists, downloaded allowlist, local blocklist, local allowlist, and final compression ON:

  • rogue blocklist handled correctly
  • oversized file part handled correctly
  • oversized final blocklist handled correctly
  • min final line count handled correctly
  • local allowlist added correctly
  • local blocklist FAIL - does not add entry.

Am incorrectly getting rogue element in downloaded allowlist now.
https://raw.githubusercontent.com/hagezi/dns-blocklists/main/wildcard/whitelist-referral-onlydomains.txt
The allowlist from Hagezi is formatted in eg suite14.emarsys.net without any other eg server=/.../# wrapper.
Maybe we should allow this format in blocklist also and wrap it in server=/.../ not urgent anyway.

These two log entries could be combined onto one line if you like to save a couple of log entries:

Found local allowlist.
Sanitizing the local allowlist.
Found local blocklist.
Sanitizing and compressing the local blocklist.

Maybe another redundant log entry here with skipping file part:

Downloaded blocklist part size reached the maximum value set in config (15000 KB).
Consider either increasing this value in the config or removing the corresponding blocklist url.
Skipping blocklist part and continuing.
Skipping file part and continuing.

But also, I'm done for the night, falling asleep at the keyboard. More testing from me tomorrow :wink:

2 Likes

Nice and thorough testing there! Sleep well.

1 Like

Great job, as always. Good night!

1 Like