Yeah I see no issues yet. But I also don't notice any difference as compared to just having used OISD. Since I am not seeing any negative effects, I'm just leaving both on for the time being.
I think I've forgotten what seeing adverts looks like now.
When I get a chance I'll code up your awk-based allow list approach. Nobody has raised any concerns yet.
@Wizballs here is my first attempt using your awk-based approach:
Please can you test (with and without an allowlist, and with various different allowlists) using the new service script available in the 'testing' branch:
Example output without an allowlist:
root@OpenWrt-1:/etc/init.d# ./adblock-lean start
Started adblock-lean.
Downloading new blocklist file.
Download of new blocklist file part from: https://big.oisd.nl/dnsmasq2 suceeded.
Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt suceeded.
Removing duplicates from downloaded blocklist file part(s).
No allowlist identified.
Checking new blocklist file.
New blocklist file check passed.
Restarting dnsmasq.
Checking dnsmasq instance.
The dnsmasq check passed with new blocklist file.
New blocklist installed with good line count: 681046.
Example output with an allowlist:
root@OpenWrt-1:/etc/init.d# ./adblock-lean start
Started adblock-lean.
Downloading new blocklist file.
Download of new blocklist file part from: https://big.oisd.nl/dnsmasq2 suceeded.
Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt suceeded.
Removing duplicates from downloaded blocklist file part(s).
Found allowlist with 2 lines.
Modifying blocklist based on allowlist.
Blocklist modification based on allowlist complete.
Checking new blocklist file.
New blocklist file check passed.
Restarting dnsmasq.
Checking dnsmasq instance.
The dnsmasq check passed with new blocklist file.
New blocklist installed with good line count: 680998.
allowlist processing time is as expected overall, ie against 680k blocklist lines. Ie double the blocklist lines, double the processing time of my 300k blocklist lines testing.
with ~2k allowlist, can see the processing time to remove allowlist entries was approx 44 seconds. Expected entries were removed correctly also:
Thu May 18 05:16:40 2023 user.notice adblock-lean: Stopped adblock-lean.
Thu May 18 05:16:43 2023 user.notice adblock-lean: Started adblock-lean.
Thu May 18 05:16:43 2023 user.notice adblock-lean: Downloading new blocklist file.
Thu May 18 05:16:57 2023 user.notice adblock-lean: Download of new blocklist file part from: https://big.oisd.nl/dnsmasq2 suceeded.
Thu May 18 05:17:01 2023 user.notice adblock-lean: Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt suceeded.
Thu May 18 05:17:01 2023 user.notice adblock-lean: Removing duplicates from downloaded blocklist file part(s).
Thu May 18 05:17:17 2023 user.notice adblock-lean: Found allowlist with 2066 lines.
Thu May 18 05:17:17 2023 user.notice adblock-lean: Modifying blocklist based on allowlist.
Thu May 18 05:18:04 2023 user.notice adblock-lean: Blocklist modification based on allowlist complete.
Thu May 18 05:18:04 2023 user.notice adblock-lean: Checking new blocklist file.
Thu May 18 05:18:31 2023 user.notice adblock-lean: New blocklist file check passed.
Thu May 18 05:18:31 2023 user.notice adblock-lean: Restarting dnsmasq.
Thu May 18 05:18:51 2023 user.notice adblock-lean: Checking dnsmasq instance.
Thu May 18 05:18:51 2023 user.notice adblock-lean: The dnsmasq check passed with new blocklist file.
Thu May 18 05:18:51 2023 user.notice adblock-lean: New blocklist installed with good line count: 677830.
allowlist of 20 lines. Allowlist removal time 54 seconds:
Thu May 18 05:41:16 2023 user.notice adblock-lean: Downloading new blocklist file.
Thu May 18 05:41:31 2023 user.notice adblock-lean: Download of new blocklist file part from: https://big.oisd.nl/dnsmasq2 suceeded.
Thu May 18 05:41:34 2023 user.notice adblock-lean: Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt suceeded.
Thu May 18 05:41:34 2023 user.notice adblock-lean: Removing duplicates from downloaded blocklist file part(s).
Thu May 18 05:41:49 2023 user.notice adblock-lean: Found allowlist with 19 lines.
Thu May 18 05:41:49 2023 user.notice adblock-lean: Modifying blocklist based on allowlist.
Thu May 18 05:42:43 2023 user.notice adblock-lean: Blocklist modification based on allowlist complete.
Thu May 18 05:42:43 2023 user.notice adblock-lean: Checking new blocklist file.
Thu May 18 05:43:11 2023 user.notice adblock-lean: New blocklist file check passed.
Thu May 18 05:43:11 2023 user.notice adblock-lean: Restarting dnsmasq.
Thu May 18 05:43:31 2023 user.notice adblock-lean: Checking dnsmasq instance.
Thu May 18 05:43:31 2023 user.notice adblock-lean: The dnsmasq check passed with new blocklist file.
Thu May 18 05:43:31 2023 user.notice adblock-lean: New blocklist installed with good line count: 681012.
No allowlist:
Thu May 18 05:34:33 2023 user.notice adblock-lean: Downloading new blocklist file.
Thu May 18 05:34:49 2023 user.notice adblock-lean: Download of new blocklist file part from: https://big.oisd.nl/dnsmasq2 suceeded.
Thu May 18 05:34:52 2023 user.notice adblock-lean: Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt suceeded.
Thu May 18 05:34:52 2023 user.notice adblock-lean: Removing duplicates from downloaded blocklist file part(s).
Thu May 18 05:35:08 2023 user.notice adblock-lean: No allowlist identified.
Thu May 18 05:35:08 2023 user.notice adblock-lean: Checking new blocklist file.
Thu May 18 05:35:36 2023 user.notice adblock-lean: New blocklist file check passed.
Thu May 18 05:35:36 2023 user.notice adblock-lean: Restarting dnsmasq.
Thu May 18 05:35:56 2023 user.notice adblock-lean: Checking dnsmasq instance.
Thu May 18 05:35:56 2023 user.notice adblock-lean: The dnsmasq check passed with new blocklist file.
Thu May 18 05:35:56 2023 user.notice adblock-lean: New blocklist installed with good line count: 681046.
# Clean whitespace
sed -i '\~^\s*$~d;s/^[ \t]*//;s/[ \t]*$//' /tmp/blocklist
to this:
# Clean whitespace and format all entries as local=/.../
sed -i -e '\~^\s*$~d;s/^[ \t]*//;s/[ \t]*$//;s/^address/local/;s/^server/local/;s/#$//' /tmp/blocklist
Reason: oisd is using the local=/.../ syntax. Hagzei is using address=/.../#. The awk duplicate line remover won't pick up duplicates across different syntax, and so many duplicates exist as is.
OK will do. Seems like the allowlist code is working in general though, which is good. By the way what's the best default location? Should it actually be /root/adblock-lean/allowlist? Since otherwise it's perhaps not clear what the file actually relates to.
This current order (duplicates first, whitespace second), won't remove duplicates with different syntax eg local=/.../ (oisd syntax) vs address=/.../ (hagzei syntax)
Which is why this line needs to still remove whitespace, but also convert all syntax to local=/.../
# Clean whitespace and format all entries as local=/.../
sed -i -e '\~^\s*$~d;s/^[ \t]*//;s/[ \t]*$//;s/^address/local/;s/^server/local/;s/#$//' /tmp/blocklist
and then remove duplicates after.
There will be a big memory saving once duplicates from oisd and hagzei are removed
Started adblock-lean.
Downloading new blocklist file.
Download of new blocklist file part from: https://big.oisd.nl/dnsmasq2 suceeded.
Download of new blocklist file part from: https://raw.githubusercontent.com/hagezi/dns-blocklists/main/dnsmasq/pro.txt suceeded.
Processing and checking new blocklist file formed from downloaded blocklist file part(s).
Removing duplicates from downloaded blocklist file.
Duplicates removed.
No allowlist identified.
Checking for any rogue elements.
New blocklist file check passed.
Restarting dnsmasq.
Checking dnsmasq instance.
The dnsmasq check passed with new blocklist file.
New blocklist installed with good line count: 405222.
Please let me know how you get on with testing and any further fixes or improvements you can think of.
Looking good! Can see duplicates are removed properly now, you are down to 405k entries, was 680k before hand. Well worth it for memory saving, processing speeds etc. I'll get through this last workday for the week and do some test runs and report back over the weekend.
I've experimented with awk vs sed on relatively powerful routers (Marvell Armada and Intel Atom) and found there was enough performance difference on very large lists to use sed for allow-lists. Just my 2 cents.
PS. With the level of complexity you've added/are adding, maybe you'd consider contributing to simple-adblock instead (rather than duplicating its features here). I'd especially welcome the dnsmasq health checking.
I think there's room for adblock-lean in that adblock-lean is simply a service script with zero dependencies on newer OpenWrt versions. Just a couple of configuration lines. I like that about adblock-lean. Hopefully not so much to go wrong. The goal is set and forget. And that's been achieved for me at least on my router.
And to be fair, the use of the dnsmasq lists was @Wizballs' idea, and as far as I'm aware that wasn't implemented yet in simple-adblock or adblock. So there's also a sense of simple-adblock duplicating features of adblock-lean now isn't there ? And it seems like soon you might duplicate our dnsmasq health checking!
Otherwise, I think we put together the various processing lines (well almost exclusively @Wizballs) rather independently without too much regard for the implementations in simple-adblock and adblock.
Hmmm ok I couldn't find a good sed option to perform this (so far anyway - I'm going to revisit). Yes agree sed is generall faster in like-for-like operations. And also has the added bonus of inline editing vs Openwrt's awk.
I look at it a bit different. adblock-lean is less than 300 lines of code and so I find it quite easy to follow the script path. But even so it has plenty of checks and functions. What is keeping it somewhat 'lean' is that it is only accepting native dnsmasq files vs all the other available formats.
I have had a look over simple-adblock and honestly why don't some of the best bits from lean get added into simple (and vise versa)? Especially multiple dnsmasq file logic etc etc. But yeah defs room for both. I think both @Lynx and I are pretty happy of our little effort. Also happy to help out simple where possible, it's all one community here.
Yes seconded. And really, we are just a couple of cheeky upstarts ourselves here right?
And at the same time, @stangri has been pretty supportive and helpful in respect of our little endeavour. Yes it a great little community, and I'm eager to collaborate and share ideas too.
check_dnsmasq()
{
log_msg "Checking dnsmasq instance."
pgrep -x dnsmasq &> /dev/null
if [[ "${?}" -ne 0 ]]; then
log_msg "No instance of dnsmasq detected with new blocklist."
return 1
fi
for domain in google.com amazon.com microsoft.com
do
nslookup "${domain}" | grep -A1 ^Name | grep -q '^Address: *0\.0\.0\.0$'
if [[ "${?}" -eq 0 ]]
then
log_msg "Lookup of '${domain}' resulted in 0.0.0.0 with new blocklist"
return 1
fi
done
return 0
}
@Lynx All running great. Several tests ran including zero length allowlist file.
Sent a github pull request for a couple lines (testing branch). Doesn't change functionality in the slightest, just shortens a couple of the processing lines.