Adblock-lean: set up adblock using dnsmasq blocklist

Wizballs · April 30, 2023, 9:52am

@Lynx realise you mentioned travelling at the moment, but I have to write this here before I forget it

Awk definitely seems like the correct tool for removing duplicates. Tested on r7800 with 623,000 lines took approx 11 seconds. IMO that is keeping inline with the lean philosophy.

Keep only unique lines from multiple files, using openwrt native version awk, and write back to separate files as per original inputs. This needs to write to a separate file as inline editing isn't available with OWRT's awk (inline only available in gawk package). Therefore the original files will need to be removed after.

awk ' { if (!seen[$0]++) print $0 > "/tmp/dnsmasq.d/" FILENAME } ' /tmp/blocklist1 /tmp/blocklist2

This method does not combine sub domains from top level domains as @dave14305 mentioned. However as a test I de-duped Hagezi multi pro & oisd big - only 16k lines were left in the OISD file. And random testing a bunch of these, I didn't find any examples of duplicate top level domains anyway. Removing exact line duplicate method is very fast to process. Even if there were a few thousand lines extra due to domains, the redundancy / increased resourse useage isn't relatively large.

I think stick to removing exact line duplicates, especially if users are a little discerning when selecting multiple lists. This duplicate method can always be revisted in future if the need ever arises, but I really can't see that it will need to change.