I think it’s worth revisiting why the allowlist processing still needs to iterate through each part of the full-qualified domain name. Since you also add the server=/example.com/#
syntax, allowlist processing can be “lazier” when comparing against the blocklist.
For any given allowed domain, there are these scenarios:
- An exact match between the allowlist and the blocklist.
- Allowlist entry not found in the blocklist at all
- Allowlist entry matches multiple entries in the blocklist (allowlist entry is a higher level domain)
- Allowlist entry is a sub-domain of a blocked domain
In my experience, 2 and 4 are covered by the server=/example.com/#
syntax. 1 is solvable by that syntax or explicit removal from the blocklist.
#3 is generally solvable by searching for the allowed domain like "(^|/|\.)example.com(/|$)"
where it can match an exact match or any blocked subdomains, depending on the file format.
Just something to think about after reading @antonk question in the other awk thread.