Oops, this was copy-pasted from my test. Should read as use_allowlist==1.
Basically this is a shortened if statement. If use_allowlist equals 1 then print current line, skip the rest of the commands, go to next line.
Edit: Bruh, you also wanted it to be NOT equal 1. So use_allowlist!=1
Shell code is not pretty in general but I don't see anything particularly ugly in this one. Think about performance. Your native shell code performs one if evaluation, then prints the entire file. The awk command performs as many if evaluations as there are lines in the file, and it reads and prints the file line by line. Now this is ugly, from my perspective.
And ye, I don't see any way to make it look more elegant or avoid using cat.
Edit:
Looking closer at the awk command, I'm wondering if it can be improved for better performance. Looks like the idea is to eliminate duplicate ipv4 addresses. What i don't understand is why it needs to split each address into octets and then run a for loop on each address making 4 comparisons with [1,2,3,4] octets rather than comparing the whole address.
Oh I incorrectly assumed that it was comparing ip addresses rather than URLs. I'd like to further understand this command though. Could you perhaps post sample input files /tmp/blocklist.${blocklist_id} and /tmp/allowlist?
I'll drop in some short examples here if that also helps. Keep in mind we are often running a blocklist of maybe 1,000,000 entries and an allowlist of 1,000 entries
I gotta say, I tried to optimize the awk command, but I found that I can't improve on existing code, at least when running on the sample files you posted.
I read over this and thought ok, we can wrap all the allowlist entries in both local=/xxx/ and also .xxx/ (so essentially doubling the allowlist enties, but who cares if faster overall right). And it would achieve the same result as current awk.
before doubling the entries, I tried as is, in the current list formats:
1,500 allowlist entries in example.com format
1.4 millon blocklist entries in local=/example.com/ format
current awk: 1m 20s
grep -v -i -f allowlist blocklist > outputfile
I cancelled it at 27 mins, and it was only 15% through processing, based on the outputted file size.
Why?
Awk is using hash value to compare.... much much faster
grep I believe is just comparing individual elements (no hashing)
So yeah. Feel free to speed test this, but my router is still cooling down
I'm just testing on my computer. Less scientific but at least I don't have to wait so long.
I was thinking that integrating the sed command into awk might speed things up, but the opposite is true. sed is just much faster. So instead here is a slightly shortened sed command which at least on my computer speeds things up by about 20% with the sample inputs.
sed 'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/; s/\([^#]\)#.*$/\1/g; s/\(#$\|[ \t]*\)//; /^\(#\|$\)/d; s/\(^address=\|^server=\)/local=/; $a\'
I don't know what some parts of this command do, so those parts I didn't touch. For instance, what does this do: s/\([^#]\)#.*$/\1/g ? I'm confused by \1 here. And what does this do $a\?
sed being 20% faster than awk is pretty much what we found in most cases, and therefore use sed wherever possible. Ok cool let me try your sed version on the router itself - always good to test final performance on the SoC itself.
s/([^#])#.*$/\1/g
This removes # comment lines, and also trailing comments eg example.com # trailing comment
$a
adds a new line at bottom of list, helpfull when combining multiple files