My final blacklist is a around 511K lines and there is no chance it will ever get smaller, so I decided to do some major performance benchmarking. The router is GL-B1300 Atheros IPQ4028, Quad-core ARM,717MHz with 256MB RAM
and it is currently taking ~1 hour to prepare the list.
3.8.14 has already dropped the processing time from ~65 minutes to ~58 minutes (probably due to a simpler regex).
For the tests below I made a copy of the Adblock work directory and put all the files are under /tmp/test1.
Then I compared egrep from BusyBox against grep -E from the grep package and redirected all output to /dev/null to eliminate the filesystem, etc impact. The difference is 30m vs 30s.
time /bin/egrep -vf /tmp/test1/tmp.rem.whitelist /tmp/test1/adb_list.overall | /usr/bin/awk '{print "server=/"$0"/"; }' > /dev/null
real 31m 8.40s
user 30m 49.43s
sys 0m 13.96s
vs
time /usr/bin/grep -Evf /tmp/test1/tmp.rem.whitelist /tmp/test1/adb_list.overall | /usr/bin/awk '{print "server=/"$0"/"; }' > /dev/null
real 0m 33.63s
user 0m 22.75s
sys 0m 1.10s
The next issue is the writes. The script is using ">" and ">>" and those are writing every single line (one at a time) to flash, which is slow and even more so if the router is using JFFS2 (compressed filesystem). Any basic buffering here should improve the throughout tremendously and I will compare "> ./t" against "| tee ./t >/dev/null": 37m vs 3m. BTW, writing files under /tmp would also improve by a lot if redirects are replaced with tee or tee -a.
time /usr/bin/grep -Evf /tmp/test1/tmp.rem.whitelist /tmp/test1/adb_list.overall | /usr/bin/awk '{print "server=/"$0"/"; }' > ./t
real 37m 3.74s
user 0m 23.00s
sys 0m 1.01s
vs
time /usr/bin/grep -Evf /tmp/test1/tmp.rem.whitelist /tmp/test1/adb_list.overall | /usr/bin/awk '{print "server=/"$0"/"; }' | tee t >/dev/null
real 3m 22.62s
user 0m 23.26s
sys 0m 1.19s
@dibdot Do you mind reviewing these results? In this particular use case, the processing time could be dropped from ~1 hour down to ~5 minutes for a massive blacklist. What is also important is that the buffering will minimize the writes to flash thus extending its life.
As a quick all-in test, I modified one line in adblock.sh and the time to prepare adb_list.overall immediately dropped from 58m down to only 7m (I think one or so more minutes can be shaved off by using buffering via tee for temporary files).
diff adblock.sh /usr/bin/adblock.sh
674c674
< egrep -vf "${adb_tmpdir}/tmp.rem.whitelist" "${adb_tmpdir}/${adb_dnsfile}" | eval "${adb_dnsdeny}" >> "${adb_dnsdir}/${adb_dnsfile}"
---
> grep -Evf "${adb_tmpdir}/tmp.rem.whitelist" "${adb_tmpdir}/${adb_dnsfile}" | eval "${adb_dnsdeny}" | tee -a "${adb_dnsdir}/${adb_dnsfile}" >/dev/null
I have not noticed any changes in the memory utilization (there 256MB of RAM on this router).
UPDATE: I did a few more tests by installing coreutils-tee, gzip, and tar packages one at a time and they made no difference performance wise. So only the two changes below provide a major performance boost:
-
grep & grep -E (from grep) instead of grep & egrep (from BusyBox)
-
tee & tee -a instead of > & >>