Lynx
81
This seems to work:
and should simplify performance analysis between different runs and versions.
Processing time for blocklist generation and import: 1m:29s.
2 Likes
Lynx
82
Another great improvement it seems. But I’m slightly lost - I wonder if you could sunmarize the changes for me to make since my last commit?
antonk
83
@Lynx I pulled this from a project where I needed to measure time spent performing a command. IIRC precision is +-10ms on OpenWrt. Now I modified this a bit for generic use and haven't tested after modification but I think this should work. Call it this way: measure_time <any command>
measure_time() {
get_uptime() {
read -r uptime_raw dummy < /proc/uptime
curr_time="${uptime_raw%.*}${uptime_raw#*.}0"
}
exec_command="$1"; shift
get_uptime
start_time="$curr_time"
$exec_command "$@"
get_uptime
echo "*** time: $(( curr_time - start_time )) ms"
}
1 Like
Cool no worries.
Master ran in this order:
sed | awk > file > rogue check > gzip
in the test branch tried this to leverage piping straight into gzip
rogue check > cat | tr | sed | awk | gzip
But this would require, although not yet implemented, the rogue check to run more processing (some duplicate commands from the sed before the awk)
So the best overall flow is still:
cat | tr | sed | awk > file > rogue check > gzip
antonk has revised the sed rogue check into grep also. No actual speed benefit here, but some other benefits were pointed out, see what you think
1 Like
Lynx
85
Yes I like the overall concept and use of /proc/uptime here.
I have weaved this in like so:
get_uptime_ms()
{
read -r uptime_ms _ < /proc/uptime
printf "${uptime_ms%.*}${uptime_ms#*.}0"
}
get_elapsed_time_str()
{
# To use, first set initial uptime: initial_uptime_ms=$(get_uptime_ms)
# Then call this function to get elapsed time string at desired intervals, e.g.:
# printf "Elapsed time: $(get_elapsed_time_str)\n"
elapsed_time_s=$(( ($(get_uptime_ms)-${initial_uptime_ms:-$(get_uptime_ms)})/1000 ))
printf '%dm:%ds' $((elapsed_time_s/60)) $((elapsed_time_s%60))
}
2 Likes
Lynx
86
@Wizballs before this we had:
So I got the line number and match I think(?). But our new code just prints out the rogue element I think, right? But perhaps we don't need line number?
I think as long as which list and what rogue element is identified, the line number probably isn't really necessary, but still a nice addition.
grep speed is still the same as sed also. Some other benefits that antonk pointed out, but does add more lines to the code. So could go either way on this one. Leave as is, or change?
antonk
88
If you only care about seconds then you could simplify this a bit:
get_uptime_s()
{
read -r uptime_s _ < /proc/uptime
printf %s "${uptime_s%.*}"
}
get_elapsed_time_str()
{
# To use, first set initial uptime: initial_uptime_s=$(get_uptime_s)
# Then call this function to get elapsed time string at desired intervals, e.g.:
# printf "Elapsed time: $(get_elapsed_time_str)\n"
elapsed_time_s=$(( $(get_uptime_s)-${initial_uptime_s:-$(get_uptime_s)} ))
printf '%dm:%ds' $((elapsed_time_s/60)) $((elapsed_time_s%60))
}
1 Like
Lynx
89
Is this what you mean:
It doesn't seem to improve the performance for me - I think it takes a few seconds longer. Is that expected?
Lynx
90
True, albeit won't working with milliseconds give a slightly more accurate determination of the elapsed seconds?
This is correct again now, thanks for changing it back to this order. Yes this is expected (or should be roughly the same speed). The problem is that moving the rogue check first means that the the rogue check would need additional commands added to it, which added about 20 seconds from my trial. So changing this back is really just avoiding that otherwise required 20 seconds time increase.
1 Like
Lynx
92
Ah, I get it. I think. So the code prior to this commit was not actually doing all the necessary rogue element checking? But now we are?
antonk
93
Hmm perhaps, if you take special care about rounding. Which your code currently doesn't. This command:
$(( ($(get_uptime_ms)-${initial_uptime_ms:-$(get_uptime_ms)})/1000 ))
Always rounds down to nearest second. Which is also what my code does, just throwing away anything after the decimal point. I may be missing something...
Yes correct. Rogue check was working when running first, but it could have easily failed if there was eg whitespace in the raw blocklist. The sanitise is performing remove white space, which then doesn't need to be done again if the rouge check remains after the sanitise.
Lynx
95
@antonk I liked your grep example code. Is there any way to get grep to output / retain the line number of detection? I hacked this like so:
rogue_element=$(sed -nE '\~(^(local|server|address)=/)[[:alnum:]*][[:alnum:]*_.-]+(/$)|^#|^\s*$|^bogus-nxdomain=[0-9.]+$~d;{p;=;q}' /tmp/blocklist.${blocklist_id} | { read match; read line; [[ ! -z "${match}" ]] && echo "${line}: ${match}"; })
antonk
96
I don't quite understand how this works. Also I'm unfamiliar with the = command in sed. Does it output the line number?
Lynx
97
One for @Wizballs - I'm clueless with sed, awk and grep.
antonk
98
Well, anyway, grep can produce line numbers with the option -n, example:
# echo "bruh" | grep -n bruh
1:bruh
1 Like
Lynx
99
Any chance you can elaborate on the duplicated sanitizing aspect? I'm still a bit muddled about this in my mind. Not sure I've totally got this right in the latest commit now.
Lynx
100
In that case I'll force push over with your grep code. Better.