The function within function limits the scope of the inner function to the namespace of the outer function. Simply put, the inner function is only available within the outer function. I've used this in multiple places in adblock-lean, and the reason is mainly to make it visually clear that the inner function is a subroutine of the outer function and has no other applications within the larger program. I think this makes the code more readable. In this particular instance, this has a second purpose. Between lines 1602 and 1617, we are creating a subshell. Within that subshell, we are sourcing the downloaded newer version. This will replace all global function definitions with the function definitions from the sourced newer version (doing this within a subshell ensures that the replacement is temporary). Since we don't have the information about these functions in the future version or how they changed, but we want to have certainty that our code executes correctly, we make sure that the functions we rely on are local, so won't be replaced.

Hope this makes sense.

2 Likes

Hello,

Tried the usual battery of test on this one, and everything is behaving correctly. PR good to go from my testing!

  • Multiple combinations of local/downloaded blocklist/allowlist
  • Rogue elements handled correctly
  • Max single/total file size handled correctly
  • Min single/total line counts handled correctly

That is what I was thinking. It might be a bit ambitious I guess due to the variability of available ram. And there doesn't seem to be a reliable way to determine file size before download using Owrt built-in tools.

2 Likes

Great work by you, as always.
Any difference in performance?

1 Like

I suppose we could implement the memory use restriction via the existing size-limiting mechanism. A.k.a head -n[X]k. One possibility would be something like this (simplified):

F=$(get_free_memory)
X=$(calculate_max_list_size_kb ${F})
for part in parts
do
    process_part | head -n${X}k
    
    total_parts_size=$((total_parts_size+this_part_size))
    if [ $this_part_size -ge $X ] || [ $total_parts_size -ge $X ]
    then
        error_out
        exit
    fi
done

The other challenges of this approach still apply, but this way we don't need advance knowledge of the remote file sizes.

All good, just trying to help out where I can :+1:. Performance seems pretty much identical excluding that removal of sed delete empty lines after sort.

1 Like

That's a pretty good approach given the restrictions. Would it still essentially download and process each file before piping it through | head ?
But maybe it is just best to keep memory management simple and leave it up to the user if that is the case. Ie something simple such as an over-large downloaded file would be enough to OOM in low ram circumstances.

We already have the head command in the part's processing pipe, so the above idea would utilize it. I believe it's right after the download if that matters. In the above sketched example, I just wanted to illustrate the algorithm, not necessarily the order in which the pipe is processing data.

I'm not sure I'm following. These two sentences seem to contradict each other :slight_smile:
It's not like I feel strongly about implementing this idea, but I'm not sure there is a good reason to dismiss it either.

1 Like

Let me put it this way: with enough safety margin, we can essentially prevent OOM's with this sort of algorithm. The question is: how big the safety margin will be? The size of the safety margin is a function of how precise we can estimate the max size of the blocklist which will not cause OOM, given the known free memory.

1 Like

I'm not strongly inclined to go that way either. Was really just an idea for discussion. As it would change the way adblock-lean is configured somewhat, so would definitely need 'majority vote' before even considering this anyway. Happy to park this for now though, it would only be low priority anway.

I should have separated those two sentences a little better lol. What I was trying to understand was if a file is downloaded with uclient-fetch, then piped through head, is the whole file downloaded first into ram, then piped through head?

Eg if I try to download a 600mb blocklist (which would definitelyOOM my router), is it possible to head only the first say 100mb of that, without downloading the remaining 500mb into ram?

Only for discussion sake, but this may have to be user-configurable anyway. Even with a buffer, there is not point in pushing ram usage to the absolute maximum. The user may want to keep some extra ram avail for regular router duties and any other installed packages etc.

Yes, this is exactly what head -c [X]k does. It stops printing data after processing X KB. So if X=1024 then it will stop after 1MB and no more data will get into memory.

1 Like

There could be an option of this sort but this is a separate thing from the safety margin I'm talking about.

I'll give an example. Let's say that you need to fit a table into a corner of your room, and that corner has exactly 1 meter by 1 meter of available space. Let's say that you want to maximize the usable surface of the table within this limitation. So ideally, you would want to buy a table exactly 1m x 1m size. However, there is a problem: tables in the real life have some deviation in size, and your measuring precision is limited by various factors, for example the precision of your ruler. So in order to avoid the risk of buying a table which would be just slightly too big (and so completely useless), you would have to allow for some "safety margin". The size of that margin depends on the actual table size deviation from the stated size, and on the precision of your ruler. Now if you are not the final user of this table but it's your responsibility to deliver a fitting table for someone else, does it make sense to ask them about the safety margin? I say no, because the user has no idea about the deviation, or the precision of your tools. It is up to you to calculate and measure, and deliver a product which will be guaranteed to fit. A question which you could ask the user would be "how much additional free space in that corner would you like to have?" - but this is unrelated to the safety margin.

I hope this makes sense.

1 Like

So I have a working implementation of session log. Turned out a bit trickier than I was expecting, because we need to care about preserving the previous session log till verified that we have the lock (a.k.a. the lockfile).

So essentially I had to tie session log writing to having acquired the lock file. Which means that for commands which do not need the lock file, no session log is recorded. Also no log is recorded at the early stages of commands initialization, before the lock is acquired.

Specifically with the current implementation, these commands trigger writing session log:
start, boot, stop, pause, resume

These commands do not:
status, gen_config, gen_stats, help, enable, disable, enabled

Commands restart, reload are handled as a composite of stop and start, and the lock is first acquired when the stop command is executed, so this is where the session log starts to get written.

I went a bit further and implemented a second separate log, exclusive to the update command. So with the current implementation, there will be abl_session.log and abl_update.log. This in order to avoid the update log from overwriting the session log and vice versa.

The report_failure() function is now called with the current session log attached to the message (passed via the 1st argument).

Also the luci_log variable is set when current log is available (either session log or update log). @Ree9

I don't want to torture @Lynx too much, so for now this is on my Github:

https://github.com/friendly-bits/adblock-lean/tree/Implement_session_log

Testing and comments are welcome, of course. BTW should the session log register date and time?

And @Lynx let me know when you are ready to accept more PR's - no rush :slight_smile:

Edit: forgot to mention, the print_log command is also implemented, which prints the most recent session log (not the update log).

Edit2: bruh, I created the branch on Github but forgot to push the commit. Now fixed.

2 Likes

Nice idea. So now one can call up the last log from adblock-lean itself without logger (which could have rotated out of the relevant log lines anyway), right?

Should we use /var/log for this?

1 Like

Yes. This has several applications:

  • Make the current session log available to the luci app
  • Make the current session log survive log rotation
  • Make the current session log easy to retrieve separately from previous session logs (which can be utilized by the custom script to get context for the failure messages, or directly by the user)

Perhaps, if it's volatile. Somehow I didn't think of it.

1 Like

Yes. Excellent.

So I think we probably should use /var/log - this is indeed volatile and I think where log files are expected to go. We could use log and log.old for the previous log file on any rotation if you think that's helpful.

This reminds me of the log file handling that I put together for cake-autorate with buffered writes, flushing, rotation, etc:

1 Like

Changed to /var/log in current revision.

I'll look into implementing this. Shouldn't be too hard.

Wait I'm not saying we should necessarily. Just mentioning the possibility. Is there benefit in retaining a log of the old run? I'm not sure.

1 Like

I'm not sure either. Probably most of the time this would not be necessary, but it's fairly easy to imagine an occasional scenario where this could be useful. Like the user tried to run the start command, got an error, then they want to see if there were errors at boot.

1 Like

Current revision implements simple log rotation (now 1 previous log is kept). Didn't take much work or much code.

This question is still unanswered:

1 Like