[WIP] smartblock - Large blacklists with minimal overhead

Inspired by Blokada's "Smart List" feature I built this to have something similar on OpenWrt. It allows you to use huge blacklists with minimal run-time overhead by only actually including domains that get/got accessed into the "live" list used by dnsmasq. It works by enabling dnsmasq's logging facility and a small "service" that saves requested domain names for processing later. Then, once a day the log is processed (using cron job) and any domain matching an entry in the configured blacklists are promoted and become "live", so that any further DNS requests for those domains are blocked after that.

The advantages are that it's very efficient in terms of RAM overhead and runtime performance (lookup time), since only what is really needed is actually blocked. The disadvantage is that you might have to put up with ads/tracking for a day depending on what you do, like visit a new website with previously unknown ads.

It's not a stand-alone adblock solution, but piggy-backs off (simple)-adblock to do provide the back-end, so to speak. Currently importing lists and auto-configuration is only supported for simple-adblock though, since that's what I use, but manual configuration is pretty simple, anyway.

It's still WIP and could be improved in some places, like adding regular adblock support, using binary search in the processing stage and actually using procd for service management (I'll need to check my backups for my dynamic ipv6 firewall updater script where I've already done it, which I'm also going to share some time later). And of course general cleanup here and there. Still working on it, but it's been doing well for the fast two days.

Installation is simple. Put this script:

#!/bin/sh

if [ -e /lib/functions ]; then
    . /lib/functions.sh

    config_load smartblock
    config_get BASE_PATH general path "/var/smartblock/"
    config_get BLOCKLIST general blocklist "blocklist.txt"
    config_get SEENLIST general seenlist "seen.txt"
    config_get LOGLIST general loglist "log.txt"
    config_get SMARTLIST general smartlist "smartlist.txt"
    config_get SMARTLIST_PATH general smartlist_path "/www/smartlist.txt"
    config_get SMARTLIST_URL general smartlist_url "127.0.0.1:/smartlist.txt"
fi

if [ ! -e /etc/config/smartblock] && [ $1 != "-i" ] && [ $1 != "-d" ]; then
    echo No config file found at /etc/config/smartblock.
    echo Run this script with arg: -i to install!
    exit 1
fi

# For debugging
if [ $1 = "-d" ]; then
    shift
    BASE_PATH="./"
    BLOCKLIST="blocklist.txt"
    SMARTLIST_PATH="./smartlist.txt"
    HOSTS_SOURCES="https://adaway.org/hosts.txt"
fi

## Blocklist updating
# By default, runs once a week via cronjob and updates the local list
# of blocked domains.
function update_domains() {
    echo "Fetching domain source: $1"
    rm tmplist.txt 2>/dev/null
    uclient-fetch "$1" -O tmplist.txt
    cat tmplist.txt | grep -v -x "" | grep -v "#" >> "${BASE_PATH}/${BLOCKLIST}.tmp"
}

function update_hosts() {
    echo "Fetching domain source: $1"
    rm tmplist.txt 2>/dev/null
    uclient-fetch "$1" -O tmplist.txt
    cat tmplist.txt | grep -v -x "" | grep -v "#" | cut -d " " -f 2 >> "${BASE_PATH}/${BLOCKLIST}.tmp"
}

function update_lists() {
    cd "${BASE_PATH}"

    config_list_foreach domains url update_domains

    config_list_foreach hosts url update_hosts

    rm tmplist.txt 2>/dev/null

    sort "${BASE_PATH}/${BLOCKLIST}.tmp" | uniq > "${BASE_PATH}/${BLOCKLIST}"
    rm "${BASE_PATH}/${BLOCKLIST}.tmp"
}

# ## Logged domain name processing
#
# Runs once a day via cronjob and checks if the logged domains are
# part of any blacklists. A list of previously visited domains is
# maintained as an optimization, so that the potentially very large
# blacklist can be skipped when checking previously checked domains
# TODO: implement binary search instead
check_domain() {
    grep -x $1 "${BASE_PATH}/${SEENLIST}" >/dev/null || {
        echo $1 >> ${BASE_PATH}/${SEENLIST}
        grep -x $1 "${BASE_PATH}/${BLOCKLIST}" >/dev/null
    }
}

process_logs() {
    echo LOGLIST: $LOGLIST
    echo SEENLIST: $SEENLIST

    cd "${BASE_PATH}"
    sort "${LOGLIST}" | uniq > "${LOGLIST}.tmp"
    rm "${LOGLIST}"

    for domain in `cat "${LOGLIST}.tmp"`; do
        echo domain: $domain
        if check_domain $domain; then
            echo $domain >> "${BASE_PATH}/${SMARTLIST}"
        fi
    done

    rm ${LOGLIST}.tmp ${LOGLIST}.tmp2
}

update_smartlist() {
    rm "${SMARTLIST_PATH}"
    ln -s "${BASE_PATH}/${SMARTLIST}" "${SMARTLIST_PATH}"
}

# ## Domain name query logging
# # Logs all DNS queries ran through dnsmasq for processing later. By
# # default processing happens once a day.
log_queries() {
    logread -f -e "dnsmasq.*query" |
        cut -s -d " " -f 11 |
        while read -r domain; do
            logger -t smartblock "Logged domain: ${domain}"
            logger -t smartblock "$BASE_PATH/$LOGLIST"
            echo $domain >> "${BASE_PATH}/${LOGLIST}"
        done
}

## (Un)/Installing
# Installs default configuration file, imports lists from
# simple-adblock and replaces (or restores) simple-adblock's lists
# with the smart list. Also sets and restores dnsmasq's query logging
# setting.
install_smartblock() {
    if [ ! -e /etc/config/smartblock ]; then
        install_config
    fi

    import_lists
    replace_simple_adblock_lists
    setup_cronjobs
    install_deps
    setup_dnsmasq
    install_service
    restart_simple_adblock
}

install_config() {
        cat <<EOF > /etc/config/smartblock
config smartblock general
       option path '/var/smartblock'
       option blocklist 'blocklist.txt'
       option seenlist 'seen.txt'
       option loglist 'log.txt'
       option smartlist_path '/www/smartlist.txt'
       option smartlist_url '127.0.0.1/smartlist.txt'

config original dnsmasq

config original simple_adblock

config sources domains
       list url ''

config sources hosts
       list url ''
EOF
}

add_domains() {
    uci add_list "smartblock.domains.url=${1}"
}

add_hosts() {
    uci add_list "smartblock.hosts.url=${1}"
}

import_lists() {
    config_load simple-adblock

    config_list_foreach config blocked_domains_url \
                        add_domains
    config_list_foreach config blocked_hosts_url \
                        add_hosts

    uci commit
    config_load smartblock
}

add_adblock_domains() {
    uci add_list "smartblock.original.blocked_domains_url=${1}"
}

add_adblock_hosts() {
    uci add_list "smartblock.original.blocked_hosts_url=${1}"
}

replace_simple_adblock_lists() {
    config_load simple-adblock

    config_list_foreach config blocked_domains_url \
                        add_adblock_domains
    config_list_foreach config blocked_hosts_url \
                        add_adblock_hosts

    uci set "simple-adblock.config.blocked_domains_url=127.0.0.1/smartlist.txt"
    uci del simple-adblock.config.blocked_hosts_url 2>/dev/null

    uci commit
    config_load smartblock
}

# Delete existing cronjobs so that there are no dupes
setup_cronjobs() {
    sed -i -e "/smartblock/d" /etc/crontab

    cat <<EOF >> /etc/crontab
0 0 * * * smartblock -p
0 0 0 * * smartblock -u
EOF
}

install_deps() {
    NEEDS=""
    opkg list-installed | grep coreutils-nohup >/dev/null || {
        echo "coreutils-nohup insn't installed.. Installing soon."
        NEEDS="coreutils-nohup"
    }
    opkg list-installed | grep "^curl" >/dev/null || {
        echo "curl insn't installed.. Installing soon."
        NEEDS="$NEEDS curl"
    }

    if [ $NEEDS != "" ]; then
        opkg update
        for need in $NEEDS; do
            opkg install $need
        done
    fi
}

install_service() {
    cat <<EOF > /etc/init.d/smartblock
#!/bin/sh /etc/rc.common

start() {
    nohup smartblock -l >/dev/null 2>/dev/null &
}

stop() {
    smartblock -k
}
EOF

    chmod +x /etc/init.d/smartblock
    /etc/init.d/smartblock enable
    /etc/init.d/smartblock start
}

# Enable logging DNS requests so that the domains can be checked later
# for potential blocking. Save original setting for uninstall (not
# implemented yet)
setup_dnsmasq() {
    ORIG_LOGQUERIES=`uci get dhcp.@dnsmasq[0].logqueries`
    uci set dhcp.@dnsmasq[0].logqueries=1
    uci set smartblock.dnsmasq.logqueries=$ORIG_LOGQUERIES
    uci commit
}

setup_simple_adblock() {
    ORIG_CACHE=`uci get simple-adblock.config.compressed_cache`
    uci set simple-adblock.config.compressed_cache=0
    uci set smartblock.simple_adblock.cache=$ORIG_CACHE
    uci commit
}

restart_simple_adblock() {
    # Make sure it exists or simple-adblock complains
    if [ ! -e "${SMARTLIST_PATH}" ]; then
        touch "${SMARTLIST_PATH}"
    fi
    /etc/init.d/simple-adblock restart
}

case $1 in
    -i)
        install_smartblock
        ;;
    -l)
        echo $$ > /var/run/smartblock.pid
        log_queries
        ;;
    -k)
        if [ -e /var/run/smartblock.pid ]; then
            kill `cat /var/run/smartblock.pid`
            rm /var/run/smartblock.pid
        fi
        ;;
    -p)
        process_logs
        update_smartlist
        /etc/init.d/simple-adblock restart
        ;;
    -u)
        update_lists
        ;;
    -r)
        rm "${BASE_PATH}/${SEENLIST}"
        rm "${BASE_PATH}/${SMARTLIST}"
        process_logs
        /etc/init.d/simple-adblock restart
        ;;
esac

somewhere on your OpenWrt box and run sh /path/to/script -i. If you're using simple-adblock everything should be set up automatically. It requires curl and coreutils-nohup which will be installed automatically if not installed. If you don't want to wait until midnight for the log processing to happen you can also run

smartblock -p
and it will immediately start blocking whatever domain it finds in the logs so far.
There's also
smartblock -u
to update your blocklists immediately (happens once a week otherwise, via cronjob)
smartblock -r
to reset the smartlist

Manual installation (for regular adblock) involves the following:

  1. Place default config in /etc/config/smartblock
  2. Add as many lists as you like
  3. ln -s /var/smartblock/smartlist.txt /www/smartlist.txt
  4. Copy cronjob definitions into /etc/cronjob
  5. Copy script somewhere into $PATH (/bin, /usr/bin, etc.)
  6. Copy service file into /etc/init.d/smartblock and run /etc/init.d/smartblock enable/start
  7. Configure adblock to use 127.0.0.1/smartlist.txt as only domain-based list

It should work after that point. Alternatively you could probably also just run the installer and just configure adblock appropiately. It should work except for some error messages because of the lack of simple-adblock, but I haven't tested this.

To uninstall:

  1. Delete the script
  2. Delete /etc/init.d/smartblock
  3. rm -rf /var/smartblock
  4. rm /etc/config/smartblock
  5. Restore (simple-)adblock config

If you have bugs, suggestions or ideas, (or patches.. :)), I'd be happy to hear from you.

1 Like