Sysupgrade config preserve question/details

Hi,

I'm compiling regularly images with quite a big amount of custom files using the "files" sub directory in source.
On different devices I adjust/extend some of those files once in a while and sporadically I sync them back into the source/files directory so that the next build contains the newest file set.
But since it's more important for me that a re-flash keeps the version of the current device, I also add all files/directories to sysupgrade.conf and perform a re-flash with keeping the configuration.

If I understand correctly, then all to be preserved files will be still copied back even though the new rom files would contain the same version.

At least overlayfs does not handle duplicate files:

root@test:/etc# ll /overlay/upper/etc/banner
ls: /overlay/upper/etc/banner: No such file or directory
root@test:/etc# ll /etc/banner
-rw-r--r--    1 root     root           441 Feb 20 18:13 /etc/banner
root@test:/etc# cp -p /etc/banner /tmp/banner
root@test:/etc# cp -p /tmp/banner /etc/banner
root@test:/etc# ll /overlay/upper/etc/banner
-rw-r--r--    1 root     root           441 Feb 20 18:13 /overlay/upper/etc/banner

So here my question: is there an option or any magic which doesn't copy config files after re-flashing if the rom contains already the same version?
And if not, wouldn't it make sense to add such a feature, or is there some hook where I could perform the necessary diff checks and thereby filter the configuration files to be written?

The alternative would be only to take a sysupgrade backup.tgz remove from there the files that the new rom will contain anyway, and then flush with this adjusted configs.

by default I think it watches for /etc/config not entire /etc, this can be modified using
https://wiki.openwrt.org/doc/howto/notuci.config#etcsysupgradeconf

The "magic" is on OverlayFS, it is in charge of recognize if file was touched or not, new files or deleted files are tracked too. More info at: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/overlayfs.txt
In your sample, since you are using cp, you are forcing to create a new inode... if you use:

root@test:~# ls /overlay/upper/etc/banner
ls: /overlay/upper/etc/banner: No such file or directory
root@test:~# cat /etc/banner > /tmp/banner
root@test:~# cat /tmp/banner > /etc/banner
root@test:~# ls /overlay/upper/etc/banner
/overlay/upper/etc/banner

Will happen the same, since the file is touched when it is written, but inode remains the same.

# Here is the interesting thing, if you delete file from Overlay:
root@test:~# rm /overlay/upper/etc/banner
# it will actually be deleted
root@test:~# ls /overlay/upper/etc/banner -lath
ls: /overlay/upper/etc/banner: No such file or directory

No matter if later you create again or put other contents, file will not be linked anymore OverlayFS will ignore it, I don't know why.
update: in some way the /overlay were "dirt" when doing this, if I reboot (remount) the contents of /etc/banner are again the same as /overlay one.

I was showing this example because when config files get restored after flashing a new rom where the new rom contains already the same files as the config backup, then they still make it into overlay/upper. So overlayfs is not doing the job of performing no-ops here.
Therefore there should be some option/magic in sysupgrade which doesn't copy identical files.

Given that sysupgrade should have some space to play with, what with some post-upgrade
packages not being installed yet, it could probably get away with doing some sort of dedup
fsck once it has booted... it has to run some fixups on some platforms anyway. Where magic
is needed is in opkg so minor upgrades to base packages in the lower filesystem can prevent
overlays where possible.

yes, but I'm actually looking for something where it is done earlier, so that I don't have to worry about missing space.
E.g.: 1MB of compressed files which require this 1MB in rom, but extracted into overlay/upper may take 3MB.
If this 1MB directory is added/configured into sysupgrade.conf, as well as compiled via source/files into rom, then only the initial clean flash may work, but a sysupgrade with restoring configs may run out of space.

Sysupgrade will not know if file is actually on upper overlay or not, when backup is restored, overlayfs have nothing to do, it doesn't actively inspect or compare files in any way, even is their stats like modified date are same, nor inspect contents either.

OverlayFS, at least on the OpenWRT / LEDE implementation does not have any de-duplication mechanism enabled.

There is the posibilty to write a script that checks file status against overlay and add it to backup.tgz only if it was actually modified over readonly rootfs version. This can also be done while restoring it, but is easier to do on creation.
May be easier than cleaning in some manner existing files on upper OverlayFS.

You could try to use:

uci export > backup.uci
uci import < backup.uci
# or is?
uci -f backup.uci import

That will backup all uci config files from /etc/config
In usual enviroments is more than enough configuration, if you have other files you could manually create a tar.gz .tgz file, if you use same path guidelines as original backup mechanism then it will be compatible too.

OK. So to recap: there is currently no such feature, and from the responses so far it looks like I'm the only one really interested in such a feature.
Then back to the last question of my original post: where do I have to hook in a "tar" wrapper which performs the necessary checks at the time when the backup.tar gets extracted over the newly flashed rootfs.

Possible tar wrapper.

#!/bin/sh

BASE="$(basename "$0")"

usage() {
    cat <<EOF >&2
    Usage: $BASE [--ignore-meta] [--checksum=SUM] TAR_FILE DEST_DIR [TAR_OPTS]
        extracts given tar file by skipping identical files in DEST_DIR
        TAR_OPTS pass further 'tar' options to underlying 'tar' command
                 "-C \$DEST_DIR" is already added

        --ignore-meta   base identical file checks only on its content and
                        ignore permissions, ownership and times.
        --checksum=SUM  use "SUM" binary for checksum building when comparing contents.
                        Default is to use "diff" for an exact content comparison.
    RETURN 0 on success.
$*
EOF
    exit 1
}

STAT="stat"
CHECKSUM=

while [ -n "$1" ]
do
    case "$1" in
        --ignore-meta)  STAT=""; shift ;;
        --checksum=)    CHECKSUM="${1#--checksum=}"; shift ;;
        *)              break ;;
    esac
done

[ -z "$1" ] && usage "Missing arguments"
TAR="$1"
shift
[ ! -f "$TAR" ] && usage "Not file: '$TAR'"


[ -z "$1" ] && usage "Missing DEST_DIR"
DEST_DIR="$1"
shift
[ ! -d "$DEST_DIR" ] && usage "Not a directory: '$DEST_DIR'"

[ -n "$STAT" ] && ! type "$STAT" >/dev/null 2>&1 && usage "Requires '$STAT' package without --ignore-meta"

if [ -z "$CHECKSUM" ]
then
    type diff >/dev/null 2>&1 || usage "Requires 'diff' package without --checksum"

    if [ -n "$STAT" ]
    then
        cmp_file() {
            local s1 s2
            s1="$(stat -c "%A %U:%G %s %Y" "$1")" || return 1
            s2="$(stat -c "%A %U:%G %s %Y" "$2")" || return 1
            [ "$1" != "$2" ] && return 1

            diff -q "$1" "$2"
            return $?
        }
    else
        cmp_file() {
            diff -q "$1" "$2"
            return $?
        }
    fi
else
    if [ -n "$STAT" ]
    then
        cmp_file() {
            local s1 s2 c1 c2
            s1="$(stat -c "%A %U:%G %s %Y" "$1")" || return 1
            s2="$(stat -c "%A %U:%G %s %Y" "$2")" || return 1
            [ "$1" != "$2" ] && return 1

            c1=$("$CHECKSUM" "$1") || return 1
            c2=$("$CHECKSUM" "$2") || return 1

            [ "$c1" = "$c2" ]
            return $?
        }
    else
        cmp_file() {
            local c1 c2

            c1=$("$CHECKSUM" "$1") || return 1
            c2=$("$CHECKSUM" "$2") || return 1

            [ "$c1" = "$c2" ]
            return $?
        }
    fi

fi

CTAR="$(readlink -f "$TAR")"
DUPS="${TMP:-/tmp}/tmp-$BASE.dups.$$"
touch "$DUPS"
WORK="${TMP:-/tmp}/tmp-$BASE.work.$$"

tar -t "$@" -f "$CTAR" | while read -r FILE
do
    [ ! -f "$DEST_DIR/$FILE" ] && continue

    mkdir -p "$WORK" || return 1
    tar -x "$@" -f "$CTAR" -C "$WORK" "$FILE" || return 1

    [ cmp_file "$WORK/$FILE" "$DEST_DIR/$FILE" ] && echo "$FILE" >>"$DUPS"

    rm -rf "$WORK"
done
RET="$?"

rm -rf "$WORK" 2>/dev/null

[ "$RET" -ne 0 ] && exit "$RET"

echo "Skipping identical file(s):" >&2
cat "$DUPS" >&2

tar -x "$@" -f "$CTAR" -C "$DEST_DIR" -X "$DUPS"
exit $?

Oh, I'd be interested, for sure.

ISTR the tar does not get extracted over the newly flashed rootfs, rather it gets converted into a jffs2 partition containing the initial contents of the upper layer and flashed directly, along with any adjustments needed to the image length/checksums to make the bootloader happy with the image.

meanwhile I was looking a bit around the base sources.
And there I also figured that sysupgrade binary is passing sysupgrade.tgz to mtd. So it gets flashed as it is (no extraction here).
But looking into /lib/preinit/80_mount_root it looks like after reboot it still just exists as plain tar file and gets extracted there. And my understanding is that at that point root (/) is already the final root file system and not some overlay/upper.

Can anyone confirm this?

Here would be the final patch.
I tested the /lib/upgrade/untar-minimal.sh separately with various setups (installed diff, stat) .
But before I screw up by device I would like some confirmation that I'm not doing complete crap here.

Thanks

--- /rom/lib/preinit/80_mount_root
+++ /lib/preinit/80_mount_root
@@ -8,7 +8,12 @@
        [ -f /sysupgrade.tgz ] && {
                echo "- config restore -"
                cd /
-               tar xzf /sysupgrade.tgz
+                if [ -x /lib/upgrade/untar-minimal.sh ]; then
+                       echo "- config restore minimal -"
+                        /lib/upgrade/untar-minimal.sh /sysupgrade.tgz /
+                else
+                       tar xzf /sysupgrade.tgz
+                fi
        }
 }
 
--- /dev/null
+++ /lib/upgrade/untar-minimal.sh
@@ -0,0 +1,141 @@
+#!/bin/sh
+
+BASE="$(basename "$0")"
+
+usage() {
+    cat <<EOF >&2
+    Usage: $BASE [-v] [-d] TAR_FILE [DEST_DIR]
+        extracts TAR_FILE to DEST_DIR (default is '/') and tries to avoid
+        extracting identical files with same content, permissions, mtime and ownership.
+        This avoids occupying unnecessary flash space on overlayfs upper file systems.
+        File comparison is done using 'diff', 'sha512sum' or 'sha256sum' depending
+        on what is installed on the system and priorized in given order.
+        Falls back to standard 'tar' if none of the above is available.
+        -v verbose output
+        -d dry run. Show only, but don't change anything in \$DEST_DIR
+    RETURN 0 on success
+$*
+EOF
+    exit 1
+}
+
+DRYRUN=""   # usefull for debugging this script
+VERBOSE=""
+[ "$1" = "-v" ] && VERBOSE="v" && shift
+[ "$1" = "-d" ] && DRYRUN="-d" && VERBOSE="v" && shift
+[ "$1" = "-v" ] && VERBOSE="v" && shift
+
+[ -z "$1" ] && usage "Missing arguments"
+TAR="$1"
+shift
+[ ! -f "$TAR" ] && usage "Not file: '$TAR'"
+
+
+[ -z "$1" ] && usage "Missing DEST_DIR"
+DEST_DIR="$1"
+shift
+[ ! -d "$DEST_DIR" ] && usage "Not a directory: '$DEST_DIR'"
+
+TYPE="z" # default assuming tgz
+[ -z "${TAR%%*.tar}" ] && TYPE=""
+[ -z "${TAR%%*.bz}" ] && TYPE="j"
+[ -z "${TAR%%*.tbz}" ] && TYPE="j"
+[ -z "${TAR%%*.xz}" ] && TYPE="J"
+[ -z "${TAR%%*.txz}" ] && TYPE="J"
+
+if type stat >/dev/null 2>&1
+then
+    [ -n "$VERBOSE" ] && echo "Using 'stat' for file permission, mtime and ownership comparison" >&2
+    cmp_stat() {
+        local s1 s2
+        s1="$(stat -c "%A %U:%G %s %Y" "$1")" || return 1
+        s2="$(stat -c "%A %U:%G %s %Y" "$2")" || return 1
+        [ "$s1" = "$s2" ]
+        return $?
+    }
+else
+    LS="/bin/ls"
+    [ ! -x "$LS" ] && LS="/usr/bin/ls"
+    [ -n "$VERBOSE" ] && echo "Using '$LS' for file permission, mtime and ownership comparison" >&2
+    # compare size mtime permissions and ownership of given two files
+    cmp_stat() {
+        local s1 s2
+        [ ! -f "$1" ] && return 1
+        [ ! -f "$2" ] && return 1
+        # tricky part since 'ls' output may not be stable. Skipping inode and file name
+        # Example "ls -le" output of busybox ls
+        # -rw-r--r--    1 root     root           441 Mon Feb 20 18:13:44 2017 /etc/banner
+        s1="$("$LS" -le "$1" | sed -e 's/[\t ]\+/ /g' | cut -d' ' -f '1,3-10')" || return 1
+        s2="$("$LS" -le "$2" | sed -e 's/[\t ]\+/ /g' | cut -d' ' -f '1,3-10')" || return 1
+        [ "$s1" = "$s2" ]
+        return $?
+    }
+fi
+
+tar_cmp() {
+    local ret
+    local ctar="$(readlink -f "$TAR")"
+    local dups="${TMP:-/tmp}/tmp-$BASE.dups.$$"
+    local work="${TMP:-/tmp}/tmp-$BASE.work.$$"
+    touch "$dups"
+
+    tar "-t$TYPE" -f "$ctar" | while read -r FILE
+    do
+        [ ! -f "$DEST_DIR/$FILE" ] && continue
+
+        mkdir -p "$work" || continue
+        tar "-x$TYPE" -f "$ctar" -C "$work" "$FILE" || continue
+
+        cmp_files "$work/$FILE" "$DEST_DIR/$FILE" && echo "$FILE" >>"$dups" \
+            && [ -n "$VERBOSE" ] && echo "Skipping identical: $FILE" >&2
+
+        rm -rf "$work" >/dev/null 2>&1
+    done
+    rm -rf "$work" >/dev/null 2>&1
+    [ -n "$DRYRUN" ] || tar "-x$VERBOSE$TYPE" -f "$TAR" -C "$DEST_DIR" -X "$dups"
+    ret="$?"
+    rm "$dups"
+    return $ret
+}
+
+if type "diff" >/dev/null 2>&1
+then
+    [ -n "$VERBOSE" ] && echo "Using 'diff' to compare file content" >&2
+    cmp_files() {
+        cmp_stat "$1" "$2" || return 1
+        diff -q "$1" "$2"
+        return $?
+    }
+    tar_cmp
+elif type "sha512sum" >/dev/null 2>&1
+then
+    [ -n "$VERBOSE" ] && echo "Using 'sha512sum' to compare file content" >&2
+    cmp_files() {
+        local s1 s2
+        cmp_stat "$1" "$2" || return 1
+        s1="$(sha512sum "$1")" || return 1
+        s2="$(sha512sum "$2")" || return 1
+        [ "${s1%% *}" = "${s2%% *}" ]
+        return $?
+    }
+    tar_cmp
+elif type "sha256sum" >/dev/null 2>&1
+then
+    [ -n "$VERBOSE" ] && echo "Using 'sha256sum' to compare file content" >&2
+    cmp_files() {
+        local s1 s2
+        cmp_stat "$1" "$2" || return 1
+        s1="$(sha256sum "$1")" || return 1
+        s2="$(sha256sum "$2")" || return 1
+        [ "${s1%% *}" = "${s2%% *}" ]
+        return $?
+    }
+    tar_cmp
+else
+    [ -n "$VERBOSE" ] && echo "No diff, sha512sum nor sha256sum installed. Falling back to standard 'tar'" >&2
+    [ -n "$DRYRUN" ] || tar "-x$VERBOSE$TYPE" -f "$TAR" -C "$DEST_DIR"
+fi
+
+exit $?
+