Git 2.11 and commit hash abbrev length

This is mainly for fun and is not LEDE-specific, but might be useful background info for some devs if you have build scripts that use git hash abbrevs.

After updating git to 2.11 on my buildhost, I noticed that the short abbreviated git hashes (shown e.g. with git log --oneline had varying lengths in LEDE main repo, Luci, packages and routing feeds. The lengths were respectively 10, 9, 8 and 7 chars. 7 chars has been the git default for ages, so what had changed?

I got intrigued and checked the git release notes and commit log and found out that git 2.11 has introduced a calculation of the default abbrev length based on the number of objects in the git repo in order to increase probability of the abbrevs staying unique also in near future.

The release notes mention it shortly, but a good longer explanation is found in the git blog:
https://github.com/blog/2288-git-2-11-has-been-released

Also the commit message by Linus Torvalds explains the change rather clearly:
https://github.com/git/git/commit/e6c587c733b4634030b353f4024794b08bc86892

I then found a nice script at https://blog.cuviper.com/2013/11/10/how-short-can-git-abbreviate/ that calculates the number of ambiguous objects at each abbrev length. Interestingly, both LEDE and LuCI repos need 9 chars, 8 is enough for packages and 7 for routing to have unique commit abbrevs. I modified my own build scripts to use "--abbrev=10" git option to ensure at least 10 chars being used.

Detailed stats are below. Each line contains
abbrev length: number of non-unique abbrevs / number of different abbrevs (for the non-unique)

LEDE main repo needs 9 chars to have a unique abbrevs:

367174 objects
 4: 365838 / 63953
 5: 108187 / 50956
 6: 7772 / 3873
 7: 500 / 250
 8: 30 / 15
 9: 0 / 0

LuCI actually needs the 9 chars that is the current new default based on 83k objects:

83516 objects
 4: 60209 / 23897
 5: 6354 / 3135
 6: 438 / 219
 7: 22 / 11
 8: 2 / 1
 9: 0 / 0

Packages:

35476 objects
 4: 14961 / 6763
 5: 1161 / 578
 6: 48 / 24
 7: 4 / 2
 8: 0 / 0

Routing would manage with 6 chars:

4971 objects
 4: 351 / 174
 5: 16 / 8
 6: 0 / 0

The script:

#!/bin/bash
# git-unique-abbrev
 
OBJECTS="$(mktemp)"
git rev-list --all --objects | cut -c1-40 | sort >"$OBJECTS"
printf "%d objects\n" $(wc -l <"$OBJECTS")
for abbrev in $(seq 4 40); do
    DUPES="$(mktemp)"
    uniq -D -w $abbrev <"$OBJECTS" >"$DUPES"
    count=$(wc -l <"$DUPES")
    acount=$(uniq -w $abbrev <"$DUPES" | wc -l)
    printf "%2d: %d / %d\n" $abbrev $count $acount
    test $count -eq 0 && cat "$OBJECTS"
    mv "$DUPES" "$OBJECTS"
    test $count -eq 0 && break
done
rm -f "$OBJECTS"
3 Likes