Builds migt not be reproducible during multi-threaded compilations

Hello all,
I ran into an issue the other day while trying to build OpenWRT on a new machine. All of my builds are running with -j20. My build failed midway through for unrelated reasons, and then I tried to continue building from where I was and I ended up with this error:

Package libfreetype is missing dependencies for the following libraries:
libharfbuzz.so.0

I have been building OpenWRT with this configuration on different machines for the longest time with no issue. Then I tried to build it on a brand new 20 something core AMD machine, and started running into all sorts of issues like these. This makes me concerned that there may be a good subset of packages out there that build differently depending on what was built before it when the packages have no dependency relationships and questions how reproducible a build is especially when it is multi-threaded. I recognize that not every package has a reproducible binary output, but the differences there are most likely benign (i.e. build timestamps, etc...). The difference here seems to really affect the behavior of the package.

I would like to hypothesize a solution: make the staging_dir be specific to each package Makefile, and only include files that show up in the PKG_BUILD_DEPENDS variables of said package. On Linux this could easily be achieved with the overlay filesystem.

I could be wrong, but I think this is more a matter of "make doesn't handle errors during parallel builds very gracefully", not

that there may be a good subset of packages out there that build differently depending on what was built before it when the packages have no dependency relationships

I've rebuilt OpenWrt multi-threaded quite a few times, and often times the easiest way to keep re-start building after a make error is to start building again from scratch (e.g., run make distclean). Sometimes, if you run

make <FLAGS> download
make <FLAGS> check
make <FLAGS> prepare
make <FLAGS> 

then make can recover from these errors, but it is hit-and-miss.

I think the problem is that when one make run errors out, any other packages that make is currently building in parallel sort of just get dropped half-finished. This tends to make make think that these packages have been built already and make doesnt re-make them, even though they were not fully compiled.

If you can figure out which packages were only half made, removing their build directories and re-running make usually makes things work right again.

Side note: on a few occasions, I was building (multi-threaded), had a make error, tweaked the configuration, ran make {download,check,prepare,}, and produced a firmware image. I then then restarted from scratch (re-cloning the openwrt github repo and used the tweaked working configuration right from the start) and built a 2nd firmware image with an error-free make run. I compared the images and the 2 images were identical.

This seemingly confirms that multi-threaded builds can (some of the time at least) be reproducible.

I think what I am trying to get at is that usually the builds will be reproducible, but not always.

For example, on a -j3 build, with four packages, the following is likely to happen:

Thread 1   | Thread 2  | Thread 3
Package A  | Package B | Package C
Package D  |           |

In this example, Package D is built after all the other packages.
But, if by some miracle, one of those threads was to slow down a little bit (you started watching a movie or the draft from the window sped up one of the cpu cores a few cycles), Package C might get finished before Package A, so the scheduling would then look like this:

Thread 1   | Thread 2  | Thread 3
Package A  | Package B | Package C
           |           | Package D

The difference here is that Package D gets built before Package A, and if you replace Package D with harfbuzz and Package A with freetype, then you get the error that I described in the first post.

(This is a simplification, add a bunch more packages and this becomes a real problem).

There is nothing here the guarantees behavior A, which is the definition of a race condition. In a normal build, it is unlikely for freetype to be built after harfbuzz because they just happen to be ordered 50 packages apart, but in an extreme case there is nothing preventing that from happening.

Well, looks like bug/omission in the package definitions.
Buildbot (which builds all packages) runs into similar problems every now and then. Too eager configure scripts find unwanted/unnecessary capabilities from already built libraries.

The proper solution wouldnto either

  • declare dependency for harfbuzz in freetype, so that harfbuzz will always be built first, or
  • patch the freetype configure script to ignore harfbuzz capabilities even if found.

I don't think applying the proper solution to every package is feasible? It's so easy to slip up because it's so hard to notice, especially when using a competent build system. If I thought it was just the one package then that's an easy fix, but how many packages are there with bugs like these?

Well, the dependencies are meant to handle the build order. There is no simple solution.

Think also that as packages get version bumps, new versions may bring new features that may introduce new optional usage of functions from libraries. So, new errors surface regularly, and need to be evaluated and patched. (Is the new dependency really needed for the new/changed features, or can it be patched away from the configure script?)

So going back to the solution I proposed in the original question, such a mechanism would be a helper system. In the same way the "Package x is missing dependencies for the following libraries" system works. It would also tend towards "more correct" Makefiles because while that system checks the validity of the DEPENDS variable, this one would test the PKG_BUILD_DEPENDS variable.

The gist of it simply is, makefiles and dependency resolution are hard. The only way is to fix issues are they turn up (and the likeliness of them occurring varies widely with concurrency).