Recently started to run a few builds based on individual community work since the platform isn't officially supported (BPI-R4, R6S, etc...), however I found one problem: It's difficult to make use of all cores on my CPU. At the beginning I thought it was the problem of my old laptop, so later OK let's use my little RPi4 to help doing the work. I use make -j4 V=s on it, hope I would see the result when I wake up again, but it stopped about 2hrs later with error but no reason, just mentioned "failed".
I was scratching head, then I use -j1 to re-run again, nothing changed but this time I can see it running past the point where it stopped before, why?? It's not CPU overheating issue nor other unstable thing happened (I ran stress test and there was no problem)
Now it compiles but painfully slow with just single thread, how to improve the situation?
OpenWrt usually builds fine with multiple threads, even high concurrency. That does not mean all corner cases are identified and fixed, which is probably what you are encountering. For build concurrency to work all dependencies need to be known and declared, missing anything will lead to race conditions during the build, depending on the exact order the individual components are being built. As long as you stick to -j1, the build order and results are predictable, beyond that -and even more so with uncommon and/or very high values, you are more likely to see issues, sadly these are hard to debug as these aren't that easily reproducible (and even adding logging might shift the timing just enough for things to succeed by accident).
With very fast/ many-core systems you will even seen a kind of maximum concurrency, as the inherent build dependencies will make you wait for some of the larger components (e.g. gcc and kernel, but there are more than those) to build, while all the small things finish quite quickly.
From a practical point of view, in cases like this, you can continue the -j4 build after it failed and only resort to trying to debug the actual issues with -j1 after 2-3 failures.
Personally, it's been quite a while since I last witnessed a concurrency related issue, over a wide variety of different systems and -j%d values, but given the diverse scope of different components built in one go, there are no illusions that this will ever be totally free of race conditions.
There is always a reason. You just haven’t scrolled back far enough in the log.
But to be honest when just building and not actually developing the only error I ever see is some url resolve error and that most often happens on weekends. For some reason many of the download servers we use seems to be turned off in Cambridge on the weekends.