What's the practical development cycle of a large project when a full build is prohibitively long

https://softwareengineering.stackexchange.com/questions/373415

06-02-2021
|

Frage

Projects like webkit, Linux kernel and many others take over ten minutes if not hours for a full build. What's the practical development cycle for these large projects? Specifically, how does a developer fix and test an identified bug? Can you give some concrete examples?

Lösung

Projects like webkit, Linux kernel and many others take over ten minutes if not hours for a full build.

Ten minutes or half an hour is not prohibitive build time.

When I started my career (1987, Sun3/160 workstation on Unix) the full (cold) build of a 200KLOC program that I wrote alone took more than half an hour.

With incremental builds (combined with parallel builds, perhaps as crude as make -j, or even parallel and distributed builds like with distcc, icecream, or even better ) and a fast (multi-core) desktop, a Linux kernel (only 25MLOC) builds in typically less than ten minutes (and most incremental builds need less than a minute; so rebooting the new kernel could take more time than rebuilding it). That is ok. See this benchmark (it shows that a Linux kernel can be cold built in less than 15 minutes on a powerful desktop)

And in the 1970s, programmers (then coding programs less than 2KLOC on punched cards) could often afford only one or perhaps two or three compilations per day. They just thought more carefully of their own code. Their attitude was different (not thinking something like: I forgot the syntax for that, just will compile it).

Sorry to sound condescending (but I was born in 1959), but you are a spoiled kid if you cannot wait ten minutes (or even half an hour) for a build. But I am spoiled too these days....

Of course, you should think about your software build and design it wisely.

Some proprietary products (think of Google indexing engine) are rumored to have more than half a billions lines of C++ code (which build much slower than C, into a single executable). The link time is then an issue. But Google funded Ian Lance Taylor to design and develop gold (precisely to accelerate that). And Google developed their own build automation engines... and uses high-end servers for that.

Some current HPC code (e.g. simulation of galaxy collision) needs today weeks of CPU time. Debugging them is probably harder, if the bug needs a whole day to occur (and is a numerical bug). But then the code is designed differently (e.g. you think of persistence at code design time).

Sometimes, a reproducible bug may take 30 minutes of CPU time to be reached (and that could happen even in a tiny personal project of only two thousands lines), and you have to live with that (if you have designed your program with some persistency in mind, things are different; however adding persistence or checkpointing to some existing program is generally so large an architectural change that you can't afford it).

Also, modularity is now widely used (even C++20 might have modules). Most programming environments these days have some notion of libraries. Many software projects use the PIMPL idiom. And recent programming languages (Go, Rust, Ocaml) know about modules. Also, huge software systems tend to be distributed and split in several executables (e.g. microservice approach), in particular because you want to take advantage of process isolation (which requires to define more carefully program interfaces).

At last, the total volume of software is growing much less than the data managed by it (look into sofwareheritage for a clue; an estimate of the total source code size on Earth is perhaps a few hundreds of terabytes, many orders of magnitudes less than the total amount of data on the Internet = many exabytes, perhaps a few zettabytes). Even an entire Linux distribution (about 20GLOC of code) can be built in less than a week on a desktop. Try some source code Linux distribution (e.g. gentoo).

You might look inside several large free software projects (e.g. Qt, GTK, GCC, Libreoffice, Firefox, Linux or *BSD kernels, Java JRE, ...) to understand more how they are built and developed by their open communities. Details would vary (because the architecture and design of these software, the programming language and build automation tools is different).

Notice that in 2018 and in most developed countries, the cost of skilled labour (those of competent developers) is much more than the cost of computers to build software. So it makes sense to provide developers with powerful build systems.

Andere Tipps

If a full build takes prohibitively long, so that you cannot test the system after every change, there are three things you can do:

make the full build faster
don't make a full build
don't test

Real-world projects do any one of (or any combination of) those three.

What's the practical development cycle for these large projects?

Allow developers to commit their code¹
Install triggers with your preferred CI System (Jenkins, Team City, etc.)
Give devs as fast as possible response, regarding any build or unit test fails
After all low level build artifacts collect them to provide a deployable build and start system integration tests on them
Let some person in charge decide, if the build met all requirements for a QA relevant testing milestone and publish it

Specifically, how does a developer fix and test an identified bug?

They should get some short time response from the CI system as mentioned.

Of course their will be all kind of compiler error and unit test case results pointing them where exactly to dig in with fixing their dev environment, or starting the debugger and try the unit tests getting green again.

Can you give some concrete examples?

Of course I could, but that seems to be a bit futile here, since all those examples how to fix such bug may vary in a very broad manner.

Reasons can vary from false positive fails detected by the build server to blatantly hardcoded values in the source code.

The main point in managing a larger group of developers in a large scale project is you should notify them about such failures as fast as possible.

¹⁾_{Though make them aware, that they shouldn't leave their workplace being unavailable until the CI system signals them green.}

A very effective approach is using a gating CI system capable of bundling multiple candidate changesets for pre-commit verifications.

Developers can use in their own environment/workspace incremental builds to speedup preparing their changesets. But they don't commit them directly to the integration branch - their private verifications aren't reliable enough. Instead they submit them to the gating CI system for the final verification and (automated!) commit/merge into the integration branch.

As (positive IMHO) side effects such process can:

guarantee prevention of breakages, since the centralized, orchestrated pre-commit CI verifications can be made 100% reliable
eliminate blame - the CI system is responsible for any breakages, if any, not the developers

One such CI system I designed for a previous employer was decently serving over 1000 devs doing trunk based development on the main branch, with commit gating criteria including 4-6h builds and 5-12h smoke tests, at an avg 60-80 commits/day rate.

Another example would be OpenStack's CI gating system based on Zuul/Gerrit. A couple of presentations about it:

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit softwareengineering.stackexchange