Which artifact should I deploy to prod when using Git Flow?

https://softwareengineering.stackexchange.com/questions/400297

03-03-2021
|

Pergunta

Suppose I have the following series of commits when using Git Flow

We have said that no new features will be included in the next release and have created a release branch (D). QA has started testing and has found a bug, which we have fixed (E).

QA conclude their testing, so we call the release done and merge back to master and develop and tag the version on master (F).

Our CI release process observes the new tag created on master, builds the code and produces an artifact.

The problem is that while the code might be identical, the artifact that is produced at E and F is not guaranteed to be 100% binary identical. Some aspect of the build may not be able to reproduced in the same way. For example, if you rely on a package manager to supply dependencies over HTTP, you are at the mercy of an external system to act identically for both builds. You would hope that most of the time it would but there is no guarantee.

Which artifact should I deploy to prod?

If I deploy the artifact from E, isn't it a bit strange to have tagged the master branch as v1? I have to kind of retrospectively name a commit as being the release version.

If I deploy the artifact from F, there is a (small) danger that it's broken in a way which E was not. QA could re-test F, but then that's somewhat of a wasted cycle and what happens if they find a critical bug? We'd need a second release to fix it and v1.0.0 becomes unused and pointless.

Solução

If the artefacts you build don't contain any information that is tied to the presence/absence of a tag during build time, then there is a way out without rebuilding the artefacts.

If you arrange your environment such that the master branch only accepts fast-forward merges, then it can be guaranteed that "commit F" and "commit E" are actually one and the same commit (same commit hash). In that case, the tag would actually be placed on "commit E" and the exact same artefacts that QA approved can be released as being version 1.0.0.

This works, because in a fast-forward commit, no new commit is created but only marker for the head of the target branch is moved. Fast forward merges are only possible if there are no commits on master that are not also present on the release branch.

Outras dicas

The problem is that while the code might be identical, the artifact that is produced at E and F is not guaranteed to be 100% binary identical. Some aspect of the build may not be able to reproduced in the same way.

This is not a GitFlow problem, it's a problem with your build process.

If your build process does not produce the same result with the same source code, then that is a useless process. Your build process needs to be determinate. GitFlow is about managing source code, not about making your builds determinate. Your builds need to be determinate regardless of what source control strategy you use.

To address your example:

For example, if you rely on a package manager to supply dependencies over HTTP, you are at the mercy of an external system to act identically for both builds. You would hope that most of the time it would but there is no guarantee.

You should never be dependent on a system you don't control for your release builds

It's perfectly OK to use package managers as part of your development and build process. But, if you are going to create release builds that need to be exactly the same, then you should not be dependent on something that behaves in ways that will surprise you. You can eliminate the surprise from package managers by:

Pinning the versions of your dependencies. Any package manager worth using will let you do this. You should definitely be doing this already.
Caching the fetched dependent artifacts that the software you wish to release depends upon. That way, it doesn't matter what happens on the upstream system, you always have the correct dependent artifact when you need it. There are many solutions that provide this sort of capability depending on what kind of artifacts and what package manager you are using.

EDIT: Based on the comments, there's a lot of confusion about why a determinate build process is desirable or even possible. Adding to this confusion is the additional trivial details of GitFlow itself, which are actually completely unimportant to addressing the actual problem in the question, that being that it is not known what the resulting binary from the build process actually is.

You need a determinate build process

The reason why you need a build process that produces the same result from the same source code, is to eliminate an entire class of errors or deliberate security vulnerabilities that can be introduced during the build process. If you know with certainty that the process always produces the same binaries from the same source code, then you and other parties who build the software (e.g. other developers on the team and your QA department, or in open source world, other developers on the Internet) can determine if the build itself is "correct" and know that any deviations require scrutiny because they likely signify larger problems.

More relevant to the example posed in the question, if we take the assumption that commit E and commit F contain identical source code, then it does not matter if the build is produced from commit E or commit F, because the result of the process should be identical. The input to the process is identical, so the output from the process should be identical. If it is not, then you have a very big problem, because if the build of commit E and the build of commit F are not the same, then multiple builds of commit E and commit F across time will not be the same either.

That state of affairs is bad because the entire point of having a revision control system is to ensure that the state of the resulting software as it existed during a specific instance in history is preserved. The entire point of having a build system is to make sure the software builds in a consistent, error free way. If this isn't how things work in practice, then you are doing a lot of extra work to gain none of the benefits of stability that these tools are supposed to give you.

"Okay, so which commit should I release?"

It doesn't matter which commit you release if they actually are the same source code. Obviously, you should only release a build that has been tested. You should always release builds to customers that you have tested. But whether it is commit E and commit F doesn't matter. Your compiler doesn't even know what a commit is. Focusing on which commit is the input to the build process is missing the actual problems you have from not knowing what the output of the build process is.

"But reproducible builds are impossible"

Many major software companies, security professionals, and the Debian project would disagree with you rather strongly about that. You can read more about how to make reproducible builds here: https://reproducible-builds.org/

"Why would I ever want to build the same code twice?"

There are lots of reasons why you'd want to build the same code twice, here's a non-exhaustive list of reasons:

Sometimes you need to replace a release build that the customer has lost and that the CI system no longer keeps a copy of.
Other people may want to build the software themselves, e.g. other developers on your team, the open source community, or customers who paid for your source code. If the build is not reproducible, how will any of them know they did it right?
Suppose your build system breaks or you need to recreate it on another machine. How will you validate that the new system is working correctly?
Depending on the architecture of your application, your build process may include compiling a collection of modules independently which are then assembled to form the final artifact to deploy. You would certainly want the modules you did not change to build exactly the same regardless of how many commits ago the changes were made (even better, you'd cache these modules someplace to speed up the process if they are unchanged).
Even better, if you know with certainty that a module does not change if the source code does not change, you can skip building those modules completely and speed up your build process.
If your software is cross-platform, you may have builds for each platform from the same source code.

"But the second build off of master in the way I understand GitFlow is totally pointless!"

Nowhere here did I say you should do a second build off of master, or follow GitFlow however you understand it with robot-like devotion. The very first words in the answer are "This is not a GitFlow problem" exactly because the trivia of what commit gets built does not change the fact that the same input to the build process should result in the same output.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange