Git Tagging for SaaS application with CD and SemVer

https://softwareengineering.stackexchange.com/questions/403953

06-03-2021
|

Question

I'm developing a SaaS application where I'm required to keep track and publish every change in a changelog. I've started to follow a Semantic Versioning approach and also using Continuous Delivery.

Because I use nvie's branching model, let's say:

I have version 1.0 in master.
develop and master are the same.
from develop, I fix a bug
merge it to master and publish to production.

Now I have version 1.0.1 (according to SemVer) in master. The question is, should I tag it?

If I do, what's happening is that I have 1.0.1, 1.0.2, 1.0.3,... and every single merge in master tagged. Sometimes more than one a day.

Seems like SemVer and CD are not compatible, are they? Or is it OK to have (soon hundreds) of tags for every PATCH release?

Solution

Semantic versioning (semver) is intended for for software products that have an API, and for which multiple versions will be available at the same time. It's most useful for software libraries, which have an API made up of exposed classes, functions, etc, but can also apply to software with an HTTP API.

Developers of software that depends on something using semver will read the version number to decide which version of their dependency to install, often with the help of automatic dependency resolution tools.

If you only make one version of your SAAS product publicly available at any one time then semver is unlikely to be relevant. It's certainly not relevant if your product does not have an API.

You may not need explicit version numbers at all. If you need a unique code to refer to each version, I'd suggest something like the build number from your build server, e.g. 42 or the exact time that the version was built, e.g. 2020-01-17_19:45:05

Nvie's branching model, aka Git Flow, is not useful for continuous delivery. Instead you should choose some form of Trunk Based Development.

OTHER TIPS

You should not be applying version strings to each commit. The git repo already has a git hash for that. Where the SemVer string comes into play, is on the packaging/publishing side of the production line. That's where you do your analysis of what has been included in the new release and whether they constitute breaking (major), non-breaking feature (minor) or bug fixes/cosmetic changes, and then generate a SemVer compliant change.

Tagging the commits is pointless and risky. What if post-build testing/analysis indicates that the specified change type (major, minor, patch) is incorrect? Change the tag in the repo? I suppose you could just rinse and repeat that processes until you get it right? Seems wasteful and potentially error prone, particularly when there are multiple actively developed clones of the repository.

Tags are transient by design. A tag can point to a specific commit hash one day, and another one the next. You risk exposing multiple mappings of the same tag to different commit hash values. In a large team, or any public repository, any number of clones could have different content with exactly the same label applied to them. This is a recipe for disaster, even if everyone is very very careful, and fully cognizant of the implications (risks). People make mistakes, particularly in agile environments, and managing repository tags correctly, is non-trivial.

The SemVer spec is clear, there shall be only one publicly available version <=> API/Package mapping. So the many common processes that do attempt to map semantic version strings with commit hashes, by using tags in their source repositories, aren't technically out of spec form the public's view of a products version history, provided the repository is kept private. The problem with this scheme is that the tooling used for internal testing, can't correctly use the same publish/release tooling internally, that it uses for external publish/release purposes, due to the internal violations of semantic versioning rules. It also comes with the risk of leakage into the public domain of version candidates that never made the cut.

The build should apply/record the commit hash to its outputs. Humans, assisted by process and automation should then determine the version label to slap on the product/package. This may involve analysis of semantic commit comments in the repository, the results of unit, integration and acceptance tests, internal customer advocacy, etc. When a determination is made to label a build output, an indelible record should be made that:

Includes the new version label.
Includes the commit hash the build operated on.
Includes the feeds and associated package keys (name, version, Id, or whatever) and package hashes, that the build output was published to.
The names and versions of all tooling used to produce the output.
The content hash of the build output (though this is not always useful).
Linkage to any records of process (discussions, process documentation, etc).

A reference to this record should be embedded in the package(s). External publication of some or all of this data is optional. The public will have access to the package and version, they can find the commit hash in a manifest, change log or other package/tooling appropriate data file. Post publication tagging of the git repo is okay for developer convenience, provided there is a well defined and secured process that takes its inputs from the publication process, and these tags should never be forwarded to the build system.

Note that it is okay to use non-human in the loop automation to generate prerelease semantic versions to packages/content for CI systems that feed up to the humans in the loop process(es) that determine which of those prerelease packages will be issued as non-prerelease publications.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange