Does Continuous Integration imply one monolithic VCS repo?

https://softwareengineering.stackexchange.com/questions/298203

10-10-2020
|

Pergunta

I am part of a small team which develops several internal apps for our company. We are in a process of becoming more Agile, this especially includes lots of automated testing. Now we are in a situation where, for the one or two apps which we have worked on most recently, we generally make small, reversible changes, run lots of tests, and deploy quite fast with not much human intervention.

I would consider us to be still a long way off from doing 'real' CI. For the one or two apps mentioned above we could probably get very close to it pretty soon. However, I am having a hard time imagining what our setup is going to look like, sometime in the future where we have started using CI for most of our legacy code.

Suppose we have several standalone apps, which all can be deployed independently to different servers. We also have some shared code which is used by many of them, and which we want to use consistently between them. This includes utility functions, code which enforces things like a consistent interface over some parts of all apps, and the ORM definitions for our database (shared across all apps).

I see two alternatives, neither of which looks very easy or elegant:

All code gets merged into one huge repo. Deployment means running all the tests for all the code, integration tests for everything and acceptance tests for all the apps, before rolling out everything in one go. This seems to make deployment a much Bigger Deal than it was before, contrary to the philosophy of CI, which suggests that it should be fast and easy. It also means we don't have any separation between different parts of our codebase, with things which never work together in the same repo forever, just because they both depend on some third thing.
We keep each app and each component of the shared code separate. Deployment means testing the latest version of one component extensively, before 'dropping it in' to a working system consisting of all other components. This seems like a cleaner design. However, it seems to imply that we have to manage dependencies and versioning for all these things. Every integration/acceptance test has to have some foreknowledge of what versions of the other components it is going to be used with and can tolerate. In other words, although every component becomes a lot more reliable, we have to worry loads more about how the pieces fit together, and integration bugs. When the pieces which everything else depends on change, there is potential for breakage eveywhere.

The way out of this dilemma is probably to do a limp version of CI, where the individual components get tested extensively and deployed quickly, but we stick to having big 'flag day' releases of the utilities and database schema. But it really seems like there are lots of benefits of proper CI/CD which we would be missing out on. It isn't an aesthetically pleasing solution, and we aren't looking to satisfy our bosses that we have ticked a box, but we want to improve our working practices.

How should we organize the code for proper CI, and what are the key lessons to learn about planning and designing the architecture, both for legacy and future newly written code?

Solução

Imagine your code not as a monolithic system, but rather as a series of packages. Some packages depend on others. Some, such as jQuery, are external to your company; others are developed by your company and made public; others, developed by your company, are eventually made private.

For instance, if you develop a web application in Python, you may have a dependency to Flask—a popular Python web framework—and to a bunch of other packages, both external and internal.

What happens to your CI when developers of Flask release a new version? Right, nothing. It's up to you to go and edit your project file which says that from now on, you are not using the version x of Flask, but rather the version y. Once you do that, CI considers that your project changed, and launches all the tests which ensure that the application still works with the new version of Flask. If it doesn't, well, you'll probably fix your application, except in very rare cases where you actually reveal a bug in Flask itself.

The same logic can be applied to any code produced within your company. A shared library becomes a Python package, either shared through pypi—the repository of public packages for Python, or stored on a private pip server which can be used within your company only.

It then makes it particularly clear what broke the build. When you publish the new version of a package A, CI runs the corresponding tests and indicates whether the version contains regressions or not. Then, if a problem is encountered at the stage when you ask a package B to use the new version of the package A, it's the package B which broke the build by being incompatible with the new version of the package A: in the same way, your app may not be compatible with a newer version of Flask or jQuery.

Notice that you don't have to really manage dependencies yourself: the packaging system does it for you. The only problem which requires your intervention is the updating of the references, that is the action of telling that a given package will use a different version of another package: if you modified a package which is used a lot in your code base, it may take a while to track and modify all the projects which use it.

As for the version control, it really doesn't matter. You may have a team working with Git, and another one working with SVN. As soon as the two teams agree on using pypi and choosing specific pip server(s)¹ for internal packages, everything will be all right. Similarly, you shouldn't care whether developers of Flask or jQuery use Git, SVN, SourceSafe or TFS.

Note: I've used an example from Python world, but the same logic can be applied to other languages as well. There is npm for Node.js, NuGet for .NET Framework languages, etc. You can setup private package managers as well.

Further reading: Packages, dependencies and interaction between teams, my article based on this question.

^{¹ Nothing forces you to have a single package server in your company. Different teams can deploy their own servers; the only constraint is that other teams should know the location of the server in order to use the packages.}

Outras dicas

CI systems support the organization of the work that they do into projects. So, you can create as many different integration projects as you need, and each one of them will have a completely different set of preferences, including, of course, which repository root to check out from, where to deploy, etc.

That having been said, if you have modules that have tight dependencies on each other, then these modules should ideally be on the same repository.

By "tight dependencies" I do not mean something specific, but as a ballpark rule I would say that if you need to perform integration testing (as opposed to unit testing) between them, then that's probably worth calling it a tight dependency.

By "ideally" I mean that the benefits of keeping them in the same repository usually outweigh the disadvantages. But of course you know your code better, so it is up to you to decide.

You could also work with a "lineup" of all your individual component repositories (your 2nd alternative) in a monolithic manner (see my answer to this Q for a possible approach), treating them as a single mega-project, which in a sense would allow you to ignore the issue of managing the dependencies between components (much more complex IMHO).

Every change to the shared code components would be considered to be impacting all the components dependent on them. This would allow you to focus on testing the actual apps instead of doing extensive testing on the shared components (which IMHO can't always make guarantees for the apps using those shared comps).

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange