How to setup our codebase for efficient code sharing and development?

https://softwareengineering.stackexchange.com/questions/421196

21-03-2021
|

Question

Our situation

At first, our company had 1 product. Custom hardware with firmware we wrote ourselves.

Now more projects are starting to be added. Many can reuse most of the components of our first product, but of course the business logic is different. Also the hardware could change, and the remote device monitoring interfaces, as the sensors and available data could change.

Now we are looking at how to structure and manage our codebase. Currently we are leaning towards making a repository that will include all the non-project-specific firmware code. This includes battery management, remote device management skeleton, hardware drivers, etc. Everything that the different projects may share. This way, fixes and new features for these modules only need to be committed once.

Furthermore, we would create repositories per project, where the project-specific code is stored.

I think this is called multi-repo.

My thoughts

Project setup and management becomes harder (it would e.g. perhaps need a script to get the right version of the non-project-specific repo)
Each project can have its own rules (branching strategy
We would have to setup CI for each extra repo (build validation, code style, policies)
Because of 1-3, would monorepo be better? Won't build validation and such become a lot harder because not all code is meant to be together (e.g. different projects)? How do we keep our freelancers out of the code they don't need?
Are there other (better) alternatives in our case?

Solution

If you've never seen or heard about snow in your life, and I put in in the middle of a snowy field, you're going to be asking yourself things that seem naive to someone who has a modicum of experience with snow.

That is what you've done here. You come from what most of us would call a very outdated concept, you're on the cusp of entering our world (i.e. that of modern day development standard), and you're asking questions from a "funnily outdated" perspective, worrying about things that we don't even have to think about.

It's perfectly understandable to not really "get it" right away, as you haven't worked in this new system yet and don't quite see how it balances itself. But it's also hard to respond to every possible question you come up with, because the premise of some of them is very misguided and it requires a repeated back and forth to find out where the premise went off the rails.

I did my best to answer your specific worries below, but it often boils down to "believe me that it is X". It is good to be a critical thinker, and you've put your due diligence in this question. But it's also good to sometimes realize that you don't quite understand something yet, and thus rely on the fact that (a) others tell you it is good and (b) many, many others are using and advocating for this system you're inexperienced with.

Project setup and management becomes harder (it would e.g. perhaps need a script to get the right version of the non-project-specific repo)

Quite logically, having more than one repository brings with it the overhead cost of having to handle multiple repositories. That is inevitable logic.

But without a shadow of a doubt, the overhead cost is massively worth what you get in return. There's a reason why pretty much every modern development team uses it. It cuts down on so much conflicts, repetition, and code juggling.

Each project can have its own rules (branching strategy)

Unless there is a project that has a concrete reason to follow a branching strategy, I strongly suggest that every project follows the same branching strategy.

Uniformity brings with it an innate understanding how things work even when you're new to the project. If you've worked with projects A, B and C, all with the same structure and branching strategy, then you're going to be able to hit the ground running when you start on project D with the same structure and branching strategy.

Note that a lot of modern day development principles are all about reducing complexity as much as possible to keep things manageable. Having wildly varying branching strategies is an added complexity that you don't need.

We would have to setup CI for each extra repo (build validation, code style, policies)

You said you were going from one company project to now having multiple projects. Having multiple build pipelines was always going to happen.

Note that coding style validation should be uniform across projects (see previous point), and potentially whatever you mean by "policies" as well.

Secondly, Having extra pipelines isn't particularly an issue. They're mostly copy/pasteable, with some small alterations between projects. But it's usually a matter of setting it up once and then potentially tweaking it a handful of times.

Setting up a build pipeline for a project is not a meaningful amount of work relative to the development of the project itself.

[..] Won't build validation and such become a lot harder because not all code is meant to be together (e.g. different projects)?

Your end products, i.e. the solutions that are the deliverable project, will still run their build including the common libraries (i.e. your non-project-specific code).

So what you're worried about is a non-issue. The build of a project still validates the entire stack of code, as it should.
The build of your non-project-specific code obviously doesn't include any project-specific code, because it is specifically project-agnostic. But that's the idea, so not an issue.

[..] How do we keep our freelancers out of the code they don't need?

I don't quite understand. How would you be doing it on a monolithic repository?

Having separate repositories enables you to provide access control to specific repositories. It doesn't create a problem here, it provides a solution.

Because of 1-3, would monorepo be better?

Other than the feedback I already provided on point 1-3, monorepo is just not a good approach.

Every time we develop a new technology, we first make it monolithic. The first car had its parts welded together. The first computer had no discrete components and was just one big circuit. The first application was a single-file single-project application. The first code was not OOP and instead used global statics.

And with all those cases, once they started improving on it, they noticed that they needed to subdivide things more, so that things would be easier to build, easier to fix, and easier to configure.

Monorepo (for multiple projects with an independent life cycle) has many, many disadvantages. You're not really perceiving them because they're currently your daily bread and you've got a handle on them. But take it from me who's standing on the other side of the bridge, that those disadvantages will dissipate when you move over. In exchange, there'll be some overhead management required, but the good outweighs the bad by several orders of magnitude.

Currently we are leaning towards making a repository that will include all the non-project-specific firmware code.

You've not really discussed how your projects are going to reference your libraries. I suspect you're thinking of having developers (and the build server) check out both the project and library repos, and link them directly.

A example to prove what that's a bad idea: Project A uses library A v2.0, but project C still uses library A v1.0 (and cannot upgrade). So any developer working on both A and C is going to have to constantly check out the correct version of the library in order to keep their project working.

What you want instead is to have a versioned release system. Essentially, a "release collection" where you find all released versions of all of your libraries.
As a .NET developer, NuGet does precisely this, it's an online collection of published releases. But in its very essence, you could even just have a shared network drive that houses the DLLs.

In the end, the most important part is that this resource has all published versions of your library available. This way, projects A and C can each reference their own specific library versions, without the developer needing to constantly hop from one to the other.

One more thing to consider, should you do one repository per library or one repository for all of them?
Well, here is where we get to the part where we do consider the overhead cost. If these libraries are small, and the total repository size would not cause problems, then you can argue that the cost of having separate repositories for every tiny library is becoming silly.

However, each individually published library should have its own build pipeline, since it has its own life cycle and release schedule. You wouldn't want to have to rebuild and release your entire collection just because one of them had a minor change. This is going to lead to many "unchanged" new released of the other libraries.

In the end, you should theoretically split up your published libraries in repositories of their own, but this can be ignored for sufficiently tiny libraries. In all cases, have a separate build/release pipeline for each library, and host all release on a commonly available resources to minimize the hassle when consuming these libraries.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange