Architecting project into multiple source control repositories

https://softwareengineering.stackexchange.com/questions/423726

24-03-2021
|

Question

I'm doing some work with my team to refactor/rearchitect some parts of our existing codebase which consists of two separate Django apps hosted in one common project repository. We're starting work on a new app and at this point want to break down the project into more useful components.

Currently, our presentation layers (Django templates and views, etc) are separated from their domain logic and DB models by residing in different paths in the code. Further, the shared code they both use exists in another distinct path.

We're considering further partitioning these parts into their own distinct source control repositories and having the CI system assemble the output project automatically. The suggestion is this:

Shared domain logic repository (core unit tests here)
App-specific domain logic repository (domain unit tests here)
App-specific presentation layer repository (functional tests here)
General app presentation layer repository
Host layer (out-of-the-box Django host, possibly its own repo)

It's been suggested to keep the 2 app-specific layers together. Pros to splitting them up is mainly about separation of concerns and the idea that the core logic doesn't need to know anything about how it's presented, making it good for unit tests and even a CLI wrapper. The con is that as separate repositories the commit history would now be split up and could make tracing changes more difficult.

Will splitting up a large project along these lines result in code that is easier to maintain and reuse or will it add unnecessary complexity? Are there other considerations for how to divide things up?

UPDATE:

It seems like splitting up the app-specific layers into their own repositories does not make sense so we're not going to do that. Those two parts are tightly coupled and no value seems to come from managing them independently.

The shared domain logic is a core library with code common to all projects and it already exists as an isolated piece of code. The new project is for a different part of the institution and requires being in its own repository (because the development may be taken over by another team in the future and they have no interest in the other two projects). The option is to either create a copy of the common code in the other repo OR to migrate it to its own repo and have the build process pull it in per build. Duplicating the code would work but removes the value of reuse.

The general app presentation layer is the common authentication, templates, HTML, etc used by all three projects. Again, we can duplicate it into the new repository, that is the easier thing to do but it removes more common code reuse so that when a change is made it will now have to be made in (at least) two separate locations.

UPDATE 2:

The discussion here clarified a few things:

Minimizing the number of distinct source control repositories is preferred.
If a library truly is an independent entity only then does it belong in its own repo.
There are other approaches which can accomplish separation and reuse while keeping things together.

With this insight the distinct parts can be broken into just two repositories. The core library with the most generic common code, and which isn't "owned" by any single project, will go into its own repository. The rest of the project(s) can remain in their own repositories. A template can be set up to quickly clone the general structure for each new project.

Solution

You wrote

shared domain logic is a core library with code common to all projects and it already exists as an isolated piece of code

Ask yourself, how much isolation does this lib have:

does it have its own life cycle, apart from the projects / programs / layers using it?
is it developed, tested, versioned and deployed completely on its own, and the other projects use only those stable, tested and deployed versions, no intermediate ones (like a third-party lib)?
could there be a team on its own which is responsible for this library alone (like a third-party vendor), not directly involved in the development of the other projects?

If you can answer all of these three questions with "yes", then it makes sense to put the core library into a separate repo. Otherwise, do yourself a favor and keep everything in one.

OTHER TIPS

We're considering further partitioning these parts into their own distinct source control repositories and having the CI system assemble the output project automatically. The suggestion is this:

I would stick with a single repository in this case because it sounds like these separate repos must stay in sync with each other.

Pros to splitting them up is mainly about separation of concerns and the idea that the core logic doesn't need to know anything about how it's presented

What does the repo structure have to do with separation of concerns? How will pulling code into different repos affect your current issues with separation of concerns? You don't need separate repos to ensure the business logic and ui logic are separated. The code can still be heavily coupled even in separate repositories.

Will splitting up a large project along these lines result in code that is easier to maintain and reuse or will it add unnecessary complexity?

Again, how does the repo structure affect maintainability? If your code is already highly coupled and unmaintainable how will pulling into different repos change that?

Currently I see no advantages to separating the repo given your post. I see several disadvantages caused by the added complexity.

Are there other considerations for how to divide things up?

Yes. You should divide repos by what is expected to be out-of-sync. An easy way to tell is what things have separate release schedules. For example, back-end and apps are almost always separate repos on separate release schedules.

You've used the words Shared domain logic which suggest there is significant shared logic in your codebase. It sounds like this shared domain logic will affect the release schedule of client specific code.

EDIT in response to comments

so the idea is to pull those out into separate repos which the CI can build as a single output

If it will produce a single output then it needs to be one repo. This means for sure the repos need to stay in sync.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange