One huge git repository: How to handle releases and branching?

https://softwareengineering.stackexchange.com/questions/341920

07-01-2021
|

Вопрос

Imagine the following git repository:

/
/ProductA/product_a_main.py
/ProductB/product_b_main.py
/SharedLibrary/shared_library.py

SharedLibrary is used by both ProductA and ProductB.
ProductA and ProductB have different release schedules.

Any advice on how to handle branching in this scenario? I imagine one model would be to have two master branches: master_a and master_b.

Решение

Keeping the three projects in separate repositories will help release management (as you have noticed), but will be nice for development. A developer working on the shared library doesn't need to know a thing about ProductA or ProductB, which will help keep it at a good level of abstraction (especially if you end up creating a ProjectC). Developers working on the main projects don't need to know about the other project, and will be encouraged to avoid changes to the shared library as a quick hack to get something working. I've seen some really bad API design when a new method is added every time a client developer wants a change. Project developers can also more easily keep the library up to date without pulling changes from their project, or keep the library on an old version when debugging changes.

If you have a good test environment and use continuous integration, keeping the repos separate will also make it easier to run your tests. When you make a change to ProjectA, you don't want to run all of the tests from ProjectB and SharedLibrary.

The biggest potential issue I can see is code breaking when the project and library versions are mismatched. That would be avoided by keeping them in one repository (so changes to both can be made in the same commit). If the projects and library are changing rapidly together, it might be worth it to keep them in one repo.

As you mentioned, Google and Facebook use this strategy since they have lots of developers working across projects and libraries. However, they also invest in lots of powerful build and test servers, and need changes to propagate as quickly as possible. If a developer improves the efficiency of MapReduce, the costs of waiting for every other team to pull that change are pretty big.

If you want to keep everything in sync at all times, and are ok with managing build/test times, then one repository is all you need. If changes to the library will generally not break project code, changes to projects will not usually require changes to library code, and you want to keep things simple, then go for separate repositories. It's easier to merge repositories than split them apart, so as a rule of thumb I would recommend starting with separate repositories until you run into problems.

One thing to note is that git's submodule system can be problematic. I've never worked with it, but this guy has some complaints. This guy has some advice on deciding whether or not to use git submodules that's a little more balanced, but still cautious.

Другие советы

Ideally, this would be split into three repos, which could each have their own branches and release process. You could even keep one "main" repo with submodule references to the other three, if you really wanted that.

Otherwise, I think you might want to have 3 master branches (for ProductA, ProductB, and for SharedLib), one for each repo, and create releases from them. You might also want a develop branch off of each of these masters, and then feature branches coming out of the develop branches. It would be much neater and cleaner with separate repositories - does it really need to be done this way?

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange