How do I keep two git projects in sync with each other?

https://softwareengineering.stackexchange.com/questions/384998

18-02-2021
|

Pregunta

I'm developing a Python library, and I'm also developing some code that uses it. Currently they are in the same git repository, but I want to separate out the library part into a separate repo, in preparation for eventually* releasing it.

However, I'm unsure of the right way to work with two repositories. The library has a complex API that's likely to change a lot as I develop it, and at this stage of the project I'm usually developing new features simultaneously with code that uses them. If I need to restore things to a previous working state, I will need to roll both projects back to previous commits - not just the last working state for each project individually, but the last pair of states that worked together.

I am not a very advanced git user. Up to now I have used it only as an undo history, meaning that I don't use branches and merges, I just do some work, commit it, do some more work, commit that, and so on. I am wondering what's the minimal change to this workflow that will allow me to keep both projects in sync without worrying.

I'd like to keep things simple as simple as possible given that I'm a single developer, while bearing in mind that this project is likely to become quite complex over time. The idea is to make small changes in order to minimise disruption at each step.

^*_{A note on what I meant here: the project is in nowhere near a releasable state, and I'm not planning to release it in anywhere near its current state. ("Repository" in this context means "folder on my hard drive with a .git in it", not a public repository on github.) I asked this question because I thought that putting it in a separate repository would be something I needed to do early on for the sake of my own management of my code, but the answer from Bernhard convinces me this is not the case.}

Solución

but I want to separate out the library part into a separate repo, in preparation for eventually releasing it.

Separating the library has a few implications. One of them is that others will use your library as well. At that point, you cannot just roll back changes in your library, as it may have unexpected effects on other projects that use this library. In the project using it, you can still roll back whatever you want. You can rely on different released versions of the library in the worst case. If you are rolling them back both at the same time, do you really have separate repositories, or are you just making life more complicated?

The library has a complex API that's likely to change a lot as I develop it

Changing the API continuously is going to be a huge pain in the neck in the long term. Interfaces are rigid. Changing an interface should not be taken lightly, because of the downstream impact on other projects. When releasing a new version of the library, you have to think about backwards compatibility too.

Considering that you are the sole developer at the moment, I recommend to do either one of these:

Keep the library and the project in a single repository, until you find that the interface is rigid enough. Then separate it when you can make a first released version
Don't develop the library and the code using them at the same time. Write unit tests that tests for each method that your interface offers in isolation. When everything works as desired, start working on the project using them.

Otros consejos

The other answer got into the process, but didn't mention the mechanics of how to do what you want. There are a couple easy ways to consume a library in git - as a submodule or as an external library (e.g., a nuget package).

A submodule can be used when you have access to the repo the library sits in. The submodule is a pointer to a specific commit in the library repo. And essentially copies the library's code into your project solution, which you then compile as normal. If you ever want to update the library to a newer (or older) version, just update the pointer to a new commit.

As an external library, you can keep all the old 'releases' around, so you can roll back whenever you want. With a e.g., private nuget feed, you can have access to any or all of those old versions, allowing you to install whichever version you want.

Submodules can be tricky, and not all git UI tools support them (or support them well). But they do provide quite a bit of flexibility, especially as things are changing frequently.

I would not separate those until I really need to do that - in your case, probably when you want to release the library and publish the source code. You might want to keep the library source in a different directory from the rest, to make that separation easy. When you release it, you can just copy the source of the library into a fresh git repository, and then, and only then, maintain it separately, making new releases when changes are neccesary.

If you want to transfer the history of the source of the library into that new git repository for some reason: it is possible to clone your original git repository into the libraries git repository, and then delete all other directories from that new repository and it's history. But that'd be advanced git use - you might want to have someone help you who knows git well enough.

Once you release your library, the API is basically set in stone. Your API needs to not change too much.

A way to do it is versioning. Use minor version numbers for adding features and not for API changes. To preserve backward compatibility. Use major version numbers if your API changes.

That way you can always say: my program works with library xyz version 2.1.1, 2.1.2, 2.2.1 but not 3.0.1.

As mentioned by other answers and acknowledged by yourself in comments, the library's API might not be in a stable enough state to release atm. I think choosing the right moment to separate the two is important, or it might create a bit of headache around maintenance.

I would just add that I think you should look at the problem a bit differently: if you separate the library from the project using it, you shouldn't need to keep the git projects in synch, you need to keep the main project in synch with the library's package. This is what package managers are made for - synchronizing dependencies between different libraries with different versions.

What I think you should do is, once you decide to separate out the library, to stop using the library as code from a github repo / local source and instead use it as a package, by releasing it on a package repository and using a package manager. I haven't worked with Python in a while, but I think you should look into pip.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange