Question

I am trying to merge several git repos into a new repo, with each old repo as a subdirectory in the new repo. git-stitch-repo appears to be the tool I want.

However, the documentation is less than clear. I was able to follow it (https://metacpan.org/pod/distribution/Git-FastExport/script/git-stitch-repo) until the part which says "It is now possible to create the master branch and have it point at the right commit, and delete the two master-A and master-B branches."

What is the "right commit", and how do I execute these steps?

Right now I have several branches labeled master-A, master-B, etc., which seem to correspond to the original repos. What I really wanted was just a master branch that contains everything. It sounds possible to get there, but I don't know how.

No correct solution

OTHER TIPS

I'm the author of git-stitch-repo. Given your description, it is indeed the tool you're looking for.

Here's how the program works:

  • it looks at all commits obtained from the output of git fast-export for each repository to merge and builds a new stream for git fast-import
  • in the new stream, a commit can be given a commit from any of the other repository as a parent
  • the program makes sure that if you would filter out all the commits from the other repositories, you'd get back the original repository

Because all repositories can have references with the same name, the program add -A, -B, etc. at the end of the references names, so you can know which repository each reference in the new repository comes from. This is why you get master-A and master-B. Picking the "right" master just means deciding which of master-A and master-B is your new master.

The biggest issue with the current version (I'm currently working on a fix), is that for every branch point in repository A, the system must decide on which side to attach the next commit from repository B (if there is actually an alternative between several valid commits that respect the constraints of the stitching algorithm. If all options are valid, the choice is basically random. And then master-A and master-B can end on different branches, which is usually not what you want. (This is what RT #70695 is all about.)

As I said above, I'm working on a fix, and I have a good hope that it's going to fix your issue. I expect to be able to do a new release withing a few days.

If you don't mind having multiple roots, you can do this (per sub-project):

  1. go into the sub-project - we'll call it project 'foo'
  2. make a 'foo' folder in there (we want everything contained by project name)
  3. git mv everything at the root into that root 'foo' folder
  4. commit the changes

Now in the master project (presuming you want to merge both 'master' branches):

  1. git remote add foo /path/to/foo/project
  2. git fetch foo
  3. from the master project's master branch do git merge foo:master

This will create a merge commit, and from that point on you'll have everything from the foo sub-project in a 'foo' folder in the master project. Before that merge point you'll be on either of the two separate trees. It doesn't make sense for their histories to intermingle, so this is a fine way to set things up, wherein the two trees only connect at the point you decided to merge them.

If you want all of the history of each sub-project to merge in...

This is quite a bit more difficult. The basic idea is that you'd choose the latest commit from the master project and rebase the entire sub-project onto that point, so the whole sub-project - i.e. all of its commits - grow out from that point, and then you'd merge the head of the sub-project back into the master project, maybe with --no-ff (no fast-forward) so you could see the whole sub-project in a commit bubble. This requires that all of the history of the sub-project be in the project-name subfolder (i.e. 'foo' from the first section).

This would require a ton of manual work, or something with git filter-branch that I'd have to work on for awhile - I can't come up with it off the top of my head. I'm guessing it would be a --tree-filter, and you'd have to move everything that wasn't in the 'foo' folder into it per commit. I don't know how this handles branches and merges in the project, but I suspect it works. I'll refrain from trying to figure this out in case you're fine with the first suggestion with multiple roots (I think it's the better option of the two).

If you don't mind multiple roots, but want to be able to send work back upstream...

If this is it for the sub-projects - they're going into this repo forever, and will never see the light of day again, either of the first two variations would be fine. If you occasionally want to push things back upstream, you should look into git submodules and subtrees.

Submodules essentially let you clone a repo into a particular folder in your project. Your project only tracks which commit it should be on. When you clone your main repo, the submodule folder won't exist. You can pull it in by doing git submodule init to create the folder, and git submodule update to pull in all the files.

Subtrees go a step further and read the entire sub-project directly into a folder of your choosing in the main repo. From that point on, that folder exists just like any other in your project. Most of the magic here comes from subtree merging, which allows you to pull in changes from the sub-project without all the history, and push changes back out the other way.

Given the option of all 3, I'd prefer to use submodules or subtrees. They're the clean way to handle merging repos together.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top