Question

What is the best method to handle two separate, but very similar code-bases in git and git-hub?

Background

I have a git repository for a small shell script project. It only has 2 or 3 files of code in it, and I am often working in a single file. Though I have originally made the project to serve a specific goal of mine, I write it to be more generally useful to others. I write the general use case version and then modify it to be specific to my specific goal. In the specific version, I might modify variables, put in a password, switch around the order of some of the code, take out a for loop... whatever.

What ive tried

I have tried two different methods and neither work as optimal as I would think that it could:

  1. Two seperate repos
    • Problem: code modified in one, cant easily and selectively be merged over to other
  2. Two branches in one repo
    • Problem: Branches are meant to eventually merge back together. I do not mean to ever fully merge these back together, but to selectively merge parts of the code.
    • Problem: I found that when trying to use merge commands between the branches, it was very easily confusing as to what code in what branch was being merged. I somehow merged code amongst the two of them that was completely unintended and there was no indication as to the incorrect merge until I looked at the content of the files in both branches

I also saw How to merge two seperate - yet similar - codebases into one SVN rep? that is about SVN. Tough for me to follow since I do not know SVN. I think though that it is a different question, because he is not trying to make one version of this code public.

Use-cases I want solved

Specifically the issue comes to light when:

  • Comment sync - I am prepping my specialized version, and notice that I could add an explanative comment at the end of a line. I add it, but the comment is now not in the generalized version.
  • Stuff I dont want shared - I am prepping my specialized version and I add in a password or change the order of how operations are done. I DO NOT want these changes going to the generalized version.
  • Same file - The above two changes will often be in the same file, which makes it hard to merge stuff together. There are interactive merges, but I do not know if the interaction can be done within a single file.
  • General -> Specialized - I or someone else might update the generalized version to have new content or comments that would be useful to also have in my specialized version. I want to bring these over from general -> specialized, without messing with any other code differences in the specialized version.

Git vs Github

Mostly my question is wondering how to do this inside of git's confines. However, it may have implications on how to interact with github also. My generalized version is up on github. The specialized version should NOT be up on github. I think that my branch method above did not push both branches if I was careful... but I was always unsure. Either way, the solution should allow for there to be one version that is public, and one version that is just kept locally... even if it is a little complex or requiring care.

Was it helpful?

Solution

This can easily be done with two branches. I'm not sure why you say "code modified in one, cant easily and selectively be merged over to other," because merging is quite easy in Git.

The structure I would suggest is to have a branch for your general version and a branch for your personal version. Merges should only ever happen in one direction, from the general branch to the personal branch. This means that any changes you make to the general version get incorporated into the personal version.

In other words, this is OK...

git checkout personal
git merge general

This you should never do...

git checkout general
git merge personal

If you make a change in your personal version, and decide that it would be spiffy to have that same code in the general version, you should be able to handle this fairly easily with a cherry pick. It will just take a little forethought to organize the commits in the personal branch. You will need a commit in the personal branch that contains only those changes you want to bring over to the general version, then simply cherry pick it off the personal branch and drop it onto the general branch.

Two repositories can accomplish the same thing. This would reduce the risk of you accidentally uploading your personal version to Github, but it would make it more tedious to work with the two different versions.

Personally, I would go with two branches in the same repo.

OTHER TIPS

To avoid pushing your specialized version to github, set the push.default config to tracking (or upstream for git >= 1.7.4.2). See http://longair.net/blog/2011/02/27/an-asymmetry-between-git-pull-and-git-push/ for the gory details.

Merges should work equally well whether you use a separate repo or a just a branch. As a high level answer, you're ultimately going to have to get really good at merging. A lot of that will come from really digging into how git works at a low level regarding branching, merging, and rebasing. Unlike some other version control systems, I've found the git requires a deep understanding of its internals to really use it properly.

I second the idea of using two branches: a public branch for the general version (which is pushed to GitHub) and a private branch for your specialized code (which is not published on GitHub). However, I would like to add that git stash is an essential tool that allows you to do what you want to do in the scenario that you outline (you're in the middle of working on the personal version, and you find a change to be done in the general version).

It is indeed good practice to always implement general changes in the general branch, and then do

git checkout personal
git merge general

Now, you can usefully use, in addition, git stash; let's take the scenario where your are updating the specialized version and thinking about a general change:

  1. You save your current changes to the specialized version:

    git stash
    

    This stashes the changes compared to the last commit, without creating a new commit. This is useful for storing your uncommitted work in progress.

  2. You go to the general branch so as to make the general change that you were thinking about:

    git checkout general  # or master, or whatever name your general branch has
    

    You can then implement your general modification and commit it as usual.

  3. Before resuming work on your specialized version, you import the general change:

    git checkout personal
    git merge general
    

    git is intelligent enough to do this nicely: only the latest, generally useful update should be made to your code.

  4. You resume your work on the specialized branch by importing your stashed work in progress:

    git stash pop
    

That's all! The key is to use git stash in order to save changes in the middle of your work, without creating a commit just for this, and then to apply back your changes with git stash pop.

Use git submodules. If you want to keep a seperate branch of the submodule, make a branch in that project's repo and keep all changes locally to that branch, and use only that branch in the submodule checkout.

The merging between different change sets is something you'll need to do half-manually. Git has great merge support, and if you don't keep the two branches too far from each other, you should be able to get by minimal manual intervention. github (or any other hosting provider) really has nothing to do with any of this. If you want to keep a branch private, don't push it to the public repo. Simple as that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top