Question

Let's say a project is:

  • 1 product
  • built over Y years
  • comprising M modules
  • written in L [1..3] languages
  • developed by total of D developers

At what point does a project contain too many or too few live branches?

I know it is a hard question, it is even harder to answer numerically, I am however looking for quantified answers, if at all possible, please make a formula.

Background

If there are too few branches, code is never ready, developers don't make large changes because it may be impossible to meet next deadline. Likewise product managers never feel confident enough to name something a release. Feature freeze is often established, new ideas are delayed, development slows down.

If there are too many branches, developers are unsure where their changes should go and where they ought to be propagated, which branch will be merged to which trunk. Merging refactored code is very hard., quality goes down. Moreover, each developer has to test their changes in several setups, considerable effort is wasted, development slows down.

What is the optimal range for number of live branches?

Was it helpful?

Solution

What is the rule of thumb to determine that RCS (svn or git) contains too many branches?

How about rule of 3:

  • One branch for stable code — main trunk;
  • One branch for unstable — upcoming release development;
  • And one more for maintenance — previous release's bug-fixes;

Many git-hosted projects use only two branches: master for main trunk and vNext for future release.

Use tags feature for labeling milestones in your development.

Please allow your developers to create development branches locally and merge them to these remote branches depending on the task they are performing.

Ask developers to add meaningful names and descriptions to the local branches. And label commits accordingly.

OTHER TIPS

There is no one answer to this question. It can only be "what works for your organization & workflow."

IMHO, once a branch has outlived its usefulness (all changes merged back to trunk, for example. Or a failed experiment/research project that has been abandoned or ended), it should be deleted. You can always get it back if you need it, and that will help to keep things a little tidier.

Based on your edit: If there are branches which are obsolete or "dead", why are they still around? By definition, there is no longer a use for them, so just delete them.

If you're using git having multiple branches can be useful, even if they're dead. You can track in which product version the bug was introduced (bisect by versions), you can organize your work for many small teams. You can see why some ideas didn't work out just by looking at dead branches.

The key is consistency. Try to group your branches to fit your workflow. You can for example have

  • stable - CI builds production and/or staging from this one
  • staging - CI build staging from this one
  • feature/* - branches for features
  • hotfix/* - started from staging/stable branch, used for hotfixes
  • experimental/* - used for R&D functionality that might not result in clean and maintainable code or may be abandoned halfway through

Some basic tips here: http://nvie.com/posts/a-successful-git-branching-model/

Also, if you want your team to quickly start using a good branch structure try git flow: http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flow/

About refactoring other co-workers bad code. You can use some refactor/* branches so you can easily see what broken by having merges/rebases atomic on a separate branch. Of course having tests is EXTREMELY useful, but if you don't a simple git bisect will show you who and when introduced a bug (and if you write a test to check for this bug which bisect would use you now have a meaningful test to add to your test suite).

Bottom line is: don't be afraid of having many branches, just keep them organised. Merges most of the time aren't that complex as people say they are and you can always undo them (or postpone till the next release if you aren't using continous delivery model).

EDIT: By something/* I mean is having multiple branches with a common something/ prefix, thus mimicking a directory structure.

EDIT2: Things are different on SVN-likes, where branching and merging isn't that cheap. Proceed with caution ;)

EDIT3: Consider using different (sub)repositories for different modules. It might simplify the development. Module developers are only concerned by the latest stable version of the main application and branch model is only oriented towards one module.

EDIT4: Q: "can you perhaps consider to put some numbers or formulae at the threshold below which messy branching is acceptable and above which branches better be organised?"

Sure!

Formula (in my humble opinion) is simple.

  • have as many 'dirty' branches locally as you want (just don't push them to other people or shared repositories)
  • try not to push dirty branches unless they represent some value for other developers. If you have them already in your shared repository keep them and rename them (legacy/* or plain dirty/* comes to mind).

Think of them as a file structure. You can have many legacy files no longer needed but if you keep them in an organised manner you can easily separate the archive from your working set.

Seeing as you like numbers you probably would like a real world use case for those branches

Let me give you an example of a small-middle sized Symfony2 PHP project I've been working on.

If you have a project going on for 6-9 months developed actively by 5 developers in agile (scrum) manner with a client demo every two weeks you might want to have branches:

  • per user story (on tightly integrated user stories this might be a bad idea), around 50 branches total, think of them as feature branches
  • per developer (on demand, if developer needs to work on something for a while), they come and go, but usually developers have less than 3 in this kind of projects. Some of them don't use public developer branches at all and keep their dirty branches to themselves
  • experimental (unlimited number of branches for research purposes, for example different algorithm or library used in module), about 7 branches as far as I remember
  • per sprint (merged from user stories, helpful for demoing), around 10, these we're our staging/stable during initial development. Why not tags? Tags too, but branches because it's easier to apply a hotfix.
  • hotfixes (usually short-lived, isolated for easy cherry-picking), 3 tops ;)
  • misc (usually on-going system-wide features and/or 2-3 people team branches), around 10

As you can see there isn't an exact number here either. I've done several projects of this size and most of them had about 70-80 branches (20-30 without user story branches). If they are organised in a logical, clean way the code repository is easy to browse.

With git consider also rebaseing instead of merging, so you won't get merge bubbles (check out this article http://stevenharman.net/git-pull-with-automatic-rebase ).

For Git and SVN there are different ways to view this. But most importantly is what it means to have too many of something, is it for performance reasons or organizing reasons.

So git performance:

In git, a branch is nothing more than a reference to an object that is part of the git repository. So there is no major overhead for this branch object. (see http://git-scm.com/book/en/Git-Branching-What-a-Branch-Is for more information) If you destroy the branch, you still have ALL the git history in your repository, its just a matter of getting to it. In general - there is no extra storage for the branch. If you are using disposable branches, then delete away, but your commits still exist. The overall performance of git is slowed down by size of the repo, but branches are not created it to get bigger, commits are. You can repack or remove objects to clean up and speed up your git repo.

So SVN performance:

In SVN, branches are copies of your working tree, however, these copies are not duplicated data. Again, they are references to the existing tree - as long as svn-copy is used. (see http://svnbook.red-bean.com/en/1.1/ch04s02.html#svn-ch-4-sect-2.1 for more information) SVN handles large files and repositories well, but again, the size of the repo is not majorly affected by branches, instead they are efficient and cheap. As a matter of fact, quoting the svnbook: "The main point here is that copies are cheap, both in time and space. Make branches as often as you want."

Organization of both SVN and git:

Well as I said above, branching is cheap, so it can happen often, disposable branches should be removed, but historical branches are cheap to keep around. Basically, you should have an easy way to name your branches for convention: release_1, bugfix_200, temp_branch_for_fun - anything that makes the name self-organize when listing them alpha-numerically. Using tags is nice as well, both in SVN and git - people tend to feel more comfortable with branches, however, in git they really are the same and in SVN, they are useful for points in time reference.

How many is too many branches - that is a business logic decision. I prefer to have many disposable branches for Work In Progress, while having only historical branches for releases. But that is only good in an Iterative Workflow. Recently I have been moving development towards a Continuous Delivery workflow, and a distributed git forked network - then I don't care at all how many branches the dev's git fork has - instead my origin or Mainline repository only has 1 permanent branch: master and disposable branches for High Priority Critical hotfixes.

The methodology I mentioned above has worked out great, since now each developer can branch and forget as much as they like without annoying any other developer. As well, my master is kept free of branch clutter. (This will also work for an Iterative Approach, and would have branches for releases only, no temporary or disposable branches hanging around).

Everyone is happy.

What is the rule of thumb to determine that RCS (svn or git) contains too many branches?

None. Such common rule doesn't exist and (if any local still can be defined) at last it's heavy team- and workflow-dependent.

With "Branch per task" you may have a lot of short-term branches for even small- and middle-size codebase (for any size of team and amount of languages in use)

Moved from comment:

1-N not closed task-branches per developer (N depends from a lot of factors) 1-2 active per developer + some common shared (stable-unstable-released-...)

Branches in git are considered disposable. If you don't need it and will not use it again you drop it. However in my case I do manage a lot of branches but only 1% of them are in use. I have locked the rest and made sure I have the proper tags applied to the release version.

You may need to define if all the branches in question are active at once. As you can have 100 branches but only 1 is in use. But if you were to have 100 active branches for 1 project, then YES I will say thats too much and it just shows bad management.

Hope this help !

If you want to archive branches, I think the usual solution is to push them to an archive repo and then delete them; if you ever want them back, you know where to look.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top