Question

My question is about the way in which Git handles branches: whenever you branch from a commit, this branch won’t ever receive changes from the parent branch unless you force it with a merge.

But in other systems such us Clearcase or Accurev, you can specify how branches get filled with some sort of inheritance mechanism: I mean, with Clearcase, using a config_spec, you can say “get all the files modified on branch /main/issue001 and then continue with the ones on /main or with this specific baseline”.

In Accurev you also have a similar mechanism which let’s streams receive changes from upper branches (streams how they call them) without merging or creating a new commit on the branch.

Don’t you miss this while using Git? Can you enumerate scenarios where this inheritance is a must?

Thanks

Update Please read VonC answer below to actually focus my question. Once we agree "linear storage" and DAG based SCMs have different capabilities, my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth?

Was it helpful?

Solution

To understand why Git does not offer some kind of what you are referring to as an "inheritance mechanism" (not involving a commit), you must first understand one of the core concepts of those SCMs (Git vs. ClearCase for instance)

  • ClearCase uses a linear version storage: each version of an element (file or directory) is linked in a direct linear relationship with the the previous version of the same element.

  • Git uses a DAG - Directed Acyclic Graph: each "version" of a file is actually part of a global set of changes in a tree that is itself part of a commit. The previous version of that must be found in a previous commit, accessible through a single directed acyclic graph path.

In a linear system, a config spec can specify several rules for achieving the "inheritance" you see (for a given file, first select a certain version, and if not present, then select another version, and if not present, then select a third, and so on).

The branch is a fork in a linear history a given version for a given select rule (all the other select rules before that one still apply, hence the "inheritance" effect)

In a DAG, a commit represents all the "inheritance" you will ever get; there is no "cumulative" selection of versions. There is only one path in this graph to select all the files you will see at this exact point (commit).
A branch is just a new path in this graph.

To apply, in Git, some other versions, you must either:

But since Git is a DAG-based SCM, it will always result in a new commit.

What you are "losing" with Git is some kind of "composition" (when you are selecting different versions with different successive select rules), but that would not be practical in a DVCS (as in "Distributed"): when you are making a branch with Git, you need to do so with a starting point and a content clearly defined and easily replicated to other repositories.

In a purely central VCS, you can define your workspace (in ClearCase, your "view", either snapshot or dynamic) with whatever rules you want.


unknown-google adds in the comment (and in his question above):

So, once we see the two models can achieve different things (linear vs DAG), my question is: which are the real life scenarios (especially for companies more than OSS) where linear can do things not possible for DAG? Are they worth it?

When it comes to "real-life scenario" in term of selection rules, what you can do in a linear model is to have several selection rules for the same set of files.

Consider this "config spec" (i.e. "configuration specification" for selection rules with ClearCase):

element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... aLabel2 -mkbranch myNewBranch

It selects all the files labelled 'aLabel2' (and branch from there), except for those labelled 'aLabel3' - and branch from there - (because that rule precedes the one mentioning 'aLabel2').

Is it worth it?

No.

Actually, the UCM flavor of ClearCase (the "Unified Configuration Management" methodology included with the ClearCase product, and representing all the "best practices" deduced from base ClearCase usage) does not allow it, for reasons of simplificity. A set of files is called a "component", and if you want to branch for a given label (known as a "baseline"), that would be translated like this following config spec:

element /aPath/... .../myNewBranch
element /aPath/... aLabel3 -mkbranch myNewBranch
element /aPath/... /main/0 -mkbranch myNewBranch

You have to pick one starting point (here, 'aLabel3') and go from there. If you want also the files from 'aLabel2', you will make a merge from all the 'aLabel2' files to the ones in 'myNewBranch'.

That is a "simplification" you do not have to make with a DAG, where each node of the graph represents a uniquely defined "starting point" for a branch, whatever is the set of files involved.

Merge and rebase are enough to combine that starting point with other versions of a given set of files, in order to achieve the desired "composition", while keeping that particular history in isolation in a branch.

The general goal is to reason in "coherent Version Control operations applied to a coherent component". A "coherent" set of files is one in a well-defined coherent state:

  • if labelled, all its files are labelled
  • if branched, all its files will branch from the same unique starting point

That is easily done in a DAG system; it can be more difficult in a linear system (especially with "Base ClearCase" where the "config spec" can be tricky), but it is enforced with the UCM methodology of that same linear-based tool.

Instead of achieving that "composition" through a "private selection rule trick" (with ClearCase, some select rule order), you achieve it only with VCS operations (rebase or merge), which leave a clear trace for everyone to follow (as opposed to a config spec private to a developer, or shared amongst some but not all developers). Again, it enforces a senses of coherency, as opposed to a "dynamic flexibility", that you may have a hard time to reproduce later on.

That allows you to leave the realm of VCS (Version Control System) and enter the realm of SCM (Software Configuration Management), which is mainly concerned with "reproducibility". And that (SCM features) can be achieved with a linear-based or a DAG-based VCS.

OTHER TIPS

It sounds like what you're looking for might be git rebase. Rebasing a branch conceptually detaches it from its original branch point and reattaches it at some other point. (In reality, the rebase is implemented by applying each patch of the branch in sequence to the new branch point, creating a new set of patches.) In your example, you can rebase a branch to the current tip of an upper branch, which will essentially "inherit" all the changes made to the other branch.

I'm not totally clear on what your asking for but it sounds like git's tracking semantics are what you want. When you branch from am origin you can do something like:

git -t -b my_branch origin/master

And then future "git pull"s will auto merge origin/master into your working branch. You can then use "git cherry -v origin/master" to see what the difference is. You can use "git rebase" before you publish your changes to clean up the history, but you shouldn't use rebase once your history is public (i.e. other people are following that branch).

As to the inheritance scheme used by accurev: GIT users will probably "get" the whole thing when the look at git-flow (see also: http://github.com/nvie/gitflow and http://jeffkreeftmeijer.com/2010/why-arent-you-using-git-flow/)

This GIT branching model more or less does (manually / with the help of the git-flow tool) what accurev does out-of-the-box automatically and with great GUI support.

So it appears GIT can do what accurev does. Since I never actually used git/git-flow day-to-day I can't really say how it works out but it does look promising. (Minus proper GUI support :-)

I'll try to answer you question. (I have to say here that I have not used GIT only read about it, so if something that I mention below is wrong, please correct me)

"Can you enumerate scenarios where this inheritance is a must?"

I won't say it is a must, because you can solve a problem with the tool you have, and might be a valid solution for your environment. I guess it is more a matter of the processes than the tool itself. Making sure your process is coherent and also allows you to go back in time to reproduce any intermediate step/state is the goal, and the plus is that the tool let you run you your process and SCMP as painless as possible

The one scenario I can see it is handy to have this 'inheritance' behavior and use the power of the config spec, is when you want your set of changes "isolated" mapped to a task (devtask, CR, SR, or whatever defines the purpose/scope of your change set)

Using this composition allows you to have your development branch clean and still use different combination (using composition) of the rest of the code, and still have only what is relevant for the task isolated in a branch during the whole life cycle of the task, just until the integration phase.

Being purist having to commit/merge/rebase just to have a "defined starting point" , I guess it would 'pollute' your branch and you will end up with your changes + others changes in your branch/change set.

When/Where this isolation is useful? The points bellow might only make sense on the context of companies pursuing CMM and some ISO certifications, and might be of no interest for other kind of companies or OSS

  • Being really picky, you might want to accurately count the lines of code (added/modified/deleted) of the change set corresponding to a single developer, later used as one input for code and effort estimations.

  • It can be easier to review the code at different stages, having just your code in a single branch (not glued with other changes)

On big projects with several teams and +500 developers actively working concurrently on the same base code, (where graphical individual element version trees looks like a messy tangled web with several loadlines, one for each big customer, or one for each technology ) large config specs using composition of several degrees in depth made this amount of people work seamlessly to adapt the same product/system (base code) to different purposes. Using this config spec, dynamically gave each team or sub team, a different view to what they need and from where they need to branch of, (cascading on several cases) without the need of creating intermediate integration branches, or constantly merging and rebasing all the bits that you need to start with. Code from the same task/purpose was branching of different labels but made sense. (You can argue here the 'known baseline' as a principle of the SCM but simple labels contemplated in a written SCM Plan did the work) It must be possible to solve this with GIT (I guess in a non dynamic way) but I find really hard to picture without this 'inheritance' behavior. I guess the point mentioned by VonC "if branched, all its files will branch from the same unique starting point" was broken here, but beside it was well documented on the SCMP, I remember there were strong business reason to do it that way.

Yes building these config specs that I mentioned above was not free, in the beginning there where 4-5 well paid people behind the SCM but were later reduced by automated scripts that asked you what you want on terms of labels/branches/features and will write the CS for you.

The reproducibility here was achieved by just saving the Config Spec along with the task in the devTask system, so each task upstream mapped to requirements, and downstream mapped to a config spec, an a set of changes (code files, design documents, test documents etc)

So up to here one conclusion here might be, only if your project is big/complicated enough (and you can afford SC Managers along the life of the project:) ) then you only will start thinking if you need the 'inheritance' behavior or really versatile tool, otherwise you will go directly to a a tool that is free and already take care of the coherence of you SCM ... but there could be other factors on the SCM tool that might make you stick to one or to another ...read on..

Some side notes, that might be out of topic, but I guess in some cases like mine need to be considered.

I have to add here that we use the "good-ol CC" not UCM. Totally agree with VonC on the a good methodology allows to "guide" the flexibility towards a more coherent configuration. The good thing is that CC is pretty flexible and you can find (not without some effort) a good way to have thing coherent while in other SCM you might have it for free. But for example here (and other places that I've worked with CC) for C/C++ projects we cannot afford the price of not having the winkin feature (reusing the Derive objects), that reduce several X times compiling time. It can be argued that having a better design , a more decoupled code, and optimizing Makefiles can reduce the need to compile the whole thing, but there are cases that you need to compile the whole beast many times a day, and sharing the DO saves heaps of time/money. Where I'm now we try to use as much free tool as we can, and I think we will get rid of CC if we can find a cheaper or free tool that implements the winkin feature.

I'll finish with something that Paul mention , different tools are better that other for different purposes but I will add that you can get away from some limitation of the tool by having a coherent process and without scarifying reproducibility, key points off the SCM In the end I guess the answer to it is worth? depends on your "problem", the SDLC you are running, your SCM processes, and if there is any extra feature (like winkin) that might be useful in your environment.

my 2 cents

Theory aside, here's a kind of obvious practical take on this, from my perspective using AccuRev in a commercial production environment for a number of years: The inheritance model works very well as long as child streams haven't diverged too much from ancestors that are still in development. It breaks down when the inheriting streams are too different.

Inheritance (later versions as children of earlier ones) allows changes in ancestor streams to be active in child streams without anyone doing anything (unless a merge is required, in which case it shows up as deep overlap, which is good to be able to see).

That sounds great, and in practice it is, when all streams involved are relatively similar. We use that model for hotfix and service pack level streams below a given production release. (It's a actually a bit more complicated than that for us, but that's the general idea.)

Production releases are in parallel, no inheritance, with those hotfix and service pack children below each of them. Starting a new release means creating a new release-level stream, and manually pushing everything from the most recent maintenance stream for the prior release into it. After that, changes to earlier releases that apply to later ones have to be manually pushed into each of them, requiring more work, but allowing much greater control.

We originally used the inheritance model across all releases, where later ones were children of earlier ones. That worked well for a while, but got unmanageable over time. Major architectural differences across releases made unavoidably inheriting changes a Bad Idea. Yes, you can put a snapshot in between to block inheritance, but then all changes have to be pushed manually, and the only real difference between parent-snapshot-child and parallel non-inheriting streams is that the entire graphical stream view continually pushes down and to the right, which is a PITA.

One really nice thing about AccuRev is that you have this choice, all the time. It's not a inherent constraint of your SCM program's architecture.

Have you noticed that you can checkout specfific file versions with GIT too?

Just use this:

git checkout [< tree-ish >] [--] < paths >

Like per config spec any existing version of a file (paths) can be loaded into the worktree. Quote from git-checkout docs:

The following sequence checks out the master branch, reverts the Makefile to two revisions back, deletes hello.c by mistake, and gets it back from the index:

$ git checkout master             
$ git checkout master~2 Makefile             
$ rm -f hello.c            
$ git checkout hello.c            

ClearCase, without MultiSite, is a single repository but Git is distributed. ClearCase commits at the file level but Git commits at the repository level. (This last difference means the original question is based on a misunderstanding, as pointed out in the other posts here.)

If these are the differences we're talking about then I think 'linear' versus 'DAG' is a confusing way to distinguish these SCM systems. In ClearCase all the versions for a file are referred to as the file's version "tree" but really it is a directed acyclic graph! The real difference to Git is that ClearCase's DAGs exist per file. So I think it is misleading to refer to ClearCase as non-DAG and Git as DAG.

(BTW ClearCase versions its directories in a similar way to its files - but that's another story.)

I'm not sure if you are asking anything, but you are demonstrating that Accurev streams are different tools than Git (or SVN) branches. (I don't know Clearcase.)

For example, with Accurev you are forced, as you say, to use certain workflows, which gives you an auditable history of changes that is not supported in Git. Accurev's inheritance makes certain workflows more efficient and others impossible.

With Git you can have exploratory coding segregated in local repos or in feature branches, which would not be supported very well by Accurev.

Different tools are good for different purposes; it's useful to ask what each one is good for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top