What are the merge semantics of git fast-import streams?

Question

if I understand your question, you're wondering exactly what shortcuts fast-import lets you take when streaming the contents of a commit into it.

As far as I can tell from reading git/fast-import.c and the manual page, fast-import initializes the tree for a new commit from the tree that was provided in the "from" command. "filemodify" and friends begin from that state to construct the new tree that will be committed at the end.

The fast-import command does not appear to change the tree at all when encountering "merge" commands; if you want to include changes from parents other than the first, you need to specify exactly which files you want to bring in. You can use marks or object hashes to name the other-branch files for "filemodify" though.

edit: Ah, let's go deeper into the git model.

In git, a commit points to a tree that represents the entire contents of the directory hierarchy being tracked, as it stood at the time of that commit. Commits do not carry any information about how they're different from their parents; the theory is that you can reconstruct the diff if you need it by comparing these trees.

A merge commit is distinguished from non-merges only by the fact that it has two or more parents. It still has a single tree, recording exactly what's in the version that resulted from performing the merge. It still does not record anything about how its author combined the parents into a merged version. The git "porcelain" commands like git log and git diff do magic to reconstruct a useful description of what happened.

Conceptually, to create a new commit object, you need to describe the complete mapping of paths to file contents that goes in that commit. (Much cleverness goes into making that efficient and simple instead of awful.)

The git fast-import command provides a shortcut for the common case: Usually the VCS you're exporting from can tell you how this commit was formed as some kind of diff from the most recent commit on the same branch. In that case, you can effectively encode the diff into fast-import's stream format for a simpler and faster import.

But you have to remember it's only a shortcut for re-constructing the entire tree from scratch.