Frage

I use and love GIT but I'm trying to really understand it.

I'm a bit stuck on remotes, tracking and such.

  • I create a GIT repo called main
  • I clone main to create second
  • I make a change to second and commit it
  • I cd back into main and I do

    git fetch ../second
    ->  * branch            HEAD       -> FETCH_HEAD
    

In other words, I'm fetching into the original repo which doesn't know anything about its clones.

Using gitk --all, nothing seems to be added.

  • If I do git pull instead, the changes I made in second show up.
  • If I register second as a remote, fetch works as I'd expect.

My questions:

  • So when I fetch from an unlinked/untracked/unanything repo, is it doing anything?
  • Why does pull work?
War es hilfreich?

Lösung

Note: using an explicit "remote" is the way to go these days (see below for why). Naming a url directly is a very old (and pretty much obsolete) method.

If you were to run gitk --all FETCH_HEAD you'd see something different (try it and see). The reason is that --all only names all refs in refs/ (see below).

Remotes and refspecs

What's a "remote"?

A remote is, in concrete terms, an entry in the git config file (usually .git/config within the repo itself). Or rather, a series of entries under a section, remote.name:

[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = ssh://some.host.name/path/to/repo.git

or similar. The point of this is to record some common items so that you don't have to repeat them all the time. In particular, the url (and optionally push-url) do not have to be spelled out after this. The fetch = line is also important, as noted below. (It's different for a "mirror" than for a "regular" repository.)

If you run git fetch with a repository argument instead of a URL ...

If you do git fetch ../second as shown above, you're naming a repository directly, rather than a "remote". So you don't need a remote "origin" section and all its entries, but instead, you may have to do more work / typing. You can name the other repository by a full url like ssh://... or https://... or whatever; for the special case of a repository already on your own machine, you can use a relative path name, as in your example.

I find it best to think about the refspec as identifying a "remote repository", probably on some other machine over a network. This helps keep clear in my mind who has access to what. The special case of a "remote repository" being on your own local machine is, well, a special case. Obviously if it's on your local machine, it's accessible at all times. Other remotes are often less accessible.

Consider the case of cloning, e.g., the source code to git itself from some web site (kernel.org or wherever), onto a laptop. At some point you unplug the laptop and take it with you—maybe onto a plane, where you won't have network access. So "they" give you access to their repository and you copy it to yours. Once you have everything, you don't need "theirs" except to occasionally re-synchronize with them.

git fetch can take more than two arguments

If you run git fetch repository refspec, fetch updates not only the objects in your local repository, but also some set of "ref-names" (references; see below). The last argument to fetch is the "refspec" part, which (to ignore some technicalities) is basically a pair of ref-names, separated by colons. For instance, you might write git fetch ssh://... master:refs/remotes/origin/master.

You need to specify which ref-names, if any, on the place you're fetching from, should have their objects brought over—but also, just as important, what name(s) those should be given in "your" repository. Sure, "they" have branch master, but also branches maint (maintenance), next, and so on. Initially, you could give them the same branch names in your repository—but then after you've been working, and added stuff, and you re-synchronize with them, *their master and your master are different. So you need a different name under which to put "their branch" when you fetch their updates to their master.

Running git fetch with a remote name, like origin, provides a refspec for you, via that fetch line (in fact, there can be multiple fetch lines, for multiple refspecs). But when you're not using a remote, you have to provide your own refspecs. You didn't, so you got a default (more about this in a moment).

References

References include things like branch and tag names. However, they're much more general and flexible than that. In fact, HEAD is also a reference. References have a whole "name space" thing going on: they are almost all spelled starting with refs/, and particular kinds of refs live in different parts of this space. The four you will use all the time are HEAD (which is kind of special—it doesn't start with refs/ and git uses it internally all the time—but it is still a reference), branches (local branches), tags, and remote branches.

(In fact, the HEAD reference name is so special that if you remove it, git decides that you no longer have a repository after all.)

Git will usually automatically choose the "right kind" of ref and not make you spell it all out, but it helps to know all this stuff, especially when git gets confused and its "figure it out and do what I mean" code does something you did not actually mean.

Local branches

Local branches live in refs/heads/, so your local master branch is actually the full name refs/heads/master. When you create new branches, this just adds more refs/heads/ names. (Those wind up in files in your local repository. Creating a branch just needs to create a tiny 41-byte file. This is why branching is so fast and easy in git.)

Usually, you leave off the refs/heads/ part and just write your branch name. Git knows what to do.

Tags

Tags live in refs/tags/: the tag v1.0 is just refs/tags/v1.0. Using --tags with git fetch just tells it to add refs/tags/*:refs/tags/* to the refspecs it will update. (In some versions of git this is a "replace" instead of "add".)

Usually, you leave off the refs/tags/ part and just write the tag name. Since you're running a command like git tag or git fetch --tags, git knows what to do.

Remote branches

Despite the name, "remote branches" are actually a local thing, kept in "your" repo. In other words, they come with you when you take the laptop on the plane.

Remote branches live in refs/remotes/, and then one have more name-part that is just the name of the remote. For the origin remote, for instance, you get refs/remotes/origin/master to keep track of what was in master on remote origin. If origin also has a branch named maint, you can keep track of "what was in maint over there" in your own, local, refs/remotes/origin/maint.

Again, usually you leave out the refs/heads part—but this time, you keep the remote-name. So you write things like origin/master and origin/maint.

One big reason for the extra name-part is that you can have more than one remote. If you have remotes origin and fred, you keep your copy of master-on-origin in origin/master, and you keep your copy of master-on-fred in fred/master. The other big reason for the extra name-part is that when you write origin/master, git can tell that you mean the remote branch master, not your local master.

These "remote branches" are what git fetch needs to update. But, in order to update them automatically, it needs to know the name of the remote. That's why git fetch remote is "better": it just does all this automatically. You could write them out explicitly, with git fetch url "+refs/heads/*:refs/remotes/origin/*", but it sure is nicer to have it all saved away under remote "origin".

The obsolete way

Long ago, git did not have all this stuff. Instead, you ran git fetch url refname, e.g., git fetch ssh://... master.

To make this work, fetch had to not clobber your master. So what it did—and still does—is go to the remote repository and bring over all the repository-objects needed, drop them into your repository, and then write another "special" reference, FETCH_HEAD. (Like HEAD and MERGE_HEAD and a few more special names, FETCH_HEAD does not live under the refs/ space.)

This happens any time you write a refspec and leave out the colon. And, if you leave out the refspec entirely, that means the same as if you had written HEAD. Thus:

  • git fetch url master means git fetch url master:FETCH_HEAD
  • git fetch url maint means git fetch url maint:FETCH_HEAD
  • git fetch url means git fetch url HEAD:FETCH_HEAD

Note that the remote repository is a git repository ("well duh" :-) ). This means it has a HEAD. If it's a typical repository for fetching, its HEAD is the same as its master, so that the default you get is to fetch master and write that into FETCH_HEAD.

git pull

Git's pull command is basically just a convenience method. It "means" the same thing as git fetch followed by git merge (or, with git pull --rebase, git fetch followed by git rebase, but let's ignore that here).

It's a somewhat weird and (my opinion) broken convenience method, though. (Much is to be fixed in git 1.9.) When you run:

git pull origin master

for instance, what git pull does is to invoke git fetch "the old way", so that this brings over origin's master but fails to update refs/remotes/origin/master. Instead, it just puts the stuff-brought-over reference into FETCH_HEAD. There, it's invisible to most commands, including gitk --all.

But the next thing git pull origin master does is to run (in effect):

git merge FETCH_HEAD

This merges the changes into your current branch, which makes them visible to most commands, including gitk --all.

In this particular case, it does not matter whether you run git pull remote branch or git pull url branch, or either of those without a branch argument, as the pull script prevents git fetch from updating the remote-branch names.

(In git 1.9, a git pull with a remote name, or a git pull with no arguments that is able to compute the remote name, will run git fetch such that it updates the remote-branch names.)

Andere Tipps

Git fetch doesn't integrate anything to your repository, i.e. it only downloads the information about changes to your local copy, not the changes themselves. Git pull on the other hand

In its default mode, git pull is shorthand for git fetch followed by git merge FETCH_HEAD.

So pull will actually integrate the changes from the remote repository into your current branch, not just fetch the information about the changes into the repository index. You might want to do this for instance to check what changes have been made to the remote before merging the changes to your local copy. Here's a good article about the matter.

The following should illustrate what's going on

> git init main
> git clone main other
> cd other
> touch file.txt
> git add file.txt
> git commit -m "Added file.txt"
master (root-commit) 027216b] Added file.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file.txt

> cd ../main
> touch a.txt
> git add a.txt 
> git commit -m "Added a.txt"
[master (root-commit) 8ac8913] Added a.txt
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 a.txt

> git fetch ../other/ master
warning: no common commits
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From ../other
 * branch            master     -> FETCH_HEAD

> git diff master FETCH_HEAD
diff --git a/a.txt b/a.txt         <-- a.txt does not exist in FETCH_HEAD
deleted file mode 100644           <-- a.txt does not exist in FETCH_HEAD
index e69de29..0000000
diff --git a/file.txt b/file.txt   <-- file.txt does not exist in master
new file mode 100644               <-- file.txt does not exist in master
index 0000000..e69de29

So the changes from ../other have been fetched on to FETCH_HEAD, you can look at the diff and merge the changes from there if you so wish, instead of having just doing a pull and risking the changes from the remote breaking something in your current branch.

The other option is to always pull on to a newly created branch and look at the diff there, but that's a little bit cumbersome.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top