hg-git clone from Github gives "abort: repository is unrelated"

https://stackoverflow.com/questions/17240852

01-06-2022
|

質問

I have a project whose main (Mercurial) repository is on SourceForge, but there are clones on Bitbucket (Mercurial) and Github (Git).

Now I have been using hg-git to push the Mercurial repository to Github and from what I understand of the procedure, some metadata is kept in the Mercurial repository in the process.

Now, when cloning the Bitbucket repository anew and cloning the Github repository anew as well, and if I issue a hg pull ../github-repo I get:

pulling from ../github-repo
searching for changes
abort: repository is unrelated

Why is that and how can I convince Mercurial that indeed they are related? Or do I have to rely on the original repository from which I pushed to Github originally? I still have that, but suppose I lost it, what would be the options I have, short of manual changeset transplantation?

Note: the Github repo was changed (new changeset) due to a pull request. But the SourceForge and Bitbucket repos still recognize each other as related. The mission now is to pull the changeset from the Github Git repo into a local one and push them back up to SourceForge and Bitbucket respectively.

解決

The related or non-related bit basically comes from whether two repositories share a common root, i.e. initial changeset.

To force the pull, you could do something evilish with the graft or transplant extensions, but this could have ripple effects, and you seem adverse to such a solution – I would be too!

To understand why you're having a problem, you need to understand a little bit about how Hg-Git works.

How Hg-Git works

Tl;dr

The real problem is that Hg-Git basically creates a new repo dynamically. Thus, the two repositories aren't related for the same reason that the product of hg convert some-existing-hg-repo isn't related to the original repository. You hadn't noticed it thus far because Hg-Git does this in the other direction as well – when you start from a Mercurial repository, it creates the necessary Git repository. When you first started cloning to GitHub, you created a bare Git repository on their servers, which for all intents and purposes is related to every repository. Thus, your push the new Git repo created by Hg-Git is related and everything works, no problem. Afterwards, you were pushing from that same repo, so again no problems – Hg-Git tracks the relationship between the local Git and Hg repositories and thus your relationship was maintained. But when you start afresh, you create a new Git and/or Hg repo (depending on which direction you're going in) and the correspondence is broken.

Slightly less oversimplified

Hg-Git works by creating a hidden Git repository and establishing a correspondence between commits of the Git and the Hg repositories. Hg-Git is a two-way bridge, that is, it is capable of taking Git commits and producing Hg commits and vice versa. Hg-Git achieves its bilingualism by using a Git-library written in Python (dulwich) and linking into Mercurial as an extension. This means that Hg-Git reads and writes Git repositories without needing a git binary / the Git reference implementation installed. However, Hg-Git is a Mercurial extension and as such depends on system Mercurial for the Mercurial end of the transaction as well as the user interface. This is why there are efforts to create the reverse interface (Git-Hg and the like) so that Git users can use the Git to interact with Mercurial.

Now, whether the Git, or the Hg repository is created depends on how the hybrid repository was created in the first place. Since you're coming from the canonical Mercurial side, we'll start there.

When you create a repository on GitHub or Bitbucket, it is initially bare and commitless and thus related to every repository – this is part of the motivation for the default of no initial commit on repository creation. (This is true for both Git and Mercurial.) Repository relatedness is based on the root node. Thus, any repository can pushed to this new repository. When you run hg push ssh+git://git@github.com/user/some-git-repo for the first time, Hg-Git creates a new, hidden Git-repository in your local folder, then uses the Git protocols to communicate and push the changes to the remote. From then on, you should have no problem communicating between the two repositories – from the initial conversion of the root node and parent-child relationships, it is possible to achieve a one-to-one mapping between the changesets of the two repositories. (This isn't quite 100% true, especially if you use more advanced, idiomatic features of Git or Mercurial, but it'll suffice for now.) Hg-Git tracks a tad bit more information than this, I'm pretty sure, if for no other reason than to speed things up with successive pushes and pulls. So, when you start from a Mercurial clone, your "proto-root" is the Mercurial root and the Git repository is created and maintained as need be.

Now, if you aren't starting from a local Mercurial clone, but rather from a remote Git clone, then you do actually wind up creating a Mercurial clone from the Git one – the "proto-root" is the Git root. More precisely, when you run hg clone ssh+git://git@github.com/user/some-git-repo, Mercurial starts up, checks to make sure it can interface with the remote (which it can with Hg-Git's help), then creates the directory and calls the necessary extension (s), i.e. Hg-Git. Hg-Git then creates a hidden .git folder in your .hg folder, performs a Git clone, then converts the Git repo to a Mercurial one; once that clone is complete, it calls hg update, which operates directly on the Mercurial repo without every knowing about the Git repo.

This, I suspect, is what went wrong in your case. When you did a new clone from GitHub, you in effect created a new Mercurial repository, which of course isn't related to your original repository – much in the same way that the product of hg convert isn't related to the original, even if the mutated commits don't include the initial one. (This is sort of like when you translate something to another language and back again, you don't always get the original form back.) For various reasons, I suspect that Hg-Git performs its conversions in a time-independent and deterministic way (almost definitely the latter, but it might add extra meta data about the conversion itself, which would mean not the former). If this is the case, then you should be able to start from a canonical Hg clone and recreate the connection to the Git repository. (Yes, it is somewhat problematic that the directionality of the initial conversion makes a difference, but the pros, and cons of the design decisions leading up to that are best discussed with the developers themselves.)

Back to the structure of a hybrid Hg-Git repository. There are two interesting things here:

Mercurial is more or less completely oblivious to the extra translation going on when it communicates with Git remotes and
There is a full-fledge Git repository hidden from view and occasionally synchronized to the Mercurial repository.

Importantly, you can actually operate directly on the hidden Git repository via system Git. If you use Hg-Git, then the Git repository is synchronized only on pushes to and pulls from remote Git clones, which means that those local direct Git changes will get out of sync with the Mercurial repository – in the worst case, you commit a few times to Git, then commit to Mercurial without synchronizing and effectively create two separate branches because the Hg commit and the Git commits share a common ancestor but don't build upon each other. However, Hg-Git provides a mechanism for manually forcing a sync between the repos via hg gimport [git-repo-to-import-from-if-not-local-hidden] and hg gexport (exports by default to local hidden copy, creating it if need be). Forcing this synchronization should also provide you a way deal with the issues you've noticed. You could namely use Git to pull (or in Git terminology, fetch – git pull is equivalent to hg pull --update; git fetch is hg pull, which makes the Mercurial fetch extension's name really unfortunate) the new changesets into the Git repository, then use hg gimport to import those changesets into the Mercurial repository.

Now, if you did something like editing history or the like, then all bets are off. I'm not sure how Hg-Git would handle this – I suspect it would wind up creating doubles. The new commits in the Mercurial clone would be added to Git, but the deleted changesets are still in the Git repo and would probably be imported back into the Mercurial repository. (This is a direct result of Hg-Git's method of offline synchronization of changesets.) In this case, I suggest picking a canonical repository, wiping all clones, and doing a new push with an apology to everybody whose clones were invalidated by this mess. (This is, incidentally, part of why the Mercurial community is so wary of editing history.)

Potential Solutions

@EmilSit suggested that you run hg pull git+ssh://github.com/you/githubrepo.git directly from the canonical (non GitHub clone) Mercurial repository. This has a decent chance of working, assuming that Hg-Git's method of creating the initial Git clone is completely time-independent and deterministic. (The latter is almost assuredly true, but I'm not sure of the former, see text above for more detail.)
You can do the local variant of this: use git clone ssh://github.com/you/githubrepo.git to get a local pure Git clone, then do hg pull ../githubrepo. (This requires that you have Git installed.) Hg-Git should automatically kick in and do the conversion. The conversion is also dependent on Hg-Git doing the conversion in a deterministic, time-independent way.
You can operate directly on the hidden Git repository in the original hybrid repository. Use git fetch (you may potentially first have to cd into the .git folder hidden in the .hg folder first). Then run hg gimport && hg update to import the changes from the Git repository and update. (You may have specify the path for gimport – either . or the path to the hidden Git repo. I suspect you could also specify the GitHub path.)
You can use various methods of dumb transplanting – exporting a patch series, etc. – and manually committing them. If you want to give another developer credit when doing manual commits, then you can use the -u option to set the user on a per-commit basis.
You can do smart transplanting with either the graft or transplant extensions. First, use Hg-Git to do a new Mercurial clone of the GitHub repository. Then use one of those extensions to pull bring the two Mercurial repositories together.

At least one of the non-transplanting methods should work because, unless Hg-Git does its magic time-dependently, then it should be possible to find a common root. Even if a common root is found though, you may wind up with two basically duplicate (unnamed) branches, which you then have to merge back together.

他のヒント

I'd add that you can even get a "repository unrelated" error when you push a hg repo to git, then clone the git repo from hg and then you try to pull in changes from the original hg repo. Since now we have a hg repo locally created from a git repo that was created from the original hg repo I suppose the local and the original hg repo should be related, but sometimes aren't.

Due to the differences in how hg and git handle author names and e-mails if your original hg repo had anything different for authors than Name <mail@example.com>-style ones you'll see this issue. The reason is that hg-git tries to convert authors to the strict git-style (that uses the mentioned name-email pairs) and will fill in the blanks if this is not the case (see the explanation in hg-git's Readme: https://bitbucket.org/durin42/hg-git).

Thus it can happen that the author for a changeset in the original hg repo is not exactly the same than in the git repo; and as a result, authors in the hg repo created from the git repo won't match the ones in the original hg repo, e.g.:

Changeset A in the original hg repo has the author set as mail@example.com.
Since this doesn't conform to git's standard hg-git will convert this to mail@example.com <mail@example.com> in the git repo.
Now when you clone the git repo to hg the changeset will have the author as mail@example.com <mail@example.com>.

Since for two repos to be related the initial commit should match exactly you'll get the "repository unrelated" error even if the hash, commit message, datetime matches, but the author is different. Quite a pain to experience (argh, now I'm punished because I forgot to set the author properly three years ago!) but completely reasonable.

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow