Workflow to synchronise Mercurial repositories via email with bundles

https://stackoverflow.com/questions/23122883

05-07-2023
|

Question

I have two directories on two different computers - machine A (Windows) and machine B (OSX) - and I want to keep the two directories via Mercurial in sync. [*]

The restriction is that the two machines are not connected via LAN/WAN; the only way to move data between them is via email. So I thought emailing Mercurial bundles as deltas could do the trick.

My current workflow is roughly this (using a local tag lcb for the latest change bundle):

Say I work on machine A. At the end of the day I do:

hg commit -A -m "changes 123"
hg bundle --base lcb bundle_123.hg
hg tag --local -f lcb --rev tip

finally then I email that bundle to machine B.

Then sitting at machine B I do

hg unbundle bundle_123.hg
hg merge
hg commit -A -m "remote changes 123"
hg tag --local -f lcb --rev tip

Now I'm working on machine B and at the end of the day I do what's listed under 1., but on machine B. And the cycle continues...

However, I'm worry this system is not robust enough:

In-between changes: What happens when after creating a bundle (Step 1) and before applying it remotely (Step 2) a changes occurrs on the remote machine B? I had a case where it just overwrote the changes with the new bundle without conflict warning or merge suggestion.
Double-applying of bundle: What happens when by accident a bundle is applied twice? Would be needed to record the applied bundles somehow with local tags?

Or is there another better workflow to transfer Mercurial deltas via email?

[*] From the answer to a superuser question I figured that Mercurial might be the most feasible way to do this.

Solution

In-between changes: What happens when after creating a bundle (Step 1) and before applying it remotely (Step 2) a changes occurs on the remote machine B? I had a case where it just overwrote the changes with the new bundle without conflict warning or merge suggestion.

If a change is made on machine B, then this change will have been made in parallel with the changes you bundled from machine A. It doesn't really matter if the changes are made before or after you create the bundle (time-wise), it only matters that the changes on machine B don't have the head from machine A as their ancestor.

In other words, the world looks like this when the two machines are in sync:

A: ... [a]

B: ... [a]

You then create some new commits on machine A:

A: ... [a] --- [b] --- [c]

B: ... [a]

You bundle using [a] as base, so you get a bundle with [b] and [c]. Let us now say that someone (perhaps yourself) makes a commit on machine B:

A: ... [a] --- [b] --- [c]
              (  bundled  )

B: ... [a] --- [x]

So far nothing has been exchanged between the two repositories, so this is just a normal case of people working in parallel. This is the norm in a distributed version control system — people working in parallel is that creates the need for merge commits.

The need for a merge is not evident in either repository at this point, they both have linear histories. However, when you unbundle on machine B, you see the divergence:

A: ... [a] --- [b] --- [c]
              (  bundled  )

B: ... [a] --- [x]
          \
           [b] --- [c]
          ( unbundled )

It is helpful to realize that hg unbundle is exactly like hg pull, except that it can be done offline. That is, the data stored in a bundle is really just the data that hg pull would have transferred if you had had an online connection between the two repositories.

You would now proceed by merging the two heads [x] and [c] to create [y] on machine B:

A: ... [a] --- [b] --- [c]

B: ... [a] --- [x] --- [y]
          \           /
           [b] --- [c]

on machine B your last bundle was created with [a] as a base. However, you also know that machine A has commit [c], so you can specify that as an additional base if you like:

$ hg bundle --base a --base c stuff-from-machine-b.hg

That will put [x] and [y] into the bundle:

bundle: (a) --- [x] --- [y]
                       /
                    (c)

Here I use (a) and (c) to denote the required bases of the bundle. You can only unbundle this bundle if you have both [a] and [c] in your repository. If you leave out the second base (only use [a]), you will also bundle [b] and [c]:

bundle: (a) --- [x] --- [y]
           \           /
            [b] --- [c]

Here you included everything except [a] in the bundle. Bundling too much is okay, as we will see next.

Double-applying of bundle: What happens when by accident a bundle is applied twice? Would be needed to record the applied bundles somehow with local tags?

Applying a bundle twice is exactly like running hg pull twice: nothing happens the second time. When unbundling, Mercurial looks in the bundle and imports the missing changesets. So if you unbundle twice, there is nothing to do the second time.

OTHER TIPS

Initial state

A>hg log --template "{rev}:{node|short} \"{desc}\" - files: {files}\n"
2:415231dbafb8 "Added C" - files: C.txt
1:6d9709a42687 "Added B" - files: B.txt
0:e26d1e14507e "Initial data" - files: .hgignore A.txt

B>hg log --template "{rev}:{node|short} \"{desc}\" - files: {files}\n"
1:72ef13990d0d "Edited A" - files: A.txt
0:e26d1e14507e "Initial data" - files: .hgignore A.txt

i.e:

Identical repos diverged at revision 1 at both sides: independent changes appeared

Test for case 1 - parallel changes

72ef13990d0d in B doesn't interfere with 6d9709a42687:415231dbafb8 in A

A>hg bundle --base e26d1e14507e ..\bundle1-2.hg
2 changesets found
B>hg pull ..\bundle1-2.hg
pulling from ..\bundle1-2.hg
searching for changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 2 files (+1 heads)
(run 'hg heads' to see heads, 'hg merge' to merge)

because B had own child for e26d1e14507e, pulling from bundle added additional head (and anonymous branch for changesets from A)

B>hg glog --template "{rev}:{node|short} \"{desc}\" - files: {files}\n"
o  3:415231dbafb8 "Added C" - files: C.txt
|
o  2:6d9709a42687 "Added B" - files: B.txt
|
| @  1:72ef13990d0d "Edited A" - files: A.txt
|/
o  0:e26d1e14507e "Initial data" - files: .hgignore A.txt

Test for case 2 - applying bundle twice

I know apriori, that existing in repo changesets will not be pulled again (and prefer unified style of hg pull from bundle instead of hg unbundle), but show it

B>hg pull ..\bundle1-2.hg
pulling from ..\bundle1-2.hg
searching for changes
no changes found

Additional benefit from pull's behavior - you can don't worry about moving base changeset for bundle and always use one, oldest point of divergence - it will (slightly) increase size of bundle (slightly - because by default bundle is bzip2-compressed archive), but also it will it guarantees the inclusion of all child changesets into bundle and pulling all missing (and only missing) changesets in destination repository

And, it any case, even unbundle the same bundle twice will not have any backfires.

Same bundle in same repo B, attempt to unbundle already pulled bundle

B>hg unbundle ..\bundle1-2.hg
adding changesets
adding manifests
adding file changes
added 0 changesets with 0 changes to 2 files
(run 'hg update' to get a working copy)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow