Question

TL;DR Version: Is it possible to reorganize a Mercurial repo without breaking Kiln/Fogbuz history? Or do I have to start fresh?


I have a repository that is a real mess, in need of some serious cleanup, and am trying to figure out how best to do it. The goal is to remove a few files entirely -- they should not appear in any commits, ever -- move a few directories, and split one directory out into an entirely separate repository. I know, I know -- you're not supposed to be able to change history. In this case, however, it's either change history or start from scratch with new repositories.

The repository in question is managed in Mercurial, with the remote repository hosted in Kiln. Issues are tracked in Fogbugz. Thanks to some commit link-processing rules, any references in a commit message to an issue (case) number like Case 123 are converted to links to the Fogbugz case in question. In turn, the case that was mentioned has a note appended to it with the commit message.

Current Structure

The project file structure is currently something like this:

- /
    +- includes/
    |   +- functions-related-to-abc.php
    |   +- functions-related-to-xyz.php
    |   +- class-something.php
    |   +- classes-several-things.php
    |   +- random-file.php
    |   ...
    |
    +- development/
    |   +- a-plugin-folder/
    |   |   +- some-file.php
    |   |   +- file-with-sensitive-and-non-sensitive-info.php
    |   |   ...
    |   |
    |   +- some-backend-functions-related-to-coding.php
    |   ...
    |
    +- index.php
    +- test-config-file.php
    ...

Target Structure

The structure I want is something like this:

- /
    +- build/
    +- doc/
    +- src/
    |   +- functions/
    |   |   +- abc.php  // renamed from includes/functions-related-to-abc.php
    |   |   +- xyz.php  // renamed from includes/functions-related-to-xyz.php
    |   |   ...
    |   |
    |   +- classes/
    |   |   +- something.php       // renamed from includes/class-something.php
    |   |   +- several-things.php  // renamed from includes/classes-several-things.php
    |   |   ...
    |   |
    |   +- view/
    |   |   +- random-file.php  // formerly includes/random-file.php
    |   ...
    |
    |   +- development/
    |   |   +- some-backend-functions-related-to-coding.php
    |   |   ...
    |   +- index.php
    |   ...
    |
    +- test/
    ...

a-plugin-folder would move to its own, separate repository. test-config-file.php would no longer be tracked in the repository at all. Ideally, I will also do some minor pruning and renaming of branches while I'm at it.

In my dream world, file-with-sensitive-and-non-sensitive-info.php would somehow be tracked consistently, but with the sensitive info (a couple of passwords) yanked out into a config file that is not under version control. I realize that's probably wishful thinking.

My Current Thinking

My current thinking is that my wish list is basically impossible: I can create new, properly structured repositories from this point forward, but cannot preserve my change history and also make the radical structural changes I need to make. In this view, I should take the current code base, reorganize it all the way I want it, and commit it as changeset 1 for two new repositories (the root repository and the plugin repository). I would then just keep a copy of the old repository backed up somewhere for reference. Major downsides: (1) I lose all my history and (2) the Kiln and Fogbugz cross-references for historical commits are all toast.

My Question

So, here's the question: is there any way to do what I want -- restructure, pull a few files out, and get everything looking pretty -- without losing all of my history?

I have considered using the hg convert extension, making heavy use of the filemap, splicemap, and branchmap options. The problems I see with that approach include: (1) breaking all prior builds, (2) not having file-with-sensitive-and-non-sensitive-info.php in prior builds at all (or leaving it in, which defeats the point), and (3) rendering many of the commit messages wildly incorrect to the extent they refer to file names or repo structure. In other words, I'm not sure this option gains me much as opposed to just starting clean, properly structured repositories.

I have also considered the extreme option: writing a custom script of some sort to build a new repository by going through each existing commit, stripping sensitive information out of file-with-sensitive-and-non-sensitive-info.php, rewriting commit messages to the extent necessary, and committing the revised version of everything. This, theoretically, could solve all of my problems, but at the cost of reinventing the wheel and probably taking a ridiculous amount of time. I'm looking for something that isn't the equivalent of writing an entire hg extension.

EDIT: I am considering creating an empty repository, then writing a script that uses hg export and hg import to bring changesets over one at a time, making edits where necessary to strip sensitive information like passwords out of files. Is there a reason this wouldn't work?

Was it helpful?

Solution 2

I was able to accomplish my goals. Here's what I ended up doing:

  • First, I "flattened out" (straightened) the repository by eliminating all branches and merges and turning the repo into a single line of commits. I had to do this because hg histedit -- the key to the whole cleanup -- doesn't work on history containing merges. This was okay with me, because there were no really meaningful branches or merges in this particular repository and there is only one author in the relevant history. I probably could have retained the branches and merged again as necessary later, but this was easier for my purposes. To do this I used hg rebase and the MQ extension. (Special thanks to @tghw for this extremely helpful answer, which helped me understand for the first time how MQ really works.)

  • Next, I used hg convert to create several repositories from the original repository -- one for each library/plugin that I needed to put into its own repository and one main repository for the rest of the code. In the process, I used --filemap and --branchmap to reorganize everything as necessary.

  • Third, I used hg histedit on each new repository to (1) clean up irrelevant commit messages as needed and (2) remove sensitive information.

  • Fourth, I pushed all of the new repositories to Kiln, which automatically linked them to FogBugz cases using the same rules I had in place for the original repository (e.g., Case 123 in the commit message creates a link to FogBugz case # 123).

  • Finally, I "deleted" the original repository in Kiln. Kiln doesn't truly and permanently delete repositories as of right now, though I have proposed a use case for making that possible. Instead, it delinks FogBugz cases and puts the "deleted" repository into cold storage; an account administrator can restore it, but it is otherwise invisible.

All told, it took about 10 hours to split the original repository into 6 pieces and clean each part thereof. Some of that was learning curve; I could probably do the whole thing in more like 6 hours if I had to do it again. A long day, but worth it for the dramatically improved repository structure and cleaned-up code.

Everything is now as it should be. Hopefully, this will help other users. Please feel free to post a comment if you have a similar issue and would like additional insight from my experience.

OTHER TIPS

Edit: I ended up taking a different approach from the one described below. My other answer explains what I ended up doing. That said, I am still very interested in a plugin like the one described below, so I am leaving this post up for reference if I find time to do it or it anyone else wants to take on the project.


I have determined that this is possible using import, export, and some patching at appropriate points in the repository's history.

The Algorithm

The short version of the algorithm looks like this:

  1. Create a new repository
  2. Loop through the existing repository's changesets, doing the following:

    1. Export a changeset from the old repository
    2. Import the changeset into the new repository without committing it
    3. Make any necessary edits to the commit message and/or sensitive files
    4. Commit the changeset in the new repository, preserving the (possibly modified) commit message and other metadata
  3. Swap out the old and new repositories

Caveats:

  • Obviously, as with all history edits, this only works for non-public repositories which haven't been pulled by third parties.
  • Step 2 can and should be heavily automated to batch-process changesets with no editing required.
  • It will be necessary to halt execution whenever changes are required.

Making it Work

I have a very basic proof of concept batch file that proves this can work.

I am working on a Mercurial plugin to make this as easy as possible. That said, I am still open to better suggestions if anyone has any.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top