Splitting a subdirectory with submodules into a separate git repository
-
25-09-2019 - |
Question
As a subset of the question detach-subdirectory already made before and considering the fact that even though a lot of questions were made about the process of splitting and merging git repositories, I couldn't find one that touches the subject of splitting when submodules are present.
So in the following scenario:
.git/
.gitmodules
folder/
data/
content/
other_data/
submoduleA/
submoduleB/
I would like to get two repositories with the following structure:
.git/
data/
and
.git/
.gitmodules
content/
other_data/
submoduleA/
submoduleB/
The first case is not a problem and can be solved easily with the method described in detach-subdirectory.
The second not so much. The existence of submodules and the fact that .gitmodules contains the full path for folder/content/submoduleA
and folder/content/submoduleB
causes part of the history to be inconsistent since .gitmodules refers to a nonexistent directory structure (once filter-branch is used).
So I would like to know if there is a way to do this without causing inconsistent history.
Solution
I suspect (not tested) that a second git filter-branch
would have the opportunity to modify the .gitmodules
content for each commits of the new repo.
But actually a git submodule split
command was in discussion early 2009.
Proposed usage:
git submodule split [--url submodule_repo_url] submodule_dir \
[alternate_dir...]
Replace
submodule_dir
with a newly-created submodule, keeping all the history ofsubmodule_dir
.
This command also rewrites each commit in the current repository's history to include the correct revision ofsumodule_dir
and the appropriate.gitmodules
entries.
However, I don't see it in the latest what's cooking.
The script in the proposed patch can give you an idea of the kind of tree rewriting necessary to update the .gitmodules
file though.
OTHER TIPS
I had the exact same problem as Unode and managed to resolve it with the following procedure:
git clone git@github.com:kdeldycke/kev-code.git
cd kev-code
git filter-branch --tree-filter "test -f ./.gitmodules && mv ./.gitmodules ./cool-cavemen/gitmodules || echo 'No .gitmodules file found'" -- --all
git filter-branch --force --prune-empty --subdirectory-filter cool-cavemen --tag-name-filter cat -- --all init..HEAD
git filter-branch --force --tree-filter "test -f ./gitmodules && mv ./gitmodules ./.gitmodules || echo 'No gitmodules file found'" -- --all
git filter-branch --force --tree-filter "test -f ./.gitmodules && sed -i 's/cool-cavemen\///g' ./.gitmodules || echo 'No .gitmodules file found'" -- --all
git remote rm origin
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
git remote add origin git@github.com:kdeldycke/cool-cavemen.git
git push -u origin master --force --tags
As you see, the trick is to temporarily rename the .gitmodules
file and use sed
to rewrite its content. You can get all details and the context of this procedure on my blog.
To elaborate on Kevin's answer: assuming that no submodules ever existed outside cool/cavemen
– the folder being detached (otherwise more elaborate editing of .gitmodules will be needed to remove those extra sections), this can be achieved much faster and in one step using an index-filter
:
$ git filter-branch --subdirectory-filter cool/cavemen --index-filter $'
hash=$(git rev-parse --verify $GIT_COMMIT:.gitmodules 2>/dev/null) &&
git update-index --add --cacheinfo 100644 $(git cat-file -p $hash |
sed \'s/cool\\/cavemen\\///g\' | git hash-object -w --stdin) .gitmodules ||
true' --tag-name-filter cat --prune-empty -- --all
As an added benefit, if cool/cavemen
did not exist in every revision or branch only those revisions or branches that did contain cool/cavemen
will be looked at.
If this is the case you may want to run the following to remove unchanged references:
$ git for-each-ref --format='%(refname)' |
grep -vF "$(git for-each-ref --format='%(refname)' refs/original |
sed 's/refs\/original\///g')" | xargs -n 1 git update-ref -d