Integrating a submodule into the parent repository


Git submodules can be very useful, and there are many guides on how to extract a certain directory of your repository and make it a submodule, while preserving history. This post is about the opposite operation – re-integrating a submodule into the parent repository without loosing any history. If you don’t care about your history, simple use one of the solutions proposed in this Stackoverflow question.

In said question, there actually is an answer that preserves history, but it has a major flaw: paths from the submodule aren’t correctly updated, thus making the history less useful. Let’s say you have the following repository structure:

•
├── Makefile
├── lib/ (submodule)
│   └── bar.c
└── src/
    └── foo.c

If you try to merge this submodule or use git subtree add without modifications to the submodule, you’ll end up with commits like this:

commit 9f28e08ec8e1ae15823ec7ecb24bdf442a2d3581
Author: Lucas
Date:   Thu Aug 15 10:35:20 2013 +0200

    Do awesome stuff in submodule

diff --git a/bar.c b/bar.c
index b7c5cb8..997e3bb 100644
[...]

As you can see, the paths haven’t been updated, and the commit still points to the path where bar.c resided in the submodule. This is problematic, because git log lib/bar.c will not show the commits made in the submodule. In order to fix this, we’ll need to rewrite every commit made in the submodule, correcting the directory. The good news: git provides a tool for that: git filter-branch attached. The bad news: using filter-branch isn’t exactly straight forward.

But fear not! I’ve written a little bash script that’ll do all the heavy lifting for you, if you don’t want to get into the nitty-gritty details (download at the end of the post).

Warning: I’ll suggest that you do both a clean clone of the parent repository and the submodule before you proceed.

I will assume that you’ve cloned both your repositories into the same directory, one as parent/, and one as sub/.

•
├── parent/
└── sub/

First, change to the directory of your submodule (sub/ in my example), and execute the git-rewrite-to-subfolder script attached below. It’ll lead you through the necessary steps.

After the rewrite is completed (which may take some time if you have many commits), change into your parent/ repository and delete your submodule:

$ rm -r path/to/submodule
$ vim .gitmodules
    # Here you need to remove the submodule in
    # questions from your .gitmodules file.
$ git add -A . && git commit

Now that we’ve deleted the submodule, we can integrate it into the repository again (still from the parent/ directory):

$ git remote add sub ../sub/
$ git fetch sub
$ git merge -s ours --no-commit sub/master
Automatic merge went well; stopped before committing as requested

We’re in the process of merging the history, but still need to add all the content from the submodule:

Warning: Do not clone from the submodule repository where you’ve just rewritten the paths. This will result in wrong paths in your parent repository. Clone from your original submodule repository.

$ git clone git@domain.com:sub/module.git path/to/submodule
$ rm -r path/to/submodule/.git
$ git add path/to/submodule && git commit

And you’re done. If you take a look at any commit that happened in the submodule, you’ll see that it now points to the correct path, and that git log lib/ will correctly display all commits made in the submodule:

commit e4d5e423d81540b801079b5f644edf730ca477e9
Author: Lucas
Date:   Thu Aug 15 10:35:20 2013 +0200

    Do awesome stuff in submodule

diff --git a/lib/bar.c b/lib/bar.c
index b7c5cb8..997e3bb 100644
[...]

The code

Also available as a gist.

#!/bin/bash
# We need the TAB character for SED (Mac OS X sed does not understand \t)
TAB="$(printf '\t')"

function abort {
    echo "$(tput setaf 1)$1$(tput sgr0)"
    exit 1
}

function request_input {
    read -p "$(tput setaf 4)$1 $(tput sgr0)"
}

function request_confirmation {
    read -p "$(tput setaf 4)$1 (y/n) $(tput sgr0)"
    [ "$REPLY" == "y" ] || abort "Aborted!"
}


cat << "EOF"
This script rewrites your entire history, moving the current repository root
into a subdirectory. This can be useful if you want to merge a submodule into
its parent repository.

For example, your main repository might contain a submodule at the path src/lib/,
containing a file called "test.c".
If you would merge the submodule into the parent repository without further
modification, all the commits to "test.c" will have the path "/test.c", whereas
the file now actually lives in "src/lib/test.c".

If you rewrite your history using this script, adding "src/lib/" to the path
and the merging into the parent repository, all paths will be correct.

NOTE: This script might complete garble your repository, so PLEASE apply this
only to a clone of the repository where it does not matter if the repo is destroyed.

EOF

request_confirmation "Do you want to proceed?"

cat << "EOF"
Please provide the path which should be prepended to the current root. In the
above example, that would be "src/lib". Please note that the path MUST NOT contain
a trailing slash.

EOF

request_input "Please provide the desired path (e.g. 'src/lib'):"
# Escape input for SED, taken from http://stackoverflow.com/a/2705678/124257
TARGET_PATH=$(echo -n "$REPLY" | sed -e 's/[\/&]/\\&/g')


# Last confirmation
git ls-files -s | sed "s/${TAB}/${TAB}$TARGET_PATH\//"
request_confirmation "Please take a look at the printed file list. Does it look correct?"


# The actual processing happens here
CMD="git ls-files -s | sed \"s/${TAB}/${TAB}$TARGET_PATH\//\" | GIT_INDEX_FILE=\${GIT_INDEX_FILE}.new git update-index --index-info && mv \${GIT_INDEX_FILE}.new \${GIT_INDEX_FILE}"

git filter-branch \
    --index-filter "$CMD" \
    HEAD