Thursday, June 11, 2009

I've been seeing many Subversion repositories being hastily imported to Git. This is unfortunate because not having a cleanly and correctly imported history can reduce the effectiveness of Git's powerful tools, such as git bisect or git blame. Having an accurate revision control history is very helpful for tracking down regressions. Here's my take on how to do this properly.

Subversion issues

There are a few typical problems in Subversion repositories that I've seen:

History tends to be crufty (svn ci -m "oops"). Some people consider cleaning such history a bad habit (since it's not what "actually" happened), but IMHO reason to preserve history is so you can figure out the purpose or nature of a change.

Merge metadata is missing. Even with merges created using SVK or Subversion 1.5, git-svn doesn't import this information.

Tags aren't immutable. People sometimes adjust them to reflect what the release really ended up being, but at that point the tag has effectively become a branch. Again, there's no metadata.

When you make a checkout with git svn the results could often be significantly improved:

When I converted the Moose Subversion repository I wrote a small collection of scripts.

Preparing a git-svn chekout

If you have made any merges using SVK or Subversion 1.5 (Update: see comments) then you should probably use git-svn from Sam Vilain's svn-merge-attrs branch of Git to save a lot of time when restoring merge information. This version of git-svn will automatically add merge metadata into the imported repository for those commits.

Assuming you have a standard trunk, branches and tags layout, clone the repository like this:

For large repositories I like to use svnadmin dump and svnadmin load to create a local copy. You can also just run the conversion on your Subversion server. For local repositories use a file:/// URI.

Cleaning up tags and branches

git svn-abandon-fix-refs

will run through all the imported refs, recreating properly dated annotated tags (but only if they haven't been modified since they were created), and making branches out of everything else. It'll also rename trunk to master.

The resulting layout is more like what a Git repository should look like, so git tag -l and git branch -l work as expected.

Restoring merge information

If some of the merges were made by hand or if you didn't use Sam's git-svn then you'll need to recreate merge metadata by hand. Fortunately this is easily done using the .git/info/grafts file.

The grafts file is a simple table of overridden lists of parents for specific commits. The first column is the commit whose parents you want to override, and the rest of the line is the list of new parents to use. For a regular commit there is only one parent, the previous commit. Merges are commits with more than one parent.

Suppose we have a subversion repository where revision 1 creates a project, revision 2 creates a branch, revision 3 modifies the branch, and revision 4 merges it back into trunk. If imported to Git without the metadata revision 4 will have a single parent, revision 1, but its parents should be 1 and 3.

If the IDs of the imported commits are:

Revision

Git Commit

1

e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e

2

7448d8798a4380162d4b56f9b452e2f6f9e24e7a

3

a3db5c13ff90a36963278c6a39e4ee3c22e2a436

4

9c6b057a2b9d96a4067a749ee3b3b0158d390cf1

The line in the .git/info/grafts file that fixes revision 4 would look like this:

If you view the history using GitX or gitk then you should now see revision 4 has become a proper merge.

Rewriting history

Most people can happily skip this step.

If you'd to change the history you can now run git rebase --interactive and use the edit command and git commit --amend to clean up any commits or squash to combine commits. This is probably a topic for another post, but it's worth mentioning.

However, make sure you keep other tags and branches synchronized when you rebase. This can be done using the grafts file.

Final cleanups

The last bit of conversion involves running

git svn-abandon-cleanup

to clean up SVK style merge commit messages (where the first line is useless with most Git log viewers), and remove git-svn-id strings.

The actual message filtering is done by the git-svn-abandon-msg-filter script. You can customize this to your liking.

Another important side effect the git filter-branch --all step in git svn-abandon-cleanup is that the grafts entries are incorperated into the filtered commits, so the extra merge metadata becomes clonable.

Finally, all merged branches are be removed (using the safe -d option of git branch).

Publishing

The resulting Git repository should be ready to publish as if you created it locally.

Nontrivial grafting

You can still cleanly import a repository does not follow the standard directory layout or has other complications (e.g. the repository was moved without importing). Use git-svn to import each directory of history separately and then use grafts to stitch the parts back together.

This snippet may need reworking, tho: You can still cleanly import a repository does not follow the standard directory layout or has other complications (e.g. the repository was moved without importing) you can still.

regarding grafting, what should be in the file if I have 1 the original file 2 is the first change on the branch then there are several changes on both the branch and the trunk and 10 is the merged revision?