Search form

You are here

Converting a Subversion repository to Git

(7 steps to migrate a complete mirror of svn in git)

When I first realized that I needed a version control system, the best system at the time was CVS. (No, really.) Subversion was nearing 1.0, so I waited for its release and then used it everywhere. Well, that was 2003. Time for a change.

This past year, it became obvious that there were many Git users within the Drupal community, so Drupal has decided to move to Git. Since then I've started learning and researching the best ways to convert all my development to a Git-based workflow. So far… it rocks.

When getting my toes wet in Git, I started using an extremely useful git command called git-svn, which primarily can be used to checkout a Subversion repository to a local Git repo and then push your changes back to the original Subversion repository. That worked great as a stop-gap measure, but now I’m ready to chuck all my svn repos and convert them to Git.

A complete guide to git-svn conversions

Our goal is to do a complete conversion of our Subversion repository and end up with a bare Git repository acceptable for sharing with others (privately or publicly). Bare repositories are ones without a local working checkout of the files available for modifications. They are the recommended format for shared repositories.

1. Retrieve a list of all Subversion committers

Subversion simply lists the username for each commit. Git’s commits have much richer data, but at its simplest, the commit author needs to have a name and email listed. By default the git-svn tool will just list the SVN username in both the author and email fields. But with a little bit of work, you can create a list of all SVN users and what their corresponding Git name and emails are. This list can be used by git-svn to transform plain svn usernames into proper Git committers.

That will grab all the log messages, pluck out the usernames, eliminate any duplicate usernames, sort the usernames and place them into a “authors-transform.txt” file. Now edit each line in the file. For example, convert:

This step will take a bit of typing. :-) But, don’t worry; your unix shell will provide a >secondary prompt for the extra-long command that starts with git for-each-ref.

7. Drink

If you’ve got just the one Subversion repo to convert…Congratulations! You’re done. Go party. Just take your “new-bare.git” folder and share it.

If, on the other hand, you’ve got a bunch of Subversion repositories to convert, you’ve got a long, long night in front of you if you want to convert them all by hand. You’re going to need a drink (or several).

Since I had 141 svn repositories that needed to be converted, I wrote a set of wrapper scripts to ease the work… which I’ll discuss in my next blog post.

Topic

Posted in

Comments

Pro Tip for Windows users: Having been through this recently myself, don't bother with git-svn on Windows, instead get yourself a Linux VM and VMware Player and do your conversion on that. The scraping from Subversion ran about 10 times faster for me than running it "natively" on Windows and I had none of the quirks that I was finding with git-svn on Windows.

Switch and Drop Legacy? Or, you could do as I do and drop your past SVN history.

Not the best solution, of course, but you can keep SVN running somewhere if you need to go back in time. However, I picked a good point where development was at a slowdown, scrapped SVN, and set everything up in a fresh Git repository :)

Oops. I just noticed the awk command in step 1 is slightly off. If you have a space character in your SVN username (for example "(no author)", it will only include the part of the username before the space. This is the proper awk command:

Sometimes it helps to read the documentation. It is not recommended to use --no-metadata, even for one-way imports:

This gets rid of the git-svn-id: lines at the end of every commit.

This option can only be used for one-shot imports as git svn will not be able to fetch again without metadata. Additionally, if you lose your .git/svn/*/.rev_map. files, git svn will not be able to rebuild them.

The git svn log command will not work on repositories using this, either. Using this conflicts with the useSvmProps option for (hopefully) obvious reasons.

This option is NOT recommended as it makes it difficult to track down old references to SVN revision numbers in existing documentation, bug reports and archives. If you plan to eventually migrate from SVN to git and are certain about dropping SVN history, consider git-filter-branch(1) instead. filter-branch also allows reformating of metadata for ease-of-reading and rewriting authorship info for non-"svn.authorsFile" users."

The docs about --no-metadata you quoted directly say “This option can only be used for one-shot imports”. [emphasis mine] One shot imports are precisely the point of this blog post. I fully expect you to toss the svn repo in the bin after doing this conversion to git.

This option is NOT recommended as it makes it difficult to track down old references to SVN revision numbers in existing documentation, bug reports and archives.

That is actually a good point. But svn commit numbers are not something that I personally needed to preserve. For those of you who do need to retain svn commit numbers, I recommend following Balu’s advice.

Wouldn't it be better to actually tag the second to last commit in the tag: "refs/heads/tags/$ref"^ ? Otherwise, we're tagging the commit that says "Tagging for X version". The difference being that in Git the tags themselves are not commits and we'd then be able to see the tags in tools like GitX when looking at the branch history from where the tag was made.

I converted 3 svn repos to git last night, to join the rest of the repos. I noticed that in step 3, I created the .gitignore file, however when I push to the -bare.git in step 4, this commit isn't pushed as well. You have any ideas what might be up?

(What I ended up doing is, in leu of drinking, was pulling from the bare, commiting the .gitignore, pushing back to -bare.git, then push --mirror to the central server.)

svn2git probably works great for most repos, but it didn't for me. When it finished I noticed I was missing several recent commits (and the changes from them!). I didn't investigate to see how deeply the problem went, I just used John's git-svn-migrate, which worked like a charm.

I see in your git-svn-migrate.sh script you have added another line that pushes the .gitignore commit. I had the same problem as Terin until I found that I had to do this after the git push bare command

Could you perhaps explain why you need the bare repo step ? I found another article which did the same, and they used yours as a reference … what difference does it make, if I just add a remote to the "temporary" repo after conversion and cleanup ? Why would I need an intermediary one ?

Basically, the issue is that git-svn creates a lot of overhead in order to maintain the "svn-ness" of the repository. By pushing just the “refs/remotes/*:refs/heads/*” references to a bare repository, you end up purging all of the svn remnants and having a cleaner repository.

I followed your instructions and got the following error with "git push bare":
" No refs in common and none specified; doing nothing.
Perhaps you should specify a branch such as 'master'.
fatal: The remote end hung up unexpectedly
error: failed to push some refs to '/Users/bodirsky/new-bare.git' "

Thanks John for this great tutorial. It was a great help, and i only hat to do some minor changes on this workflow, i.e. a new latest-svn tag.

But also I have problems with the .gitignore file which does not go into the repo (or into the correct branch).
Here you see what I did and that everything completed without error, but in the final ls there is just no .gitignore: http://pastebin.com/VhthD4VN

Hi John,
Great article, it was a massive help when converting my old svn repos into git. It ran without any modifications on Cygwin. I just thought I'd mention that I got the following error when converting one of my svn repo to git:
'fatal: refs/remotes/trunk: not a valid SHA1'.
It happened in srep 2. The problem turned out the be the fact that my SVN repo wasn't standard as the root directory wasn't trunk but was a custom name. This confused git and it didn't know where the master branch needed to be. I fixed this problem by passing in the parameter --trunk= in step 2 where is the directory that is your primary branch.