Tuesday, September 04, 2007

Merging two subversion repositories

Update: an anonymous commenter pointed out that yes, there is a (much!) better way to do this with svnadmin load --parent-dir, which is covered in the docs under "repository migration." All I can say in my defense is that it wasn't something google thought pertinent. So, for google's benefit: how to merge subversion repositories. Thanks for the pointer, anonymous!

I needed to merge team A's svn repository into team B's. I wanted to preserve the history of team A's commits as much as reasonably possible. I would have thought that someone had written a tool to do this, but I couldn't find one, so I wrote this. (Of course, now that I'm posting this, I fully expect someone to point me to a better, pre-existing implementation that I missed.)

The approach is to take a working copy of repository B, add a directory for A's code, and for each revision in A's repository, apply that to the working copy and commit it. This would be easy if svn merge would allow applying diffs from repository A into a working copy of repository B, but it does not. I can't think of a technical reason for this. (In fact, I seem to remember that early versions of the svn client did allow this, with dire warnings, but I could be mistaken and I don't have a 1.1 client around anymore.)

So I tried instead to use "svn diff |patch -p0", which worked great up until the first commit with a binary file. Oops. For the final version I ended up having to create a working copy for A, update to each revision there, then rsync to the right point in working copy B and call the "svnaddremove" script to mark files added or deleted. (This is suboptimal since we can get the exact changed paths from svn, and just copy those files over, but rsync is fast enough as long as your working copies stay in cache. The update and commit steps both consistently took longer than rsync in my timing.)

My script does not try to be intelligent about copies or moves that svn knows about. Team A did not use branches or tags much so I didn't put the effort in to deal with those the "right" way (which would be to also issue a cp/mv on B's working copy to preserve history). It also uses unix users to commit revisions with the same name as the original commit. Doing this obviously requires at least access to the repository server to add the right users. I used "svn log -q |grep ^r |awk '{print $3}' |sort |uniq |useradd."

Final note: the perl script in svnaddremove is a long way of writing "awk {print $2}", except that it preserves filenames with spaces in them. There is probably a much more clever way of doing this, too.

For instance 2 svn repository: 1:Commit to proj1 on nov 29 at 8:00am for revision 1 on repo1Commit to proj1 on nov 30 at 9:00am for revision 2 on repo1

2:Commit to proj2 on nov 30 at 8:00am for revision 1 on repo2Commit to proj2 on dec 1 at 11:00am for revision 2 on repo2

would look once merged :

Commit to proj1 on nov 29 at 8:00am for revision 1 Commit to proj2 on nov 30 at 8:00am for revision 2 Commit to proj1 on nov 30 at 9:00am for revision 3 Commit to proj2 on dec 1 at 11:00am for revision 4

Which would mean, "revision 2 on repo1/project1" became revision 3 because of the time and so became revision 2 from "revision 1 on repo2/project2", is it what you would expect?