Tag Archives: subversion

I’ve recently had a regular task of releasing projects from multiple branches, and then merging branches together. Handling this and any conflicts hasn’t been a big hassle when it is regularly updated, with one notable exception – Maven POM files. When they are merged the version changes on both branches always conflict.

There are ways I’ve got around it to date:

Subversion’s --accept mine-conflict and similar options can help resolve them quickly if you know ahead of time they are the only conflicts, without additional changes

If ending up with versions from the wrong branch, follow up with mvn versions:set -DnewVersion=... versions:commit

Sometimes the right merge tool or IDE will select the right one, or at least make it less repetitive to select the right one for each file

However, I was finding that in projects with a large number of POMs and occasional other conflicts, these were all a little too tedious. I’ve whipped up a basic script that can be used to eliminate the POM-based conflicts first, so that the remaining conflicts are only those you really need to deal with: Automatically resolve conflicts in Maven POMs after a merge — Gist. It doesn’t handle a lot of edge cases, but should work well enough for most uses.

This works pretty well with Git, and can be used with Subversion (if you automatically postpone all the resolve steps with --accept, or use a non-CLI tool to merge).

This is not something I’d want in a standard release/branch workflow – the problem could be avoided by different branching practices so that release commits are not merged. However if someone finds it useful, I’m sure they can improve on the Gist above, or perhaps patch the Maven SCM or Versions plugins to provide the capability more directly.

Like this:

As part of the process of migrating NPanday to the Apache Incubator, I had to find a way to extract the Subversion repository from Codeplex. The challenge here is that it isn’t actually a Subversion repository, but a TFS server running SVNBridge to appear like one. Its occasional quirks and timeouts had been part of the reason we decided to move. Here is what I’ve learned.

Available Tools

The aim was to get it as a dump of a Subversion repository so that it could be loaded into the ASF Subversion repository, retaining the full history. I tried everything to get it out of there – svnsync, cloning it as a Git repository, cloning it as a Mercurial repository, and other similar tools. All would timeout or freak out at some point due to the nature of SVNBridge. I made some progress with tfs2svn (which seems to be what Codeplex is using to migrate repositories to Mercurial when needed), but I started to find that not only was it needing regular manual intervention, it wasn’t quite the same (eg. Subversion properties came in as ..svnbridge hidden folders, and every comment appended the original TFS revision number).

I had tried rsvndump earlier without a lot of success, but eventually gave it another try. While it needed some help, it was making the most progress and was the one that ended up being successful.

Patching rsvndump

It wasn’t anywhere near smooth sailing, and rsvndump needed some modifications to handle the task. So I brushed off my dusty C skills and made the required changes, which can be found in my github fork of the project (pull request pending).

The main problem was that rsvndump expected to be able to do the repository all in one hit. Given that Codeplex would timeout on several requests, this made it impossible. It did allow selecting a subset of revisions, but even then it would both do a full svn log (which wouldn’t succeed), and beyond that would traverse revisions to construct a path hash (I believe for detecting moves, etc.).

To assist with these, I added a --log-window-size option, similar to git-svn. This retrieves the logs in multiple requests, avoiding problems with timeouts. Next, I added a --first-rev argument, which would start retrieving logs and content from a later revision than 0. While it introduced some risk of crashing due to missing revisions, in many circumstances it allowed restarting a dump from a later revision in a much faster manner.

The next problem was that Codeplex shares a single TFS repository between several projects, so your own revision numbers are not sequential. NPanday started at revision 21102, and ended at 60509 with a lot of gaps, having only 1427 revisions of its own. This wasn’t too much of a problem – because rsvndump was designed to deal with subdirectories of a Subversion repository it expected the gaps. The --first-rev argument helped deal with the big gap to the start. But another SVNBridge quirk was that svn copy operations copied from the (current revision - 1) – even when it didn’t exist! To correct this, I had to adjust the code to search backwards through the revision numbers until it found one that existed to make the copy operations correct.

Finally, rsvndump added padding revisions into the dump file when a revision number was missing. This is helpful if you want to maintain the same numbers, but due to my use of --first-rev they were already out and I was importing to an existing repository, so I decided to strip these out. For that, I added another flag --omit-padding-revnums.

The first few arguments are the customisations described above (and --adjust-missing-revnums to make the dumped revisions sequential). The next are the traditional svnadmin dump arguments that rsvndump honours. Finally, I redirected the output so that I could channel stdout to the dumpfile and stderr to a log file that I could also tee.

Other Codeplex SVNBridge Issues

With these changes in place, I was getting moderately successful dumps – but a few frustrating issues remained.

Firstly, many svn copy operations (such as creating a tag) were tracked file by file by Codeplex instead of at the top level directory. This resulted in further timeouts that I couldn’t work around. We had seen this manifest on the Codeplex repository, being unable to even list the /tags/ directory. I didn’t attempt to correct this, instead manually applying the copy operation again after the preceding dump, then continuing.

If that revision appeared in the dump file (either incomplete or not able to be applied), I’d delete it by searching for Revision-number: xyzxyz and deleting the lines up until the next revision.

Between tags and a few other stubborn revisions that wouldn’t come across (including one where even svn log wouldn’t succeed), I manually reconstructed 100 revisions like that. The upside was that it provided an opportunity to clean out some botched releases (due to the SVNBridge /tags/ issues) and branches that had never been used.

So the process was to dump as many revisions as possible, then apply to a test repository, check it out, make required modifications, and repeat. I captured all of this in a shell script so that at any time I could recreate the work repository and reapply all of the dumps and modifications to date. This because useful a few times as I gradually identified inconsistencies with a checkout from the same revision in Codeplex from having missed something.

This still uses a lot of bandwidth – starting at a given revision will both reconstruct the path hash for the whole repository at that revision, and fetch the “base revision”, which is a checkout of an entire revision, tags and all. So the process took a few days running intermittently. I also had to start the --first-rev at least one revision earlier and sometimes more, to avoid getting a cryptic Subversion error message about the “editor drive”.

Properties were also quite quirky on SVNBridge, due to the way they are apparently stored in TFS as described earlier. Some could not be removed (eg, bogus svn:mime-type), and some were set oddly (svn:ignore on a file, svn:eol-style on a directory). I chose to leave these alone and correct them after the import.

Some properties went missing, which was part of a larger problem on SVNBridge with copying from an existing revision. If you attempt to copy in the working copy and then make a modification before committing, this doesn’t show up as A+ in the svn log result later, but simply M. The dumps know it was copied, but not that it was added, so attempt to modify a non-existant file when being applied. What’s more, this step wipes out some properties that are set on directories.

In some cases, I manually applied the revision, in others I made an edit in the dump file from:

Node-action: change

to

Node-action: add

Deleting directories hit snags as well. I’m unsure if this was a problem in SVNBridge or rsvndump, but it would dump deletions for every path and file like so:

When applying the dump, it would successfully delete the first then fail on the others that were already deleted in the first step. I ended up removing the nodes for all the later entries manually in these instances. You would take these out 4 lines at a time (including trailing whitespace of 2 lines):

Node-path: tags/npanday-1.2-RC1/pom.xml
Node-action: delete

Final manipulations

Aside from Codeplex, for NPanday we needed to make some more manipulations. First, changing the usernames to line up to their final accounts on the ASF using repeated changes to the svn:author revision properties.

The dump was also loaded onto another partial subversion repository that contained some intermediate history between leaving the incubator originally and arriving at Codeplex.

Loading to an existing Subversion repository and path

After all this was eventually done, and there was a repository that was matched with the history of the Codeplex one, it needed to be dumped to load into the ASF repository.

Normally, this would be a simple:

svnadmin dump --incremental --deltas work-repo >npanday.dump

However, the objective was to load this onto a path that already existed. This was because we sought to have continuity with the history from the point at which the project was forked from the incubator originally.

To achieve this, I identified the revision in the dump that matched the content in the ASF repository, which due to the initial creation of branches and tags, was revision 4. I then dumped it using:

svnadmin dump --incremental -r5:HEAD work-repo >npanday.dump

Originally, --deltas had been included to reduce the size, but we found that this caused checksum problems, possibly due to different line endings between r4 and the original in the ASF repository.

Conclusion

This took considerably more work than anticipated when we originally thought it would be a good idea to retain the history.

I found that there wasn’t a lot of information about these topics on the web, so I hope this post will help to expand that for those that might face this challenge in the future. I’ve also found that editing Subversion dump files (when not in delta-mode) is reasonably straightforward.

Interestingly I’ve learned that Subversion 1.7 will include the ability to do a remote svnadmin dump, however I don’t believe this will work when svnsync is not supported (as was the case here), or support sub-paths as rsvndump does.

Like this:

Last week Dennis started things moving to have another release of the Maven Release Plugin. The release process should start very soon, so please join us on dev@maven.apache.org to help test it!

This is certainly a nice one to have out the door, not only because of the length of time since the last release but because it fixes some important bugs (Subversion 1.6 support for starters), and improves multi-module support.

Having been bitten by the latter category myself very recently I took the opportunity to get a couple of changes in.

Support for flat directory multi-module projects

This highly requested support was actually added by Deng way back in May last year, but it was only recently that I started using the new version of the plugin and discovered a small corner case I jumped in and made a couple of improvements and fixes.

While I would always recommend using a typical hierarchical Maven multi-module project, there are a number of existing projects using the flat structure, particularly in non-Java environments. It’s good that the release plugin can now support anything with a common trunk.

This means that projects like the following will now release correctly (run from the parent directory):

Not requiring artifacts to be in the local repository before releasing

This controversial issue has popped up a number of times and proven to be a real nuisance in releases, where a multi-module project needed to be built locally before it can be released (including the preparation test, that makes 3 full builds!), or at best spouted a large number of warnings about missing dependencies on the artifacts it was yet to build.

In the end here we decided to revert to the original behaviour and accept the limitations that came with, while making the typical release faster and easier. The release:prepare-with-pom goal has been added to cater to the use case for which the dependency resolution was put in place originally. With this intended to be the 2.0 release of the plugin, we can stick to this behaviour going forward.

In the future, Maven 3.0 has added additional capabilities for plugins to operate with their modules without building them first, which will allow a unified and enhanced release:prepare goal once more, but in the mean time we’ve opted to put in place the best solution for the majority of Maven users today.

Book

Coverage of intermediate Apache Maven concepts with a focus on best practices and "tying it all together". Significant coverage of automated build and repository management concepts, illustrated using Apache Continuum and Apache Archiva.