Whilst using an installation of Eclipse 3.7.2, I found that it silently fails to show the Javadoc provided by my ObMimic library.

The Eclipse project was pointing at the right location for this Javadoc, but flatly refused to show any of it in pop-ups or the Javadoc view.

A look at the Eclipse logs showed a pile of “StringIndexOutOfBoundsException” crashes from Eclipse’s attempt to parse the individual Javadoc files. This turns out to be a known Eclipse bug, 394382, which is marked as fixed in Eclipse 4.3 M4 onwards.

The problem arises from a change in the “Content-Type” header within Javadoc files produced by JDK 7. Prior to JDK 7, these specified the “charset” as part of the content-type string, but from JDK 7 onwards the “charset” is given by a separate attribute.

The Eclipse code that fails is trying to extract the charset’s value from this line (i.e. the “UTF-8”), but it crashes when given the newer form of the header.

Obviously one solution would be to insist on Eclipse 4.3 M4 or higher, but ideally I’d like my ObMimic Javadoc to be usable on any reasonable version of Eclipse that any user might have. For the time being that ought to include 3.7.x versions. Most of all I don’t want people complaining about my Javadoc not working when it’s actually an Eclipse bug!

So as a work-around for this I’ve added a step into ObMimic’s build script to change this particular header back to its old format. As far as I know both formats of the header are valid, and there doesn’t seem to be any pressing need to use the newer format, so it seems harmless to use the older format for this header.

To achieve this, as soon as the relevant Ant script has generated the Javadoc it now uses the following “replace” task to change the relevant lines within all of the Javadoc’s HTML pages (where ${obmimic.javadoc.dir} is the root directory into which the Javadoc was generated):

Note that the above is based on the Javadoc charset being UTF-8 (from the Ant Javadoc task specifying “docencoding” and “charset” attributes of “UTF-8”), and would obviously need adjusting for any different charset or if making it variable. Also, the “summary” attribute produces a message showing how many replacements were carried out, to at least confirm that the replacments have taken place (without this, the replacement is done silently).

Share this:

Like this:

Running the Java EE 5 Verifier can be a useful way of checking EAR files and other Java EE artifacts before deploying and running them.

However, once you start using third-party libraries there’s one set of rules in the verifier that are rather too idealistic: the requirement that all referenced classes need to be present in the application. If any classes are referenced but can’t be found, these are reported by the verifier as failures.

In theory, it’s perfectly reasonable that Java EE applications are basically supposed to be “self-contained”, and that all classes referenced within them need to be present within the application itself (obviously excluding those of the Java EE environment itself). Actually, Java’s “extension” mechanism is also supported as a way of using jars from outside of the application, but this has limitations and drawbacks of its own and doesn’t really change the overall picture. There’s a useful overview of this subject in the “Sun Developer Network” article Packaging Utility Classes or Library JAR Files in a Portable J2EE Application (this dates from J2EE 1.4, but is still broadly appropriate for Java EE 5).

Anyway, verifying that the application’s deliverable includes all referenced classes seems better than risking sudden “class not found” errors at run-time (possibly on a “live” system and possibly only in very specific situations). The trouble is that once you start using third-party libraries, you then also need to satisfy their own dependencies on further libraries, even where these are only needed by optional facilities that you never actually use. Then you also need all the libraries that those libraries reference, and so on. This can easily get out of hand, and require all sorts of libraries that aren’t ever actually used by your application.

As a simple example, take the UrlRewriteFilter library for rewriting URLs within Java EE web-applications. This is limited in scope and its normal use only involves a single jar, so you’d think it would be relatively self-contained.

However, one of its features is that you can configure its “logging” facilities to use any of a number of different logging APIs. In practice, I don’t use anything other than the default setting, which uses the normal servlet-context log. But its code includes references to log4j, commons-logging and SLF4J so that it can offer these as options. The documentation says that you need the relevant jar in your classpath if you’re using one of these APIs, but the Java EE Verifier tells you that they all need to be present – even if you’re not actually using them (on the perfectly reasonable basis that there’s code present that can call them).

That’s not the end of the story. The SLF4J API in turn uses “implementation” jars to talk to actual logging facilities, and includes references to classes that are only present in such implementation jars. So you also need at least one such SLF4J implementation jar. At this point you’re now looking at the SLF4J website and trying to figure out which of its many jars you need. What are they all? Does it matter which one you pick? Perhaps you need all of them? Do they have any further dependencies on yet more jars? Are there any configuration requirements? Are these safe to include in your application without learning more about SLF4J? Do they introduce any security risks?

So apart from anything else, you’re now having to find out more than you ever wanted to know about SLF4J, just because a third-party library you’re using has chosen to include it as an option. Ironically, a mechanism intended to give you a choice between several logging APIs has ended up requiring you to bundle all of them, even when you’re not actually using any of them!

Anyway, in addition to the log4j jar, the commons-logging jar, the SLF4J API jar, and an SLF4J implementation jar, the UrlRewriteFilter also needs a commons-httpclient jar (though again, nothing in my own particular use of UrlRewriteFilter appears to actually use this). That in turn also requires a commons-codec jar.

Fortunately, that’s the limit of it for UrlRewriteFilter. But it’s easy to see how a third-party jar could have a whole chain of dependencies due to “optional” facilities that you’re not actually using.

As a rather different example, another library that I’ve used recently appears to have an optional feature that allows the use of Python scripts for something or other. This is an optional feature in one particular corner of the library, and is something I have no need for. To support this feature, the code includes references to what I presume are Jython classes. As a result the verifier requires Jython to be present (and then presumably any other libraries that Jython might depend on in turn). Now, bundling Jython into my Java EE application just to satisfy the verifier and avoid a purely-theoretical risk of a run-time “class not found” error seems plain crazy. If the code ever does unexpectedly try to use Jython, I’d much rather have it fail with a run-time exception than have it work successfully and silently do who-knows-what. To add insult to injury, Jython is presumably able to call Python libraries that might or might not be present but that the verifier will know nothing about – so bundling Jython in order to satisfy the verifier might actually make the application more vulnerable to code not being found at run-time.

With the mass of third-party libraries available these days, and the variety of dependencies these sometimes have, I suspect there must be cases that are far, far worse than this. (Anyone out there willing to put forward a “worst case”?)

So what’s the answer? Obviously you do need to bundle the jars for all classes that are actually used, but for jars whose classes are referenced but never actually used (and any further jars that they reference in turn) I can see a number of alternatives:

Work through all the dependencies and bundle all the jars so that the verifier is happy with everything. Often this is entirely appropriate or at least acceptable, but as we’ve seen above, this cure isn’t always very practical, and in some cases it can be worse than the disease.

A variation on the above is to leave the “unnecessary” jars out of the application but run the verifier on an adjusted copy of the application that does include them. That is, produce a “real” deliverable with just the jars that are actually needed, and a separate adjusted copy of it that also includes any other jars necessary to keep the verifier happy but that you know aren’t actually needed by the application. The verification is run on this adjusted copy, which is then discarded. The drawback is that you still have to work through the entire chain of dependencies and track down and get hold of all of the jars, even for those that aren’t really needed. There’s also the risk that you’ll treat a jar as unnecessary when it isn’t, which is exactly the mistake that the verifier is trying to protect you from.

Another alternative is to just give up and not use the verifier. But it seems a shame to miss out on the other verification rules just because one particular rule isn’t always practical.

Ideally, it’d be nice to be able to configure the verifier to allow particular exceptions (perhaps to specify that this particular rule should be ignored, or maybe to specify an application-specific list of packages or classes whose absence should be tolerated). But as far as I can see there’s no way to do this at present.

Another approach is to inspect the verifier’s results manually so that you can ignore these failures where you want to, but can still see any other problems reported by the verifier. However, it’s always cumbersome and error-prone to have to manually check things after each build, especially where you might have to wade through a long list of “acceptable” errors in order to pick out any unexpected problems.

Potentially you could script something to examine the verifier output, pick which warnings and failures should and shouldn’t be ignored, and produce a filtered report and overall outcome based on just the failures you’re interested in. In the absence of suitable options built into the verifier, you could use this approach to support appropriate options yourself. This is probably the most flexible approach (in that you could also use it for any other types of verifier-reported errors that you want to ignore). But it seems like more work than this deserves, and it’d be rather fragile if the messages produced by the verifier ever change.

As a last resort, if the library containing the troublesome reference is open-source you could always try building your own customised version with the dependency removed (e.g. find and remove the relevant “import” statements and replace any use of the relevant classes with a suitable run-time exception). Clearly, even where this is possible it will usually be more trouble than it’s worth and will usually be a bad idea, but it’s another option to keep up your sleeve for extreme cases (e.g. to remove a dependency on an unnecessary jar that you can no longer obtain).

The approach I’ve adopted for the time being is to run the verifier on “adjusted” copies of my applications, but only use this for jars that I’m very confident aren’t needed and aren’t wanted in the “real” application. The actual handling of this is built into my standard build script, which builds the “adjusted” application based on an application-specific list of which extra jars need to be added into it.

In the longer term, I’m hoping that the entire approach to this might all change anyway… in a world of dynamic languages, OSGi bundles, and whatever eventually comes of Project JigSaw and other such “modularization” efforts, the existing Java EE rules and packaging mechanisms just don’t seem very appropriate anymore. It all feels like part of the mess that has grown up around packaging, jar dependencies, classpaths, “extension” jars etc, together with the various quirks and work-arounds that have found their way into individual specfications, APIs and tools (often to handle corner-cases and real-world practicalities that weren’t obvious when the relevant specification was first written).

So I’m hoping that at some point we’ll have a cleaner and more general solution to packaging and modularization, and this little quirk and all the complications around it will simply go away.

Share this:

Like this:

If you auto-deploy a war archive on Glassfish V2, any changes to the deployed application’s JSP files are picked up automatically. However, if you make changes to the deployed application’s web.xml file or any other such configuration files, you need some way to make Glassfish “reload” the application using the updated files.

It isn’t immediately apparent how to trigger this. At any rate, it had me scratching my head yesterday when I found myself trying to install a third-party application. The installation instructions led me to auto-deploy its war archive and then edit the deployed files, but the changes didn’t take effect.

I couldn’t see anything in the Glassfish admin console to make it stop and re-load the application, and the command-line facilities that I found for this don’t seem to apply to auto-deployed applications.

The obvious solution was to shut-down and restart Glassfish, but even that seemed to leave the application still using its original configuration and ignoring the changes.

Apparently the trick is that you have to put a file named .reload into the root of the deployed application’s directory structure.

This file’s timestamp is then checked by Glassfish and used to trigger reloading of the application. So you can force a reload at any time by “touching” or otherwise updating this “.reload” file.

I can’t claim any detailed knowledge in this area, and have only had a quick look, but I get the impression that this “.reload” mechanism is used by Glassfish for the reloading of all “exploded” directory deployments. For applications that are explicitly deployed from a specified directory structure, you can use the deploydir command with a “–force=true” option to force re-deployment (there might be other ways to do this, but that’s the most obvious I’ve seen so far). But on Glassfish V2 that doesn’t appear possible for auto-deployed applications, so the answer for those is to manually maintain the “.reload” file yourself.

Manually touching/updating a “.reload” file also works for exploded archives that have been deployed via “deploydir” (i.e. as an alternative to using the “deploydir” command to force reloading).

The content of the “.reload” file doesn’t matter, and it can even be empty. It just has to be named “.reload” and must be in the root directory of the deployed application (that is, alongside the WEB-INF directory, not inside it).

Because the “.reload” file is in the root of the web-application and outside of its WEB-INF, it’s accessible to browsers just like a normal JSP, HTML or other such file would be. So it’s not something you’d want to have present in a live system (or you might want to take other steps to prevent it being accessible).

I haven’t looked in detail at whether Glassfish V3 has any improved mechanism for this, but:

Glassfish V3 also seems to have a new redeploy command for redeploying applications, which appears to be equivalent to “deploydir” with “–force=true” but doesn’t require a directory path, so can presumably be used on any application, including auto-deployed applications.

As a personal opinion, I’m quite happy with using auto-deployment for most purposes, but in general I’m very much against the idea of editing the resulting “deployed” files. It just doesn’t seem right to me, and I can see all sorts of potential problems.

So even where a third-party product is delivered as a war archive and requires customisation of its files, I prefer to make the necessary changes to an unzipped copy. I can then use my normal processes to build a finished, already-customized archive that can be deployed without needing any further changes.

But there are still times when it’s handy to auto-deploy a web-application or other component by just dropping its archive into Glassfish, and then be able to play around with it “in place” – for example, when first evaluating a third-party product, or when doing some quick experiments just to try something.

So being able to force reloading of an auto-deployed application remains useful.

Share this:

Like this:

FindBugs is terrific. I’ve been using it for several years now, and each new release seems to find some more mistakes in my code that were previously slipping through unnoticed.

I’d like to think I’m very careful and precise when writing code, and have the aptitude, experience and education to be reasonably good at it by now. I’m also a stickler for testing everything as comprehensively as seems feasible. So it’s rather humbling to have a tool like FindBugs pointing out silly mistakes, or reporting issues that I’d not been aware of. The first time I ran FindBugs against a large body of existing code the results were a bit of a shock!

In the early days of FindBugs, I found the genuine problems to be mixed with significant numbers of false-positives, and ended up “excluding” (i.e. turning off) lots of rules. Since then it has become progressively more precise and robust, as well as detecting more and more types of problem.

These days I run FindBugs with just a tiny number of specific “excludes”, and make sure all my code stays “clean” against that configuration. The “excludes” are mainly restricted to specific JDK or third-party interfaces and methods that I can’t do anything about.

Further new releases of FindBugs don’t usually find many new problems in the existing code, but do almost always throw up at least one thing worth looking into.

So last weekend I upgraded to FindBugs version 1.3.4, and sure enough it spotted a really silly little mistake in one particular piece of “test-case” code.

The actual problem it identified was an unnecessary “instanceof”. This turned out to be because the wrong object was being used in the “instanceof”. The code is intended to do “instanceof” checks on two different objects to see if both of them are of a particular type, but by mistake the same variable name had been used in both checks. Hence one of the objects was being examined twice (with the second examination being spotted by FindBugs as entirely superfluous), and the other not at all. If this had been in “real” code I’d have almost certainly caught it in testing, but buried away in a “helper” method within the tests themselves it has managed to survive for a couple of years without being noticed.

I guess this raises the broader issue of whether (and how) test-case code should itself be tested, but that’s one for another day (…would you then also want to test your tests of your tests…?). Anyway, thanks to FindBugs, this particular mistake has been detected and fixed before causing any harm or confusion.

Every time I find something like this it makes me think how fantastic it is to have such tools. I use PMD and CheckStyle as well, and they’ve all helped me find and fix mistakes and improve my code and my coding. I’ve learnt lots of detailed stuff from them too. But FindBugs especially has proven to be very effective whilst also being easy to use – both in Ant scripts and via its Eclipse plug-in.

Share this:

Like this:

The Ant build script that I use for all of my projects includes, for web-applications, translating and compiling any JSP files. For my purposes this is just to validate the JSPs and report any syntax and compilation errors as part of the build, rather than to put pre-compiled class files into the finished web-app.

I’ve just quickly switched from using Tomcat’s JSP compiler to using Glassfish V2’s JSP compiler, and it seems worth documenting the changes involved and some of the similarities and differences.

Note that I was previously using the Tomcat 5 JSP compiler, and it didn’t seem worth upgrading this to Tomcat 6 just in order to ditch it for Glassfish, so this isn’t a like-for-like comparison – some of the fixes/changes noted might also be present in Tomcat 6.

The actual change-over was relatively painless. It’s basically the same JSP compiler – which I understand is known as “Apache Jasper 2” – so the general nature of it and the options available are essentially the same.

In Tomcat this is provided via a “JspC” Ant task, and needs to be supplied with a classpath that includes the relevant Tomcat libraries. In contrast, Glassfish provides a “jspc” script that supplies the appropriate classpath and invokes Glassfish’s JSP compiler, passing it any supplied command-line arguments.

So switching over basically just consisted of taking out the invocation of the Tomcat-supplied “JspC” Ant task (and the corresponding set-up of its classpath), and replacing it with an Ant “exec” of the Glassfish “jspc” script with equivalent command-line arguments.

However, the Glassfish documentation for this seems a bit on the weak side. At least, I didn’t find it particularly easy to locate any definitive documentation on the command-line options for the Glassfish V2 “jspc” script. Maybe I just didn’t look in the right places. The program itself supports a “-help” option that lists its command-line options, but without much explanation. There’s a more detailed explanation of the options in the Sun Application Server 9.1 Update 2 reference manual at http://docs.sun.com/app/docs/doc/820-4046/jspc-1m, but this doesn’t entirely match the current Glassfish release (e.g. it doesn’t include the recently-added “ignoreJspFragmentErrors” option). Nevertheless, it’s the best documentation I’ve found so far. In any case, the options haven’t yet diverged much from those of Tomcat JspC, so much of the Tomcat documentation remains relevant.

I’m also a bit unsure of the exact relationship between the Tomcat and Glassfish code. They both appear to be “Apache Jasper 2”, but this doesn’t seem to exist as a product in its own right, only as a component within Tomcat. The Glassfish code is presumably a copy or fork of the Tomcat code, but with its own bug-fixes and new features, and maintained and developed as part of Glassfish. With Glassfish being the reference implementation for new JSP versions, I assume the Glassfish implementation is now the main branch going forward, even if some of the changes get incorporated into both.

To add to my uncertainty, I’m also rather confused as to whether Glassfish does or doesn’t also provide an Ant task for invoking its JSP compiler. There is an “asant” script that invokes Glassfish’s internal copy of Ant with a suitable classpath, with various targets and supporting Ant tasks. There’s also documention for previous releases of the “Sun Application Server” that show a “sun-appserv-jspc” Ant task. But the current Glassfish V2 documention doesn’t seem to list any such task amongst its “asant” targets, nor otherwise document a “jspc” or “sun-appserv-jspc” Ant task. Maybe I just didn’t find the right document. I guess I should just hunt around the Glassfish libraries for the relevant class, or try invoking it based on the previous release’s documentation. But for the moment, invoking the “jspc” script is perfectly adequate for my purposes, so I’m sticking with that unless and until I get a chance to look at this again.

A few other findings:

When given a complete web-application, the Tomcat 5 JspC compiler seems to process precisely those files that have a “.jsp” or “.jspx” extension. Maybe someone can enlighten me, but I can’t see anything in the Ant task’s attributes that allow it to be configured to process other file extensions. In contrast, Glassfish’s jspc script seems to automatically process all file types that are identified by the web.xml as being JSPs.

With the Tomcat JspC task, the JSP translation had to be followed by a separate run of “javac” to compile the resulting java source code. In contrast, the Glassfish jspc script supports a “-compile” option that carries out the compilation as part of its own processing. What’s more, I gather this uses the JSR 199 Java Compiler API for “in process” compilation if this is available (i.e. when running on JDK 6 or higher), and seems much faster as a result.

A slight limitation of the Glassfish jspc “-compile” option is that there doesn’t seem to be any control over where the resulting class files are written. Instead, they just get written into the same directory as the java source files. For my purposes this doesn’t matter, but if you wanted to put the class files into a specific location, or deploy them without the source code, you’d have to follow the jspc run with your own moving/copying/filtering of files as necessary.

I’m not particularly concerned with the exact performance of this, but subjectively the builds do seem noticeably faster since switching over to the Glassfish JspC and using its “built-in” compile instead of a separate “javac” run.

The Glassfish jspc script also supports a “-validate” option, which validates “.tld” and “web.xml” files against their schemas and DTDs. However, I don’t currently use this, and instead use a separate run of Glassfish’s verifier script to verify the finished web-application archive as whole.

I wonder if anyone can clarify the exact relationship between the Tomcat and Glassfish JspC implementations and the underlying “Jasper 2”? Or the exact status (and maybe classname, location, documentation etc) of any Glassfish “jspc” Ant task?

The ObMimic library for out-of-container servlet testing is now being made available to a small number of users as a private “beta” release, in advance of a more public beta.

We’re ready for a few more people to start trying it out, so if you’re interested just let me know – either via this blog’s “contact me” page or via my company e-mail address of mike-at-openbrace-dot-com.

In outline, ObMimic provides a comprehensive set of fully-configurable test doubles for the Servlet API, so that you can use normal “plain java” tools and techniques to test servlets, filters, listeners and any other code that depends on the Servlet API. We call these test doubles “mimics”, because they “mimic” the behaviour of the real object.

We see this as the ultimate set of “test doubles” for this specific API: a set of plain Java objects that completely and accurately mimic the behaviour of the “real” Servlet API objects, whilst being fully configurable and inspectable and with additional instrumentation to support both “state-based” and “interaction-based” testing.

If you find servlet code harder to test than plain Java, ObMimic might be just what you’re looking for.

With ObMimic, you can create instances of any Servlet API interface or abstract class using plain no-argument constructors; configure and inspect all relevant details of their internal state as necessary; and pass them into your code wherever Servlet API objects are needed. This makes it easy to do detailed testing of servlets, filters, listeners and other code that depends on the Servlet API, without needing a servlet container and without any of the complexities and overheads of packaging, deployment, restarts/reloads, networking etc.

ObMimic includes facilities for:

Setting values that are “read-only” in the Servlet API (including full programmatic control over “deployment descriptor” values and other values that are normally fixed during packaging/deployment, or that have fixed values in each servlet container).

Examining values that are normally “write-only” in the Servlet API (such as a response’s body content).

Optionally recording and retrieving details of the Servlet API calls made to each object (with ability to turn this on and off on individual objects).

Controlling which version of the Servlet API is simulated, with versions 2.3, 2.4 and 2.5 currently supported (for example, you can programmatically repeat a test using different Servlet API versions).

Controlling the simulation of container-specific behaviour (i.e. where the Servlet API allows variations or leaves this open).

Explicitly forcing Servlet API methods to throw a checked exception (e.g. so that you can test any code that handles such exceptions).

Handling JNDI look-ups using a built-in, in-memory JNDI simulation.

There are no dependencies on any particular testing framework or third-party libraries (other than Java SE 5 or higher and the Servlet API itself), so you can freely use ObMimic with JUnit, TestNG or any other testing framework or tool.

In contrast to traditional “mock” or “stub” objects, ObMimic provides complete, ready-made implementations of the Servlet API interfaces and abstract classes as defined by their Javadoc. As a result, your tests don’t have to depend on your own assumptions about the Servlet API’s behaviour, and both state-based and interaction-based tests can be supported. ObMimic can even handle complex sequences of Servlet API calls, such as for session-handling, request dispatching, incorporation of “POST” body content into request parameters, notification to listeners, and other such complex interactions between Servlet API objects. It can thus be used not only for testing individual components in isolation, but also for testing more complete paths through your code and third-party libraries.

With the appropriate configuration, it’s even possible to test code that uses other frameworks on top of the Servlet API. For example, we’ve been able to use ObMimic to test “Struts 1” code, and to run ZeroTurnaround’s JspWeaver on top of ObMimic to provide out-of-container testing of JSPs (as documented previously).

As a somewhat arbitrary example, the following code illustrates a very simple use of ObMimic to test a servlet (just to show the basics of how Servlet API objects can be created, configured and used):

ObMimic isn’t open-source, but it will have a zero-cost version (full API coverage but a few overall features disabled, such as the ability to configure the Servlet API version, control over how incorrect/ambiguous API calls are handled, and recording of API calls). There will also be a low-cost per-user “Professional” version with full functionality, and an “Enterprise” version that includes all of ObMimic’s source-code and internal tests (with an Ant build script) as well as a licence for up to 200 users.

At the moment there’s no web-site, discussion forums or bug-reporting mechanisms (all still being prepared), but ObMimic already comes with full documentation including both short and detailed “getting started” guides, “how to”s with example code, and extensive Javadoc – and for this private beta I’m providing direct support by e-mail.

Anyway, if you’d like to try out ObMimic, or have any questions or comments, or would like to be informed when there’s a more public release, just let me know via the “contact me” page or by e-mail.

Share this:

Like this:

I’m a source-control kind of guy. Anyone that knows me would assume that I’d always insist on a source-control tool of some kind, even for my own “solo” work.

But they’d be wrong – I’ve only just found one I’m happy with, and in the meantime I’ve gone several years without any source-control tool. And frankly, I’ve always been a bit perplexed at how everyone else seems to get along with these tools.

Sure, in the past I’ve worked on teams using PVCS or ClearCase, and before that PANVALET on mainframes (and some other mainframe tool whose name I can’t even remember). I’ve had the odd encounter with CVS, Subversion and Perforce. And when I started setting up my own development environment environment a few years back, source-control was one of the first things I looked at (together with overall directory structures, backup, and security).

But at that time I wasn’t happy with any of the tools I found. Everyone else seemed to be using CVS, but the more I learnt about it the more of a ridiculous nightmare it seemed. I looked at Subversion and Perforce and a few others, but at the time they all seemed far too awkward, limited and problematic to suit my needs – just far more trouble than would be worth. The more expensive tools were beyond my budget (and in any case, given past experiences, I kind of expected them to be worse rather than better).

I think at least part of the problem was that these tools tend to address a broad but ill-defined set of loosely-related issues. It’s as if everybody knows what such source-control tools are supposed to do (unfortunately, often based on CVS, which just seems insane), but this isn’t based on any clear definition of exactly what needs such a tool should and shouldn’t be trying to address. Then each specific tool has its own particular flaws in conception, architecture and implementation. Throw non-standard services, storage mechanisms and networking protocols into the mix, and you end up having to deal with a huge pile of complications and restrictions just to get one or two key benefits.

As an aside, the Google “Tech Talk” video Linus Torvalds on git has plenty of scathing comments about these traditional source-control tools and why they aren’t the answer. If you want some more examples of people who aren’t enjoying their source-control tools, there are also some great comments on the “Coding Horror” article Software Branching and Parallel Universes.

In the end, it looked both simpler and safer for me to live without a source-control tool. That’s heresy in civilized software engineering circles, even for a one-man project. But it has worked fine for me up until now.

In the absence of a source-control tool, I’ve maintained separate and complete copies of each version of each project, and done any merging of code between them manually (or at least, using separate tools). This loses out on the locking, merging and history tracking/recreation that a source-control tool could provide, but to date that hasn’t been of any consequence (and can partly be addressed by other means, e.g. short-term history tracking by my IDE, use of “diff” tools against old backups etc). In return I’ve not had to deal with any of the overheads, complexity or risks of any of these tools, nor had to fit the rest of my environment and procedures around them.

Don’t get me wrong: on a larger team, or more complex projects, some kind of source-control tool would normally be absolutely essential, however problematic and burdensome. But I am not a larger team, and so far it hasn’t been worth my while to shoulder such burdens.

Anyway, I revisit this subject every now and then, to see if the tools have reached the point where any are good enough to meet my needs (and so that I have a rough idea of what to do if I suddenly do need a source control tool after all).

And this time around, at last, everything seems to have changed…

This time, the world suddenly seems full of “distributed” (or perhaps more accurately, “decentralized”) source-control tools. Despite initially fearing that things had just got a whole lot more complicated, these tools have actually turned out to be exactly what I’ve been looking for all this time.

I’m not going to try and explain distributed source-control tools here, but for some general background, see (for example):

Of the currently-available distributed source-control tools, a quick look round suggested that Mercurial might be best for me, and some brief exploration and experimentation with it completely won me over.

At last, a souce-control tool that I’m happy with!

Mercurial gives me precisely the benefits I’m looking for from a source-control tool – in particular, history tracking/recreation and good support for branching and merging. It’s flexible enough to let me add these facilities into my existing development environment and directory structures without otherwise impacting them (even though this isn’t how most teams would normally use it), it doesn’t need any significant adminstration, and it seems simple and reliable.

It all seems simple and reasonably intuitive, and everything “just works”.

Branching and tagging, and more importantly merging, all look relatively simple, safe, and effective.

Its overall approach makes it very flexible. I especially like the way the internal Mercurial data is held in a single directory structure in the root of the relevant set of files. This keeps it together with the files themselves, with no separate central repository that everything depends on, whilst also not scattering lots of messy extra directories into the “real” directories. It was easy to see how this could be fitted into my existing directory structures, backup, working practices etc without any significant impact or risk, and without other tools and scripts needing to be aware of it. At the same time I don’t feel it ties me down to any one particular structure, and I can see how it could readily accommodate much larger teams or more complex situations.

Although this is entirely subjective, it feels rock solid and safe. Retrieving old versions and moving backwards and forwards between versions works quickly and reliably, with no fuss or bother. The documentation’s coverage of its internal architecture and how this has been designed for safety (e.g. writing is “append only” and carried out in an order that ensures “atomic” operation, use of checksums for integrity checks etc) gives me good confidence that corruptions or irretrievable files should be very rare. For extra safety I can still keep my existing directories in place (holding the current “tip” of each version), so that at worst my existing backup regime still covers them even if anything in Mercurial ever gets corrupted.

The documentation provided by the Distributed revision control with Mercurial open-source book seems excellent. I found it clear and readable enough to act as an introduction, but extensive and detailed enough to work as a reference. I spent a couple of hours reading through the whole thing and felt like this had given me a real understanding of Mercurial and covered everything I might need to know.

Commits are atomic, and can optionally handle added and deleted files automatically. This means that I can pretty much just carry out the relevant work without regard for Mercurial, then simply commit the whole lot at the end of each task, without having to individually notify Mercurial of each new or deleted file. This removes a lot of the need for integration with IDEs, and a lot of the potential source-control implications of using IDE “refactoring” facilities.

Some of these are intrinsic benefits of distributed source control; some are due to Mercurial being a relatively new solution (and able to build on the best of earlier tools whilst avoiding their mistakes and being free of historical baggage); and some are just down to it being well designed and implemented.

For anyone coming from other tools, some conversion/migration tools are listed at Mercurial’s Repository Conversion page, but of course I haven’t tried any of these myself.

The only weaknesses I’ve encountered so far are:

Mercurial deals with individual files, and is therefore completely blind to empty directories. The argument seems to be that empty directories aren’t needed and aren’t significant, but I think this is more an artifact of the implementation than anything one would deliberately specify. I don’t think it’s such a tool’s place to decide that empty directories don’t matter. I have directories that exist just to maintain a consistent layout, or as already-named placeholders in readiness for future files. To work around this I’ve had to find all empty directories and give them each a dummy “placeholder” file.

Although there’s at least one Eclipse plug-in, at least one NetBeans plug-in, and a TortoiseHg project for an MS-Windows shell extension, these seem to be at a very early stage. I’d expect this situation to improve over time, especially for NetBeans (given Sun’s use of Mercurial for OpenJDK). In the meantime this doesn’t have much impact on my own use of Mercurial, as the command-line commands are simple to use and powerful enough to be practical. During normal day-to-day work, my use of Mercurial has generally been limited to a commit of a complete set of changes when ready, plus explicit “rename”s of files where necessary.

On MS Windows you need to obtain a suitable diff/merge tool separately, as this isn’t built into the Mercurial distribution (but the documentation points you at several suitable tools, and shows how to integrate them into Mercurial – and anyway, I’d rather have the choice than be saddled with one I don’t like, or have a half-baked solution as part of the source-control tool itself).

I’ve now been using Mercurial for a couple of months. Despite my general dislike of all the source-control tools I’d looked at beforehand, I have been very pleased with Mercurial.

If you’re looking for a new source control tool, or have always disliked tools such as CVS, Subversion and Perforce, I’d certainly recommend Mercurial as worth taking a look at.