Mercurial wins me over

6112007

I’m a source-control kind of guy. Anyone that knows me would assume that I’d always insist on a source-control tool of some kind, even for my own “solo” work.

But they’d be wrong – I’ve only just found one I’m happy with, and in the meantime I’ve gone several years without any source-control tool. And frankly, I’ve always been a bit perplexed at how everyone else seems to get along with these tools.

Sure, in the past I’ve worked on teams using PVCS or ClearCase, and before that PANVALET on mainframes (and some other mainframe tool whose name I can’t even remember). I’ve had the odd encounter with CVS, Subversion and Perforce. And when I started setting up my own development environment environment a few years back, source-control was one of the first things I looked at (together with overall directory structures, backup, and security).

But at that time I wasn’t happy with any of the tools I found. Everyone else seemed to be using CVS, but the more I learnt about it the more of a ridiculous nightmare it seemed. I looked at Subversion and Perforce and a few others, but at the time they all seemed far too awkward, limited and problematic to suit my needs – just far more trouble than would be worth. The more expensive tools were beyond my budget (and in any case, given past experiences, I kind of expected them to be worse rather than better).

I think at least part of the problem was that these tools tend to address a broad but ill-defined set of loosely-related issues. It’s as if everybody knows what such source-control tools are supposed to do (unfortunately, often based on CVS, which just seems insane), but this isn’t based on any clear definition of exactly what needs such a tool should and shouldn’t be trying to address. Then each specific tool has its own particular flaws in conception, architecture and implementation. Throw non-standard services, storage mechanisms and networking protocols into the mix, and you end up having to deal with a huge pile of complications and restrictions just to get one or two key benefits.

As an aside, the Google “Tech Talk” video Linus Torvalds on git has plenty of scathing comments about these traditional source-control tools and why they aren’t the answer. If you want some more examples of people who aren’t enjoying their source-control tools, there are also some great comments on the “Coding Horror” article Software Branching and Parallel Universes.

In the end, it looked both simpler and safer for me to live without a source-control tool. That’s heresy in civilized software engineering circles, even for a one-man project. But it has worked fine for me up until now.

In the absence of a source-control tool, I’ve maintained separate and complete copies of each version of each project, and done any merging of code between them manually (or at least, using separate tools). This loses out on the locking, merging and history tracking/recreation that a source-control tool could provide, but to date that hasn’t been of any consequence (and can partly be addressed by other means, e.g. short-term history tracking by my IDE, use of “diff” tools against old backups etc). In return I’ve not had to deal with any of the overheads, complexity or risks of any of these tools, nor had to fit the rest of my environment and procedures around them.

Don’t get me wrong: on a larger team, or more complex projects, some kind of source-control tool would normally be absolutely essential, however problematic and burdensome. But I am not a larger team, and so far it hasn’t been worth my while to shoulder such burdens.

Anyway, I revisit this subject every now and then, to see if the tools have reached the point where any are good enough to meet my needs (and so that I have a rough idea of what to do if I suddenly do need a source control tool after all).

And this time around, at last, everything seems to have changed…

This time, the world suddenly seems full of “distributed” (or perhaps more accurately, “decentralized”) source-control tools. Despite initially fearing that things had just got a whole lot more complicated, these tools have actually turned out to be exactly what I’ve been looking for all this time.

I’m not going to try and explain distributed source-control tools here, but for some general background, see (for example):

Of the currently-available distributed source-control tools, a quick look round suggested that Mercurial might be best for me, and some brief exploration and experimentation with it completely won me over.

At last, a souce-control tool that I’m happy with!

Mercurial gives me precisely the benefits I’m looking for from a source-control tool – in particular, history tracking/recreation and good support for branching and merging. It’s flexible enough to let me add these facilities into my existing development environment and directory structures without otherwise impacting them (even though this isn’t how most teams would normally use it), it doesn’t need any significant adminstration, and it seems simple and reliable.

It all seems simple and reasonably intuitive, and everything “just works”.

Branching and tagging, and more importantly merging, all look relatively simple, safe, and effective.

Its overall approach makes it very flexible. I especially like the way the internal Mercurial data is held in a single directory structure in the root of the relevant set of files. This keeps it together with the files themselves, with no separate central repository that everything depends on, whilst also not scattering lots of messy extra directories into the “real” directories. It was easy to see how this could be fitted into my existing directory structures, backup, working practices etc without any significant impact or risk, and without other tools and scripts needing to be aware of it. At the same time I don’t feel it ties me down to any one particular structure, and I can see how it could readily accommodate much larger teams or more complex situations.

Although this is entirely subjective, it feels rock solid and safe. Retrieving old versions and moving backwards and forwards between versions works quickly and reliably, with no fuss or bother. The documentation’s coverage of its internal architecture and how this has been designed for safety (e.g. writing is “append only” and carried out in an order that ensures “atomic” operation, use of checksums for integrity checks etc) gives me good confidence that corruptions or irretrievable files should be very rare. For extra safety I can still keep my existing directories in place (holding the current “tip” of each version), so that at worst my existing backup regime still covers them even if anything in Mercurial ever gets corrupted.

The documentation provided by the Distributed revision control with Mercurial open-source book seems excellent. I found it clear and readable enough to act as an introduction, but extensive and detailed enough to work as a reference. I spent a couple of hours reading through the whole thing and felt like this had given me a real understanding of Mercurial and covered everything I might need to know.

Commits are atomic, and can optionally handle added and deleted files automatically. This means that I can pretty much just carry out the relevant work without regard for Mercurial, then simply commit the whole lot at the end of each task, without having to individually notify Mercurial of each new or deleted file. This removes a lot of the need for integration with IDEs, and a lot of the potential source-control implications of using IDE “refactoring” facilities.

Some of these are intrinsic benefits of distributed source control; some are due to Mercurial being a relatively new solution (and able to build on the best of earlier tools whilst avoiding their mistakes and being free of historical baggage); and some are just down to it being well designed and implemented.

For anyone coming from other tools, some conversion/migration tools are listed at Mercurial’s Repository Conversion page, but of course I haven’t tried any of these myself.

The only weaknesses I’ve encountered so far are:

Mercurial deals with individual files, and is therefore completely blind to empty directories. The argument seems to be that empty directories aren’t needed and aren’t significant, but I think this is more an artifact of the implementation than anything one would deliberately specify. I don’t think it’s such a tool’s place to decide that empty directories don’t matter. I have directories that exist just to maintain a consistent layout, or as already-named placeholders in readiness for future files. To work around this I’ve had to find all empty directories and give them each a dummy “placeholder” file.

Although there’s at least one Eclipse plug-in, at least one NetBeans plug-in, and a TortoiseHg project for an MS-Windows shell extension, these seem to be at a very early stage. I’d expect this situation to improve over time, especially for NetBeans (given Sun’s use of Mercurial for OpenJDK). In the meantime this doesn’t have much impact on my own use of Mercurial, as the command-line commands are simple to use and powerful enough to be practical. During normal day-to-day work, my use of Mercurial has generally been limited to a commit of a complete set of changes when ready, plus explicit “rename”s of files where necessary.

On MS Windows you need to obtain a suitable diff/merge tool separately, as this isn’t built into the Mercurial distribution (but the documentation points you at several suitable tools, and shows how to integrate them into Mercurial – and anyway, I’d rather have the choice than be saddled with one I don’t like, or have a half-baked solution as part of the source-control tool itself).

I’ve now been using Mercurial for a couple of months. Despite my general dislike of all the source-control tools I’d looked at beforehand, I have been very pleased with Mercurial.

If you’re looking for a new source control tool, or have always disliked tools such as CVS, Subversion and Perforce, I’d certainly recommend Mercurial as worth taking a look at.

Like this:

Related

Actions

Information

20 responses

6112007

the j-dog(21:21:23) :

Can we pleeeeeaaaase stop using those stupid website preview things when we hover over links? Don’t you guys have any idea how annoying those things are? I occasionally accidentally hover over one when I’m scrolling with the mouse wheel and a preview of some random website will jump out at me. Sometimes I like to “feel” the text while I read it with the mouse cursor, and some random website will jump out at me. Sometimes I like to hover over a link, just so I can get an idea where the link goes from the address, and BAM. This stuff is a blight on the web. It’s not quite as bad as the pages that look up words I double click in the dictionary, but still very very very very very very very very very very very very very very very very annoying.

I use Git both at work and for personal prejcots. Its superior branch merging and local branching capability are great advantages over Subversion. Git is very powerful but as a result has lots and lots of commands, options and features that are confusing and unnecessary for everyday use. However I found that once you learn how to do the basics, you can get by perfectly fine. My advice would be to seek out a friendly tutorial and avoid the official documentation as a learning resource it’s just confusing!

See, the thing with the empty directories is that it massively simplifies the internal model of the tree: instead of file nodes and directory nodes, there are now only file nodes. This means some optimizations can be done that would otherwise be impossible. Seen in that light, I think it’s a good trade-off. Having a few .keep files around isn’t too much of a price to pay.

I want my version control tool to work how I intuitively expect it to work and not have to find work-rounds like creating files in empty directories..

That said, I have been using hg for the last month on single developer projects and am quite happy. I haven’t had the need to use branching yet, or maybe I’ve had too much CVS branching history to attempt it..

kudus to Mike for handling the link-preview thing in the sanest I’ve seen on the net.

I looked at Git briefly, but was put off by: impression that it’s perhaps primarily a toolkit for higher-level tools and front-ends; doubts over using it on MS Windows (I use both Windows and Linux); and concerns over space requirements and possible need for regular “housekeeping”.

I might well be doing it an injustice, but my gut reaction was that it wasn’t going to offer anything decisive over Mercurial and/or Bazaar, and it was firing enough alarm bells to put me off spending time looking at it further.

I guess my main decision was that the “distributed” approach is what I’ve been looking for and already has tools that are good enough for serious use. Choosing between those tools seemed a much less critical issue. I suspect there’s little to choose between Git, Mercurial and Bazaar (pros/cons to each, balance likely to shift over time anyway).

Mercurial just “felt” most right to me (very subjective…), and it does meet my requirements very well, so I was quite happy to plump for it rather than spend more time choosing between them.

I tried Mercurial and Bazaar for doing a big merge (1500+ files) from PVCS repo on Windows. I found Bazaar more user friendly for merges than Mercurial (at least on windows). Other than that both are similar.
Mercurial insisted that I should perform the merge interactively whereas all I wanted was a report of conflicting files. Bazaar went ahead, performed the merge and gave me a list of files which had the merge markers in them.
I wish I could have used Mercurial. With Bazaar I was very impressed with the merge.
Other than that, you need C compiler (either MSCV2003 or MinGW) to install Mercurial source code whereas for Bazaar, all dependencies are available as handy windows installers.

Jonny, thanks for pointing out your markemptydirs tool. Looks like a nice simple answer for automatically creating and removing placeholder files. It doesn’t suit me personally at the moment due to being .Net or needing Mono (not worth introducing these into my build process and then administer/maintain/patch etc for something as minor as this), but handy to know about, and I can see it being a neat little answer for anyone who wants to automate this.