Friday, January 28, 2005

A while ago, I went and asked the question about why Gnu Arch was so popular among Lispers. I got some replies from people that suggested that Arch was clearly superior to CVS/Subversion in a number of ways, and so I decided to give it a try. After a few months of relatively light use, here's my report. I won't rehash a lot of the general Arch vs. revision-control-X sort of stuff. You can find a lot of that on the Arch Wiki website and the websites of all the revision-control-X's out there. Note that these are my observations. If you're using Arch successfully and think it's the tip-top, fine; carry on.

On the pros side:

I like that Arch supports distributed development and doesn't require a special server for publishing an archive. A simple HTTP or FTP server will work just great.

I like that Arch supports branching relatively easily. Once a branch is created, you can stay in sync with upstream changes or publish your changes back to the upstream repository.

The Arch Emacs interface, xtla, is quite nice, and seems to be evolving quite rapidly now. Xtla simplifies a number of Arch processes, which as I'll point out below is very helpful.

On the cons side:

Arch is way more complicated than it needs to be. I printed up a little cheat-sheet for myself with all the various commands and a brief description. My cheat sheet was two pages of small print. The semi-official cheat-sheet that you can get from the Arch Wiki is six pages, but it includes options with explanations for each command.

There are many commands that seem to be the same, but slightly different (update vs. replay, which I still haven't figured out). Adding to the confusion, in other cases there are aliases in the command set where the commands really are the same (add vs. add-id, for instance). This is one area where xtla helps out. By following a standard xtla flow, you don't have to remember too many of the commands; they just become emacs keychords.

The Arch command names are, in many common cases, non-intuitive. For instance, why star-merge and not just merge?

Arch uses very strange file names for everything, composed of some characters that require shell escaping. Further, Arch seems to love long names (files, repositories, everything), making some commands painful to type. In some cases, these actually are file names and you can use shell tab-completion to help out; in other cases they are internal Arch names which may map to file names in a directory structure, but shell completion can't be used (full repository names, for instance). In that case, it's very difficult to type out a 40-character string of semi-gibberish and get it right the first time. This is one area where xlta is a huge help. By simply selecting names from buffers, it's relatively easy to avoid making mistakes.

The Arch repository structure is painful. It's sort of hierarchical, with archives, categories, branches, versions, and revisions as the various components. It was never really clear to me how to use categories to good effect. Some of this seems like a throwback to a semi-centralized repository structure. That is, you can put all your projects into a single repository if you want, then have categories under the main repository for each project. At least that's the only thing I could figure out. But if that's the case, why call them categories?

Arch processes are more complex than they need to be. Setting up a repository and importing code takes several more steps than it should. Branching and merging is more complex than it needs to be.

I spent some time reading comp.version-control.arch.user last night to get a feel of how other people are using the system. It turns out that many of these problems are known, though some not recognized (internalized?) as problems. I saw a lot of Arch defenders dismissing things simply as "user interface problems," as if we were simply talking about a pixel being out of place somewhere. Frankly, whenever I see that, I get scared. What is the purpose of a program if not to interface to a user such that the user can accomplish tasks with it. If the program makes it so confusing that users can't get done what they need to, quickly and easily, then what good is it? Arch strikes me as a powerful program that has evolved without a lot of UI direction.

Tom Lord, Arch's creator, is probably a brilliant guy, but he needs to take a class on user interface design. No, I'm not talking about GUIs with pretty colors and dialog boxes, but just your basic "users need a simple mental model of the system and if you don't provide that, everything will be confusing and complex, even when it doesn't need to be" sort of thing.

Others will say that I'm confusing complexity for power. "When a tool is powerful," they'll say, "it's bound to be complex." And Arch is certainly powerful. Well, there is a case to be made that tools that do a lot of things naturally end up with a lot of commands. Yes, that's true. But those commands can be organized in simple ways. And simple processes can be given simple commands to execute them. As Einstein is reported to have said, "Everything should be made as simple as possible, but no simpler." In this case, I believe Arch could be much more simple.

So, when I first asked my question about Arch, one kind reader suggested I check out darcs. After getting irritated at Arch last night, I did. In short, darcs seems amazingly streamlined and intuitive when compared to Arch, but provides basically all the same advantages of distributed version control. After quickly skimming the manual, I was able to set up repositories, create patches, branch, and merge. The command set is small and rationally organized. The basic processes take only a couple of commands to execute. There are no funky names of repositories, etc. Further, darcs seems to work on Windows quite well, which has been a limitation of Arch before this. While I use Linux mostly, there are times that I have to deal with Windows and it's nice to know that I could use the same systems on both environments.

In reading comp.version-control.darcs.user, there are a few limitations to darcs currently. Darcs is at version 1.0.1 as I write this, with a 1.0.2 imminent. Arch has had more time than darcs to become feature rich and optimized. Some current limitations of darcs seem to be:

It's sometimes slower than it should be. Darcs is written in Haskell. If you're a functional programming junkie, you should see this as a win. What it means is, however, like most programs written in a high level language, there is always some performance tuning to be done. Darcs' author, David Roundy, is starting to work on some of the bigger problem areas as we speak. Given that I'm programming in Lisp now, I understand this completely. Over time, optimization will come, but Arch is probably faster for larger projects; small projects won't see much of a difference and darcs seems to be fine for everything I have used it on so far. Some operations are a bit sluggish, but nothing too problemmatic. As a test, David has the Linux source tree available via darcs.

Darcs doesn't have too much support for signing of patches. If you're looking for a very secure version control system, darcs is not (yet) for you. There is a capability to send patches to a repository or a maintainer via email, and those emails can be signed with GPG. But Arch has the capabilities to sign every patch stored in the repository, not just in transit. Arch supporters argue that if your repository is broken into, this provides better protection against tampering. Whether this is a huge concern or not probably depends on your deployment model. If your repository will be outside a firewall environment, the Arch signature methods may make more sense.

As a result of all this, I decided to "go over to the darcs side" for a while and give it a try. I'll let you know what I find out for my own personal projects.

I had the same difficulties with arch, so I switched to darcs a couple weeks ago.

I haven't used it too heavily, but I do like it a lot better. My main criticism is that patches in darcs don't seem to be numbered or designated by anything other then the patch name you supply. This seems like it could be problematic if (accidentially) you name two patches the same.

I had exactly the same reaction you did to Arch. There is likely some good thinking about version control that went into the program, but I won't be likely to find out because the user interface is just *so* bad. It gives me some hope that some people have recognized this, like the Bazaar folks (http://bazaar.canonical.com/), but for now I've given up on Arch.

I ended up switching to Monotone, which I've been really impressed with. The current implementation is a bit on the immature side, but the developers are rapidly ironing out the remaining bugs. Strong integrated crypto, a simple, powerful approach to version control, and a nice UI just make it a pleasure to use. In that past Monotone hasn't supported per-file commands (like the ability to "diff" or "commit" a single file, not the whole tree), but that has recently been added in the development sources (and should be in 0.17). In short, check out MT!

I've seen darcs some time ago but decided to give it a try aftrer reading your article (and after I've found out it's prepackaged on gentoo :)). Generally, I find it much easier than arch and more powerfull than CVS (but what isn't these days?).

One thing I don't like is that a server needs darcs installed. In this regards it's more like CVS than arch, since the latter can work on a very dumb server. E.g. I've got a machine with a FTP access, and that's enough for arch but not for darcs.

Ok, I know I can mount the ftp directory on the local filesystem and it'll probably work, but that's beside the point.