2011-04-19

With Darcs being the foil to Git for the purposes of this discussion, I thought it would be useful if I cleared up a few points, particularly this first one:

consistency is a usability issue

When people say they like Darcs, they don't generally talk about it having a beautiful or elegant theory. Instead, they talk about how easy and simple it is to use, about how they never really had to grapple with a learning curve or feel stupid for doing something wrong.

What makes Darcs so simple to use? Did it hit the right notes by accident or through David Roundy's good taste? Or is usability merely in the eye of the beholder? Some of these explanations may be true, but I think what lies at the heart of Darcs' usability is that it supports a very simple way of understanding a repository:

a darcs repository is a set of patches

This mental model may not be suitable for everybody, and in the long run Darcs may need to improve its support for history tracking. But if you want to understand why, for all its current shortcomings, people continue to use and develop Darcs, you must appreciate how refreshingly simple the set-of-patches mental model can be. As a Darcs user you are freed from a lot of the artefacts of worrying about commit order. Collaborating with people is just question of shuffling patches around, with no merge states, no rebases, way fewer spurious dependencies to worry about.

But simplicity is hard. In order to make this simple world view possible, Darcs has to guarantee a property that any ordering of patches allowed by Darcs commutation rules is equivalent. If Darcs gives you the option of skipping a patch, it has to work hard to make sure that if you include the patch later on, that the repository you get is equivalent. That's what the patch theory fuss is about. While it's useful that Darcs tends to attract purists and math geeks, we're really not engaged in the pursuit of some sort of ivory tower theoretical elegance for its own sake. Ultimately what we're after is usability.

A good user interface minimises work for the user, be it cognitive, memory or physical work. The joy of Darcs is being able to focus cognitive work on our real jobs, and not on babysitting version control systems. So when Russell O'Connor says that merges ought to be associative, he's not saying this to tick some sort of mathematical box, what I think he's really saying is as a Darcs user, he doesn't want to worry about the difference between pushing patches one at a time vs all in one go. Consistency is a usability issue.

darcs is imperfect

Darcs is very much a work in progress. Some users have felt let down by Darcs: whenever performance grew to be unacceptable for their repositories, when they hit one exponential merge too many, or when Darcs just plain did something wrong. Even our much vaunted usability has cracks at the edges, a confirmation prompt too many, an inconsistent flag set, a non-reversible operation or two.

I particularly want to make sure I'm very clear about this point:

darcs patch theory is incomplete

We still don't know how to cope with complicated conflicts. Moreover the implementation of our first two theories is somewhat buggy. Darcs copes well enough with most every day conflicts, but if a conflict gets hairy enough, Darcs will crash and emit a nasty message. This is one of the reasons why we don't recommend Darcs for large repositories.

Our version of "don't do that" is not to maintain long term feature branches without merging back to the trunk on a regular basis. This is not acceptable for bigger projects, but for smaller projects like Darcs itself, the trade-off between a simple user interface in the general case, and the occasional hairy conflict can be worth it. In the long run, we have to fix this. We are revising our patch theory again, this time taking a much more rigorous and systematic approach to the problem.

In the interim, we will be gaining some powerful new tools to help work around the problem, namely a new "darcs rebase" feature that will allow users to smooth away conflicts rather than letting them get out of hand. This will be a crucial bridging tool while we continue to attack the patch theory problem.

patch theory is simple at heart

I am in the awkward position of being a non-expert maintainer, having to defer a lot of thinking about software engineering and patch theory to the rest of the Darcs team. In a way, this is healthy for Darcs, because we have long suffered from an excess concentration of expertise. Inverting the pie so that you basically have the number one Darcs Fan as the maintainer is useful because it forces everybody else to break things down into words an Eric can understand.

The good news is that basic patch theory is one of these things an Eric can understand: patches have inverses and may sometimes be commuted. Just learning the core theory teaches you how merging and cherry picking works, why you can trust the set-of-patches abstraction and most importantly, how simple Darcs is. So we're not after some kind of magical AI here, nor are we trying to guess user intention. The things we do with patches are much more mechanical, systematically adjusting patches to context, one at a time, click-clack on the abacus until the merge is complete.

patch vs snapshot is not so important

We think it's important to continue working on Darcs because we are exploring territory that no other version control system is looking at - patch-based version control. That said, patches and snapshots are duals of each other. We think that things that Darcs can do are possible in snapshot based version control and we would be very interested to see work in that direction.

The secret to Darcs merging is that it replaces guesswork (fuzz factor) with history. A darcs patch only exists in the context of its predecessors, and if we want to apply a patch to a different context, we mechanically transform the patch to fit. We think this sort of history-aware merging could be implemented in Git. In fact, we would be excited to see somebody taking up the challenge. Git fans! How about stealing history-aware merging from us?

exponential merges still exist but there are fewer of them

We have developed two versions of patch theory. The second version avoids a lot of the common causes of exponential merge blowups, but it is still possible to trigger them. Recent Darcs repositories are created using version 2 of the theory. For compatibility's sake, repositories created before Darcs 2 came along tend to still be using version 1 of the theory (we only recommend converting if conflicts become a problem).

The most well-known remaining cause of blowups in theory 2 is the problem of "conflict fights" where one side of the conflict resolves the conflict and gets on with their life without propagating the resolution back to the other side. What tends to happen there is that we not only encounter the conflict again in the future, but we also conflict with the resolution!

So life is definitely better with Darcs 2. We've given the exponential merge problem a good knock on the head, but it's still staggering around and we're working our way to the finishing blow.

performance is improving

I think that when people complain about Darcs being slow, they're not talking about the exponential merge problem. They're mostly referring to day-to-day issues like the time it takes to check out a repository. Our recent focus has been to solve a lot of these pedestrian performance issues. For example, the upcoming Darcs 2.8 is like to use a new "packs" feature which makes it possible to fetch a repository in the form of two larger tarballs rather than thousands of little patch files. This makes a big difference!

Another improvement we hope to bring to Darcs 2.8 is the performance of the darcs annotate command (cf. git blame). Annotate has neglected for a while, and to make things better, we've basically reimplemented the command from scratch with more readable output to boot. As an example of something fixed along the way, one misfeature of the old annotate is that would work by applying all the patches relevant to a given file, building it up from the very beginning. But if you think about it, annotating a file is really about annotating its current state; we don't care about ancient history! So one of the Darcs hackers had the sort of idea that’s obvious in hindsight: rather than applying patches forwards from the beginning of history, we simply unapply them from the end. Much faster.

We're not yet trying to compete with Git when working on these performance issues. We admire the performance that Git can deliver and we agree that getting speed right is a usability issue (too slow and your user loses their train of thought). But we've been picking a lot of low hanging fruit lately, solving problems that make Darcs faster with very little cost. We hope you'll like the results!

With Darcs being the foil to Git for the purposes of this discussion, I thought it would be useful if I cleared up a few points, particularly this first one:

consistency is a usability issue

When people say they like Darcs, they don't generally talk about it having a beautiful or elegant theory. Instead, they talk about how easy and simple it is to use, about how they never really had to grapple with a learning curve or feel stupid for doing something wrong.

What makes Darcs so simple to use? Did it hit the right notes by accident or through David Roundy's good taste? Or is usability merely in the eye of the beholder? Some of these explanations may be true, but I think what lies at the heart of Darcs' usability is that it supports a very simple way of understanding a repository:

a darcs repository is a set of patches

This mental model may not be suitable for everybody, and in the long run Darcs may need to improve its support for history tracking. But if you want to understand why, for all its current shortcomings, people continue to use and develop Darcs, you must appreciate how refreshingly simple the set-of-patches mental model can be. As a Darcs user you are freed from a lot of the artefacts of worrying about commit order. Collaborating with people is just question of shuffling patches around, with no merge states, no rebases, way fewer spurious dependencies to worry about.

But simplicity is hard. In order to make this simple world view possible, Darcs has to guarantee a property that any ordering of patches allowed by Darcs commutation rules is equivalent. If Darcs gives you the option of skipping a patch, it has to work hard to make sure that if you include the patch later on, that the repository you get is equivalent. That's what the patch theory fuss is about. While it's useful that Darcs tends to attract purists and math geeks, we're really not engaged in the pursuit of some sort of ivory tower theoretical elegance for its own sake. Ultimately what we're after is usability.

A good user interface minimises work for the user, be it cognitive, memory or physical work. The joy of Darcs is being able to focus cognitive work on our real jobs, and not on babysitting version control systems. So when Russell O'Connor says that merges ought to be associative, he's not saying this to tick some sort of mathematical box, what I think he's really saying is as a Darcs user, he doesn't want to worry about the difference between pushing patches one at a time vs all in one go. Consistency is a usability issue.

darcs is imperfect

Darcs is very much a work in progress. Some users have felt let down by Darcs: whenever performance grew to be unacceptable for their repositories, when they hit one exponential merge too many, or when Darcs just plain did something wrong. Even our much vaunted usability has cracks at the edges, a confirmation prompt too many, an inconsistent flag set, a non-reversible operation or two.

I particularly want to make sure I'm very clear about this point:

darcs patch theory is incomplete

We still don't know how to cope with complicated conflicts. Moreover the implementation of our first two theories is somewhat buggy. Darcs copes well enough with most every day conflicts, but if a conflict gets hairy enough, Darcs will crash and emit a nasty message. This is one of the reasons why we don't recommend Darcs for large repositories.

Our version of "don't do that" is not to maintain long term feature branches without merging back to the trunk on a regular basis. This is not acceptable for bigger projects, but for smaller projects like Darcs itself, the trade-off between a simple user interface in the general case, and the occasional hairy conflict can be worth it. In the long run, we have to fix this. We are revising our patch theory again, this time taking a much more rigorous and systematic approach to the problem.

In the interim, we will be gaining some powerful new tools to help work around the problem, namely a new "darcs rebase" feature that will allow users to smooth away conflicts rather than letting them get out of hand. This will be a crucial bridging tool while we continue to attack the patch theory problem.

patch theory is simple at heart

I am in the awkward position of being a non-expert maintainer, having to defer a lot of thinking about software engineering and patch theory to the rest of the Darcs team. In a way, this is healthy for Darcs, because we have long suffered from an excess concentration of expertise. Inverting the pie so that you basically have the number one Darcs Fan as the maintainer is useful because it forces everybody else to break things down into words an Eric can understand.

The good news is that basic patch theory is one of these things an Eric can understand: patches have inverses and may sometimes be commuted. Just learning the core theory teaches you how merging and cherry picking works, why you can trust the set-of-patches abstraction and most importantly, how simple Darcs is. So we're not after some kind of magical AI here, nor are we trying to guess user intention. The things we do with patches are much more mechanical, systematically adjusting patches to context, one at a time, click-clack on the abacus until the merge is complete.

patch vs snapshot is not so important

We think it's important to continue working on Darcs because we are exploring territory that no other version control system is looking at - patch-based version control. That said, patches and snapshots are duals of each other. We think that things that Darcs can do are possible in snapshot based version control and we would be very interested to see work in that direction.

The secret to Darcs merging is that it replaces guesswork (fuzz factor) with history. A darcs patch only exists in the context of its predecessors, and if we want to apply a patch to a different context, we mechanically transform the patch to fit. We think this sort of history-aware merging could be implemented in Git. In fact, we would be excited to see somebody taking up the challenge. Git fans! How about stealing history-aware merging from us?

exponential merges still exist but there are fewer of them

We have developed two versions of patch theory. The second version avoids a lot of the common causes of exponential merge blowups, but it is still possible to trigger them. Recent Darcs repositories are created using version 2 of the theory. For compatibility's sake, repositories created before Darcs 2 came along tend to still be using version 1 of the theory (we only recommend converting if conflicts become a problem).

The most well-known remaining cause of blowups in theory 2 is the problem of "conflict fights" where one side of the conflict resolves the conflict and gets on with their life without propagating the resolution back to the other side. What tends to happen there is that we not only encounter the conflict again in the future, but we also conflict with the resolution!

So life is definitely better with Darcs 2. We've given the exponential merge problem a good knock on the head, but it's still staggering around and we're working our way to the finishing blow.

performance is improving

I think that when people complain about Darcs being slow, they're not talking about the exponential merge problem. They're mostly referring to day-to-day issues like the time it takes to check out a repository. Our recent focus has been to solve a lot of these pedestrian performance issues. For example, the upcoming Darcs 2.8 is like to use a new "packs" feature which makes it possible to fetch a repository in the form of two larger tarballs rather than thousands of little patch files. This makes a big difference!

Another improvement we hope to bring to Darcs 2.8 is the performance of the darcs annotate command (cf. git blame). Annotate has neglected for a while, and to make things better, we've basically reimplemented the command from scratch with more readable output to boot. As an example of something fixed along the way, one misfeature of the old annotate is that would work by applying all the patches relevant to a given file, building it up from the very beginning. But if you think about it, annotating a file is really about annotating its current state; we don't care about ancient history! So one of the Darcs hackers had the sort of idea that’s obvious in hindsight: rather than applying patches forwards from the beginning of history, we simply unapply them from the end. Much faster.

We're not yet trying to compete with Git when working on these performance issues. We admire the performance that Git can deliver and we agree that getting speed right is a usability issue (too slow and your user loses their train of thought). But we've been picking a lot of low hanging fruit lately, solving problems that make Darcs faster with very little cost. We hope you'll like the results!

I've been following Darcs since 2004 and like to check in on the corner-case progress every once in a while. Having an article like this to read definitely beats my usual method of skimming the mailing list archives.