The Subversion Project: Buiding a Better CVS

This open-source project aims to produce a compelling replacement for the Concurrent Versions System (CVS).

If you work on any kind of open-source
project, you've probably worked with CVS. You probably remember the
first time you learned to do an anonymous checkout of a source tree
over the Net, your first commit or learning how to look at CVS
diffs. And then the fateful day came: you asked your friend how to
rename a file. “You can't”, was the reply.

“What? What do you mean?” you asked.

“Well, you can delete the file from the repository and then
re-add it under a new name.”

“Yes, but then nobody would know it had been
renamed.”

“Let's call the CVS administrator. She can hand-edit the
repository's RCS files for us and possibly make things
work.”

“What?”

“And by the way, don't try to delete a directory
either.”

You rolled your eyes and groaned. How could such simple tasks
be so difficult?

The Legacy of CVS

No doubt about it, CVS has evolved into the standard software
configuration management (SCM) system of the Open Source community,
and rightly so. CVS is free software and has a wonderful nonlocking
development model that allows hundreds of far-flung programmers to
collaborate. In fact, one might argue that, without CVS, it's
doubtful whether sites like Freshmeat or SourceForge ever would
have flourished as they do now. CVS and its semi-chaotic
development model have become an essential part of Open Source
culture.

So what's wrong with CVS? Because it uses the RCS storage
system under the hood, CVS can only track file contents, not tree
structures. As a result, the user has no way to copy, move or
rename items without losing history. Tree rearrangements are always
ugly server-side tweaks.

The RCS back end cannot store binary files efficiently, and
branching and tagging operations can become very slow. CVS also
uses the network inefficiently; many users are annoyed by long
waits, because file differences are sent in only one direction
(from server to client, but not from client to server), and binary
files are always transmitted in their entirety.

From a developer's standpoint, the CVS codebase is the result
of layers upon layers of historical “hacks”. (Remember that CVS
began life as a collection of shell scripts to drive RCS.) This
makes the code difficult to understand, maintain or extend. For
example, CVS's networking ability was essentially stapled on. It
was never designed to be a native client/server system.

Rectifying CVS's problems is a huge task, and we've only
listed a few of the many common complaints here.

Enter Subversion

In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a
company for commercially supporting and improving CVS. Cyclic made
the first public release of a network-enabled CVS (contributed by
Cygnus software). In 1999, Karl Fogel published a book about CVS
and the open-source development model it enables
(cvsbook.red-bean.com).
Karl and Jim had long talked about writing a replacement for CVS;
Jim even had drafted a new, theoretical repository design. Finally,
in February 2000, Brian Behlendorf of Collabnet
(www.collab.net) offered
Karl a full-time job to write a CVS replacement. Karl gathered a
team and work began in May.

The team settled on a few simple goals: it was decided that
Subversion would be designed as a functional replacement for CVS.
It would do everything that CVS does, preserving the same
development model while fixing the flaws in CVS's (lack of) design.
Existing CVS users would be the target audience; any CVS user
should be able to start using Subversion with little effort. Any
other bonus features were decided to be of secondary importance (at
least before a 1.0 release).

At the time of this writing, the original team has been
coding for a little over a year, and we have a number of excellent
volunteer contributors. (Subversion, like CVS, is an open-source
project.)

Subversion's Features

Here's a quick rundown of some of the reasons you should be
excited about Subversion:

Real copies and renames: the Subversion repository
doesn't use RCS files at all; instead, it implements a virtual
versioned filesystem that tracks tree structures over time
(described below). Files and directories are
versioned. At last there are real client-side mv and cp commands
that behave just as you think.

Atomic commits: a commit either goes into the
repository completely or not all.

Faster network access: a binary diffing algorithm
is used to store and transmit deltas in both
directions, regardless of whether a file is of text or binary
type.

Filesystem properties: each file or directory has
an invisible hash table attached. You can invent and store any
arbitrary key/value pairs you wish: owner, perms, icons, app-owner,
MIME type, personal notes, etc. This is a general-purpose feature
for users. Properties are versioned, just like file contents. And
some properties are auto-detected, like the MIME type of a file (no
more remembering to use the -kb switch).

Extensible and hackable: Subversion has no
historical baggage; it was designed and implemented as a collection
of shared C libraries with well defined APIs. This makes Subversion
extremely maintainable and usable by other applications and
languages.

Easy migration: the Subversion command-line client
is very similar to CVS; the development model is the same, so CVS
users should have little trouble making the switch. Development of
a cvs2svn repository converter is in progress.

It's free: Subversion is released under an
Apache/BSD-style, open-source license.

I've been a systems administrator for 15 years. I've been a revision control evangelist for at least 10 of those years. I've used RCS, CVS, and PerForce. I've never actually used subversion, though, and for good reason: it's impossible to build.

I'll admit, things have certainly improved over the past few years. At least it's actually possible to get a linux distribution with Subversion installed. Of course, if you're already a user, then you're probably using an older version, and the one that comes with your distro of choice isn't compatible.

Then again, if you happen to actually work for a living with Unix(tm), then you probably aren't in a linux-only environment. How about HP-UX? Solaris? AIX? The only way to actually build subversion on those platforms is to either completely ignore several vendor-supplied packages and build them all from scratch, or else to dig back through Usenet and try to find the magic combination of Subversion, Neon, ... that will work with your installed version of Apache. Or Python. Or Swig. Or whatever.

While I understand that the features offered by Subversion are not only useful, but cover some extremely large gaps in CVS, I can build CVS on any platform, and the *only* dependency is RCS, which also builds cleanly on any platform. Frankly, at this point the only way Subversion is going to become interesting to a huge number of people out there is if they ship it in a pure Python/Java/Perl/Ruby/whatever form (i.e. write it in a high-level scripting language that stands a reasonable chance of running on multiple platforms).

Subversion is really a great system now. It's making strides against CVS in terms of market share.

From my customers forums:"One of the features that sold me is the ease of access via http. Using CVS, you always have to struggle with firewalls and setting up ssh tunnels or else, put up with the weak security of pserver."

Feel free to check out our Subversion [wush.net] hosting service if you want to give Subversion a spin. We've got instant setup and a week long free trial. You'll be able to start playing with your repository in 5 minutes :)

What does this term 'intelligent merging' mean? It sounds like 'intelligent merging' means if I check in a file that's been updated (ie, changed by someone else) since I checked it out, the VCS *intelligently* merges the changes. Does this system currently support concurrent development or not?

Actually, "intelligent merging" means a solution to the classic "repeated merge" problem you see so often in CVS.

That is, you merge some changes from the trunk to a branch. Then later on, you want to merge more changes from trunk to branch, but end up getting *conflicts*, because some of the changes have already been ported over previously.

With intelligent merging, the system remembers which changesets have been ported to each branch. So it can automatically avoid repeated merges.

It probably refers to the merging of a "sandbox" sub-project, where, say new functionality is being added to a project, with the main project code tree. That way, new functionality can be added and tested seperately without affecting the stability of the main project code. Once it is ready, this stream can be merged into the main project. Just a guess.

Was there some reason not to include the project URL in this article? Not that it took long to find with Google, but it does seem like an obvious oversight. The URL in question is http://subversion.tigris.org/.