Version-Control Systems

Code evolves. As a project moves from first-cut prototype to
deliverable, it goes through multiple cycles in which you explore
new ground, debug, and then stabilize what you've accomplished. And
this evolution doesn't stop when you first deliver for production.
Most projects will need to be maintained and enhanced past the 1.0
stage, and will be released multiple times. Tracking all that
detail is just the sort of thing computers are good at and
humans are not.

Why Version Control?

Code evolution raises several practical problems that can be
major sources of friction and drudgery — thus a serious drain on
productivity. Every moment spent on these problems is a moment not
spent on getting the design and function of your project right.

Perhaps the most important problem is
reversion. If you make a change, and discover
it's not viable, how can you revert to a code version that is known
good? If reversion is difficult or unreliable, it's hard to risk
making changes at all (you could trash the whole project, or make many
hours of painful work for yourself).

Almost as important is change tracking. You
know your code has changed; do you know why? It's easy to forget the
reasons for changes and step on them later. If you have collaborators
on a project, how do you know what they have changed while you weren't
looking, and who was responsible for each change?

Amazingly often, it is useful to ask what
you have changed since the last known-good
version, even if you have no collaborators. This often uncovers
unwanted changes, such as forgotten debugging code. I now do this
routinely before checking in a set of changes.

--Henry Spencer

Another issue is bug tracking. It's quite common
to get new bug reports for a particular version after the code has
mutated away from it considerably. Sometimes you can recognize
immediately that the bug has already been stomped, but often you
can't. Suppose it doesn't reproduce under the new version. How do you
get back the state of the code for the old version in order to
reproduce and understand it?

To address these problems, you need procedures for keeping a
history of your project, and annotating it with comments that
explain the history. If your project has more than one developer,
you also need mechanisms for making sure developers don't overwrite
each others' versions.

Version Control by Hand

The most primitive (but still very common) method is all
hand-hacking. You snapshot the project periodically by manually
copying everything in it to a backup. You include history comments
in source files. You make verbal or email arrangements with other
developers to keep their hands off certain files while you hack
them.

The hidden costs of this hand-hacking method are high,
especially when (as frequently happens) it breaks down. The
procedures take time and concentration; they're prone to error, and
tend to get slipped under pressure or when the project is in
trouble — that is, exactly when they are most needed.

As with most hand-hacking, this method does not scale well. It
restricts the granularity of change tracking, and tends to lose
metadata details such as the order of changes, who did them, and why.
Reverting just a part of a large change can be tedious and time
consuming, and often developers are forced to back up farther than
they'd like after trying something that doesn't work.

Automated Version Control

To avoid these problems, you can use a version-control
system (VCS), a suite of programs that automates away most of
the drudgery involved in keeping an annotated history of your
project and avoiding modification conflicts.

Most VCSs share the same basic logic. To use one, you start by
registering a collection of source files —
that is, telling your VCS to start archive files describing their
change histories. Thereafter, when you want to edit one of these
files, you have to check out the file —
assert an exclusive lock on it. When you're done, you check
in the file, adding your changes to the archive, releasing
the lock, and entering a change comment explaining what you
did.

The history of the project is not necessarily linear. All VCSs
in common use actually allow you to maintain a tree of variant
versions (for ports to different machines, say) with tools for merging
branches back into the main “trunk” version. This
feature becomes important as the size and dispersion of the
development group increases. It needs to be used with care,
however; multiple active variants of the code base can be
very confusing (just associated bug reports to the right version
are not necessarily easy), and automated merging of branches does
not guaranteed that the combined code works.

Most of the rest of what a VCS does is convenience: labeling,
and reporting features surrounding these basic operations, and tools
which allow you to view differences between versions, or to group a
given set of versions of files as a named release
that can be examined or reverted to at any time without losing later
changes.

VCSs have their problems. The biggest one is that using a VCS
involves extra steps every time you want to edit a file, steps
that developers in a hurry tend to want to skip if they have to be
done by hand. Near the end of this chapter we'll discuss a way to
solve this problem.

Another problem is that some kinds of natural operations tend to
confuse VCSs. Renaming files is a notorious trouble spot; it's not
easy to automatically ensure that a file's version history will be
carried along with it when it is renamed. Renaming problems are
particularly difficult to resolve when the VCS supports
branching.

Despite these difficulties, VCSs are a huge boon to productivity
and code quality in many ways, even for small single-developer
projects. They automate away many procedures that are just tedious
work. They help a lot in recovering from mistakes. Perhaps most
importantly, they free programmers to experiment by guaranteeing
that reversion to a known-good state will always be easy.

(VCSs, by the way, are not merely good for program code; the
manuscript of this book was maintained as a collection of files
under RCS while it was being written.)

Unix Tools for Version Control

Historically, three VCSs have been of major significance in the
Unix world, and we'll survey them here. For an extended introduction
and tutorial, consult Applying RCS and SCCS [Bolinger-Bronson].

Source Code Control System (SCCS)

The first was SCCS, the original
Source Code Control System developed by Bell Labs around 1980 and
featured in System III Unix. SCCS
seems to have been the first serious attempt at a unified source-code
management system; concepts that it pioneered are still found at some
level in all later ones, including commercial Unix and Windows
products such as ClearCase.

SCCS itself is, however, now
obsolete; it was proprietary Bell Labs software. Superior open-source
alternatives have since been developed, and most of the Unix world has
converted to those. SCCS is still in use to
manage old projects at some commercial vendors, but can no longer be
recommended for new projects.

No complete open-source implementation of
SCCS exists. A clone called CSSC
(Compatibly Stupid Source Control) is in development under the
sponsorship of the FSF.

Revision Control System (RCS)

The superior open-source alternatives began with RCS (Revision
Control System), born at Purdue University a few years after
SCCS and originally distributed with 4.3BSD
Unix. It is
logically similar to SCCS but has a cleaner
command interface, and good facilities for grouping together entire
project releases under symbolic names.

RCS is currently the most widely
used version control system in the Unix world. Some other Unix
version-control systems use it as a back end or underlayer. It is well
suited for single-developer or small-group projects hosted at a single
development shop.

The RCS sources are maintained and
distributed by the FSF. Free ports are available for
Microsoft operating systems and VAX VMS.

Concurrent Version System (CVS)

CVS (Concurrent Version System) began life as a front end to
RCS developed in the early 1990s, but the
model of version control it uses was different enough that it
immediately qualified as a new design. Modern implementations don't
rely on RCS.

Unlike RCS and
SCCS, CVS
doesn't exclusively lock files when they're checked out. Instead, it
tries to reconcile nonconflicting changes mechanically when they're
checked back in, and requests human help on conflicts. The design
works because patch conflicts are much less common than one might
intuitively think.

The interface of CVS is significantly
more complex than that of RCS, and it needs
a lot more disk space. These properties make it a poor choice for small
projects. On the other hand, CVS is well
suited to large multideveloper efforts distributed across several
development sites connected by the
Internet. CVS tools on a client machine can
easily be told to direct their operations to a repository located on a
different host.

The open-source community makes heavy use of
CVS for projects such as GNOME and
Mozilla. Typically, such CVS repositories
allow anyone to check out sources remotely. Anyone can, therefore,
make a local copy of a project, modify it, and mail change patches to
the project maintainers. Actual write access to the repository is more
limited and has to be explicitly granted by the project maintainers. A
developer who has such access can perform a commit option from his
modified local copy, which will cause the local changes to get made
directly to the remote repository.

You can see an example of a well-run
CVS repository, accessible over the
Internet, at the GNOME CVS
site. This site illustrates the use of
CVS-aware browsing tools such as Bonsai,
which are useful in helping a large and decentralized group of
developers coordinate their work.

The social machinery and philosophy accompanying the use of
CVS is as important as the details of the
tools. The assumption is that projects will be
open and decentralized, with code subject to peer review and
inspection even by developers who are not officially members of the
project group.

Just as importantly, CVS's
nonlocking philosophy means that projects can't be blocked by a lock
if a programmer disappears in the middle of making some changes.
CVS thus allows developers to avoid the
“single person point of failure” problem; in turn, this
means that project boundaries can be fluid, casual contributions are
relatively easy, and projects are not required to have an elaborate
hierarchy of control.

The CVS sources are maintained and
distributed by the FSF.

CVS has significant problems. Some
are merely implementation bugs, but one basic problem is that your
project's file namespace is not versioned in the same way changes to
files themselves are. Thus, CVS is easily
confused by file renamings, deletions, and additions. Also,
CVS records changes on a per-file basis,
rather than as sets of changes made to files. This
makes it harder to back out to specific versions, and harder to handle
partial check-ins. Fortunately, none of these problems are intrinsic
to the nonlocking style, and they have been successfully addressed by
newer version-control systems.

Other Version-Control Systems

CVS's design problems are sufficient
to have created demand for a better open-source VCS. Several such
efforts are under way as of 2003. The most notable of these are
Aegis and
Subversion.

Aegis
has the longest history of any of these alternatives, has hosted
its own development since 1991, and is a mature production system.
It features a heavy emphasis on regression-testing and
validation.

Subversion is
positioned as “CVS done right”, with the known design
problems fully addressed, and in 2003 probably has the best
near-term prospect of replacing CVS.

The BitKeeper project
explores some interesting design ideas related to change-sets and
multiple distributed code repositories. Linus Torvalds uses Bitkeeper
for the Linux kernel sources. Its non-open-source license
is, however, controversial, and has significantly retarded the
acceptance of the product.