Recently in Version Control Systems Category

I started to use Subversion one year ago and liked the elegant file-system design a lot. Soon it became impossible for me to go back to
CVS. This means that I felt uncomfortable whenever I was working on projects
using CVS, and I wanted to see a tool to keep my Subversion repository
in sync with a CVS repository. This would not only save me time
importing snapshots into vendor branches, but it would also give me the
whole history when I'm not online.

I found Barrie's VCP and wrote a Subversion driver. Then I
understood why people said Subversion was slow. My driver invoked the
svn command, and it took something like 30 hours to convert from a CVS
repository that resulted in 3000 revisions in the Subversion
repository.

Fortunately the Subversion developers made the code easy and ready for
wrapping into different languages using SWIG. At that time, only Python
bindings were implemented, so I had to do the Perl bindings myself.
With the Perl bindings implemented rapidly, VCP gets much faster, and I
also started writing SVN::Mirror,
a module that enables mirroring
between Subversion repositories. When I felt bored, I would add
Subversion back-end support to tools like Bloxsom and Kwiki.

Then the season for traveling came. As I'm far more productive and
creative while disconnected from the Internet, I realized I need a
distributed version control system, and decided to give myself a year
break to develop such a tool to enable me to be even more productive in
the future. svk was born soon after my birthday in September 2003.

Why?

There are other distributed version control systems available: Arch,
monotone, darcs. The functionalities they offer are more or less
equivalent.

svk, however, is written in Perl, and so might be more hackable by a
large community. svk also has a set of commands similar to those of cvs. On
top of this, svk plans to implement transparent interpolation between
different version-control systems.

As I don't see any strong argument suggesting one system over another,
it's really up to you to try and decide.

Design Decisions

Subversion has a layered design:

fs: Underlying versioned tree library using bdb.

repos: Higher-level support for the fs, like log messages.

ra: Repository access. Abstracted protocol handlers.

wc: Working copy handling.

client: Implements the commands for clients.

The first design decision was to drop the wc and
ra layers. Elaborating the Subversion design mentality:
"Bandwidth is expensive, disks are cheap," we should really keep a
local copy for every revision -- and SVN::Mirror is
already available for such purposes.

Having everything in a local repository, we don't need anything like
the bloated wc implementation at all. The
wc library not only has the .svn metadata directory
to confuse your favorite utilities like diff and grep, but also stores
a text-base that makes your checkout twice the size of the actual
content. XD (which is the character-wise increment of wc) was written
to maintain checkout copies in a lightweight manner.

Next, I found the most important component of Subversion is not on the
above list. It's the delta library that defines the API for
describing tree deltas; this is definitely the core thing in
tree-based version control systems. It's called "Delta Editor."

For example, running a delta between revision 1 and revision 3 will
generate a series of method calls (add_directory,
open_file, apply_textdelta,
close_file, close_directory, etc.), to the
targeted editor object. These method calls describe the changes made
from revision 1 to revision 3.

While svk was self-hosting within two months of development, I started to
refactor the existing code to center around this interface. With Perl,
I could easily stack the editors together, making each editor do its own
job, adding arbitrary callbacks as extension to the API, and all of the fun
things you know you can do with Perl. Much of the functionality is
abstracted and it resulted in the following core components of svk:

An editor that receives delta calls to modify the checkout copy.

A function to generate delta calls for describing the modification
done to the checkout copy.

An editor that takes delta calls and merges them with a tree to generate
non-conflicting calls.

Together with these, the logic behind most of the commands became just
a question of gluing together a delta generator and the appropriate editors.

Additionally, with Perl's flexible PerlIO layers system, keyword
expansion (like $Id$ in cvs) was done within one
hour. The reusable part of this was abstracted out to the
PerlIO::via::dynamic module on CPAN.

Now let's see svk in action.

A First Look

I hate typing those long URLs when using Subversion. So mapping repositories
to shorter names is a must:

$ svk depotmap

This will help you create a default repository at
~/.svk/local, and you could refer to it by
// in the future. If you have a Subversion repository
on the disk, you could add another line: test:
'/path/to/repos'. Then you have immediate access to the
existing repository -- only with the shorter name /test/
instead of
file:///long-path-plus-auto-complete-wont-work.

Now let's put something in it:

$ svk import //project/vendor /path/to/project-0.01

This will do what you think: import things into
//project/vendor. Repeat the command with a newer version
of this project, say 0.02, you'll have a vendor branch tracked on the
path.

Like Subversion, branches and tags are implemented as cheap file
system copies:

If you have experience with cvs or Subversion, you'll find it
familiar when trying to add, modify, remove, or commit files. svk log
will give a change history of files or directories.

Suppose you import project-0.02 after branching trunk, and want to
merge the changes from the vendor branch. You just need to:

$ svk smerge //project/vendor //project/trunk

svk remembers branch and merge history, so it does things
automatically for you. If there are conflicts, just replace
//project/trunk with a checkout path such as
~/project/trunk. You will be able to see the
conflicts. Resolve them and commit once done. Merging is no more
painful.

Once merged, you could bring the checkout copy of your trunk to the latest
revision with svk update.

Working with Remote Repositories

As mentioned earlier, svk uses SVN::Mirror to handle
remote repository access. You need to mirror them before you can
use them:

Currently you need to set up a Subversion server (either using Apache2
or svnserve). See relevant articles or tutorials about it.

Now create a local branch, and prepare for traveling:

$ svk cp -m 'create a local branch' //project/trunk //project/local

You could now check out //project/local and work on it
just as above. Of course you could still create your own branch
with cp //project/local //project/new-feature.

Use svk sync to sync the latest trunk when connected. Merging the new
changes from trunk to your local branch works just like the previous
example of merging from a vendor branch. How about merging your local
changes back to the remote repository?

$ svk smerge //project/local //project/trunk

Transparent, isn't it?

You should use smerge -C in advance to check if there are conflicts.
Even if your local branch is not merged from the latest trunk, svk will
merge the changes for you and commit to the remote repository directly,
provided there's no conflicts. But be sure to sync the latest trunk first.

In fact, if you are online and about to commit a minor change, you
could forget about the process "modify on local branch, then merge
back." Just do:

$ svk switch //project/trunk
$ svk commit

This first line means we now switch from the local branch to trunk,
which is the path containing the mirrored archive. The switch command
will keep your local change and apply it to trunk as if those
modifications are done to a checkout of trunk. Then the svk
commit on the mirrored path will just commit the changes
directly to the remote server and then sync the path for you. If
the server is temporarily unavailable, just switch back to local and
merge back later.

You could also merge individual changes. Find the change number you
want with svk log, and use:

$ svk cmerge -c 113,125-128,130 //project/trunk //project/stable

Now if you are working on projects where you don't have the permission to
commit, you could easily generate a diff and submit it to the author:

$ svk diff //project/trunk //project/local

Working with Multiple Repositories

Many people track development of several projects. Once you use svk
to mirror the projects, you can run svk sync -a to sync all of
them.

Now suppose another hacker uses svk and adds a feature to the project
and publishes his own branch, and you wish to experiment with or utilize
his feature:

You could also use the cmerge command described above to merge
specific changes only from that new-feature branch.

This is the minimum case of the distributed development model. The
idea is that everyone could create his private branch of the product
and then to be merged back by the maintainer. There have been arguments
against such model, but I am not going into them
here. Although tools somehow promote certain models to solve problems,
we change the model or just use another tool when we have to.

There are several features planned in the near future:

Changeset signing and verification

Signing the modified file in a commit with gpg. This is
already done; it's just that the SVN::Mirror side
hasn't been able to propagate and verify the signatures.

VCP integration

This would enable mirroring (and thus branching) from alien version
control systems, like cvs or perforce. Imagine:

$ svk mirror //foo/fromcvs cvs://cvs.server/foo@trunk

This will make me (and perhaps other people) more comfortable
when working with projects which use other version control systems,
and also less confused when switching between different command sets
for working with different projects.

Patch manager

Non-committers can already easily generate the diff as shown
above. While it would be good to register a merge history with the
patch manager, large projects that need to merge many
developer-submitted patches would find it handy to
have a feature which allowed a review, then test, then click-to-apply
for a particular change.

The development of svk is rather rapid, so expect them coming soon!

Conclusion

Five months after the birth of svk, it had become a fast,
full-featured distributed version control system. This is possible
mainly because the flexibility of Perl and the spirit of Perl -- use
something that exists to create new things. Besides, the commands are
designed to DWIM!

If you find it interesting, get a copy from the home page and install it
just like any other Perl module. Hopefully I'll then receive your
complaints, make svk better, and make the open source world more
productive.