CVS homedir

I keep my life in a CVS repository. For
the past two years, every file I've created and worked on, every
e-mail I've sent or received and every config file I've tweaked
have all been checked into my CVS archive. When I tell people about
this, they invariably respond, “You're crazy!”

After all, CVS is meant for managing discrete bodies of code,
such as free software programs that are worked on and available to
a lot of people or in-house projects that are collaboratively
developed by several employees. CVS has a reputation of being a
pain to deal with, and it has a lot of crufty bits that regularly
drive users up the wall, like its mistreatment of directories. Why
inflict the pain of CVS on yourself if you don't have to? Why do it
on such a scale that it affects nearly everything you do with your
computer?

I get three major benefits from keeping my whole home
directory in CVS: home directory replication, history and
distributed backups. The first of these is what originally drove me
to CVS for my whole home directory. At the time, I had a home
desktop machine, two laptops and a desktop machine at work.
Rounding this out were perhaps 20 remote accounts on various
systems around the world and many systems around the workplace that
I might randomly find myself logging in to. I used all of these
accounts for working on the same projects and already was using CVS
for those projects.

I'm a conservative guy when it comes to my computing
environment (I've used the same wallpaper image for the past five
years), and at the same time I'm always making a lot of little
tweaks to improve things. Whenever I go to work and something
wasn't just like I had tweaked it the night before, I'd feel a
jarring disconnect, and annoyingly copy over whatever the change
was. When I sat down at some other system at work, to burn a CD
perhaps, and found a bare Bash shell instead of the heavily
customized environment I've built up over the past ten years, it
was even worse. The plethora of environments, each imperfectly
customized to my needs by varying degrees, was really getting on my
nerves. So one day I cracked and sat down and began to feed my
whole home directory into CVS.

It worked astonishingly well. After a few weeks of tweaking
and importing I had everything working and began developing some
new habits. Every morning (er, afternoon) when I came into work,
I'd cvs up while I read the morning mail. In the
evening, I'd cvs commit and then update my
laptop for the trip home. When I got home, I'd sync up again, dive
right back into whatever I'd been doing at work and keep on rolling
until late at night—when I committed, went to bed and began the
cycle all over again. As for the systems I used less frequently,
like the CD burner machine, I'd just update when I got annoyed at
them for being a trifle out of date.

It only took a few more weeks before the advantage of having
a history of everything I'd done began to show up. It wasn't a real
surprise because having a history of past versions of a project is
one of the reasons to use CVS in the first place, but it's very
cool to have it suddenly apply to every file you own. When I broke
my .zshrc or .procmailrc, I could roll back to the previous day's
or look back and see when I made the change and why. It's very
handy to be able to run cvs diff on your kernel
config file and see how make xconfig changed it.
It's great to be able to recover files you deleted or delete files
because they're not relevant and still know you've not really lost
them. For those amateur historians among us, it's very cool to be
able to check out one's system as it looked one full year ago and
poke around and discover how everything has evolved over
time.

The final major benefit took some time to become clear. Linus
Torvalds once said, “Only wimps use tape backup:
real men just upload their important stuff on
FTP and let the rest of the world mirror it.” I'm not a real
enough man to upload my confidential documents to
ftp.kernel.org though, so
I've been wimping along with backups to tape and CD and so on. But
then it hit me: take, for example, one crucial file, like my .zshrc
or sent-mail archive: I had a copy of that file on my work machine,
and on my home machine, and on my laptop and several other copies
on other accounts. There was another copy encoded in my CVS
repository too.

I'm told that the best backups are done without effort—so
you actually do them—and are widely scattered among many machines
and a lot of area so that a local disaster doesn't knock them out.
They are tested on a regular basis to make sure the backup works. I
was doing all of these things as a mere side effect of keeping it
all in CVS. Then I sobered up and remembered that a dead CVS
repository would be a really, really bad thing and kept those wimpy
backups to CD going. But the automatic distributed backups are what
keep me sleeping quietly at night. Later, when I left that job, the
last thing I did on my work desktop machine was: cvs
commit ; sudo rm -rf /. And I didn't worry a bit; my life
was still there, secure in CVS.

A full checkout of my home directory with all the trimmings
often runs about 4GB in size. A lot of that will be temporary trees
in tmp/ and rsynced Ogg Vorbis files (so far, I have not found the
disk space to check all of them into CVS). My CVS repository
currently uses less than 1GB of space, though it is steadily
growing in size. I keep some 13,000 files in CVS, and so a full CVS
update of my home directory is a sight to see and takes a
while.

These days I'm often stuck behind a dial-up connection, and I
mostly just use one laptop, so I might go days between CVS updates.
Other better-connected systems have automatic CVS updates done via
cron each day. I cvs commit whenever I want to
make a backup of where I am in a file or when I am at the point of
releasing something. I still also do a full commit of my home
directory every day or so. I confess that some of my CVS commit
messages are less than informative—“foo” has been used far too
many times on some classes of files. I even do some automatic CVS
commits; for example, my mailbox archives are committed by a daily
cron job.

There are other benefits of course. I attend many tradeshows
and other events that require me to sit down at some computer out
of the box, use it for an hour or a day and never see it again. I
can check out the core of my CVS home directory in about five
minutes, and after that it is just as comfortable as if I'd SSH'd
home and was doing everything there. I even get my whole desktop
set up in that five minutes. In a chaotic tradeshow environment,
there is nothing more reassuring than having your familiar computer
setup at your fingertips as you demo things to the hordes of
visitors.

Keeping your home directory in CVS is not all fun though.
Anyone who's used CVS in a large project probably has had to
resolve conflicts engendered by two people modifying the same file.
At least you can curse the other guy who committed the changes
first while you deal with this annoying task. Most of you have
probably not had to resolve conflicts between
the file you modified at home and at work, then cursing at
yourself.

Then there are CVS's famous problems: poor handling of
directories and binary files. The nearly nonexistent handling of
permissions, which is not a big deal in most projects but becomes
important when you have a home directory with some public and some
private files and directories in it. A slow, bloated protocol,
hindered even more by the necessity of piping it all over SSH; the
pain of trying to move a file that is already in CVS, or much
worse, a whole directory tree, again hits you especially hard when
you're using CVS for the whole home directory. And those damn CVS
directories are always cluttering up everything. I've developed
means of coping with all of these to varying degrees, but like many
of us, I'm hoping for a better replacement one day (and dreading
the transition).

Perhaps it's time that I get down to the details of how I
organize my home directory in CVS. I've always managed my home
directory with an iron hand, and CVS has just exacerbated this
tendency. Let's look at the top level:

Yes, that's it. Well, except for the 100-plus dot files. Most
people use their home directory as a scratch space for files
they're working on, but instead I have a dedicated scratch
directory, the tmp directory, which I clean out irregularly. In
general, when I start a new file or project, I will be checking it
into CVS soon, so I begin working on it in the appropriate
directory. This document, for example, is starting its life in the
html directory and will be checked into CVS soon to live there
forever. Of course, sometimes I goof up and then I have to resort
to the usual tricks to move files in CVS. And so the first rule of
CVS home directories is it pays to think before starting and get
the right filename and location the first time. Don't be too
impatient to check in the file.

CVS is a great way to ensure that you have a nice, clean,
well-managed home directory. Every time I cvs
update it will helpfully complain to me about any files
it doesn't know about. Of course, I make heavy use of .cvsignore
files in some directories (like tmp/).

If I go to another machine, the home directory looks pretty
much the same, though various things might be missing:

joeyh@auric:~>ls
CVS/ GNUstep/ bin/ tmp/

I use this machine for occasional specific shell purposes. I
don't administer the system, so I don't want to put private files
there. The result is a much truncated version of my home directory.
It's perfectly usable for everything I normally do on that machine,
and if I want to, say, work on this document there at some point, I
can just type cvs co html and a password and be
on my way.

The way I make this partial-checkouts system work is by using
CVS modules and aliases. I have modules defined for each of the
top-level directories and for the home directory (dot files)
itself. For example, the entry in my CVSROOT/modules file for the
stripped-down version of my home directory looks like this:

Notice the .hide module. It results in a ~/.hide directory when I
check it out. This directory is where I put the occasional private
file that I don't want to appear in home directories—like the one
on auric—that are on systems not administered by me. The files in
.hide get hard-linked to their proper locations if .hide is checked
out, so I can put confidential dot files in there and only check
those dot files out on trusted systems. I also have, for example,
my Mozilla cookies file in .hide.

It's important to distinguish between such files that I need
to put in .hide and the entire set of private directories, like my
mail directory. Yes, I keep my mail in CVS (except for just-arrived
spooled mail, which I keep synced up with a neat little program
called isync that is smarter about mail than CVS is). But it's all
in its own mail/ directory, so I can omit checking that directory
out to systems that I don't trust with my mail or that I don't want
to burden with hundreds of megabytes of mail archives.

While I'm discussing privacy issues, I should mention that I
make some bits of my home directory completely open to the public.
This includes a lot of free software in debian/ and src/, and some
handy little programs in bin/. This is accomplished by permissions.
I have to make sure that most directories in the repository (or at
least the top-level directories like mail/) are mode 700, so only I
can access them. Other top-level directories, like bin/, are opened
up to mode 755. This allows anonymous CVS access and browsing at
cvs.kitenet.net/joey-cvs/bin/.

This leads to the second rule of CVS home directories: don't
import $HOME in one big chunk; break it up into multiple modules.
The structure of your repository need not mirror the structure of
your actual home directory. Modules can be checked out in different
locations to move things around and control access on a per-module
level. There's a layer of indirection there, and such layers always
make things more flexible and more complex.

Some of the projects I work on have their own CVS
repositories that are unconnected to my big home directory
repository. That's fine too; I simply check them out into logical
places in my home directory tree as needed. CVS can even be tweaked
to recurse into those directories when updating or
committing.

Another thing to notice in those lines from my modules file
is the use of -u cvsfix to make the cvsfix
program run after CVS updates. That program does a lot of little
things, including ensuring that permissions are correct, setting up
the hard links to files in .hide and so on.

One last thing to mention is the issue of heterogeneous
environments and CVS. Most of my accounts are on systems running
varying versions of Debian Linux on a host of different
architectures, but there are accounts on other distributions, on
Solaris and so forth. Trying to make the same dot files work on
everything can be interesting. My .zshrc file, for example, goes to
great pains to detect things like GNU ls, deals with varying zsh
versions, sets up aliases to the best available editor and other
commands and so on. Other programs, like .xinitrc, check the host
they're running on and behave slightly (or completely) differently.
I've even at one point had a .procmailrc that filtered mail
differently depending on hostname, though the trick to doing that
is lost somewhere in one of the innumerable versions stored in my
repository. I've even resorted in a few places to files with names
of the form filename.hostname—cvsfix finds one matching the
current host and links it to the filename. Branches are also a
possibility, of course, but despite my heavy use of CVS, I still
find some corners of it a black art.

Well I guess that's it. I'd be happy to hear from anyone else
who keeps their home directory in CVS, especially if you have some
tricks to share. In the future I'd like to try checking /etc into
CVS too, and if you've successfully done this, I'd love to talk
with you. Now I'm off to commit this file.

Joey Hess
(joey@kitenet.net) is
a longtime Debian developer who lives on a farm in Virginia. He
enjoys finding new and unlikely places from which to commit code
wirelessly to CVS.

I've read a lot of comments to rsync this and try that. And that's all good... if you run everything on linux. What isn't mentioned here is this. What happens when you plop down on a Winblows box? That disk with a basic command line CVS (or flash drive if you wanna be creative) can be a life saver, or if you've installed a web interface for your CVS to restore some file you've been working on and want to show to a friend, prospective employer, or windows user who wants to know what you can do with linux (and you won't have to install cygwin on their machine). Also, don't forget that not many tools have the web-usage of CVS. I've yet to see (correct me if I'm wrong, and I could be) rsync have any online tool that allows you to edit that important "report" you have to turn in when you are on the road without your laptop. With CVS and web interface installed, and having some innovative cron scripts, you can update the file, and update an auto-email script from any i-net kiosk. Your home/office machine can then CVSup (on the hour), and another cron can check for your auto-email file, and send off the e-mail via your home account, even rotate the CVS password if you are paranoid about kiosk security. All without making any direct connection to the machine doing the work.

Nothing says "Linux is King!" like Yet-Another-GPL-MS-Can't-FUD.

While I do agree with the right tool for the right job, using 50 tools for 50 slightly different jobs is both the great extensibility of linux (God knows I couldn't live without rsync... and yes, it's best to tunnel it), and yet the downfall to date of linux for the home user. Here's a tool that you can easily use on linux, windows, or the web to syncronize everything in one fell swoop. I don't wanna start a Gimp v. Photoshop debate here, but CVS on a website certainly benefits from importing on a windows boxes and linux boxes depending on user preference. No doubt the same can be said of one's desktop, especially for those with 500 icons on their desktop would could use a little organization. ;)

I could really go on about the possible uses of this that I now realize, that all of those tools listed "could" do, but not in one package (excluding VNC that is... but come on, heh). But, I digress...

Great article. It's so awesome to see long time linux people get the "Ahhhh!" effect like when they booted their first distro.

What about using a distributed filesystem solution (DFS) such as AFS or CODA? CODA is made for detached operations - though I'm not sure you can limit which files get replicated where (ie: on machines you don't trust). Less hassle.

Well, you talk about CVS, and it's very clever of your mind to have thought about it. I like these minds that dare launching unusual concepts or links for the fun and for the curiosity.

But, I've heard about rdiff-backup. It's just a tool for people who wants to backup things incrementally, based on librsync. It's currently still under development. Why rdiff-backup ? Because, as you told it, CVS does � not� manage everything elengantly such as dirs or binary files. The conceptor of rdiff-backup wants to do his best for that problem. I suppose rationnaly you won't try it, not because you should be stupid but because you used so much energy to learn CVS very deeply and it should be quite a non sense to trash what you've learned with it (we have not more than 24 hours a day, have we ?). But what think the other guys about this alternate solution ?

I've also been doing this for a few years now, and it works very well for me.

Rather than run my own CVS server, I am a key contributor to tigris.org (an open source software engineering community). Key contributors can host personal private projects there. So, my life is organized in CVS, my own personal bugzilla, and several mailing lists with archives. I am not using subversion yet, but I expect tigris.org to be the first public project hosting site to offer subversion as a standard feature when it is ready for that.

I've been doing this for years (ain't I cool?), including config's in /etc--though I don't add everything, just the replicable stuff and that which I don't wanna lose. Yeah, CVS is farfrumperfekt, but it's better than copying things piecemeal. But in /etc especially, I was tripping on files which were specific to a particular host.

So I developed a system of naming those files according to the specific hostname, as:

http.conf@www

Then I wrote a script that creates symlinks to files bearing the local hostname (http.conf --> http.conf@www). It will prompt for each new file, and if the file already exists (without the @www), it will even show a diff and let me choose which one to use.

This seems better to me than trying to maintain separate branches in CVS.

The issue of Symlinks is actually a general problem for storing directory trees in CVS. Sometimes the symlinks are important information that needs to be preserved just like regular files. Auto creating them with a script is one option, another that I might suggest is using a small Apache Ant Build file and the task to persist the symlink information in a properties file (which CAN be version controled). The only drag is that you need to regularly run the build to record the links or the info is outdated, but you could cron that.

I currently use this task to maintain a website that makes heavy use of symlinks. This does of course require an additional tool... Apache ant 1.6+ but you don't have to maintain the script or conform to conventions about how you name your links.

It's been a while since I read an equally silly article. If you need versioning for your files, use Plone. http://www.plone.org

It has that and a host of other features, it allows you to have all your files available through a web browser and it makes back ups much easier. Why? Plone keeps versioning of your files, but because it is a database system system on top of the Zope appliatication server, when it comes to doing a backup, you only have to worry about a single file to backup: the plone database that contains all your other writing and files.

I love when people overengineer their solutions such as you just did. It is so funny that I can hardly type this.

A troll, surely. Your suggested alternative is to install Python, then Zope, and then Plone? A web-based solution that will be considerably harder to automate than CVS? A solution that is targetted at content management rather than simple file versioning? Sorry, you must be using a different definition of "overengineer" to the rest of us.

No I wasn't trolling. It takes 20 seconds to install it. Python is included with most distributions. After that, just intall your plone rpm and start the process.

It is much more powerful, and much easier to user. Of course, you would never now because you would rather call me a troll.

And because it is all accessible through a web browser, you can leave a computer running at home and have access to all the files. Not to mention that you can annotate your files, provide meta key works for them...

Even if you just want to synchronize, then run plone on two separate computers and rsync one single file, the Data.fs store.

Actually, it's a completely inappropriate tool for the job. Just because it has version control does not make it suited to versioning your home directory. A number of the "features" you cite is exactly the opposite of a feature the author uses and needs.

Ever hear of rsync? This situation is exactly what it's made for. CVS is overkill here.

rsync will mirror your directories to multiple machines, do incremental mirrors (only update new files or files that have changed), and can make the mirrors identical, i.e. delete files from the server that you have deleted locally (optional parameter).

Also, if you want to go on the server and look at the mirrored files, they exist exactly as they do on your laptop or workstation--not tied up in a CVS archive. Another thing is that you can use it to mirror between machines, such as your workstation and laptop, without having to go through a CVS checkout.

Now, you won't get histories and tagging, but so what? If I have something I want to track changes on (code, documents), then I put it into CVS. The right tool for the right job.

Rsync, particularly with the "--link-dest" feature of new versions, is great for backups.

Quick explanation: in addition to a source and destination locations, you can specify a third "link_dest" location. If a file in the dest would be identical to one in the link_dest rsync does a hard link to the link_dest copy instead of wasting disk space on a new copy of old data . The cool use is to have ones previous backup be the link_dest for the new backup. The result is that each incremental backup is complete, but unchanged files are shared between consecutive backups, only changed files need new data space.

I also specify some files not to backup, and the result is that since I put on Redhat 9 I have a complete, online, incremental backups of my entire machine. When I eventually upgrade to a new distribution I will be able to revert to RH 9 if I need to. Or, simply grab the stuff I need.

Mix software raid 0 with rsync backups to a different (normally readonly) partition, and one can have convenient impressive reliability in a single box. Take advantage of rsync's ability to operate efficiently and securely over a network and off-site backups are easy too.

Also, the script to do the backups can be little and the result easy to use without any special software.

-kb, the Kent who is maintaining a computer on the other coast with the help of this tool.

Just so people aren't confused - raid 0 is strictly for speed, and does not protect your data from a hard drive failure. Raid 1, 5, and other configurations will offer less speed advantage, but will protect you from a single drive failure. I'm sure the previous poster just made a typo.

I'm disappointed that the article didn't discuss how the author overcame directory, binary, and symlink problems inherit in CVS.

I "solved" the problem of needing different dot files on different machines by creating a branch in CVS. So I can check out the "home" branch on my home machine, and the "work" branch on the work machine, etc.

You have to worry about merging changes between the different machines branches now, but you don't need to worry about how to make the same config file work in a bunch of (slightly?) different environments.

Excellent article! This was truly innovative thinking and really got me excited to try it out. Please keep up publishing new and exciting articles like this for the future! I wonder if there are decent GUI's for subversion or other CVS-like apps that might make doing things like this super easy? This might be a very nice way to do "hands-off" enterprise desktop installs. The uses of this are only limited by the mind. :)

On Qt/KDE, eSvn is a good solution, especially when used along with Kompare. Cervisia is also probably quite good. Finally, KDESVN is probably set to be the best option, but I don't think it's there yet. Meld will probably be part of any GNOME solution. For SVN, I just use the shell for commits, adds, removes, renames, but with kwrite set as my $EDITOR, and piping svn diff into kompare.

i agree, this is a great idea - though i don't do all that he does. i only put my bashrc, vimrc, et al in CVS. then, i do an update whenever and have new login files which work on most unixes and versions of the common utils. if i add a nice new alias, it'll eventually get propagated.

i do think thowing huge photoshop files into CVS might be a bad idea. and what about mac osx's "bundles", like .app? you'd have to recursively add/import each file for a program manually - would be weird.

I suppose I should mention that these days I keep most of my home directory in subversion. I have not gotten around to writing a successor to this article yet, but it works even better than cvs, and that's probably the most common question people ask me about this article these days.

This is actually a really really good idea. As an advocate and user of CVS, I dont know why I hadnt thought of this before...... its perfect for homedir stuff (esp. dotfiles like .Xresources and such).

Looks like I'll be implementing a CVS server on my network when I move into my new place. :)

When I tell people about this, they invariably respond, ``You're crazy!''

Hey Joey, they're the ones that are crazy. I use CVS for all locally grown utilities, /etc/profile.d files, /etc/conf files, etc. (no pun intended). Building a new system is as easy as
o Install from scratch
o Restore cvs repository
o cvs checkout everything
o run MyInstall, a script that walks the sandbox looking for install.ini files and does copies and whatever else is needed. This is homegrown.
o Restart xinetd and off we go.
Keep up the good work