Setting Up Django-Rcsfield

As I was continuing to work on a project of mine, I slowly came into the realization that I was going to need to version some of the data I was dealing with. After spending some time considering the possibilities, I remembered a tweet from Kevin a few weeks back about django-rcsfield supporting just that. (I can't find the specific tweet where he mentioned it, sorry.)

So, I embarked off onto a slight journey to get django-rcsfield installed and working correctly. At project's wiki, there is a quick start guide, but it wasn't quite enough to get me started, so I'll share my notes from the process.

(I think my favorite thing about Leopard is that the site-packages
directory is finally in an easy to remember location. Adding packages
to the site-packages dir used to be like looking for eggs on
Easter morning.)

I used the backend based on git-python, which requires that you have Git installed, as well as git-python.

The django-rcsfield project also supports using Bazaar
and Subversion backends as well. Looking at the code,
adding support for Mercurial would probably be a 1-3
hour hack session depending on how many things went
horribly wrong. (Mostly it would take that long because
there doesn't seem to be much documentation for using
Mercurial from the command line, although I believe
the source itself is well documented.)

Now open up your project's settings.py file and add
these lines:

RCS_BACKEND='gitcore'# name of backend moduleGIT_REPO_PATH=os.path.join(ROOT_PATH,'git')

Notice that the value for RCS_BACKEND is the
name of the module containing the backend you want
to use. Since it wasn't clear what to put there,
I initially specified 'git', which lead to a
bewildering debug session where I tried to find
the error in rcsfield.backends.gitcore,
when that file wasn't actually being loaded. Instead it was loading the actual git-python module (git), but
they both have a .commit attribute, which
helped perpetrate the confusion in my mind.
(Of course, if I had carefully read the error
message to begin with, then I would have realized
what was going wrong right away, woops.)

If you want to use the Bazaar backend, you would
specify:

RCS_BACKEND='bzr'BZR_WC_PATH='some/path/here'

and if you wanted to use the Subversion backend:

RCS_BACKEND='svn'SVN_WC_PATH='some/path/here'

Note, though, that it is not necessary to initialize the
repository yourself, at least for the Git backend (and
I assume for the others as well).

Now you will need to modify your model to include
an RcsTextField. (Note that the QuickStart Guide
provides an excellent snippet covering this part
of configuration.) Something like:

Anything you can do without RcsTextField, you can do
with RcsTextField. It really just slots in cleanly.
If you already have a custom manager, you could
simply chose a different name for the RevisionManager

objects=MyCustomManager()revisioned_objects=RevisionManager()

and just remember to use the appropriate one.

I can't strictly say that you must reset your app at this
point, but I didn't have any success otherwise. Fortunately,
you can dump your existing data out, and reload it afterward,
and the reset won't cause too many issues.

To be clear: I changed a model from using TextField to using
RcsTextField, dumped out the data, reset the app, and was able
to load the data back in successfully without editing the
data dump. This means that replacing a TextField with an
RcsField is quite painless (and it makes sense, since in the
database table they are both stored in the same way, RcsField's
magic happens outside of the database).

Also, it is necessary to run syncdb as well, even if you are
not necessarily adding any new structures to the database.

At this point everything should work, unless you're
using OSX. For OSX, Git wasn't able to properly locate
the files that it was creating in order to add them
to the repository. I worked around that by editing
line 71 in rcsfield/gitcore.py.

I changed line 71 from:

repo.git.add(key)

to

repo.git.add(os.path.join(self.repo_path,key))

I was initially tempted to submit a patch, but it
turns out that the original code performs correctly
on Ubuntu, and instead I suspect that the issue should
be patched in either git-python or git itself.
Then again, it is equally possible that this is simply
a symptom of different versions of git being installed
from Port and Aptitude.

So there is something screwy occuring, but it would
involve some investigation to figure out exactly
what/when/where/who/why and most importantly how to
fix it appropriately.

Now when you edit the rcsfields, you should see changes
occur in the Git repository you specified in GIT_REPO_PATH.
Success is sweet. (Unless you ran into different issues
and are angry. Then my success is bitter. I swear.)

Performance Thoughts

My current feel is that the performance here is slightly
worse than desirable. The commit to the rcs backend occurs
via a post-save signal, so when we consider a request like
this:

We not only save the object to the database, the post_save hook
is being activated as well, before the HttpResponse is sent.
Because the database still maintains the current version of the
field, it isn't necessary for the update to the repository to
be finished before the response is returned to the user
(I'll concede that there may be some applications in which
this is not true, but I suspect for the vast majority it is).

This means that saving an object with an rcsfield is
substantially slower than saving an object without one.
Whether or not that is an issue depends heavily on your
user interface and overall application's design, but it
is worth keeping in mind.

As for me, if redesigning the UI doesn't sufficiently
relieve the feeling of slowness, then I have two thoughts
on how to modify the current solution to provide a faster
feel:

Throw the saving to repository into its own thread.
Threads are evil, yada yada yada, but this seems like
a situation where threading out the commit would
provide a substantial boost in responsiveness.
(This is the 'kind of hacky but might be great'
solution.)

Add the commit to a task queue whose tasks are handled
by a different service. (This is the "we're scability
engineers, wee" solution.)

Anyway, django-rcsfield is a great little project, and
I highly recommend playing around with it a bit and seeing if
you have any projects that might benefit from adding it to
the mix.

Hi folks. I'm Will, known as @lethain on Twitter.
I write about software and management topics,
and love email at lethain[at]gmail.
Get email from me by subscribing to
my weekly newsletter.