thraxil.org:

Django Deployment with virtualenv and pip

A while back, I detailed how I went about deploying TurboGears apps with workingenv and supervisord. That's still the basic approach that I'm using for the TG apps that I still maintain and keep running. Lately though, I've been doing more of my work with Django and the landscape of Python deployment tools has changed significantly enough that I thought it was about time I explained how (and why) I'm deploying things these days.

Motivations

I still have the same basic needs. I write a lot of small applications that get deployed on one server and I don't want to spend all my time doing the upgrade dance when a new version of a library comes out. If I've got a bunch of apps written with Django 0.97 that are running and I want to deploy another app on the same server that's written against Django 1.0, I don't want to have to upgrade all my old ones (since there were some non backwards compatible changes introduced). Similarly, I don't want the presence of my old legacy apps preventing me from being able to use the latest versions of libraries for my new apps. Generally, this means that the approach of installing libraries into the global site-packages just does not work for me.

I should also mention that while the approach detailed in my old post uses workingenv's "--requirements" flag to give it a list of URLs of packages to install, I actually moved away from that quite a while ago instead opting to bundle all the needed eggs in a directory right alongside the application's code so the entire deployment process could be done without relying on the network. I just had too many problems with either PyPI or the TG website being unavailable at the wrong time and breaking my deployments. It also meant that I couldn't bootstrap a new environment if I was on a laptop without network access.

What's New?

My old post explained in some depth how I got around that problem for the TurboGears apps I was deploying by using workingenv to isolate each application's libraries in a way that I could just bundle the libraries needed for each along with the application and not have to worry about what else was on the system.

Things are slightly different now with Django for a couple reasons.

First, mod_wsgi has matured and, for my purposes, seems to be the best way to deploy Django applications. That means I'm back to Apache and don't need the whole setup with lighttpd proxying back to individual application web server processes each being monitored and controlled by supervisord.

Second, the Django community as a whole seems to have not bought into setuptools and eggs as the preferred distribution method and have stuck with distutils instead. My old approach relied on having all the required libraries bundled as eggs. With TurboGears, that was rarely an issue since TG was completely tied to setuptools. But with Django, I find that more often than not, the library I want to use isn't available as an egg, so I ended up having to download the source tarball and manually build an egg for each library. That got tedious after a while. Ideally, a new approach would allow me bundle both eggs and source tarballs. Failing that, source tarballs currently seem to be the path of least resistance in a Django ecosystem.

Finally, workingenv has been all but deprecated in favor of virtualenv (and/or the combination of virtualenv and pip depending on what set of workingenv features one is replacing). Virtualenv has a number of advantages over workingenv that I won't bother enumerating here. Pip is also an important new player in town. Conveniently, pip happens to use source tarballs instead of eggs.

The Current Approach

I'm just going to dive into it.

In each project, I have a 'requirements' directory. That, in turn, has a 'src' directory in which I've dumped source tarballs for every library that the project requires. That includes Django itself, database drivers, everything.

Also in 'requirements' are 'libs.txt' and 'apps.txt' text files. Those are just lists of the contents of the 'src' directory since pip currently isn't (yet) smart enough to figure out the order that it needs to install things when you just give it a directory full of tarballs. I separate them into 'libs' and 'apps' just for convenience. 'libs' are plain python libraries and 'apps' are full-fledged django apps.

At the top level of the project, there is a 'bootstrap.py' file with the following contents:

In other words, telling pip to install all the packages specified in requirements/apps.txt (which in turn has a reference to libs.txt) into a new virtualenv directory called 've'. The only real reason that's done in Python is to easily make it self-aware of it's location so the script can be run from any directory and it will figure out how to put the 've' directory in the right place. Bash makes that harder than it ought to be.

I also include a copy of pip.py in the top level project directory so pip doesn't even have to be globally installed on a system to bootstrap. (I think I can also drop virtualenv.py in there too but I already have that installed on every system I admin so I haven't bothered yet).

All together now

I actually have all of this set up in a Paster template (along with a whole bunch of other customizations) so instead of the usual

$ django-admin.py startproject foo

I run

$ paster create --template=mydjangotemplate foo

and I get a stock Django project directory plus the requirements directory full of source tarballs and the bootstrap.py. The copy of 'manage.py' that my paster template drops in there has the '#!/usr/bin/env python' line replaced with '#!ve/bin/python'. So the next couple steps for me to bring up a running dev server look like:

I'd like to stress that that happens on a system that has Paste and virtualenv installed but does not need to have Django, psycopg2, or any of the other libraries that are used installed. They are included in the template and get installed into the virtualenv for that one project.

I check everything except the 've' directory into version control (usually git these days). That includes all the source tarballs. It's a bit wasteful of disk space and bandwidth but I haven't found that it's that much overhead in practice.

When I deploy to production (via our automated system at work), the project is checked out of version control, rsync'ed to the production server then

$ ssh productionserver /path/to/myapp/bootstrap.py

is run and everything gets installed into a virtualenv on the production server. Other applications on the same server are unaffected. My mod_wsgi configs point to '/path/to/myapp/ve/lib/python2.5/site-packages' and so on (they are autogenerated by my paste template as well).

I plan on going into more detail on the paste template magic I use in the future, but that should give you a bit of a sense for why I like it so much.

A couple fine points

A few caveats:

I have this working on Ubuntu 9.04 but it requires a couple tweaks since Python 2.6 is the default on that system. Figuring out how to do that is left as an exercise for the reader.

I don't use that many libraries that have C extensions (and thus require time-consuming compilation steps). Pretty much just psycopg2 and PIL. psycopg2 is small enough that compiling it is very quick so I just put the source tarball in requirements like usual. PIL is big though and I don't want to have to wait for it to compile every time I deploy an app to production. That one's stable enough and is easily installed with Ubuntu's package manager that it's become my one exception and I just install it globally and it hasn't been a problem. (this is why pip gets the '--enable-site-packages' flag to pull that in)

pip doesn't yet do a very good job of figuring out dependencies. If you don't have the order just right in the requirements files, it will try to download dependencies off PyPI even if they're sitting right there. It's taken me a bit of trial and error to get it right.

Eg

comments

(Whoo, long comment. If I had my own blog set up I'd post this there, as a response .. but, uh, I don't .. sorry.)

I'll admit I was both confused and skeptical of this approach when I first started using it at work, but after a couple of months of developing a Django project under this system I'm really digging it. It's tremendously convenient, easy to get started with a project and to replicate a build, and fits my brain well. I think it's conceptually somewhat similar to the buildout approach, but IMO it's far more straightforward and comprehensible, and I like its better integration with virtualenv.

I've only had a couple of cases where this system hasn't been fully sufficient for my needs. In both cases, I think the solution is best done outside this system and then integrated with it.

h2. Installing external scripts / development tools

I've written "flunctests":http://pypi.python.org/pypi/flunc for my Django project. Flunctests are twill scripts, so they're executed through a simulated browser over HTTP against a running webserver. The tests are invoked by a command-line script that's provided when you install flunc. So I've declared flunc as a "libs.txt" dependency for my project, packaged the project's tests alongside the code, and .. have a bash script that invokes the tests: source ./ve/bin/activate; flunc the_test_suite -t http://localhost:8000 -w

That's annoying .. I don't really want to have to go inside the virtualenv directly or activate it. There's also an organizational problem with my approach: technically flunc and the application's tests are downstream of the Django project I'm developing, not a part of it. When we deploy the project to a production server, flunc is just an extraneous library to install; even if we decide to run the project's flunctests against a site in production, running them remotely against the site's external URL is a much more accurate test of the deployment than connecting to the production server and running them from the deployment's environment itself.

But the standard arguments apply against a global installation of flunc.

I think that solving this is outside the scope of the system you're describing. The idea I'm considering is something like the following .. it seems a bit weird with the nested virtualenvs, but it's the most logical approach I've come up with so far:

Create a separate Python package that contains the project's flunctests and declares a dependency on flunc in setup.py

Put together a "development installation" script that: creates a new virtualenv; installs my flunctest package within that new virtualenv, preferably through an SVN checkout and python setup.py develop; and installs my Django project, through an SVN checkout of the top-level project directory and bootstrap.py

h2. Installing shared development libraries

The distinction between project dependencies (declared in ./requirements, installed in the virtualenv's site-packages) and project code (checked into the project's source repository directly as code and not formally installed anywhere) makes it a bit hard to use shared development libraries that you want to hack on frequently without forking. These are the sorts of things that I always prefer to package as an egg, do a source checkout of, and python setup.py develop. But this is hard to do with no "outer" virtual environment.

When I put it that way it seems like the same solution is in order -- an external virtual environment that installs development eggs and then installs my project itself. These really are more like project dependencies, though, so the separation that seems natural for development tools feels artificial and awkward in this case.

"Fassembler":http://blog.ianbicking.org/2008/06/19/my-experience-writing-a-build-system/ has this nice feature for this use case where you can specify dependencies in your requirements file as "editable", with the flag -e. For editable dependencies, fassembler does a source checkout and python setup.py develop. (Pip may have this feature as well .. fassembler/poacheggs/pip/workingenv/virtualenv all have these very similar concepts of requirements files with slightly different featuresets and syntaxes, so I have a lot of trouble remembering their overlap.)

In my experience this hasn't been quite the right place for the feature, though; I think that whether to install a dependency as "editable" is more of a personal and case-by-case decision than a formal specification to declare in the project's requirements. In terms of workflow I think that the "plonenext":http://svn.plone.org/svn/plone/plonenext/3.2/README.txt tool "gets it right.":http://dev.plone.org/plone/browser/plonenext/3.3/README.txt#L24 I think a similar command-line tool could be installed into an external virtualenv.

formatting is
with Markdown syntax. Comments are not displayed until they are approved by a
moderator. Moderators will not approve unless the comment
contributes value to the discussion.