Python: If you have Docker, do you need virtualenv?

I’ve been working in Python a lot over the last year. I actually started learning it about three years ago, but back then my day gig was 100% .NET/C# on the back-end, and just when I decided to do something interesting with Python in Ubuntu a series of time-constrained crises would erupt on the job and put me off the idea. Since that time my employer and my role have changed, the first totally, and the second at least incrementally. At KIP we use Python for daily development, and run all our software on Centos on Amazon Web Services. As such I got to dive back in and really learn to use the language, and it’s been a lot of fun.

One of the de facto standard components of the Python development toolchain is virtualenv. We use it for Django development, and I have used it for my personal projects as well. What virtualenv does is pretty simple, but highly useful: it copies your system Python install along with supporting packages to a directory associated with your project, and then updates the Python paths to point to this copied environment. That way you can pip install to your heart’s content, without making any changes to your global Python install. It’s a great way of localizing dependencies, but only for your Python code and Python packages and modules. If your project requires a redis-server install, or you need lxml and have to install libxml2-dev and three or four other dependencies first, virtualenv doesn’t capture changes to the system at that level. You’re still basically polluting your environment with project-specific dependencies.

Dependencies in general are a hassle. Back when I started messing with VMs a few years ago I thought that technology would solve a lot of these problems, and indeed it has in some applications. I still use VMs heavily, and in fact my daily development environment is an Ubuntu Saucy VirtualBox VM running on Windows 7. But VMs are a heavyweight solution. They consume a fixed amount of ram, and for best performance a fixed amount of disk. They take time to spin up. They’re not easy to move from place to place and it’s fairly complicated to automate the creation of them in a controllable way. Given all these factors I never quite saw myself having two or three VMs running with different projects at the same time. It’s just cumbersome.

And then along comes Docker. Docker is… I guess wrapper is too minimizing a term… let’s call it a management layer, over Linux containers. Linux containers are a technology I don’t know enough about yet. I plan to learn more soon, but for the moment it’s enough to know that they enable all sorts of awesome. For example, once you have installed Docker and have the daemon running, you can do this:

The first command uses debootstrap to download a minimal Ubuntu 13.10 image into a directory named saucy. It will take a little while to pull the files, but that’s as long as you’ll ever have to wait when building and using a Docker image.

The second command tars up the saucy install and feeds the tarball to docker’s import command, which causes it to create a new image and register it locally as myimages:saucy. Now that we have a base Ubuntu 13.10 image the following coolness becomes possible:

This command tells Docker to launch a container from the image we just created, with an interactive shell. The -name option will give the container the name “saucy,” which will be a convenient way to refer to it in future commands. The -h option causes Docker to assign the container the host name “saucy” as well. The bit at the end ‘/bin/bash’ tells Docker what command to run when the container launches. Hit enter on this and we’re running as root in a shell in a self-contained minimal install of Ubuntu. The coolest thing of all is that once we got past the initial image building step, starting a container from that image was literally as fast as starting Sublime Text. Maybe faster.

So we have our base image running in a container. Now what? Now we install stuff. Add in a bunch of things the minimal install doesn’t include, like man, nano, wget, and screen. Install Python and PIP. Create a project folder and install GIT. Whatever we want in our base environment. Once that is done we can exit the container by typing ‘exit’ or hitting control-d. Once back at the host system shell prompt the command:

sudo docker ps -a

…will show all the containers we’ve launched. Ordinarily the ps command only shows running containers, but the -a option causes it to show those that are stopped, like the one we just exited from. That container still has all of our changes. If we want to preserve those changes so that we can easily launch new containers with the same stuff inside, we can do this:

sudo docker commit saucy myimages:saucy-dev

This just tells Docker to commit the current state of the container saucy to a new image named myimages:saucy-dev. This image can now serve as our base development image, which can be launched in a couple of seconds anytime we want to start a new project or just try something out. I can’t overemphasize how much speed contributes to the usefulness of this tool. You can launch a new Docker container before mkvirtualenv can get through copying the base Python install for a new virtual environment. And it launches completely configured and ready to run commands in.

Given that, I found myself wondering just what use virtualenv is to me at this point? Unlike virtualenv Docker captures the complete state of the system. I can fully localize all dependencies on a “virtualization” platform that is as easy to use as a text editor in terms of speed and accessibility. Even better, I can create a “DockerFile” that describes all of the steps to create my custom image from a knwon base, and now any copy of Docker, running anywhere, can recreate my container environment from a version controlled script. That is way cool.

Now, the truth is, there are probably some valid reasons why the picture isn’t quite that rosy yet. The Docker people will be the first to tell you it is a very young tool, and in fact they have warnings about this splashed all over the website. There are issues with not running as root in the container. I haven’t been able to get .bashrc to run when launching a container for a non-root user. There are issues running screen due to the way the pseudo-tty is allocated. And if you’re doing a GUI app, then things may be more complicated still. All this stuff is being worked on and improved, and it’s very exciting to think of where this approachable container technology might take us.

23 thoughts on “Python: If you have Docker, do you need virtualenv?”

It remains an interesting question. Virtualenv by design can only isolate python library dependencies. A docker container isolates the entire file system. It takes really no more time to spin up a new docker container and begin installing specific libraries into it than it does to create a new virtualenv. The command is a little bit more verbose but that’s about it. Last week, for example, I wanted to experiment with a java package and it took just a moment to fire up a new instance of a base JDK container and start loading stuff up on it. Similarly I have a base python development image that I can launch a container from in a moment or two. It’s worth noting that I haven’t done any development in docker that supports a GUI interface, or even one using curses. But for server-side work I don’t see much reason to prefer virtualenv at this point.

Hi, Jan. I was still on the fence back when I wrote that post. As a relatively new python developer I don’t have a long history of virtualenv in my toolset, so there could certainly be things I’m missing, but having said that I feel like Docker is a better choice at the moment based on my own needs. If you think about that experimentation cycle you alluded to, Docker makes it a lot easier, and really no less fast, to begin with a base environment, add to it, drop back when things don’t work, and snapshot various stages when they do work. And since it includes the complete file system state, and not just the changes made to the python environment, the control you have is ultimately more deterministic, imo. Thanks for your comment.

I agree, Jeff, that keeping the global python package state to a minimum makes good sense. One of the issues I have with relying on virtualenv is that it doesn’t give you any boxing of os-level dependencies. Containers do, but as the article you linked notes, you can’t really turn your desktop into a container yet (although there are some interesting thoughts on running graphical apps in a container here: http://fabiorehm.com/blog/2014/09/11/running-gui-apps-with-docker/), so it makes sense to keep the global python install lightweight.

LXC images are the way to go, and I see little purpose for virtualenv at this point. LXC solves issues that venv solved, and also solves the ‘dev environment doesn’t exactly match prod’ problem.
It’s a no brainer.

Hi just coming back to the virtualenv question. I’ve always liked to organized things with virtualenv, and I did when deploying applications with docker. But I’ve hit some issues, the problem is that my main application needs to be executed as a specific user, but there is others services i also have to start in the container in order to cope with the rest of you system, like consul, but this one, I have to start as root and this application will need the python environment to run some health check commands. So when I start the container as some user, even if I hard-code the virtualenv activation into root’s bashrc, using the command sudo, would never add my virtualenv to the PATH. So I gave up to use virtualenv and now I am installing all again in the global env.

Thanks for the comment, Andre. When I wrote the post I had in mind the initial development scenario, where we use virtualenv locally to manage our python environments. In that scenario a docker container with volume mounts can provide the same ability to isolate the changes to the system that are needed to support the application. It has the further advantages of also isolating the system from changes made by aptitude and other package management layers, and of being directly deployable in the production environment. I can’t think of any good reason to deploy a virtualenv inside a docker container in production. The docker image completely defines the state of the file system when the container is launched from it, and that state is completely destroyed when the container is shut down and removed. I don’t see what virtualenv brings to the party. Just my two cents worth. Thanks for stopping by.

Rapidly approaching 2017 is this blog post couldn’t be more relevant. As a developer who has recently learned Docker and is staring painfully at a messy global space, I only wish I had been exposed to this technology much, much sooner.

Excellent write-up and while the counter arguments are very compelling, I really see no application for virtualenv in today’s Python developer toolkit. Docker is far too advanced, stable, and portable at this point.

Hi. Bob. Thanks for stopping by. Wow, so much has changed in the three years since this post was created. I should probably write an update to it. One of the most fundamental changes is that I no longer think of docker containers as creating an environment in which to possibly do development. I didn’t understand them very well back then. Today I view a container as a runtime environment for a single process. I still think there is no need to activate a virtualenv inside a container. They’re redundant in that context. When doing python the way I work most often now is to develop locally using virtualenv and then package into a container (building off the same requirements.txt) for deployment. When working locally using virtualenv the issue of dependent system packages still exists, but I haven’t found docker to be a good solution for that. I think a better solution is either to run dev in a vm or use nix so you can easily move between configurations.

As someone who was installing to global space and relying upon Docker, the problem with relying strictly upon Docker is you still need to install packages when using IDE code helpers (completions and the like). The points made in Hynek’s article seem pretty valid to me, both for the development (IDE) environment as well as any staging type environments and probably even production. Nearly all of my projects stay in the dev and staging space simply because they are more about functionally testing other projects rather than offering direct services to the public. There hasn’t been an issue yet, but I’m starting to explore *env to hopefully avoid issues in the future.

So instead of creating a virtual *python* environment you create a virtual *os* environment, with all the overhead that entails. While we’re bloating everything, why not spin up a full blown VM while we’re at it?

It’s not at all a virtual OS environment. It’s a process, with cgroup limits on ram and cpu, in a kernel namespace with a virtual ethernet adapter and a mounted portable file system, along with a little bit of scaffolding for logging, process lifecycle management, etc. The mounted filesystem is obviously the thing that competes with a virtualenv. Try spinning up a virtualbox vm and then compare the start time to ‘docker run’ on a base ubuntu image, as an example.

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Search for:

About the Author

Mark is a programmer and system architect with nearly two decades of experience. He typically works with small teams of talented developers on next-generation responsive web and mobile applications. Over the years he has built production systems in BASIC, Pascal, C, C++, C#, Python, Java, and javascript ... (more)