Running long processes in Django

My original issue was that I had this piece of code in my Django project that was taking way too long to run, sometimes even leading to a load time out. The particular situation happens very infrequently (at most once a week), so it was just a matter of getting the process to run successfully. Along the way, though, I learned a lot about spawning processes, distributed computing, and the different options we have in the Python community. The different approaches are just ways to get the processing to be done outside the Django process, of course.

cron and queue

cron
I will first start with the recommended process of taking care of this issue. You can setup a cron job to run some code that checks every minute to see if there’s anything to process and then run the appropriate code. Cron jobs are pretty simple to setup and pretty effective. All you need to do is edit the crontab with the time intervals you want the service to be run and your code takes care of the rest.

django-queue-service
The cron part of this solution takes care of when the processing happens, but what handles why it happens? So for that aspect of it you’ll need some way to know when there is processing to be done. There are of course multiple ways to handle this. Update a table in your database, update a file, or a folder… One way is to use django-queue-service. This method requires you to run the queue service as another django instance and then make requests to it. The sample code from the projects page looks as such:

While this method does make the most use of Django of all the methods I’ll discuss, I really have problems with it. It’s heavy handed, unnecessary, and I can’t even tell if there are security concerns. Let’s say that someone set this method up improperly and exposed this django instance to the outside world…

queue: Python module
There was some Python module I came across, which I can’t seem to find. When I find it I’ll post a link, but the way it worked is that it was a file based queue. It would add files to folders. And there were five different folders based on the status of the item of the queue. Ready, active, complete… This was a better way to handle the queue than the django-queue in my opinion, but still seemed a bit uncomfortable.

Asynchronous Messaging in Python

The most natural approach, or at least what someone in my ADD generation wants is to get things done on demand. Why wait a minute or ten minutes for the cron job to call my code. Why can’t my code run when I want it to. This approach seems to make more sense to me and the reasons to stay away from it (namely complexity and processing power) go totally out the window (I think) since the other approaches are really not easier than my ending approach (which I’m actually really happy with).

When considering asynchronous communication the natural choices are the following:

Pyro: Very light, very simple to setup. In a word, perfect. Pyro fit the bill for what I was looking for. The thing I liked best about this library was that it’s native to Python so my objects are sent and received as if real Python objects. So there’s no analyzing/manipulating data. Which takes out all the fuss in this type of interraction. Very cool stuff!XML-RPC: A very strong contendor and something I will probably run into in the near future. XML-RPC is very welcomed in Python and has a couple different implementations. Seems like a vary sane choice when doing this type of messaging.Twisted: A very dependable project, whose only issue for this task was that it seemed too complicated to setup for the job at hand.Corba: Corba’s been around for a while, and while it can communicate with almost every language under the sun. I don’t really need that kind of power. Also, since it’s not native to Python there’d be a lot of translation going on.

Pyro
In the end as mentioned above, I went with Pyro. And I am so pleased with the results. And I think you’ll be very surprised by how little code I needed to put together to get this all working if you’ve never looked at Pyro code before.

Firstly, thank you for this interesting post, but I would like to know if it is possible to set up a Django project with RabbitMQ AND Pyro, I mean using Django to build the entire project (with views, database…), RabbitMQ to encapsulate data in queues and Pyro to synchronize communication as it uses already the Peackle protocole ?

Lior, ya, when I wrote this post, I don’t think Celery wasn’t out yet and I hadn’t heard of greenlet or gevent… Celery looks pretty cool; haven’t got a chance to play with it yet. Gevent seems a bit heavy handed for these purposes…