17:53,
June 12, 2012

Recently, I’ve been playing with message queue-based interprocess communication
in Python. I’ve got an application idea which uses a client / worker-pool
distributed processing concept, and wanted to test out a few of the options
available.

Yes, I know that ZooKeeper isn’t really an MQ, but it’s pretty easy to implement
one with it.

0MQ

0MQ is exactly what they say: “sockets on steroids”. There’s no 0MQ “daemon”,
no extra processes, just sockets and a library to talk via them. This is both
good and bad.

Without a daemon, it means that there’s no extra point of failure to rely upon,
you get full control over your queues and how they’re used. But also, you don’t
get clustering unless you do it yourself, you don’t get any administration tools
unless you do them yourself, no access control unless you do it yourself, &c.

Working with 0MQ

It took me a while to figure out how many queues I needed for my concept, and
how to route them. I designed separate client and worker processes, with a
single broker to manage the worker pool. This setup was a bit complex, with
client having an output to the broker’s input queue, and an input queue for
responses from workers. The broker has a single input queue, with various
outputs to each of the workers in the pool. The workers each had an input
queue.

Actual code for 0MQ is reasonably simple. I ended up making a few objects to
wrap the low-level stuff, but it’s not like 0MQ is overly verbose. There’s a
lot of different queue options, all of which do things slightly differently, I
ended up using PULL for the worker and client input, PUSH for client -> broker,
and PULL into the broker and PUSH out of the broker.

Results

Decent. A bit limited. Not very robust unless you want to do your own
heartbeat stuff, it’s a little hard to tell that you’ve actually connected and
sent a message and that message was definitely received. I probably won’t use
it for this.

Celery

Celery is a task/job queue built on message passing, which uses one of a handful
of MQ systems. I used RabbitMQ via amqp, because I had it installed already.

I do like that it uses a backend MQ system, and that there’s a method and
library for integrating it with Pylons (even though
I’ve moved on to Pyramid now).

Working with Celery

Celery is a very high-level library, which uses decorators to indicate task
procedures. You can do a lot with very minimal code.

Results

I ended up stopping work on Celery, I just didn’t like how the system works.
For one, I’m not really a fan of decorators which do a lot behind the scenes,
because it’s not really obvious what is going on at a glance.

Also, there’s no clear way to have a callback when a task is complete, you can
only check on the status of jobs from the caller. You can chain tasks, but
the subtasks will run in the worker process, not the caller, which just isn’t
going to be useful for me.

Apache ZooKeeper

As I said, ZooKeeper isn’t really a MQ system, it’s more of a shared storage
system for maintaining configuration, naming, synchronization, and the like, but
it’s easy to make a message queue. You just make a node for the queue, and
therein make a sub-node for each message, letting ZooKeeper name them
sequentially. Your worker can process them in order.

ZooKeeper is designed from the start to be robust and distributed, which allows
the application developer to not really worry about these things.

Working with ZooKeeper

The python library included in the ZooKeeper source is a bit low-level, so I
wrote a basic wrapper library to provide object- oriented access to ZNodes, then
built a “ZQueue” object on top of my ZNode class. I found that treating the
ZooKeeper tree as a tree of objects works very well when building higher level
applications.

My method of making a ZNode for the queue and then ZNodes for each message means
ZooKeeper acts as the broker, and there’s no need to write a middle layer.

Results

I like ZooKeeper.

It’s not really a MQ, so I probably won’t use it as such, but I’m glad I tried
it out, I can think of lots of other uses for it.

Pika and RabbitMQ

Pika is a pure-Python implementation of AMQP 0.9.1, the standard MQ protocol
used by various MQ brokers, including my choice here, RabbitMQ.

RabbitMQ is a robust message broker with a management interface and bindings for
many languages.

Pika is reasonably flexible, not too low-level but not too high. RabbitMQ is
quite full-featured, including a full management interface which lets you
inspect queues, and a full access-control system which is entirely optional.

Working with Pika 0.9.5

The Pika library is a little lower level than say, Celery, but it’s still
reasonably easy to work with. Doing blocking work is rather easy, but doing
anything asynchronous or CPS is a little more complex, you have to define
mutliple callbacks. I just created objects for both my client and worker.

With RabbitMQ acting as the broker, there’s no need to build a broker layer like
we did with 0MQ.

Results

Pika has lots of nice features. You can create named, permanent queues,
temporary unnamed queues, full message exchanges, tag messages with types and
filter on those types, require or discard ACKs, &c. I see lots of possibilites
there.

RabbitMQ servers can be clustered to form a single logical broker, and you can
mirror queues across machines in a cluster to provide high availability.

In fact, Pika/Rabbit seems to have far more features than I’ll actually need for
the project I have in mind - which is good, really.

Conclusion

So far, as should be obvious, I’m going to use Pika and RabbitMQ. I’ve only
been playing with the combo for a day, so expect more to come here.