BOSH Cloud

We all know that ejabberd is cool and erlang is extremely scalable. Facebook even decided to use erlang for their new chat system! Even with all of that, I want to write about an alternative. :) I am not gonna compare numbers and benchmarks. This post will lack data and statistics for all of that. I am just gonna describe a system thats implemented, in production, and works well. It is the story of the BOSH Cloud.

First, XMPP is a standard protocol for presence based applications, namely IM chat. It is growing in use, even as I type this blog entry. Web based chat can be done using XMPP with some help. The help comes from an extension to the XMPP standard called BOSH. BOSH solves the problem of implementing something that requires state, over a stateless protocol like HTTP. (Ejabberd has a BOSH implementation) Anyway, the question is, 'what is a BOSH Cloud?'

The answer is that it is a set of Amazon Ec2 instances used to provide a scalable BOSH connection manager. You have a HTTP load balancer up front and you can create as many BOSH instances based on your scaling needs on the back end.

What we have here is a recipe to implement a BOSH connection manager and do simple round robin scaling. Ngnix provides the load balancing and Punjab provides the BOSH implementation. To scale, you run them on Amazon's elastic compute cloud.

To start, you will need an amazon web developers account and know a bit about ec2.

This implementation starts with a basic debian image. Installed on this image is Python, Twisted, Nginx, and Punjab. There is plenty of good documentation on how to build an ec2 image on amazon's web site. Once we have the applications and their dependencies built we will need to configure them before we build a new image.

Lets start with what is up front, Nginx. This will be our simple round robin load balancer. The configuration will look something like the following : (values will vary based on your needs)

Nginx could also serve static files or your chat client. Nginx has a very small foot print and is fast. The main thing about this configuration example is the nginx load balancing. NOTE: you will NEED the ip_hash directive in order to have the same ip use one punjab. Otherwise it will be disconnected.

Next we configure punjab with the basic .tac file that comes with punjab.

Now that those are configured, you can build your image. Once you have the images, the magic can happen. :)

So first we start an instance and run nginx.

ec2-run-instances -d "hostname=www"

This will start up one instance and if the script above is executed it will take the user data given and create.

www.yourdomain.com

Which will load balance to the punjab instances.

You can have nginx start up on boot with the above configuration and you are almost there.

Start up the Punjab instances.

for i in 0 1 2 3; do
ec2-run-instances -d "hostname=punjab$i"
done

This will start up 4 Punjab Amazon instances and if your server start up scripts run punjab on boot, you will be up and running. You can scale as you like by starting up new instances or taking down old. You can have a script to reconfigure nginx and then HUP it to add the new instances to the load balancing pool.

I will note that there are things left out, namely the Javascript BOSH client. You are welcome to leave questions in the comments or leave them as exercises for your enjoyment. :) That is it for a basic BOSH setup using Amazon ec2 and python or a "BOSH Cloud" . You can also do this with ejabberd and erlang.