Capistrano & EC2 Sitting in a Tree, K I S S I N G

I am using EC2 to host my soon-to-launch Quizical.net application. Its great.

I use capistrano to manage these EC2 instances. With these tasks, I have automated many sets of EC2 commands into simple rake tasks.

For instance, to launch an instance, I can type…

rake ec2:run id=ami-61a54008

. ..and a minute or two later I have a new instance running.

Then to install my rails app, I type…

cap initial_install

…which

patches this instance with things I need;

starts my litespeed web server;

installs my app from subversion;

creates my databases;

writes my database.yaml

runs my migrations

imports my database from S3

restarts my server

So in a few minutes, I’ve got my app running on a newly commissioned server! Awesome.

I use to dread bundling my instances (that is, saving my instance with all the changes so I can re-use it again later). I’d have to look up how to do it in the API, paste in my secret keys, and then wait until bundling finished before uploading it, then registering it. It took a while. Now I can bundle, upload, and register with one key command:

rake ec2:complete_bundle

I’ll include the files I use to do this here. There are three files used.

aws.yml – this is where I store all of the data needed by Amazon’s web services for ec2 and s3, such as my access and secret keys. This file goes in your config directory. Since I use this data in multiple places (ec2.rake, deploy.rb, and my s3_cache library), I keep it in this central location. It looks like:aws_access_key: 'XXXXXXXXXXX'
aws_secret_access_key: 'x+XXXXXXXXXXXXXXXXXX'
aws_account: '84441XXXXXXX'
image_bucket: "steveodom_ec2_images"
ec2_id_rsa: '~/Documents/Projects/ec2/auth/id_rsa-rails-server'
ec2_keypair_name: "rails-server"
primary_instance_url: 'domU-12-31-34-00-00-6A.usma2.compute.amazonaws.com'

ec2.rake – this file goes in lib/tasks directory of your application. It contains these tasks:

Deploy.rb – this is my capistrano deploy file. This calls tasks from ec2 as well as using tasks from Adam Green’s s3.rake library. So you will need to have Adam’s s3.rake and ec2.rake in your lib/tasks folder.Some of my tasks here are:

patch_server – Anytime you change something on an EC2 instance, unless you re-bundle and register that change at EC2, your changes are not saved next time you run that instance. I put all my changes in this script so the next time I run an instance, I can call this task, and it gets my server the way I want it. If this task starts getting too long, I’ll then re-bundle and register my image.

create_database – If I bundle an image that has databases all ready in the instance, it does not leave me flexibility if I want to use that instance for another web app. So I use this task to write my databases after the instance is already created.

write_database_yaml

backup_db – uses Adam’s S3.rake library.

import_db – uses Adam’s S3.rake library.

Next I start bundling tasks together. Such as initial_install, which I run right after running my instance, and this task patches it with the latest changes, starts my server, then sets_up my rails application (creates databases, writes the database yaml, does all my migrations, imports the latest version of my database from S3, then restarts).

I’ve found Capistrano with these tasks to be perfect for managing my EC2 instances. I hope that others can find them useful too and add to them.

EC2 is $72 a month plus modest bandwidth charges. Its more expensive than VPS’s, right. What I like about EC2 is the flexibility. I can commission and decommission servers at will. I can create a staging server with my app on it in just a few minutes, test my latest release on that server, then if all is well, terminate the staging server and have only paid $.15 or so. I can add new servers when I need the capacity (though this reason is still a pipe dream).

And I think the impermanence fear about EC2 makes you think more about backups and redundancies.

Since you do an S3 back up every hour do you ever worry about your EC2 image going down somewhere inbetween backups? Am I missing something or when you talk about redundancies are you runing mutiple images? Any light you can shed on impermanence of EC2 woudl be greatly appreciated as this one reason I’m a bit reluctant to go full steam into EC3/S3 as a solution for right now.

Yep, impermanence is the primary obstacle for many deploying EC2. There’s something about the ‘virtual’ aspect of it that puts the spotlight more on impermanence than on dedicated solutions. I did not think as much about backing up and failures on a dedicated box as I do with EC2, though my dedicated box was maybe just as likely to fail. I think that is actually a benefit to EC2. The emotional triggers of ‘virtual’ forces you to design for failure.

Quizical.net is not an ecommerce site where if my instance goes down I’ll be losing orders – or medical records – or even photos. I can get away with hourly backups, which I will start moving closer together as traffic builds. If traffic builds I’ll probably set up a master/slave database on multiple instances. This project also holds some interest:http://www.openfount.com/blog/s3infidisk-for-ec2

There has also been speculation that Amazon is going to come out with a virtual database solution – the potential third leg of their infrastructure on demand solutions.

Nice.
That’s 72$ EC2 without bandwidth and storage ? I mean that’s extra 0.15$/GB/month?
I personally don’t justify using EC2 as webservers. Testing I do at home or on devel machines (maybe I’m lucky that I have access to some servers).
I thought about using S3 for storage as SmugMug does but I reached that that’s expensive too. A private server with 4*500GB hdds is 450$/mo and I don’t pay the bandwidth to access the data.

I think those of us who have chosen to use EC2 and S3 have thought about it pretty hard Piku. We do get bills and we know what the others charge. The flexibility of EC2 is just great though. I am willing to pay for that.

Steve, have you thought about creating and making publicly available an image geared specifically towards running Rails apps, stripping out some of the intermediate steps for people who want to get up and running with Rails as fast as possible?

If you use EC2 for hosting and S3 for storage, then bandwidth between the two services is free and really fast (since they are most likely in the same datacenter). You only pay for bandwidth to and from your users.

How do you handle multiple instances of EC2 each having its own mySQL database? Won’t sessions be lost if a user is bounced to a different instance between requests? It would seem to me that you would need a database server instance running, then each application server instance would access the same database instance server… or am I missing something?

Could you tell us a little bit about how you create and tweak your AMIs? I use windows as my development machine and test my deployments on virtualized Ubuntu boxes that are running under vmware. I’m at the stage now where I’d like to try to deploy to EC2 and play with clustering a bit, but I’m wondering what’s the best way to convert my virtualized Ubuntu machines into an EC2 AMI? Any tips?

Maybe I’m just thinking about this wrong. The Elastic Compute Cloud Walkthrough post was very informative, and similar to what I found the “EC2 Developers Guide”:http://docs.amazonwebservices.com/AmazonEC2/dg/2006-06-26/. Both guides explain clearly how to build a new AMI either based on an existing image or entirely from scratch, which is great. The next step according to these guides would be to upload, register and run these instances in EC2. Ok, also great – if you plan to use EC2 as your development and staging environment. I could do that I suppose. The thing is, as someone pretty new to *nix admin world, I’ve just got used to creating, cloning, tweaking my *nix staging deployments in vmware. It seems all so easy to create my own virtual networks of machines, with everything running on my notebook and my development server. Once I’m satisfied that my deployment is working as I want on my virtualized network, what I’d really like is a way to convert my vmware images into AMIs ready to be loaded and run on EC2.

I guess what I was asking originally was whether you and others who are using EC2 basically use EC2 as your development and staging platform. Do you fire up one of your customized AMIs on EC2, make some changes, maybe via Capistrano, run some tests, then persist it back to S3 when the development day is done? Or do you develop with most everything running locally, and only when your ready to release your current iteration do you deploy to EC2.

Good work on what you have built. We currently run our entire web site on EC2/S3. We have about a million downloads a month and growing. We had 16TB of data sent out from EC2 last month.

We do the same thing as you except we do mysql dumps to s3 every 2 hours. The main reason I am commenting is to ward you away from s3infinidisk. We purchased 2 copies of s3dfs (infinidisk) in the spring and the support was horrible. I gave the owner of it many bugs and he always said there were fixes coming but then nothing. Eventually he just stopped responding to me. $2K down the drain.

We have done price comparisons on what we are saving not using a co-lo and I have to say we are doing great.

I have just started recently playing with EC2, and the way that I am handling the impermanence issue is by creating an EBS persistent data block. The cost of storing data in this way actually seems to be a bit cheaper than S3 at 0.10 per GB-month vs 0.15 for S3.

I then mount the EBS block onto /vol on my instance and create an ext3 filesystem on it.

I moved my Apache document root, sites-enabled and sites-available directories to /vol/www where they live alongside site directories. All I needed to change in apache2.conf was the Include statement for sites-enabled which needed to be changed from /etc/apache2/sites-enabled to /vol/www/sites-enabled.

I then moved my PostgreSQL data directory to /vol/data, updating my postgresql.conf file to reflect the new data directory location. I am almost certain that it is very similar to do this with MySQL.

About Steve

Steve's hobby is creating rails applications that no one visits. Sites like smarkets.net, trivionomy.com, and quizical.net. He's also created a couple of rails plugins; s3cache is one, and elasticrails.com is the other. One day he hopes to create something popular so he can quit his dayjob.
He can be reached at steve.odom at gmail.com