How to publish code to Amazon EC2?

We're looking to switch to Amazon EC2, and I'm wracking my brain to try and understand how we'll get there. One of the tenants of EC2 is that you have many instances that can and will fail at any time, being replaced by new ones. While I can see how this is beneficial, especially in regards to scalability, it also some new challenges.

So my question is, how do you push code in such an environment?

Right now our code publish script does an hg archive (we use Mercurial instead of Git) to get a clean copy of the code out of HG, and then we rsync it to our 1 web server. We do the same for our "Scripts" codebase to our 1 database server. Additionally, we have a "shared" codebase which, whenever updated, gets published between the two.

This works great when you have 1 web server, but I'm trying to think of how to manage it when you have 2, 4, or 30 web servers. My first thought was to just put Mercurial on each EC2 instance and pull the latest changes whenever the server comes online, but there's 3 problems with this:

1. We have several large files (such as 300MB binaries) on the web server which are not under version control. I've always been told it's a bad idea to put big files into the repo, so what other solution is there?

2. We also have several files uploaded directly to the server, such as graphs or pictures that go along w/ a particular blog post. These obviously do not belong in the repo regardless of size, so how do you handle that in an EC2 environment?

3. We don't necessarily want the servers to always be running the latest changeset on our stable branch. Sometimes we publish code to 'stable' in preparation for a disruptive upgrade that evening. We wouldn't want that code out there early. More rarely, we may need to revert to an older version in case something goes wrong and we need to revert.

This is a situation where something like Puppet, Chef, or Ansible come into play. With any of them, you can build an inventory of software which you need installed on a node, and rely on the orchestration software to manage the actual installs.

They're a bit confusing at first - but definitely worth the time. My personal favorite is Ansible, since it requires virtually no installs on the target servers in advance, and it's highly extensible using Python - but your mileage may vary.

re: files uploaded directly to the server - you can't do that if you're moving to an architecture like you describe. Either store them on S3 or on some other CDN of your choosing (in the minimal case this could be a "media" server you maintain yourself that just serves static files). But the point is that your static files come from someplace other than your application server.

We solved this by using a "private" S3 bucket to hold your code along with a simple startup script that copies the files to the VM.

We had our CI server push new releases up to S3 after committing and testing.

Then we can launch a new instance, which will copy over the files and run setup scripts. After the new instance tests out correctly, we then switch the Elastic Load Balancer to use the new instance and drop the old one.

This actually looks really interesting to me. Having to install Ruby everywhere is extremely unfortunate (not to mention writing scripts in a strange Ruby DSL) and I'd much rather be able to just centralize my deployment stuff on one machine.

This actually looks really interesting to me. Having to install Ruby everywhere is extremely unfortunate (not to mention writing scripts in a strange Ruby DSL) and I'd much rather be able to just centralize my deployment stuff on one machine.

For me, the ruby install was less annoying than having to set up a new daemon and service to manage multiple servers at once. I also appreciate the simple sequential inventory format for Ansible playbooks; as much as I appreciate the automated dependency ordering that Puppet is capable of, it has resulted in odd (read: wrong) behavior more than once.