devops extravaganza

The only thing I like almost as much as building great web applications is devops automation. It's a discipline which is just coming into its own - new enough to still have a lot of innovation around it but mature enough that it already has a great ecosystem of quality tools and best practices.

Lack of updates the last few weeks hasn't been for lack of progress. Before I started updating the content again much I wanted to correct some issues with the storage and architecture of the site and it was just easier to do it if I didn't need to worry about managing porting new content around between different environments.

This blog is back on AWS entirely now except for the comment server which is still running out of my living room. I've decided the $60-70 per month charge for all the extra costs associated with the "free" tier on AWS is just the cost of learning this technology so I'm going to eat it. (I mean I already spent $1500 on a router at this point so whatever.)

So here's what's working now, and the general overview of what I want to accomplish as well as what's planned for the near future:

All application instances run in a private VPC that's not routable from the internet.

There's a IPSEC VPN connecting my home network to the VPC. Access to application instances is possible from the internet only by connecting to my home network's VPN (via openvpn).

The private VPC can speak to the internet via two NAT instances in a public VPC subnet. Each instance has an elastic IP. The NAT instances are in different availability zones and monitor each other. If one goes down, the other takes over its route and tries to reboot the failed instance.

A load balancer exists as a front end to the web application instances.

A read only domain controller is deployed to the private VPC. It's set as it's own subnet and site in active directory. This serves as the primary DNS server in the ec2 DHCP set. This server is not built from scratch but rather a private AMI that's already been joined to my domain and configured. I'm sure it's possible to automate creating the domain controller but it's just not worth the effort, the image has no software installed other than active directory and DNS, which replicate automatically once the machine is launched.

There's a salt master instance that takes over building the rest of the infrastructure.

All of the above is created automatically via a cloudformation template. Subnets of the public and private VPC are parameterized.

From here the rest of the architecture is created by salt master, via salt-cloud and salt states. (No intervention required, salt master starts building the instances as soon as everything is ready.) Currently there are 3 servers, 1 for the database and 2 web front ends.

The database runs PostgreSQL. Getting Ghost to run with Postgres instead of the default SQLite is very easy. SQLite isn't so bad to use but obviously it's not going to work well with multiple web front ends.

Once built, the web front ends are attached to the load balancer and the alias to the load balancer is updated in route53. I don't use route53 for my primary DNS but I delegate blog.emikek.com and *.blog.emikek.com to route53.

I came up with the following strategy for managing secrets during the deployment process:

I don't know the public IPs of the NAT instances before they are built, but salt master needs to get a bunch of scripts to build itself. To solve this, I host a public read-only git+https repository from my home network that holds all the scripts, configuration files and salt templates that the salt master instance can get before the VPN is configured. Then, it vamps until the VPN is set up by trying to ping the private DNS name of a computer in my home network. It keeps trying until the ping is successful. Then, it pulls the secrets (AWS keys, database passwords, etc.) from a webserver in my home network that will only serve to the private IP of the salt master instance, which is statically configured in cloudformation. This way I can easily get the scripts to the cloud but without publicly hosting secret information facing the internet. I still need to be careful not to accidentally commit any secrets to git. Although, using the VPC takes a lot of risk out of the picture because there's no way to access a server in the VPC aside from within my home network. SSH is disabled entirely from the internet.

The parts of the infrastructure which must be pre-existing (not automated by cloudformation or salt) are:

the s3 bucket for images.

the route53 hosted zone and the record set (although updating the alias to the load balancer is automated).

the IAM role with permission to upload to s3 (used by Ghost in its configuration file).

creating the private AMI of the read-only domain controller.

updating the shared key and gateway on my router to connect IPSEC VPNto my home network (wouldn't really want this to be automatable I don't think!)

These seem pretty reasonable to be managed outside of the automated processes for the most part. Perhaps the name of the s3 bucket and route53 zone could be added as parameters to cloudformation and the IAM role could be created by cloud formation, but I'm fine with leaving them for now. I don't really want the s3 bucket created on the fly because the images should persist between deployments and copying stuff between buckets is pretty pointless.

Coming up next:

Memcached for nginx's cache. Maybe elasticache, haven't decided yet.

Need to create an autoscaling group of web front ends. I haven't decided how to accomplish this yet. It's easy enough to autoscale the same process that salt creates the initial web servers (using the salt ec2 reactor formula), but I really hate that it takes 10 minutes to build new servers. In really high traffic that could be ten minutes of downtime. Much better plan I think is to create an AMI of the web front end once it's build and just spin up more of those when required. But the question is how exactly that interacts with salt master since the minion is already joined on the existing instance, and the replica will share all that minion state. I'm assuming there's an easy workaround, need to try it out and see what issues come up.

Comment server. Just to save some $$ i'm thinking of hosting this little application on the database server and sending requests to it from nginx on the front ends.

Multi-region deployment! Probably not till a much later iteration, this introduces a lot of options and complications. I'd like to accomplish the one below first. Multi AZ is good enough, probably long term even.

Some automated backups. These are manual at the moment.

Once this "toy" blog is done, reliable and scalable, I need to deploy my old pet project using the same techniques. More on this but not for a month or two.

Next post on this I'll post all my cloudfront, salt, and bootstrapping code.