Duration: 11:50am - 12:40pm

Day of week: Wednesday

Level: Intermediate

Persona: Architect, CTO/CIO/Leadership, Developer

More talks on:

What You’ll Learn

Understand how to build systems that are designed to fail in graceful ways.

Hear stories of rapid growth in very short periods of time.

Learn techniques and practices to improve System Reliability Engineering.

Abstract

Hillary for America was arguably one of 2016’s largest startups. It was in the news every day, raised billions of dollars, and grew at an incredibly fast rate. There was even a very splashy exit. But what isn’t often talked about is the technical infrastructure behind it. Over the course of 18 months, HFA tech’s SRE team built and ran an immutable infrastructure, supporting a tech org that started with one developer and grew to 80, letting people deploy hundreds of times a day, with little to no downtime. In this talk Michael will explore how the campaign systematically approached every design decision to stay true to immutable principles, leveraging AWS infrastructure along with open source technology like Packer, Ansible, Consul, and a healthy dose of Varnish.

Interview

Question:

QCon: Aside from supporting a website, people might ask why would a Presidential campaign need immutable infrastructure? What are some use cases that the team had to handle and how large was the team?

Answer:

Michael: I joined Hillary for America at the beginning of a campaign in June of 2015. At that point, we had just a few things that we were doing. These were things like collecting money online, trying to get people to sign up for emails, or keeping engagement with web site.

From there, we built out some of the initial products the campaign had before moving on to creating a site reliability engineering team which handled build and deploy tooling. But, most importantly, we architected the more than 70 microservices that ran throughout the campaign.

All 70 of these Microservices were built with immutable infrastructure that did lots of interesting stuff. These were things that took money, signed people up for events, call tools, sync tools, voter protection tools, the list goes on and on.

By the end of the campaign, we were an SRE team of four and a tech team of 80 (including more than 50 software engineers pushing code every single day).

Question:

QCon: What are you going to discuss in your talk?

Answer:

Michael: Basically, my motivation is to answer the question with 50 engineers pushing code as fast as the can and an SRE team of four, how do you do that?

How do you balance the needs of your developers against reliability? That's where immutable infrastructure came in. So I want to talk about how that's the handshake agreement between developers that are moving insanely fast and reliability concerns.

I want to talk about what our stack looked like, and how we did it. I will also talk about some of the tools we used like Consul and Vanish.

Question:

QCon: Who is the primary audience you're talking to in your talk?

Answer:

Michael: I'm talking to is somebody who is an architect, reliability engineer, or a person who is in the position of making decisions right. So they're not just implementing. They are making calls on how to prioritize what they're working on and the tradeoffs.

What I want them to come away with is an understanding that even with the best immutable infrastructure plan, you will fail. So the question is really about how you build an immutable infrastructure system that will scale. But, more importantly, allows you to fail gracefully and in a way that your users don't don't notice.

Speaker: Michael Fisher

Site Reliability Engineering Manager @HFA

Michael Fisher led HFA’s Site Reliability Engineering team, joining the tech team as its 5th engineer in June 2015. Before that Michael worked on ad tech at DMP platform Krux and at Google. He currently lives in New York, NY.