When the nerds go marching in

By Alexis Madrigal

November 16, 2012

The Obama campaign's technologists were tense and tired. It was game day and everything was going wrong.

Josh Thayer, the lead engineer of Narwhal, had just been informed that they'd lost another one of the services powering their software. That was bad: Narwhal was the code name for the data platform that underpinned the campaign and let it track voters and volunteers. If it broke, so would everything else.

They were talking with people at Amazon Web Services, but all they knew was that they had packet loss. Earlier that day, they lost their databases, their East Coast servers, and their memcache clusters. Thayer was ready to kill Nick Hatch, a DevOps engineer who was the official bearer of bad news. Another of their vendors, StallionDB, was fixing databases, but needed to rebuild the replicas. It was going to take time, Hatch said. They didn't have time.

They'd been working 14-hour days, six or seven days a week, trying to reelect the president, and now everything had been broken at just the wrong time. It was like someone had written a Murphy's Law algorithm and deployed it at scale.

And that was the point. "Game day" was October 21. The election was still 17 days away, and this was a live action role playing (LARPing!) exercise that the campaign's chief technology officer, Harper Reed, was inflicting on his team. "We worked through every possible disaster situation," Reed said. "We did three actual all-day sessions of destroying everything we had built."