How DevOps Saved Healthcare.gov

Recently, I read this article in TIME Magazine about the launch and resurrection of healthcare.gov. In reading the entire article, the only thing I could think of was how the key problem here is summed up in one word: DevOps.

Before I talk about that article, I want to clarify the meaning of DevOps. I am going to use the definition by Mike Kavis , which reads:

“DevOps is a culture shift or a movement that encourages great communication and collaboration (aka teamwork) to foster building better-quality software more quickly with more reliability.”

The goal of DevOps is to gain greater visibility, and earlier visibility, into the innards of software development and easing the communication of that information to the parties involved. With the launch of healthcare.gov, we saw the complete lack of communication and visibility and then, again, we see how the problems were fixed by providing tremendous amounts of visibility.

[Tweet "Goal of #DevOps is to gain greater and earlier visibility into the innards of S/W development"]

In the TIME article, we can see the problem from before the first line of code was written. The White House CTO Todd Park was not part of the conversation from the beginning. The code was written by a myriad of government contractors. There was lack of ownership. Everyone bickered over passing the buck. The system had no private beta and no testing at scale.

So why was this a problem, and how were these problems fixed?

Reaching Scale

One of the criticisms launched against healthcare.gov was: “What’s so difficult about making a website, when Facebook and Twitter handle significantly more traffic?” But, I feel these arguments are oblivious to the amount of engineering that went into making Twitter and Facebook, even Google and Amazon. Each of these companies grew organically, developing scale as needed. And many of these companies had problems in scaling and would frequently go down right when they were getting popular.

Healthcare.gov, did not have private beta, or a gradual roll-out. It needed to scale from the beginning and that requires architecting for scale. The requirements are more stringent and any bad piece of code can bring the whole system down. You need to have a continuous cycle of test and deploy to make sure the code quality never degrades. While it’s not clear from the article if they setup Continuous Integration or Continuous Deployment, during the month of November there were 28 deployments. These small incremental changes make it easier to track the progress and catch the problems early. The continuous integration needs to test the scalability of the system so test must include load testing. Load testing is doubly important when you don’t do a gradual roll out and are expecting huge spikes on day one. It is the only way to understand the hidden inter-dependencies and side-effects for the system.

Ownership of Software

With different contractors working in silos, a lack of ownership was inevitable. But this culture of passing the blame and marking territories is also present in many large enterprises among teams. What White House CTO Park noted early on was that the teams were interested in fixing the problems. More than anything, what Park’s team did was provide leadership that enabled the capable engineers to work together with an attitude to create solutions, not more problems.

You Can’t Fix Something You’re Not Measuring

While the engineers involved in the project wanted to fix the problems, they had no sense of what was wrong and exactly how urgent it was.

So, to bring about both the sense of urgency to completing the task at hand and visibility into the problems, one of the first steps the team took was put together a dashboard to see the current state of the website. These dashboards were taken a step further by lining the walls of the office space with giant monitors displaying the various dashboards. This helped make it clear to the team what the goal was and it’s urgency. This was further improved with daily scrums leveraged to identify immediate problems and begin working on them.

Let the Coders Do What They Love

To me, DevOps is not about the tools. It is about human relationships. All of us have an inherent desire to be a part of creating something great. We want to help our fellow human beings. We want recognition for our hard work. And, we do not want to do things that detract us from what we love.

When developers have all of this, they are happier, more energetic, and more productive. DevOps is about solving this human problem. Getting everyone working together as a team with improved communication. Anything tedious is automated to avoid the frustration and consequent loss of productivity. This is what DevOps is all about.

By having leadership create a culture of, “Yes, we can,” Park’s team was able to overhaul the website, fixing the bugs and making it scale. Engineers were given ownership to solve problems, not take directions from managers. As the issues started to dwindle down on the prominent dashboards, the team started to see the results of their hard work. The effort ruined people’s Thanksgiving and Christmas celebrations. But, with the emotional investment they had in the final success of the website, the coders kept on trucking through.

This joy of having it work, of the pieces falling into place, is something we are very familiar with at Flux7. At our core, we are tinkerers and creators at Flux7. This is why we have chosen to focus on implementing DevOps solutions. Essentially, by supporting and creating a culture of DevOps, it allows us to fundamentally and effectively change an organization so people are able to accomplish more and are happier doing it.

If you want to understand why DevOps is necessary and important for your organization, or how to implement DevOps in your organization, click the “Contact Us” button now and drop us a line today.