Search form

Why We Need DevOps Now

Breadcrumb

Since 1999, my passion has been studying high performing IT organizations. This journey started off when I began keeping a list called “Gene’s list of people with good kung fu.” The people on the list talked differently about IT Operations, acted differently, and most importantly, had profoundly different results than your typical IT organization.

On this journey, I studied these high performers and benchmarked over 1,500 organizations. Our goal was to understand what enabled them to do what most organizations could only dream of. Our findings went into a book that we published in 2004 called The Visible Ops Handbook, which described how these organizations made their “good to great” transformation.

What I couldn’t have predicted was how this journey would take me straight into the heart of the DevOps movement. As my friend John Willis told me after I dismissed DevOps as just another marketing fad, “DevOps is the best chance at relevance that IT Operations has had in thirty years.” I immediately realized that he was right.

By putting DevOps patterns into practice, organizations like Etsy, Netflix, Facebook, Amazon, Twitter and Google are achieving levels of performance that were unthinkable even five years ago: tens or even hundreds of code deploys per day, while delivering world-class stability, reliability and security.

DevOps typically refers to the emerging professional movement that advocates a collaborative working relationship between Development and IT Operations, resulting in the fast flow of planned work (i.e., high deploy rates), while simultaneously increasing the reliability, stability, resilience of the production environment.

Why Everyone Needs DevOps

There is currently a core, chronic conflict that exists in almost every IT organization. It is so powerful that it practically pre-ordains horrible outcomes, if not abject failure. It happens in both large and small organizations, for-profit and non-profit, and across every type of industry.

In fact, this destructive pattern is the root cause of one of the biggest problems we face. But, if we can beat it, we’ll have the potential to generate more economic value than anything we’ve seen in the previous 30 years.

Act I begins with IT Operations, where we’re supporting a large, complex revenue generating application. The problem is that everyone knows that the application and supporting infrastructure is... fragile.

IT Operations

How do we know? Because every time anyone touches it, it breaks horrifically, and causes an epic amount of unplanned work for everyone.

The shameful part about this is that how we find out about the outage. Instead of an internal monitoring tool, it’s when a salesperson calls and says, “Hey, Gene, something strange is happening. Our revenue pipeline stopped for two hours.” Or, “the banner ads in my market are being served upside down and in Spanish.”

There are so many moving parts that it takes way too long to figure out what caused the problem du jour, which means we’re spending more and more time on unplanned work and increasingly unable to get our planned work done.

Eventually, our ability to support the most important applications and business initiatives goes down. And when this happens, the organization members suddenly find themselves unable to achieve the commitments that they promised the outside world, whether it’s customers, analysts or Wall Street.

In Act 2, our life gets worse when the business starts making even bigger commitments to Wall Street, often dreamed up by art or creative writing majors. We all love these people (rest assured that some of my best friends are creative writing majors), but they’re often not the people who have the best grasp on what technology can and can’t do.

And yet, they start dreaming up new features that are sure to dazzle the marketplace, writing the business requirements, and making promises to the outside world.

Product Management

Enter the Developers. They start seeing more and more urgent date-driven projects put in the queue, often requiring things that the organization has never done before. Because the date can’t be moved (because of all those external promises made), everyone has to start cutting corners.

Development must focus on getting the features done, so the corners that get cut are all the non-functional requirements (e.g., manageability, scalability, reliability, security, and so forth). This means that technical debt starts to increase. And that means increasingly fragile infrastructure in production.

It is called “technical debt” for a reason—because technical debt, like financial debt, compounds. So, what begins like this:

Eventually turns into this:

When technical debt begins to accumulate, something very insidious starts happening. Our deployments start taking longer. What used to take an hour now takes three hours, then a day, then two days—which is okay, because can still get it done in a weekend. But then it takes three days, and then a week, then two weeks!

Our deployments become so expensive and so difficult that the business says that we have to lengthen the deployment intervals, which goes against all your instincts and training. We know that we need to shrink the batch sizes, not make them bigger, because large changes make for larger failures.

The flow of features slows to a trickle, the deployments take even longer, more things go wrong, and because of all the moving pieces, issues take even longer to diagnose. All our best Dev and Ops people are spending all their time firefighting, and blaming each other when things go wrong.

What it feels like when we're trapped in a system that pre-ordains failure.

I’m guessing that most of you can relate to at least some portions of this story? In my fifteen years of research in this area, I've found almost all IT professionals have experienced this cycle. This was the low point in my journey.

Act 3: How DevOps Breaks Us Out Of Our Downward Spiral

We all know that there must be better way, right? DevOps is the proof that it’s possible to break the core, chronic conflict, so we can deliver a fast flow of features without causing chaos and disruption to the production environment.

When John Allspaw and Paul Hammond gave their seminal “10+ Deploys Per Day: Dev and Ops Cooperation at Flickr” presentation at the 2009 Velocity Conference, people were shocked and amazed, if not outright fainting in the aisles at the audaciousness of their achievement.

It wasn’t a fluke. Other organizations such as Facebook, Amazon, Netflix and the ever-growing DevOps community have replicated their performance, doing hundreds, and even thousands, of deployments per day. But DevOps isn’t just for large companies. It’s for any company where the organizational goes have a high reliance on IT. These days, that means almost every company, large or small, for-profit or not-for-profit, etc.

As a friend once told me, “Before you can solve a complex problem, you must first have empathy for the other stakeholders. And story-telling is most effective means of creating a shared understanding of the problem.”

Dr. Eliyahu Goldratt demonstrated the power of a novel as a teaching tool through his book, The Goal: A Process of Ongoing Improvement. It’s a novel written in the 1980s about a plant manager who has 90 days to fix his cost and due date issues or his plant will be shut down. When I read this book nearly 15 years ago, I knew that this story was important, and that there was much I needed to learn, even though I never managed or worked in a manufacturing plant.

It isn’t an overstatement to say that The Goal and Dr. Goldratt’s Theory of Constraints changed my life—in fact, it probably was one of the biggest influences on my professional thinking. For eight years, my co-authors and I have wanted to write The Phoenix Project, because we all believed that IT is not a department, but a strategic capability that every business must have.

As you can imagine, I was incredibly honored and thrilled when Jez Humble, author of the award-winning book Continuous Delivery recently told me, “This book is gripping tale that captures brilliantly the dilemmas that face companies which depend on IT. The Phoenix Project will have a profound effect on IT, just as The Goal did for manufacturing.”

Our book is only one part of what is undoubtedly going to be a revolution in IT. Throughout “DevOps December,” you’re going to see the writings of many of the most prominent thought-leaders in the DevOps movement, including Jez Humble, Patrick Debois and more. You’ll hear about how they characterize the problem, and more importantly, how we create solutions that work for us.

For those you are looking for some places to start your DevOps journey, here are my three favorite DevOps patterns:

Make sure we have environments available early in the Development process. Enforce a policy that the code and environment are tested together, even at the earliest stages of the project.

Mitchell Hashimoto will be talking about how to reduce inconsistencies between dev, test and production environments with Vagrant, and Max Martin will talk about how the Puppet Enterprise development team builds their infrastructure—topics that support this pattern.

Wake up developers up at 2 a.m. when they break things. The goal is to shorten and amplify feedback loops, and to bring Development closer to the customer experience (which includes IT Operations and the end-users of the service being delivered).

In DevOps work streams, developers often deploy their own code, and is also fixing code issues when things go wrong. By doing this, developers can see the consequences of their decisions and actions. (Note the symmetry here: the previous pattern #1 about making environments available early is all about embedding IT Operations into Development, while this pattern is about putting Development into IT Operations.)

This is all about culture, which Patrick Debois and a group of Puppet Labs people will write about near the end of December.

Create reusable deployment procedures: When every deployment is done differently, every production environment can become different, like snowflakes. When this occurs, no mastery is ever built in the organization in procedures or configurations. As Luke Kanies said, “If your infrastructure is special, you’re doing it wrong.”

For example, we would build a reusable user story for IT Operations called “Deploy into high availability environment,” which then defines the exactly the steps to build the environment, as well as how long it takes, what resources are required, etc.

Jez Humble will talk more about world-class continuous integration and deployment processes later in the month.

Thanks to Puppet Labs for assembling such a great cast of characters. Enjoy “DevOps December!”

About the author: Gene Kim is a multiple award winning CTO, researcher and author. He was founder and CTO of Tripwire for 13 years. He has written two books, including The Visible Ops Handbook, and is now writing The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win. Gene is a huge fan of IT operations, and how it can enable developers to maximize throughput of features from “code complete” to “in production,” without causing chaos and disruption to the IT environment. He has worked with some of the top Internet companies on improving deployment flow and increasing the rigor around IT operational processes. In 2007, ComputerWorld added Gene to the “40 Innovative IT People Under The Age Of 40” list, and was given the Outstanding Alumnus Award by the Department of Computer Sciences at Purdue University for achievement and leadership in the profession.