Friday, December 30, 2011

How Netflix gets out of the way of innovation

#defrag 2011 presentation script.

I'm the cloud architect for Netflix, but rather than tell you about why we moved Netflix to a cloud architecture or how we built our cloud architecture, I'm going to tell you what we do differently at Netflix to create a culture that supports innovation.

What is it that lets us get things done very quickly. Sometimes a bit too qwikly…. but how did we keep making big strategic moves, from DVD to streaming, from Datacenter to Public Cloud, from USA only to International, all in very short timescales with a fairly small team of engineers.

My presentation slides are just box-shots of movies and TV shows that are available on Netflix streaming. This script is based on the notes I made to figure out what I was going to say for each box shot. If some of you see a show you didn't know we had and want to watch that would make me happy, you can click on the box shot to visit that movie at Netflix, they were all available for streaming in the USA at the time of writing.

I've attempted to match the box shots loosely as cues to what I'm saying, but I've also used a musical theme in places since this is for Defrag and Defrag rocks!

Netflix is now one of the largest sites that runs almost entirely on public cloud infrastructure. We have become a poster child for how to build an architecture that takes full advantage of the Amazon Web Services cloud. But when I talk to other large companies about what we have done, they seem to have a lot of reasons why they couldn't or didn't do what we did, even if they wanted to.

Why is that? Why are we heading in one direction while everyone else is going the other way? Are we crazy or are they zombies? Well, I've worked at other large companies so I have some perspective on the issues.

Before I joined Netflix I worked at eBay for a few years, and helped found eBay Research Labs. This was setup because eBay felt it wasn't innovating fast enough, and they were looking for the one missing ingredient that would drive more innovation into the company.

This is a fairly common approach. "You guys go and be innovative, then hopefully we will find ways to spread it around a bit." Unfortunately the end result of setting up a separate group to add innovation to a big company is more comical than useful.

The most interesting projects got tied in knots, they trod on too many toes or were scary. We visited Xerox Parc and IBM Santa Teresa Labs to discuss how they were setup, to try and learn what might work., and we went to an Innovation Forum in New York. That was weird, some of the primary examples they were talking about emulating were eBay and Paypal!

The projects that did get out were minor tweaks to existing ideas, they could be fun, but ultimately not very interesting.

So I had to break out of there and find something new to do, and in 2007 I joined Netflix just as they first launched streaming.

One of the key attractions for me was the Netflix culture I heard about in the interviews, I wanted to get inside their heads and figure out if what they were describing was real, and if so, was it sustainable as the company grew.

What I found out over the next few years is that the culture is what enables innovation, so that Netflix can get things done quickly that other companies are too scared or too slow to try. The rest of this talk is about the key things that we do differently at Netflix.

Before I get into them I want to warn you that even with a roadmap and a guide, you probably won't be able to follow this path if you are in a large established company. Your existing culture won't let you. However if you are creating a new company from scratch, I hope you can join me in what I hope is the future of cool places to work.

Here's the key insight. It's the things you don't do that make the difference. You don't add innovation to a company culture, you get out of its way.

I'm mostly going for SciFi at this point, because it's going to sound like I was beamed in from the future to some of you.

Let me repeat that. You have to setup a company that doesn't do many of the things you would consider business as usual. That's why it's so hard to retrofit.

How about some audience participation? Hands up everyone who hates answering questions by putting their hands up..

Who works at a company that has more than one product line? Do you get along? The problem is that the company loses focus and has trouble allocating resources where they are needed so there are big fights. Pick one big thing and do it well. For Netflix, our addressable market is everyone in the universe who likes watching movies and TV shows, that should keep us busy for a while.

Who has teams spread over multiple sites and countries? We don't. It adds communication and synchronization overhead that slows your organization down. For the geeks, think of Amdahl's law applied to people. We have as many people as possible in the same building on the same site. We are planning a new bigger building to make sure we can keep everyone close together. High bandwidth, low latency communication.

Who's worked for a place that bought another company, then run it into the ground, laid everyone off and wrote down the value. Over and over again. It's crazy. I don't think Netflix has ever bought another company. It's a huge disruption to the culture, if you see something you like just hire away their best people and out execute them in the market.

Who has junior engineers, graduate hires and interns writing code? We don't. We find that engineers who cost twice as much are far more than twice as productive, and need much less management overhead. Reducing management overhead is a key enabler for an innovative culture. Engineers who don't need to be managed are worth paying extra for. We are tiny compared to companies like Google, they take on raw talent and develop it, we sometimes take a chance on someone with only five years experience.

Who has an architecture review board and centralized coding standards? We don't have that either. What we do have is tooling that creates a path of least resistance, which combined with peer pressure keeps quality high. The engineers are free and responsible for figuring it out for themselves.

Who has an ITops team that owns keeping code running in production? We don't. The developers run what they wrote. Everyone's cell phone is in the pagerduty rota, the trick is making sure you don't need to get called. All the ops people here have horror stories of stupid developers, and vice versa, but it doesn't have to be that way. We have one dev organization that does everything and no IT ops org involvement in our AWS cloud deployment.

Who has to ask permission before deploying 100's or 1000's of servers? We don't. The developers use our portal directly, they have to file a Change Management ticket to record what they did if it's in production, that's all. We've trained our developers to operate their own code. We create and destroy up to 1000 servers a day, just pushing new code. AWS takes about 5 minutes to allocate 100 servers, it takes longer than that just to boot Linux on them.

Who has a centralized push cycle and has to wait for the next "train" before they can ship their code? We don't. Every team manages their own release schedule. New code updates frequently, and the pace slows for mature services. Teams are responsible for managing interface evolution and dependencies themselves. Freedom and responsibility again.

Who has project managers tracking deliverables? We don't. The line managers do it themselves. They own the resources and set the context for their teams. They have time to do this because we took the BS out of their role.Managers have to be great at hiring, technical and hands on enough to architect what their team does, and project manage to deliver results. Don't split this into three people. Reduce management overhead, minimize BS and time wasted. Teams are typically 3-7 people. Have a weekly team meeting and 1on1 with each engineer to maintain context.

Who has a single standard for development tools? We don't. We assume developers already know how to make themselves productive. We provide some common patterns to get new hires started, like Eclipse, IntelliJ, on Mac, Windows. Some people use Emacs on Linux. Hire experienced engineers who care, and they will take care of code quality and standards without being told how to.

Who has to work with people they don't respect? It's much too disruptive. The only way to get high talent density is to get rid of the people who are out of their depth or coasting.

That also applies to what you might call brilliant jerks. Even if they do great work, the culture can't tolerate prima donna anti-social behavior, so people who don't trust others or share what they know don't fit in.

So does that mean we value conformity? No but it's really important to be comfortable as part of a high performance team, so we look for people who naturally over-communicate and have a secure confident personality type.

If you haven't experienced a high performance culture, think about what it's like to drive flat out at a race track. Some people will be too scared to deal with it and drive around white knuckled at 40 mph, some will be overconfident and crash on the first corner, but for people who fit into the high performance culture it's exhilarating to push yourself to go faster each lap, and learn from your peers without a speed limit. When you take out the BS and friction, everyone gets so much more done that productivity, innovation and rapid agile development just happen. This is the key message, removing obstacles to a high performance culture is how innovation happens throughout an organization. Doing less to get more.

We don't pay bonuses. We don't have grades other than senior engineer, manager, director, VP. We don't count the hours or the vacation days, we say "take some". Once a year we revise everyones salary to their peers and current market rate - based on what we are paying now to hire the best people we can find.

We also have what sounds like a crazy stock option plan that grants options every month, vests the same day, and they last 10 years even if you leave Netflix. The net of this is less work for managers, they can concentrate on hiring top people, and almost everyone that leaves takes a pay cut. The test we make is "would you fight to keep your engineers if they tried to leave". If not, let them go now and get someone better. We don't make it hard to let people go.

Some of you may be thinking this sounds expensive, but what is the value of being incredibly productive and able to move faster than your competition? You can get out ahead and establish a leading position before anyone else realizes you are even in the game. Remember how a few years ago the "Analysts" said that Netflix the DVD company was going to get killed by other companies streaming, then all of a sudden people realized that we were streaming more bandwidth than anyone else?

So what could possibly go wrong? We had a near miss recently, we went too fast, partly because we could, got unlucky and screwed up. The good thing is that Netflix could re-plan and execute on the fixes we need very quickly as well, with no internal angst and finger-pointing. Also there was an Asteroid nearby earlier this week. By the way, my stepdaughter @raedioactive was the art director for this movie.

So, you are at a crossroads, you could be on stage with Eric Clapton, or in the audience watching and wondering why you can't do what they are doing. It's a radically different way to construct a corporate culture, it doesn't work for everyone, and we can't all be up on stage with Eric, but the talent is out there if you start by building a culture focused on talent density to find it and keep it.

Is it going to be the goats or the glory? I just told you all to stop doing things, what could be easier than that? It takes less process, fewer rules and simpler principles. Give people freedom, hold them responsible, replace the ones that can't or won't perform in that environment. Focus on talent density and conserving management attention span by removing the BS from their jobs.

This is your challenge, can you get a band together and go on a mission to save your company? Stop doing all the things that are slowing you down, and get rid of the unproductive BS that clogs up your management and engineers.

I will take questions in the comments or on twitter to @adrianco. Thank you.

20 comments:

I'd work at a place like this for half my current salary, but apparently that would hurt my chances of getting a job there. :)This place sounds so wonderful for an engineer, to me it's like something out of a Science Fiction novel.

Thanks for the comment ZEYEZ, I hope this is what the future looks like for more people... I like Alan Kay's quote: "Don't worry about what anybody else is going to do… The best way to predict the future is to invent it. Really smart people with reasonable funding can do just about anything that doesn't violate too many of Newton's Laws!"

It's a great read, and almost all of what you shared resonated with me. Thanks for taking the time to write it!

Here's where you lost me: "we sometimes take a chance on someone with only five years experience." I encourage you to stretch toward incompetency. There are tangible benefits to growing people while you continue to hire for experience. We had great success with it at Obtiva, and have now brought that with us to Groupon.

Adrian, you dont talk much about leadership. Is each team self-organizing and lead? what's your take on the importance of leadership? how do you find the leaders? who steers the direction of the product/stories?

Adrian excellent article that covers some of the things i ponder on a daily basis. I have a few questions on the same topic. how do you control the list of ideas that everyone has? how do you manage the prioritisation of the work you do? who decides what each team should do? how do you manage dependencies between teams? it would be great to understand how you manage the ideas/work before he teams do it by 'themselves' and deliver

Something about this statement reminded me of what a peer code review tool like CodeCollaborator does for development teams:

"Who has an architecture review board and centralized coding standards? We don't have that either. What we do have is tooling that creates a path of least resistance, which combined with peer pressure keeps quality high. The engineers are free and responsible for figuring it out for themselves."

Are you currently using a peer review tool to automate the code review process?

Nice job. Key message for me is that it's about people and culture, and not necessarily technology. The technology is sometimes a personal choice (IDE/editor), as well as the raw material for innovation.

Question: You mention that the dev teams are responsible for their own uptime. We want to implement something similar here but we're having trouble with monitoring tooling. Seems like Nagios and other software is designed for central administration rather than distributed usage. What do you use to manage individual monitors and notifications? Built in-house or customized OSS?