“You build it, you run it”

jrochkind

4 years ago

Advertisements

I have seen several different approaches to division of labor in developing, deploying, and maintaining web apps.

The one that seems to work best to me is when the same team responsible for developing an app is the team responsible for deploying it and keeping it up, as well as for maintaining it. The same team — and ideally the same individual people (at least at first; job roles and employment changes over time, of course).

If the people responsible for writing the app in the first place are also responsible for deploying it with good uptime stats, then they have incentive to create software that can be easily deployed and can stay up reliably. If it isn’t at first, then the people who receive the pain of this are the same people best placed to improve the software to deploy better, because they are most familiar with it’s structure and how it might be altered.

Software is always a living organism, it’s never simply “done”, it’s going to need modifications in response to what you learn from how it’s users use it, as well as changing contexts and environments. Software is always under development, the first time it becomes public is just one marker in it’s development lifecycle, and not a clear boundary between “development” and “deployment”.

Compare this to other divisions of labor, where maybe one team does “R&D” on a nice prototype, then hands their code over to another team to turn it into a production service, or to figure out how to get it deployed and keep it deployed reliably and respond to trouble tickets. Sometimes these teams may be in entirely different parts of the organization. If it doesn’t deploy as easily or reliably as the ‘operations’ people would like, do they need to convince the ‘development’ people that this is legit and something should be done? And when it needs additional enhancements or functional changes, maybe it’s the crack team of R&Ders who do it, even though they’re on to newer and shinier things; or maybe it’s the operations people expected to it, even though they’re not familiar with the code since they didn’t write it; or maybe there’s nobody to do it at all, because the organization is operating on the mistaken assumption that developing software is like constructing a building, when it’s done it’s done.[1]

I just don’t find that it works well to create robust, reliable software which can evolve to meet changing requirements.

There is another lesson here: Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.

When something breaks at night, the ops engineer can hope that enough documentation is in place for them to figure out the dial and knobs in the application to isolate and fix the problem. If it isn’t, tough luck.

Putting developers in charge of not just building an app, but also running it in production, benefits everyone in the company, and it benefits the developer too.

It fosters thinking about the environment your code runs in and how you can make sure that when something breaks, the right dials and knobs, metrics and logs, are in place so that you yourself can investigate an issue late at night.

The responsibility to maintaining your own code in production should encourage any developer to make sure that it breaks as little as possible, and that when it breaks you know what to do and where to look.

That’s a good thing.

None of this means you can’t have people who focus on ops other people who focus on dev; but I think it means they should be situated organizationally close to each other, on the same teams, and that the dev people have to have share some ops responsibilities, so they feel some pain from products that are hard to deploy, or hard to keep running reliably, or hard to maintain or change.

[1] Note some people think even constructing a building shouldn’t be “when it’s done it’s done”, but that buildings too should be constructed in such a way that allows continual modification by those who inhabit them, in response to changing needs or understandings of needs.