Developing and Maintaining Secure and Reliable Software in the Real World

Wednesday, February 13, 2013

Releasing more often drives better Dev and better Ops

One of the most important decisions that we made as a company was to release less software, more often. After we went live, we tried to deliver updates quarterly, because until then we had followed a staged delivery lifecycle to build the system, with analysis and architecture upfront, and design and development and testing done in 3-month phases.

But this approach didn't work once the system was running. Priorities kept changing as we got more feedback from more customers, too many things needed to fixed or tuned right away, and we had to deal with urgent operational issues. We kept interrupting development to deploy interim releases and patches and then re-plan and re-plan again, wasting everyone’s time and making it harder to keep track of what we needed to do. Developers and ops were busy getting customers on and fire fighting which meant we couldn't get changes out when we needed to. So we decided to shorten the release cycle down from 3 months to 1 month, and then shorten it down again to 3 weeks and then 2 weeks, making the releases smaller and more focused and easier to manage.

Smaller, more frequent releases changes how Development is done

Delivering less but more often, whether you are doing it to reduce time-to-market and get fast feedback in a startup, or to contain risk and manage change in an enterprise, forces you to reconsider how you develop software. It changes how you plan and estimate and how you think about risks and how you manage risks. It changes how you do design, and how much design you need to do. It changes how you test. It changes what tools people need, and how much they need to rely on tools.

It changes your priorities. It changes the way that people work together and how they work with the customer, creating more opportunities and more reasons to talk to each other and learn from each other. It changes the way that people think and act – because they have to think and act differently in order to keep up and still do a good job.

Smaller, more frequent releases changes how Development and Ops work together

Changing how often you release and deploy will also change how operations works and how developers and operations work together. There’s not enough time for heavyweight release management and change control with lots of meetings and paperwork. You need an approach that is easier and cheaper. But changing things more often also means more chances to make mistakes. So you need an approach that will reduce risk and catch problems early.

Development teams that release software once a year or so won’t spend a lot of time thinking about release and deployment and operations stuff in general because they don’t have to. But if they’re deploying every couple of weeks, if they’re constantly having to push software out, then it makes sense for them to take the time to understand what production actually looks like and make deployment - and roll-back – easier on them and easier on ops.

You don’t have to automate everything to start – and you probably shouldn't until you understand the problems well enough. We started with check lists and scripting and manual configuration and manual system tests. We put everything under source control (not just code), and then started standardizing and automating deployment and configuration and roll-back steps, replacing manual work and check lists with automated audited commands and health checks. We've moved away from manual server setup and patching to managing infrastructure with Puppet. We’re still aligning test and production so that we can test more deployment steps more often with fewer production-specific changes. We still don’t have a one-button deploy and maybe never will, but release and deployment today is simpler and more standardized and safer and much less expensive.

Deployment is just the start

Improving deployment is just the start of a dialogue that can extend to the rest of operations. Because they’re working together more often, developers and ops will learn more about each other and start to understand each other’s languages and priorities and problems.

To get this started, we encouraged people to read Visible Ops and sent ops and testers and some of the developers and even managers on ITIL Foundation training so that we all understood the differences between incident management and problem resolution, and how to do RCA, and the importance of proper change management – it was probably overkill but it made us all think about operations and take it seriously.
We get developers and testers and operations staff together to plan and review releases, and to support production and in RCA whenever we have a serious problem, and we work together to figure out why things went wrong and what we can do to prevent them from happening again. Developers and ops pair up to investigate and solve operational problems and to improve how we design and roll out new architecture, and how we secure our systems and how we set up and manage development and test environments

It sounds easy. It wasn't. It took a while, and there were disagreements and problems and back sliding, like any time you fundamentally change the way that people work. But if you do this right, people will start to create connections and build relationships and eventually trust and transparency across groups – which is what Devops is really about.

Once you start moving faster, from deploying once a year every few months to once a month and as your organization’s pace accelerates, people will change the way that they work because they have to.

Today the way that we work, and the way that we think about development and operations, is much different and definitely healthier. We can respond to business changes and to problems faster, and at the same time our reliability record has improved dramatically. We didn't set out to “be Agile” – it wasn't until we were on our way to shorter release cycles that we looked more closely at Scrum and XP and later Kanban to see how these methods could help us develop software. And we weren't trying to “do Devops” either – we were already down the path to better dev and ops collaboration before people started talking about these ideas at Velocity and wherever else. All we did was agree as a company to change how often we pushed software into production. And that has made all the difference.

No comments:

Subscribe to this blog

About Me

I am an experienced software development manager, project manager and CTO focused on hard problems in software development and maintenance, software quality and security. For the last 15 years I have managed teams building and operating high-performance financial systems.
My special interest is how small teams can be most effective in building real software: high-quality, secure systems at the extreme limits of reliability, performance, and adaptability. Software that has to work, that is built right, and built to last.
I use this blog to explore ideas and problems in software development that are important to me. To reflect and to find new answers.