The Road to Continuous Integration and Deployment

Six years ago, when I joined CloudShare, I was really impressed by the fact that we release a new version every two weeks. Back then, many companies were still practicing the waterfall methodology and “continuous delivery” was still an unknown term for most of us.

Over time, we evolved. We migrated from SVN to GIT, and began practicing continuous integration using Jenkins. We even had two attempts toward ‘continuous delivery’. Both were purely technical, and handled solely the engineering aspect.

The first was focused on our development pipeline and had a very positive impact: our CI process has matured, we improved our infrastructure to allow much less downtime during deployment, and we upgraded our deployment scripts. However, the release cycle didn’t change.

The second effort was focused on configuration management. We initiated an effort (which is still on-going) of keeping all of our configuration in Git using Chef. As a result, the deployment and configuration management of our very complex service has become significantly more mature. However, again, the release cycle and our dev process in general did not change.

Our old process cons (in short):

Looking at our development process as a whole I identified a few problems and aspects that were ‘out dated’ and needed improvement.

1. We had a two week code freeze for each release. This resulted in developers working on a different version than the testers were testing. Actual time from finishing coding to release was generally 3-4 weeks and never less than 2 weeks.

2. Due to historical reasons, QA was not an integral part of the teams. A very ‘waterfall-like’ structure. The downside of this is described in many places. In my view, as a former developer and a former team leader in CloudShare, the most painful disadvantage in this structure is the fact that a team leader is not independent. Even when implementing a simple work item that was completely under her ‘jurisdiction’, the Team leader needs to depend on the QA team to finish her tasks. And QA teams usually have their own prioritization and plans.

3. We had everything coupled together: delivery schedule was coupled with user story definition, user story implementation and prioritization process.

The above resulted in a risk for our quality. Team leaders and developers were required to handle complex coordination tasks to ensure the quality of what they release. We had too many parallel items ‘in progress’., For a developer/team leader or QA engineer to “not drop the ball” had become a very non-trivial task.

Due to the above (and more) I had been thinking of changing our process for more than a year. But I always had excuses to postpone the change: “I have a new QA manager”; “We have a new product manger, this is not the time for a change”; “I am missing several QA engineers to support the new structure”; “We just need to finish our new build scripts / our new provisioning methods / our new something very technical”; and so on. You get the picture.

Change!

About a month ago, I was (again) challenged by our new product leader: “This is a nice process, but it is really not optimal and up-to-date. Why can’t we release every week?” I finally decided to bite the bullet and lead the change in our development process.

The change is happening right now, so it is much too early for a retrospective or conclusions. I’ll just mention the highlights of what we are doing and will elaborate on our choices and conclusions in different post(s).

What we are doing:

1. We re-organized the teams. QA engineers are now an integral part of each team. The QA team remained very small and is focused mostly on automation tasks. The main advantage here (among many) in my perspective is that every dev team is now able to independently deliver most of its items. Our Team leaders will continue to act as product owners of their teams, but will now have an interdisciplinary and more capable team,

2. We are moving to a delivery cycle of 1 week. The act of the deployment itself (which is pretty straight forward) will be done by all the Dev group members in turns.

3. We are implementing a Kanban method. This will allow us to decouple the delivery cadence from user story implementation. We are still trying to fit every task in no more than two weeks in order to keep our ‘delivery batches’ small, but we do not force a ‘strict time box’ for each work item.

It will also allow us to easily evolve towards continuous delivery for when our deployment pipelines are automatic and mature enough for us to deploy every day – or even several times a day. In other words, we are not waiting for the technology to lead us. We’re building a process that fits current and future deployment capabilities. We decoupled the prioritization process as well. We will have a ‘backlog grooming’ meeting once a month for start. Using Kanban, we will be able to enforce strict limits on the amount of our ‘work in progress’ and to identify our bottlenecks. We are still learning the limits.

Summary:

That’s it. This is our plan. We started last week. I am sure we will hit a lot of bumps in the road, but I am very excited and have a very good feeling about this change. We’ll update on specifics (like our Kanban board columns and limits), lessons learned, and other outcomes in future posts.

VP Engineering @CloudShare.
Asaf leads R&D for CloudShare, including the development of all software solutions we offer. He started working at CloudShare in 2008 where he has been responsible for many different areas of Engineering. Prior to CloudShare, Asaf worked as a developer for SAP Labs. Asaf holds an MSc in computer science from the Weizmann Institute and a BSc in mathematics and computer science from Tel Aviv University.