August 6, 2011

Causes of failure in software deployments and solutions

Our goal is to deliver useful, high quality software to our customers. Software is not “done” until it is deployed and tested by our customers. But we encounter many problems during deployment. This post outlines common problems that lead to failure during deployment, along with common solutions.

Problem

Our work is not done until the product passes acceptance tests in the production environment or passes those tests in a target environment that is suitably like production so that we can have confidence in both the product and the process used to deploy it.

I have seen MANY projects struggle to deploy software that everyone thought was “done.” It passed tests in the development environment. But all of a sudden, it seems to be a struggle to get it to work in another environment. Here are the reasons I see and their well-known solutions. If you can offer additional advice, please post it for my benefit and the benefit of others.

Poor packaging

If you are going to deploy software, you have to bring all the components together into one place, and package them. If it is a simple application, you may simply compile it, and use a compression program like WinZip to put them into one package, the winzip archive. If your application uses stored procedures and other SQL objects, something must be included to ensure the target environment can have its databases synchronized with the latest version.

Given the obvious importance of this, you would think that teams would establish uniform standards for how packaging is done on each project. This would include clearly written and up-to-date instructions on how developers can build the latest package.

Since human actions are so error-prone, many developers wisely write batch scripts that will automate the packaging on the build of a release version or at the touch of a button. For those who believe in continuous integration (CI), it is best to build this automation into the CI processes so that the software can be deployed automatically with each build in any target environment as desired.

Poor deployment instructions

Now that we have our software package, what do we do with it? In some companies, the developer deploys the software. I have worked that way in several places over the years. As of this writing (2011) it is more common for companies to require developers to give it to someone to deploy. There are several reasons why this is done. One is motivated by common audit recommendations to follow the principle of separation of duties. This allegedly reduces the potential for fraud. That notion comes from audit practices that evolved in finance. In that context, it makes sense. In most places that follow it for software, its use to reduce fraud is a farce. After all, how many places review what the code does? Deployers simply deploy what is provided. If it has malicious code to scramble the database on my birthday, nobody will know until my birthday arrives. Personally, I would rather have a nice dinner at an Italian restaurant to celebrate my birthday. I work hard to write good code. It goes against my values to write malware. But I comply with this requirement because I am a team player. If you REALLY wanted to protect your company, you would employ pair programming, and review all source code.

Despite my disdain for the anti-fraud motivation, I do believe there is value in requiring separation of duties for software deployment. I have personally witnessed place where programmers were responsible for the entire chain of deployment from test environments through to production. Some programmers are very disciplined and meticulous. They document the steps required. If errors are found in the package or deployment instructions, they examine the root cause, then correct the software or documentation, so that future deployments do not experience that error. Others are careless and make fixes in the target environment without fixing the source of the problem. Consequently, the next person to deploy the software experiences the same problem again, which is needlessly wasteful. If a third-party deploys the software and documents the failures, they could insist that the developer fix the problem and submit new packaging or instructions until it works.

I can imagine better approaches than either of these extremes for deployment. Can you?

Human error on deployment

I have done deployments. I have witnessed others do them. They are typically tedious and error-prone. I myself have seen myself make mistakes even when I am sober, awake, and trying very hard to get it right.

Our best solution is to simply automate deployments and avoid all this human error.

If you cannot, take great care in preparing your document and make check boxes at critical control points,. I recommend the deployers print the document and check off steps as they are completed. They often ignore me on this and often skip steps. If you have read this far, you are probably a talented and serious software developer. So please don’t laugh at my next line. But I think it is quite challenging to write installation documents that are clear and easy to follow.

Oh, and interruptions are the enemy of manual deployments. Doing them from home with small children and pets demanding attention is a recipe for disaster. Doing them at work with frantic coworkers interrupting you is equally disastrous. I would love to create a sealed room with one entrance, and a deployment manager guarding the entrance, in which deployers could retreat to do deployments in peace.

Working people around the clock is also an invitation to human error. It is sad that I should have to write that down.

Environmental mismatches

In an ideal development shop, developers would work with systems that resemble production as closely as possible. That way they can learn about the features of the technology and create products that work well with them. If you test against one web server, and deploy to another in production, there is certainly a chance it will not work in production, even though it worked in development. We want our experience in development and test environments to reliably predict our experience in production.

Once the software is unit tested, it would then be deployed to another environment that resembles production as closely as possible. It would be tested in that environment. If it passed acceptance tests in that environment, you would have confidence it would work just as well if the same package were deployed to production with the same instructions.

The reasons for environmental mismatches are many and worthy of another post. But it is enough in this context to emphasis that software must be developed in and deployed to environments that resemble production as closely as possible. Any variation from that is a likely source of unexpected failure during deployment and the subsequent testing.

Like this:

Related

It takes continuous integration a (big) step forward and talks about how to automate the build pipeline, i.e. the staging of a build from development to testing and finally production environments in an automated fashion.

It was a mind opener for me on how this “separation of duties” between developers, testers and operations does more harm than it solves the actual problem. To find bugs, mismatches and deployment errors early and not too late we need open and early collaboration between all these people. Look for DevOps movement too.

I followed the link you provided and I am fascinated by their concept of DevOps. It is what I have been looking for. For those who have seen the power of cross-functional collaboration between programmers, testers, business analysts, and subject matter experts, it seems only logical to embrace our colleagues in the operations and deployment group. As a former manager of an operations and deployment group, and now a developer, I am acutely aware of the enormous potential to streamline the delivery of value to the business.