Applying Root Cause Analysis to Software Defects

25 June 20133 Comments

Root Cause Analysis (RCA) is about identifying the root causes of faults or problems and addressing them instead of treating the symptoms. It’s a process that grew out of accident investigations to become a standard feature of hardware engineering. If something is broken, instead of just fixing it at the point of discovery, let’s investigate and try to fix the underlying cause at the point of origin. This principle is applicable for software development, so much so, that it could have been developed to deal with software defects.

Careful application of RCA metrics can uncover serious inefficiencies in your software development processes. The cause of defects can be traced to the original requirements, the design, the code implementation, the verification, the test planning, or even the final QA itself. By addressing the issue at root, you can drastically improve the final software and save money at the same time. You can also fix fundamental problems with your processes that will benefit not only the current project, but all future projects as well.

Prevention is better than the cure

According to an IBM white paper [1], the cost of fixing a defect in the testing phase is up to 10 times more than if you catch it in the design stage right at the start. The cost to fix that same defect in a post-release product is up to 30 times more. The earlier you catch the defect, the more time and money you can save.

It’s also worth bearing in mind the potential damage you can cause by allowing software to hit the market with defects in it. Customer satisfaction is impaired and a bad experience can do irreparable harm to your reputation. You’ll also still end up having to fix the defect, which means another test cycle to ensure it is fixed and regression testing to ensure that you haven’t introduced any new defects.

Cast the net wide

As the champions of quality assurance, the QA department is best placed to drive forward the use of RCA. It is typically employed to investigate major defects and one-off disasters, but if you carry the methodology through to all defects then you can strengthen its accuracy and realize much greater benefits. A little extra effort in the short term will pay large dividends in the long term. Effective statistical analysis requires as great a data pool as possible and provides your best chance of identifying the patterns that betray a real problem. Find those problems, dig up the roots, and address the causes, and you can drastically improve your processes.

The ultimate aim is closely aligned with the remit of QA – to stop defects reaching the customer, but the side-effects of that include improvements in your processes throughout the software development chain. This can extend beyond individual projects and will result in more efficient use of your resources and a healthier bottom line.
Pulling everyone together

If you intend to employ RCA in software development then you must get everyone to buy in. You’ll need more than casual approval from the management, the software engineers, and the testers in order for this to work. Identifying the root causes of problems requires an objective analysis and may involve examining the original requirements, poring over design documents, double-checking how code was implemented, and putting both test plans and the execution cycles under a microscope.

You should expect some resistance at first because individuals will be concerned about taking the blame for mistakes or oversights. Make it clear from the outset that the intention is to uncover faults in the processes and not the individuals enacting them. Be very wary of creating a divisive atmosphere and avoid dwelling on the cause when it has been unearthed, instead move quickly on to collaborating on how it could have been prevented. It will be counterproductive if you get bogged down in arguments over who is to blame.

By documenting the RCA process and tracking the defects to their origins, you can begin to identify patterns. This can highlight weaknesses or blind spots in your development systems and it gives you a chance to introduce fail-safes and checking procedures to ensure that all the necessary boxes are ticked and signed off. It could be something as simple as a checklist, or something as in-depth as a formal review process involving staff from each department.

There is no denying that this requires an upfront investment in terms of time and resources, but it will prove worthwhile. Treating symptoms is a short term view. It was George Santayana who famously said, “Those who cannot remember the past are condemned to repeat it.”

By implementing RCA for defects you are investigating the past and taking positive action to ensure that you don’t repeat history and commit the same mistakes over and over again.

Kaushal Amin is Chief Technology Officer for KMS Technology, a software development and IT services firm based in Atlanta, GA and Ho Chi Minh City, Vietnam. He was previously VP of Technology at LexisNexis and a software engineer at Intel and IBM. You may reach him at kaushalamin [at] kms-technology.com.

Related Content:

The purpose of Root Cause Analysis (RCA) is to analyze problems to identify the main causes that have led to them, and to initiate actions to prevent similar problems from occurring in the future, as you explained. Over the years, I’ve been asked many times for hands-on solutions to do Root Cause Analysis. I have described a process and a checklist to help organizations that want to start with it, and I´m providing a report with an example. They are available at my Root Cause Analysis Tools Page.