I was wondering what useful leading metrics people might be using to predict failures/issues after release. (I'm assuming SW projects).

There are lots of lagging metrics that capture what has already happened after release (e.g. weekly number of defects reported by customers) and those can be used to build a predictive model themselves. But I am wondering if anyone else tracks things like "number of high level defects per total defects" and uses that to predict (with some level of accuracy :-) the number of defects that customers will see/report.

Reason/context: I'd like to be able to have something (measurable) in my back pocket in 12 months (the next big project) that says "sure, we can lower the quality bar and cut $$$ for quality improvements, but it'll result in a higher maintenance budget by about x%" That is, start measuring now, validate the data and results then use it as a guide on the next project(s).

Great question, but I would hope that if there was some predictive metrics, they would be applied immediately on the project and then it wouldn't fail. :) The challenge with trying to address your reason/context is the sheer # of factors that impact software development project success, many of which have nothing to do with cost.
– Dave WhiteJun 17 '11 at 14:45

1

I agree they could be applied on the project right now, but validating that they work (or how well they work) won't be "proven out" till the first data set... So, record them now and validate the signal they produce for use on future projects. Better?
– Al BiglanJun 17 '11 at 18:04

1

Virtual +1 to @MarkC.Wallace for putting a bounty on another person's interesting question, and for challenging the community.
– Todd A. Jacobs♦Jan 18 '13 at 19:36

It's also worth pointing out that number of defects, whether total or per-customer, is not synonymous with problem severity or cost assessment of the defects. Is 1,000 total defects with minor impact on 2/5,000 customers better or worse than 3 moderate defects that impact 499/500 customers? What about if only 20% of your customer base reports exactly 1 severe defect? What does percentage of defects that customers report actually buy you in your scenario?
– Todd A. Jacobs♦Jan 19 '13 at 20:05

Predictive Indicators in the Literature

If you poke around in IEEEXplore and the ACM Digital Library, you'll find hundreds of papers on how this or that metric is a jolly good way to predict something else. But notice that in all such cases the researchers 1. gathered the data and then 2. generated the models.

That is what we are all telling you to do: gather data. Generate models from the data.

So for example:

Nagappan and Ball found that defects found by static analysis was a predictive metric for defects found by testing.

Basili, Briand and Melo found that selected OO-related complexity metrics predicted the fault-proneness of different classes.

But Not All Is Happy in Paradise

Except, how good is this literature? A lot of stuff published on any topic in Software Engineering has problems with small sample sizes, weak statistical power, defective experiment design and so on. Software engineering is just one of those fields where the formidable apparatus of science is difficult and expensive to deploy. There are many things we will never know with any certainty.

Fenton and Neil wrote a brutal literature review in 1999 in which they point out all these problems with the literature on metrics as quality indicators.

Many organizations want to predict the number of defects (faults) in software systems, before they are deployed, to gauge the likely delivered quality and maintenance effort. To help in this numerous software metrics and statistical models have been developed, with a correspondingly large literature ... However, there are a number of serious theoretical and practical problems in many studies.

They go on to advocate Bayesian networks, which, if you are following closely, means that you will still need to ... collect data and generate models.

We Cannot Do Your Homework For You

There are no general equations of software quality. Only rules of thumb.

Software engineering is a complex process, in the sense that Bartosz Rakowski meant. There's lots of circular causality and lot of it is unobservable. The best you can do is form some loose models.

If you want to be able to say "Doing X has led to Y", you will need to collect those numbers yourself and derive your own figures and accept that they will be wrong anyhow. The second-best option is to use the industry statistics I linked to above.

+1 for a thorough recommendation for proper modeling. NB: Summarizing your citations was nice, because most visitors won't have access to the pay-walled research papers.
– Todd A. Jacobs♦Jan 19 '13 at 20:24

It sounds like what you're really talking about is some way to predict risk.

The problem here is that 'metrics' are things that are measurable (have happened), and the predictive part isn't absolute, especially since every project is different. so you would in effect be saying "on project A we allowed for only X testing and found X bugs", but that would not necessarily hold true for future projects.

I think your only real approach is risk mgmt and assumption tracking on current projects, coupled with lessons learned, and then applied to future projects in the risk mgmt/identification process.

Sorry, maybe my question isn't clear... I'm looking for an indicator for a project (e.g. "number of defects found") that provides a good predictor of number of customer tickets raised after release. The "for the next project" part is mostly because the signal may need to be tuned. That is "per 1000 defects found, we get 7 tickets raised in the first 6 months after release" I'm looking for anything people might use other than just "defects found during the project"
– Al BiglanJun 18 '11 at 2:55

Predictive Risk Calculations

There are a number of quality control frameworks like Six Sigma that attempt to calculate and control defect rates in a very math-oriented way. However, like many traditional project management practices, I personally believe these sorts of metrics work best in the manufacturing sector rather than in software development.

However, you might steal a page from the security sector and consider calculating the single-loss expectancy (SLE) and annualized loss expectancy (ALE) of your possible defects in order to estimate the potential cost of those defects over time. Because this calculation is focused on cost to the organization, rather than simply positing a rate of occurrence, the end result is probably more useful in making strategic decisions about what risks to control for within a project.

Predictive Metrics Involve Educated Guesses

Sadly, these metrics are still subject to the cone of uncertainty. For example, when calculating SLE with:

the value of EF is still largely subjective. Likewise, the Annual Rate of Occurrence (ARO) in the ALE calculation is also an educated guess based on your threat model.

That doesn't mean they're worthless. It simply means such calculations are most useful when devising planning scenarios, and designing project controls based around your tolerance for particular risks.

Ultimately, you still need to track current trends and past performance to determine whether your risk models remain accurate over time. If you didn't need to do that, your results wouldn't be predictions, they would be prophecies.

First paragraph almost qualified this answer for the bounty; but SLE/ALE are not the kind of leading indicators that OP is looking for. In fact they rely on establishing the kind of leading indicator that OP wants. If we had a leading indicator, EF need not be subjective. Tony Cox in particular has some trenchant observations about ways to get beyond SLE/ALE/ARO, etc. FAIR (Jack Jones) also has some very good decomposition of risk factors that transforms subjectivity into a big data exercise.
– Mark C. WallaceJan 24 '13 at 13:14

Complicated environment is one which you can slice into the non-influencing factors, analyse each factor and find a proper and working cause-effect prediction model. Instead it is a Complex environment which consist of mutually influencing factors capable of delivering different results while having the same initial conditions. Only difference is the time of activity and related system state.

E.g.:

You can have two different results based on the same project and the same people just because there are different interpersonal relations.

You can have two different results based on the same project and the same people just because they experienced and remember previous attempt.

You can have two different results based on the same people just because they have different subject matter expertise on a given project's subject.

Agile and TDD related, leading-trailing indicators

In most cases the failure statistics provided by unit testing, continuous integration, release automation and manual testing - although they are trailing indicators - are strongly correlated with the technical quality of a final product.

Agile practices like "iterations", "frequent delivery" or "continuous integration" (however you name it) - allows for live indicator of defect ratio. The same indicator cumulated over a time gives sort of a prediction of a technical debt. You can easily guess what tells you a trend line.

Technical quality is not THE quality

Customers use product and it's functionalities. They don't care if there is a defect in functionality if they don't use it. Moreover, they don't care if there are no technical defects if the functionality is poorly designed. For the customer the most important part of quality is how good a product was designed and how much of her needs were fulfilled.

Be sure you're not measuring ripples at a high sea

Sometimes it happens we believe having all the cards and that we rule all the data. It's important to remember that a volume of reported usage data is related to intensity of usage.

In other words, the same amount of users generate different amount of product usage in different circumstances, e.g.:

Users can have holidays or other assignments pending

Particular release triggered interest and influenced usage

Different user groups can have different engagement ratios as well as reporting tendency

So, be sure what you measure is a signal and you are aware of the biases involved.

If you are having trouble defending spending dollars or time on quality, there is a wealth of literature on the impact of building quality into a process rather than trying to correct for defects later. This is the foundation of the whole field of Quality (with a capital Q).

If you are trying to predict potential failures, I'd look into the realm of systems engineering and serious risk analysis/management. This kind of work is done often on large projects where identifying potential failures could cost lives or huge dollars.

I don't currently have an opinion about the model you're talking about, but I think your answer could be dramatically improved by adding some additional content. Answers should be largely self-contained, and not just sign-posts to linked resources.
– Todd A. Jacobs♦Jan 19 '13 at 19:31

I can't get to the papers; it sounds like you may be spot on for the bounty, but without access to the papers, I can't agree. If I had more confidence (e.g. a summary) that the papers were relevant, I might be motivated to try to surmount the obstacles to retrieving them.
– Mark C. WallaceJan 24 '13 at 13:17

@Mark why can't you get the papers? the links are working fine with me.
– M.SameerJan 25 '13 at 12:10

A good leading indicator of a process' output is to look at how well or whether the steps in the process are performed. If you require that the test script be written before the code, then measure the % of time the test is written.

I see many answers explaining the benefits of tracking/reporting metrics, but I admit to disappointment in not seeing lists of things people actually track. That is no answers to "what useful leading metrics people might be using" So I will attempt to answer my own question :-)

Defects per release (normalized) - we do weekly releases sized by features by/in points. I've found (# defects found by devteam / points) to be a good predictive measure

Daily and weekly on time release - we have daily builds at 4PM and weekly builds Friday at 3PM. If we miss two of these or more in a week, it seems to predict more customer reported issues

Number of test cases (normalized) - (number of new test cases / story points) in a release was nice to track/report because we found where our diminishing returns on writing test cases was. Beyond a certain number, didn't seem to impact customer noticeable issues

Contact hours with customer - this was a unique one... if we spent more than X hours in a week talking directly with a customer, quality went up. Not just review meetings, but any meeting.

"Mentor hours/week" - another oddball... when we asked for an external team member to consult on a project we noticed that above ~10 hours a week, the quality improved (they didn't do work directly, but was more "sitting with the team and being available)

These are (perhaps) obvious in retrospect - but it was (and remains) an interesting exercise to try to select metrics that can be tracked/reported easily and see if they are helpful or not. Also, doing this leads to some interesting experimentation which in turn leads to interesting process improvements.

Glad people found value in the question and really liked some of the answers! Hope mine adds to the conversation!