Greetings - I am new to this site but it looks like an excellent ITIL community forum.

we am trying to rationalize the creation of a Problem Management practise within our organization and are seeking ITSM industry data indicating, on average, the percentage of Incidents which are selected for Problem Management.

We have an existing Incident Management process and are trying to build the case that, of X number of Incidents, Y number are typically selected for PM. In other words, because we do not have a PM process, Y number of problems go undiagnosed and consequently unresolved.

You will most likely not come up with such tangible metrics for your business case. Problem Management (PM) is more about proactive quality improvement to avoid work and pain in the future.

In most enteprises, PM looks across all Incidents. Think of Incidents as fires. When you look at a landscape of groupings, patterns, types, etc. of Incidents, you see fire patterns... some are small and insignificant, some are raging forest fires, etc. You will also see "potential" fires. PM is about picking certain fire areas, across your landscape, and finding ways to improve those areas so as to eliminate, reduce, and/or prevent fires. In order to do this effectively, it usually requires every flame (i.e. Incident) in your universe.

The most effective way to build a business case is to show patterns, trends, and visualizations that go across your entire Incident landscape to indicate the clarity/transparency into your support operations. When other people see the "fire" patterns, they then start to see potential for proative control of those fires. It will not be until the fires get close to them that they will react. So once you find the fire patterns, pick the ones that are closest to the decision makers to make them feel the proverbial "heat".

I like Franks reply, but I am intrigued by your comment "We ... are trying to build the case that, of X number of Incidents, Y number are typically selected for PM. "

One of the easiest places to start PM is with the reactive component. If you have an established incident process, you probably already have statistics on the cost to your company of incidents that are resolved at 1st point of contact vs those that are dispatched to on-site or technical support. Any incident can happen again unless the error in the infrastructure that caused it is removed. What about gathering the costs of, say the high severity incidents for a year, and projecting a modest reduction of that cost by deploying PM? (The downside is that in reality incidents still continue to occur - they're just from different causes!) Perhaps some variation of creative accounting will occur to you.

Side story: I've seen a help desk completely focussed on managing its call volume. They got a handle on that, then became focussed on first-call resolution. They are getting a handle on that, and have finally come to what I like to call 'the psychological moment', when they realize that it would be to their benefit if the underlying cause of the incidents was fixed. Almost no business case needed, and required a lot less tap-dancing from me

thanks to you both for your replies... access to the experience of senior ITIL'rs such as yourselves is really a great benefit...

I guess that you are saying that I am going about it the wrong way. What I had hoped to show was that industry data indicates that in any organization, a certain % of incidents are actually recurring, and that by identifying and subsequently correcting the root cause (through a structured ITIL'ized PM process), we could expect to eliminate that % (ignoring for the moment eisbergsk's bang-on comment that incidents still continue to occur - they're just from different causes).

So, how about I try it this way: Are you aware of empirical industry data which shows the benefits, in either headcount savings, or increased productivity, or % of incidents eliminated during the 1st year of the implementation of PM?

I do not mean to confuse anybody, but, despite the benefits of Problem management, if you are really seeking to reduce the number of incidents, I would seriously advise to investigate what you do exactly in terms of Change Management...

It is quite common to consider that 80% of incidents come from changes.
Final resolution of incidents require to identify and cure problems. Implementing a problem's solution is a change
==> you might create as many incidents as you solve ( ) without a proper change management in place ..._________________JP Gilles

thanks JP, and yes, we are aware of the Change to Incident relationship. When we started down the ITSM road several years ago, my organization chose to begin with Configuration, Incident and Change, so these processes are relatively mature. We are now embarking on the next steps and hope to implement Problem and Release, as long as our Business Case is solid, hence my search for industry data...

It sounds like you are in exactly the right spot. As much as I love Problem Mgmt (I have drunk the koolaid! I lick the pages and chew the staples of the manuals!), I accept that it is a 'shoulder' discipline that in fact cannot be implemented without Incident, Change and Config being there first.

I don't know about industry, but if it helps in any way, here is my experience. When I started with PM it was because our IT dept decided it was time, asked me to accept the position and told me to go forth and do. They were paying my salary anyway, I guess. There are no other tools...
The assumptions at that time were that most problems were reoccurring (which hints that our dept was incompetent), or as a result of changes. Focussing on the High Severity (Sev 1) incidents, what I found was that the majority of our Sev1 incidents were caused by business growth, and recurring incidents were mostly where that was known but it took us a LONG time to provision new hardware. In fact, the Severity 1s acted as live feedback on the success of products or promotions.
Over time, I could prove that once a problem was 'resolved' the incident rarely (less than .01%) reoccurred.
After an initial rise (almost 700 in 2003), our volume of Sev1 incidents has dropped to less than 500 in 2006, despite more applications and services being added to the mix. A culture that focusses on enabling cooperative problem solving rather than finding blame has improved the perception of our department in the corporation. However that does not translate as dollars... Plus, issues with our incident, change and CMDB tools make it almost impossible to do anything other than reactive PM, alas.
There is a list of Key Performance Indicators for the Critical Success Factors for PM. Perhaps there is something in there that might lead you to the data you need to get your case approved?
Best of luck to a fellow Canuck!
/Sharon
p.s. do you watch Corner Gas?

I don't think much rationalisation is required to justify problem management: it exists to reduce the number of incidents a service has. That seems pretty rational to me.

However, if you are seeking to build a business case to implement problem management I agree with Frank, metrics are hard to find, and anyway are dependant on how your services are provided.

So given you have no external metrics to use, and anyway they really depend on your environment, I'd suggest all you can do is create your own. In other words kick off a project to perform a once-off review of your existing incident database.

Look for common incidents over a period of time, see how long they take fix, and the effort required to fix then in terms of man hours for support staff. Include the varying costs of 1st/2nd/3rd level support if known. Factor in the cost to the business of the incident in terms of man hours lost productivity.

Then perform a mock problem management process, see how long it takes in terms of man hours. (I think it's reasonable to exclude the cost of actually fixing the problem because that would have to be done anyway at some point.) Finally make an estimate of the number of incidents you might still receive after the fix is implemented. This might be 0, but it is totally dependent on the nature of the fault, and you of course risk introducing other faults in the fix.

So if Ni = number of common incidents/period (eg 6 months)
H1 = Hours 1st level support spends on a common incident
H2 = Hours 2nd level support spends on a common incident
H3 = Hours 3rd level support spends on a common incident
Hu = Hours lost productivity for the user

If you can add a cost per function above all the better as will give a more meaningful result.

You could consider adding in a factor to deal with incidents that occur, but are not reported (especially if a fault is long standing, users will not bother calling it in 'cos it never gets fixed so why bother?).

Now you can argue about how accurate the above modelling is, and it is very rough and ready. But even without numbers the problem review project is a useful exercise.

It will tell you what KPIs are useful for your incident management process. Perhaps it will show you that you have no observable pattern to your incidents, which might mean you perform problem mangement as a one-time review every 6 months with temporary team members. If you have many instances of common incidents you might conclude you need a more permanent problem review board. Immediately.

Let me know if you do decide to do this as I'd be very interested in the results. I did a similiar exercise years ago but for risk management in a security environment, and the results were actually shocking to everyone!