My organization is revamping a number of its ITIL services (Incident, Problem, Capacity, and many more). With that said, How do you guys measure how effective your Problem Management Processes are & how do you set your Problem Management goals/expectations? Here's the key measurements we're focusing on & comparing to best-practices from a Process perspective in terms of measuring the process' effectiveness:
-Average time to resolve problem (based on priroty)
-Average time to identify root cause
-% of Problems meeting root cause target
-Average time to close record after resolution has been identified
-% of problems not solved

My dilemma, though, is going 1 step further. How do you set PM goals (Whether annually, 5-yr, etc.) to show how effective the Problem Management service is? I get that PM can lead to lower incident volume, decreased MTTR for incidents with an effective KEDB, increased system availability, etc....but how have you identified "HOW MUCH" improvement you'd expect to see in a given year? Is there a best-practice expectation?....2%, 5%, 10%, etc.

1) Do you have a goal for the Incident volume redution that should occur with effective Problem Management? (i.e. "With effective PM, we expect to see a X% reduction in incidents this year". If so, what percentage does your company use for X?

2) Do you have a goal for the Tier-1 Incident MTTR reduction based on effective PM? (i.e. Tier-1 incident MTTR will be reduced by X% based on effective PM processes).

Etc.

Just trying to understand what types of specific goals & percentage improvements you guys anticipate as a result of effective PM processes. Any help would be greatly appreciated.

Not an easy area. Problem analysis can be difficult or easy, the solutions can be protracted or simple to implement and these do not correlate well with the priority. Then the world your incidents live in keeps changing, leading to new problems.

Another aspect to look at is how pro-active your problem management is. So something like number (or percentage) of problems identified/analysed/resolved before they cause incidents might be useful.

Improvement figures cannot be taken easily from other environments. They too much depend on the level of maturity you start from, the volatility of the environment, how leading edge the technologies used and how much your organization is willing to invest in problem management.

In the end the performance of problem management has to be largely subjective based on experience, unless you are dealing with very large numbers._________________"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718

Makes complete sense....Our Problem Management processes, and Incident Management processes for that matter, are not mature at this point. That's part of the reason why we're revamping our RUN services across our IT organization. I agree that a lot of Problem Management is subjective, so I'm trying to find some good measures to make it more objective in order to show it's effectiveness. I'm hesitant in trying to setup a "Goal" measuring the reduction in Incident volume because I think other variables play a part in increasing/decreasing incident volume, so the volume of incidents may not be a good indication of how effective the Problem Management processes are. Here's where I do think I could accurately measure the effectiveness:

1) Number of Resolved Problem Records year-over-year
2) Number of incidents permenetly prevented as a result of resolved Problems - I would track this by accounting for the reported Incident volume that was initially related to the Problem record, and not try to predict what "could have been" had we left it unresolved, since there's really no way to predict this.
3) Percent of Problems logged as a result of proactive Problem management.
4) Decrease in the number of reoccuring incidents
5) Percent of Problems that have workarounds available.

These stats only work if you have a large infrastructure with many applications and hence many problems. Even then they have to be treated with caution:

1) Has to be linked with unresolved problems and/or with problems open for longer than anticipated. But you also have to distinguish problems where the analysis and proposal is complete but the solution is out of your hands.

2) This should be incident groups and has to take into account the severity and likelihood of the incidents.

3) The numbers have to be large to use % and again what about severity? It's more important to prevent three disasters then fifty inconveniences. Also might look at ones that could have been spotted pro-actively (to guide improvement actions).

4) In a big complex organization this may not happen much. Although it can if you have low maturity at the start.

5) Almost all problems have a work-around of some sort (even if it is just to restart the system or perform some task by other means). Unless you have a way to measure the quality of the work-around you are not measuring anything useful.

Also you sh/could measure cost against benefit.

In a volatile environment, some years you will do well to just keep pace unless you have an infinite budget._________________"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718

Thanks Diarmid. Our IT organization handles anywhere from 190,000-210,000 incidents a year, so I think the volume will give us plenty of opportunities for Problem Management improvements.

I don't disagree with your points, but perhaps I can add additional clarity:

1) Within our tool, we have the ability to identify the number of problems that have been resolved, as well as the number of problems that have gone through an analysis but may not get "resolved" based on things like: cost of the solution, solution requirements, likelihood/impact of the incident, etc., so we will be able to determine the number of records that have been fully resolved, and the number that have gone through the analysis but may not be "Resolved" based on other circumstances.

2 & 3) Within each of those metrics, we would go into detail about where those numbers fall in terms of severity of the incidents, so this would be covered. I don't disagree that a disaster outweighs inconveniences, but over time, the increased volume of inconveniences adds up & adds an element of ineffiency to our shop because now we're spent firefighting repeat inconveneince incidents. Our company, today, spends way too much time firefighting repeat incidents & we don't look for fire prevention opportunities.

5) Almost all problems have a work-around of some sort, but our company is awful at documenting this knowledge, which means even "Known" workarounds aren't available to a vast majority of our support teams because it's in somebody's head. The implementation of our KEDB should help with this of course, and although I wouldn't expect the % of problems w/ known workarounds to vary significantly from year-to-year, I think it has value in showing where we're inefficient in our communication to the incident groups. I'd consider this metric more of a benchmark.

1) Reduced number of incidents, with the problem team in place, I do expect to see less or at least a trend of incidents reduction, otherwise, why I need problem team.

2) Reduced number of downtime or performance degradation cases.

3) Improved service restoration time.

4) Reduced recurring issues.

I think problems are not easy to address and sometimes you need external vendors support, so basically it does not make any sense to measure the fix time for problem team like what we have done for incident team._________________Luo, Tian-Hong (Ken)
Regional Operation Lead

Makes complete sense....Our Problem Management processes, and Incident Management processes for that matter, are not mature at this point. That's part of the reason why we're revamping our RUN services across our IT organization. I agree that a lot of Problem Management is subjective, so I'm trying to find some good measures to make it more objective in order to show it's effectiveness. I'm hesitant in trying to setup a "Goal" measuring the reduction in Incident volume because I think other variables play a part in increasing/decreasing incident volume, so the volume of incidents may not be a good indication of how effective the Problem Management processes are. Here's where I do think I could accurately measure the effectiveness:

1) Number of Resolved Problem Records year-over-year
2) Number of incidents permenetly prevented as a result of resolved Problems - I would track this by accounting for the reported Incident volume that was initially related to the Problem record, and not try to predict what "could have been" had we left it unresolved, since there's really no way to predict this.
3) Percent of Problems logged as a result of proactive Problem management.
4) Decrease in the number of reoccuring incidents
5) Percent of Problems that have workarounds available.