Hi!
I recently started working for a company thats starting to implement some of the ITIL processes. Part of my role includes problem management. I recently took the ITIL V3 foundations test (still waiting on the results) and plan to attend the Support and Restore Practitioner level training course next month. In the meantime my boss asked me to think about the problem management process that we plan to implement.

We recenlty moved our Incident Management system to Remedy in an effort to get better align with ITIL. Also, we are in the process of evaluating tools that will contain our CMDB. The only thing close to problem management that was done in the past was RCAs on priority 1 outages. Before this role, I worked with implementing a change management process, so I consider myself prepared to write-up and design a new process for problem management.

So, Where do I start in defining this new Problem Management process? I'll take all the help I can get.

The easiest start to problem management is the major incident process where the problem manager is doing quality assessment.

Problems cause incidents and the primary goal of problem management is to reduce the number of incidents. So a problem is something that if analysed will reduce incidents.

The service desk logs incidents, work requests and changes very well, but few problems are logged. You find the problems by sitting down with the IT teams and listing what they are currently working on. On this list you will be able to reference the incidents, work requests and changes. The rest of the stuff without references are the problems.

Another source of problems is to ask the IT customers what issues they have with service delivery of IT. The response will be, in the order of, e.g. my mail is slow. That equals a problem.

The first target of problem management is to find a workaround that is able to restore the service. Then you work on finding a final solution.

That solution will need to go through change management to be implemented which means some delay, whereas the workaround can be executed every time the incident occurs. So as you may for the final solution to be implemented , you minimize the impact of re-occuring incidents.

The first goal would be to find a workaround, so that as soon as an incident is report, a workaround can be applied to restore service to the customer. The fix, usually identified after a RCA, has to be analyzed for cost vs. benefit. Many times management will decide that a workaround is sufficient due to the $$ involved.

Here are some of the tasks that I am currently working on
- Define, standardized and socialize a RCA document
- Define Criteria for Incident->Problem
- Priority (criticality)
- Impact
- Reoccurance
- Set up tool (DB) for known-errors. (we already have a IMgmt & PMgmt tool)
- Develop and Implement a RCA procedure

During the regular support work, the repeating incidents that occur are taken to be candidates for a problem. Problem can also be identified based on the initial query or request from the customer received thru’ mails. Monthly analysis of the production log, done by the team leads could result in the identification of a problem.

so i guess the scope of problem management shoul be restricted to following steps :

1.Problem Classification

2.Problem Investigation and Resolution

3.Problem Closure & Review

4.Problem Tracking(which include maintaining a central repository to store the prob id # with the corresponding RCA report, so that in feauture it could be used to reduce redundant work).

The one that we have have in mind it is the concept on it problem management that is always related to an incident or to a change.
Use as best practices a crisis team's creation to solve the more critics and a team high level for the others.

i guess finding the immediate workaround solution so that the service can be restored to normal ASAP is a part of Incident management.......
It is out of scope topic for Problem Management Process.

Add,

As I understand it, Incident Management is responsible for implementing a fix to a known error. But in the situation where there is no workaround or fix to restore the incident, it is Problem Management's responsibility to develop a workaround or a solution for Incident Management to implement.

an incident is an incident.
A problem is a problem....
A known error is a problem whose cause is identified and for which a workaround is identified.
Problem Control is the first phase of PB management and focuses on finding a workaround.
Error control is the second phase of pb management and looks for a final solution.

Once a workaround is identified, that fix can be applied by incident management to restore the service. However identifying the workaroud belongs to Problem management...

an incident is an incident.
A problem is a problem....
A known error is a problem whose cause is identified and for which a workaround is identified.
Problem Control is the first phase of PB management and focuses on finding a workaround.
Error control is the second phase of pb management and looks for a final solution.

Once a workaround is identified, that fix can be applied by incident management to restore the service. However identifying the workaroud belongs to Problem management...

br
JP

I would have to slightly disagree with your statement that "identifying the workaround belongs to Problem management..."

I would say that the correct statement would be that "Problem management validates identified workarounds". Incident management may identify potential workarounds in the course of restoring service, but they can not be entered as true workarounds until Problem management has validated them.