Hi guys, just have a scenario question here. My organization is moving to a Remedy based Incident/Problem management solution, so we have the chance to effectively change the way we operate to align more closely with ITIL standards (hooray!). I'm having some issues wrapping my brain around this particular scenario, given the infomation I found in the ITIL V3 definitions.

Known Error:
A Problem that has a documented Root Cause and a Workaround.

Workaround:
Reducing or eliminating the Impact of an Incident or Problem for which a full Resolution is not yet available.

The scenario goes like this:
Multiple users call in to report an application crashing during synchronization, the same error message is seen each time. For each user calling the Service Desk, a new Incident record is logged. After a multitude of calls, a parent Incident is created and all other incidents are linked as child. After troubleshooting the scenario without any resolution the Incident records are escalated to Tier II. Once a workaround has been acquired, the Incident triggers a 'Known Error'. The workaround is applied to all affected machines to restore service and the Incidents are closed. A Problem record is created to determine the root cause of the Incidents.

This would seem to make sense to me, but by the ITIL definition, I cannot create a "known error" untill I have a workaround and root cause. Maybe I am just confused in terminology, but I'd like to add something lie ethe error and the workaround to a known error database so that subsequent calls can be related to that known error database entry.

Last edited by SemiFrank on Sat Jun 27, 2009 1:19 am; edited 2 times in total

First of all, Known Errors are the responsibility of Problem Management not Incident Management

Quote:

but I'd like to add something lie ethe error and the workaround to a known error database

I think what you are refering to is the KEDB (Known Error Database, actually part of the CMDB) it is used by Incident Management to look for any potential workarounds currently available for this type of incident.

As for the error you can put that in the description.

Thats what I understood from your question, if thats not the case then please clarify.

My position is one that will involve me in Incident Management and Problem Management, so I'm not too concerned about where it lies.

I see the discovery of a workaround as a trigger to closure of incident records and as the beginning of the problem management process, with the "known error" and an entry into the KEDB as the catalyst.

So to make it simple (I'm having difficulty defining what im having a problem with here): I see "Known Error" being sandwiched between Incident Management and Problem Management, but according to the ITIL definition, I can't have a "known error" untill *after* the problem management process (because i need a workaround *and* a root cause).

In terms of formal definitions, a ‘Problem’ is an unknown underlying cause of one or more Incidents, and a ‘Known error’ is a Problem that is successfully diagnosed and for which a Work-around has been identified.

So lets say in your scenario that you are relating all incidents to a problem. Here comes the Problem Management part in identifieng the root cause of the problem. Incase Incidents are related to unresolved Problem, then problem management will advise Incident Management staff on the best available Work-arounds(Temp work around) to restore services untill a permanent fix had been found then the problem can be closed with all related incidents to it.

this question has come up a couple of times. You might find additional insight in the other threads.

I think this a good example of when not to take ITIL too literally. don't forget the premise behind ITIL is guidance towards best practice. In order to offer this ITIL inevitably has to formulate definitions. but that does not mean that they are the only valid or best definitions for everyone.

As far as I am concerned, as soon as you have something consistently going wrong in a way that you can describe from its symptoms, then you have a known error. You also have a problem. The sooner you can relay this to your support staff the better. As soon as an incident from this error has been resolved you have a workaround of sorts (it may not be brilliant and you may want to find a better one, but it does get the user/s back on stream).

With your problem management hat on, you want to get this info recorded on whatever serves for your known error database right away, even if the workaround is couched in "maybe"s and "not sure"s. you may also flag warnings if you see risks involved.

Meanwhile, you will be triggering analysis of the incidents to determine root cause (a nicely opaque concept anyway). But it does not make practical sense to wait until you have that pinned down when you already know enough to get users back up and running.

SemiFrank wrote:

I see the discovery of a workaround as a trigger to closure of incident records and as the beginning of the problem management process

I would like to quibble with this.

The trigger for closing an incident is not the discovery of a workaround but the application/implementation of a workaround and thus the restoration of normal service (at least temporarily).

I would never delay the beginning of the Problem Management process for the lack of a workaround.

It is important to realise that there is no sequential link between Incident and Problem.

Back to my opening comment. You should no more allow the ITIL text to become your master than you would allow your other tools that position._________________"Method goes far to prevent trouble in business: for it makes the task easy, hinders confusion, saves abundance of time, and instructs those that have business depending, both what to do and what to hope."
William Penn 1644-1718

As far as I am concerned, as soon as you have something consistently going wrong in a way that you can describe from its symptoms, then you have a known error. You also have a problem. The sooner you can relay this to your support staff the better. As soon as an incident from this error has been resolved you have a workaround of sorts (it may not be brilliant and you may want to find a better one, but it does get the user/s back on stream).

As much as I agree with the statement that ITIL should not be taken too litterally ("adopt and adapt"), I cannot agree with the suggestion made here about having a known error "as soon as you have something consistently going wrong in a way that you can describe from its symptoms".

In Problem Mgmt there are 2 fundamentally different things: problems and known errors. While a problem is the unknown underlying cause of one or more incidents, a known error is a problem for which the root cause is known (and according to ITIL) a workaround is available.

I'm willing to give up the workaround part of the known error definition (which we in fact did in our company), but that's it. As long as you do not know the root cause you are dealing with a problem, not a known error.

Indeed a workaround may be all you need to resolve an incident. After all, Incident Mgmt is only concerned with restoring service as quickly as possible. As the topic opener will find out, Remedy allows a workaround to be documented within both the problem record as well as the known error record. There is also the option (which we don't use) to create a solution record to capture the workaround.

I noted that you said that you will be involved in both problem and incident management. There could be a conflict here as one restores functionality ASAP (incident) while the other looks for the root cause (problem).

Find out what works for you within the context of ITIL. You can be flexible as long as you clearly define what you do, by whom and why etc._________________Mark O'Loughlin
ITSM / ITIL Consultant

This very question has often come up when I have been discussing the Workaround concept as it applies to Problem and Incident Management. What I usually say (and this is an adapt principle) is that Incident Management comes up with many (in fact, probably the majority) of Potential Workarounds. If I'm trying to get a critical IT Service up and running, and I figure out that rebooting a server fixes it, then I log the resolution to the Incident as "Reboot server" and, if my tool supports it, flag it as a Potential Workaround. The issue may come up over-and-over, but I can always apply my Potential Workaround to get it going again.

It is just a Potential Workaround because, until Problem Management goes in and looks for root cause, it's just an IT cowboy throwing darts and one happened to hit. Problem Management in their investigation may do root cause and find out that a hung service on the server is causing the outage and that a much less impacting thing to do would be to simply stop and restart the service. Now it is a true Workaround. So why the difference?

When I'm doing Incident Management, the first thing I search is the Known Error Database for Workarounds. They have at least had some research (maybe rebooting the server was the best solution) and have been vetted by someone investigating root cause. If I don't find anything, I search the Incident database for Potential Workarounds. If I get a hit, I can try it and see if works in my situation.

When I'm working Problem Management for Incidents that are good candidates for opening Problem records on, I look for Incidents which are flagged with Potential Workarounds. These are issues where the issue isn't well understood, and the Potential Workaround may deserve some investigation to improve it to a true Workaround.

By the way, in ITIL v3, a Known Error record can be opened at any point in a Problem's lifecycle regardless of whether or not you have Root Cause or a Workaround identified (or even after the fact during a post mortem analysis).