1. A Known Error should only be recorded when the "true" root cause has been identified and proven with a tested work-around that can be used to stabilise and restore service until a permanet solution has been implemented

66%

[ 6 ]

. A Known Error can be recorded when the symptoms of a problem are known, the cause is known (not the root cause) and a work-around is available to be used. Once the true root cause is identifed then it can be documented in the Known Error and Problem rec

33%

[ 3 ]

Total Votes : 9

Author

Message

CaperzItiler

Joined: Jul 24, 2009Posts: 23Location: Sydney, Australia

Posted: Fri Jul 24, 2009 5:19 pm Post subject: Known Errors

I would like to raise a new post about Known Errors. There seems to be different opinions and views on when a Known Error should be recorded. Here are the 2 main argumenst I have seen.

1. A Known Error should only be recorded when the "true" root cause has been identified and proven with a tested work-around that can be used to stabilise and restore service until a permanet solution has been implemented

2. A Known Error can be recorded when the symptoms of a problem are known, the cause is known (not the root cause) and a work-around is available to be used. Once the true root cause is identifed then it can be documented in the Known Error and Problem records.

The Known Error record and Problem records can exist until the root cause root cause has been confirmed as removed via an implemented solution.

I have posted this as I am involved in the introduction of a Known Error database at the moment... after running Problem Management for almost 2 years (with a more concentrated focus over the last year).

Last edited by Caperz on Fri Jul 24, 2009 5:46 pm; edited 1 time in total

I'm surprised that you didn't know this if you have been involved in Problem Management for a couple of years.[/b]

Boris - Thank you for your reply. I didnt say that I didnt know the answer. I wanted to ask the question to get people's opinions. Remember ITIL is just a framework for best practise not the bible (so to speak) of how things are and will be done in real life.

I want to understand people's interpretation and application of Known Errors. How do you work with known errors in your organisation ?

I'm surprised that you didn't know this if you have been involved in Problem Management for a couple of years.[/b]

Boris - Thank you for your reply. I didnt say that I didnt know the answer. I wanted to ask the question to get people's opinions. Remember ITIL is just a framework for best practise not the bible (so to speak) of how things are and will be done in real life.

I want to understand people's interpretation and application of Known Errors. How do you work with known errors in your organisation ?

Yes I understand what ITIL is...I'm v1, v2 and v3 qualified. Given that in 'Known Error' you used a proprietary term I gave you the proprietary answer. I gave you a further real world interpretation as a free gift.

What is the more accepted definition of Known Error ?
1. A Known Error should only be recorded when the "true" root cause has been identified and proven with a tested work-around that can be used to stabilise and restore service until a permanet solution has been implemented
2. A Known Error can be recorded when the symptoms of a problem are known, the cause is known (not the root cause) and a work-around is available to be used. Once the true root cause is identifed then it can be documented in the Known Error and Problem rec

Although I'm not sure what purpose this poll truly serves, I would like to point out that none of the presented options provides a definition for a Known Error. The options only say when a Known Error should/can be recorded. That makes it impossible to answer the poll question.

That being said, there is a lot to discuss when it comes to what Problems and Known Errors are. There are probably various opinions and various implementations and that in itself may be just fine. After all, the organizations we all work for typically are not interested in scientific or phylosophical definitions: we have a business to run! In other words, whatever you choose, make sure it is practical for your organization.

Now, for the fun of it, let's look at the definitions as presented by ITIL.

ITIL V2 (Service Support book, page 95):

Problem: An unknown underlying cause of one or more Incidents.

Known Error: A Problem that is successfully diagnosed and for which a Work-around has been identified.

ITIL V3 (Service Operation book, pages 236 and 240):

Problem: A cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created, and the Problem Management Process is responsible for further investigation.

Known Error: A Problem that has a documented Root Cause and a Workaround. Known Errors are created and managed throughout their Lifecycle by Problem Management. Known Errors may also be identified by Development or Suppliers.

The wording has changed a bit from V2 to V3. Actually, V2 was a bit more straightforward. When looking at the description of the process in V3, things become a bit more murky. The very linear Problem Management process flow (Service Operation, page 60) is absolutely in line with the definitions. But it also exposes a problem (no pun intended): if you follow this process by the book you would not be able to work on the permanent resolution of a Problem until you have first identified a workaround. In my mind that is not very practical because in the real world there is not always a workaround. On page 64 of the same book, ITIL seems to agree with me:

Quote:

"In some cases it may be possible to find a workaround to the incidents caused by the problem ...".

This implies that there are other cases where finding a workaround is not possible. ITIL goes on to say that if a workaround is found, it should be documented in the Problem record. I totally agree. The book then goes on about raising a Known Error record:

Quote:

"As soon as the diagnosis is complete, and particularly where a workaround has been found (even though it may not yet be a permanent resolution), a Known Error record must be raised and placed in the Known Error Database - so that if further incidents or problems arise, they can be identified and the service restored more quickly.

However, in some cases it may be adventageous to raise a Known Error record even earlier in the overall process - just for information purposes, for example - even though the diagnosis may not be complete or a workaround found, so it is inadvisable to set a concrete procedural point exactly when a Known Error record must be raised. It should be done as soon as it becomes useful to do so!

The Known Error Database and the way it should be used are described in more detail in paragraph 4.4.7.2."

Lock that in: you can raise a Known Error record even if the Problem investigation is not complete (i.e. you don't have a root cause) and you also don't have a workaround! Yes, you don't have any of the components that comprise a Known Error according to the definitions yet it might be advantageous to raise a Known Error record. Go figure! Needless to say, I truly wonder how that can be advantegous. In essence you would end up with two records (Problem and Known Error record) that basically say the same thing: I don't know what is going on and I don't have a band-aid to stop the bleeding. And yes, I have read paragraph 4.4.7.2 about the Known Error Database and it does not address this situation at all. It does say that:

Quote:

"The Known Error record should hold exact details of the fault and the symptoms that occurred, together with precise details of any workaround or resolution action that can be taken to restore the service and/or resolve the problem."

That is back in line with what the Known Error definition says. Can somebody shed some light on this apparent chaos and maybe explain what the added value is of having a Known Error record that does not contain any more information than the Problem record? I honestly fail to see it. (Are there maybe organizations out there that use Known Error Databases that are accessible for Incident Mgmt, while there is no shared access to the Problem Database?)

Now back to the real world. Problems can be complex. There are often multiple factors that result in a Problem. For that reason I don't really like the common terms "root cause" or "true cause". These terms suggest that there is only one cause. Unfortunately, ITIL does not address this complexity and also seems to assume a 1:1 relationship. With this in mind, I suggest that the investigation of a Problem can result in multiple Known Errors (one for each "root cause"). I don't want activities for the development of a permanent resolution to depend on the development of a workaround. For that reason, in our organization's process, a Known Error can exist without a workaround, but not without a "root cause". The Known Error record drives the development of a permanent resolution. For some Known Errors there may be no need or justification to develop a solution. Incident Management can find workarounds by matching Incidents against Problems and Known Errors. In many cases, matching against a Problem is easier because the Problem is more described in terms of symptoms that are similar to the symptoms observed in the Incident. A workaround at the Known Error level may not always be practical, especially in cases where a single Known Error plays a role in different Problems (yes, that too can happen). The workaround is more likely to focus on the Problem than on the individual Known Error.

Fun topic, isn't it? I'm curious to learn about other implementations of these concepts._________________Manager of Problem Management
Fortune 100 Company
ITIL Certified

Marcel - Thank you very much for your reply and feedback. It is great to get some insight from other ITIL organisations. I have replied back to your comments below :

[quote="Marcel"]

Quote:

Although I'm not sure what purpose this poll truly serves, I would like to point out that none of the presented options provides a definition for a Known Error. The options only say when a Known Error should/can be recorded. That makes it impossible to answer the poll question.

You are right. Apologies on my behalf. I am still learning to use this tool to post situation that I am uncovering as I work closer with Problem Management. I realised what you mentioned, a day after i posted this and found that I couldnt edit it once at least one vote has been posted. My intention was for the Poll to ask :
When should a Known Error Record be recorded :

1. A Known Error should only be recorded when the "true" root cause has been identified and proven with a tested work-around that can be used to stabilise and restore service until a permanet solution has been implemented

2. A Known Error can be recorded when the symptoms of a problem are known, the cause is known (not the root cause) and a work-around is available to be used. Once the true root cause is identifed then it can be documented in the Known Error and Problem record

Quote:

"As soon as the diagnosis is complete, and particularly where a workaround has been found (even though it may not yet be a permanent resolution), a Known Error record must be raised and placed in the Known Error Database - so that if further incidents or problems arise, they can be identified and the service restored more quickly.

However, in some cases it may be adventageous to raise a Known Error record even earlier in the overall process - just for information purposes, for example - even though the diagnosis may not be complete or a workaround found, so it is inadvisable to set a concrete procedural point exactly when a Known Error record must be raised. It should be done as soon as it becomes useful to do so!

The Known Error Database and the way it should be used are described in more detail in paragraph 4.4.7.2."

Lock that in: you can raise a Known Error record even if the Problem investigation is not complete (i.e. you don't have a root cause) and you also don't have a workaround! Yes, you don't have any of the components that comprise a Known Error according to the definitions yet it might be advantageous to raise a Known Error record.

This very paragraph is exactly why I raised this post. I have completed my ITIL v3 Capbility OSA course in April, this year, and raised many debates in our classes over this very thing. I share your thinking and frustration also.

I guess this means that an Known Error Record can be raised when the problem is understood and when practical to do so, which is inline with option 2 of my Poll... so that would be the most correct answer. The ITIL text book definition of what a KE actually is, is inline with option 1 of my Poll. Furthermore I would see that Problem investigation must strive to identify root cause and ensure that it is documented in the Known Error record.

The issue I see is that technicians do, in a lot of cases, start looking at implementing different solutions to address their suspected root cause (that they cant prove at the time) and in essence end up trying to find and prove root cause by trying different things to eliminate it. So it gets to a point where the problem ends up getting resolved without a dedicted root cause analysis. The root cause is then identifed from the bottom up (within the process). So they formulate what they thought the most probably root cause would have been, based on the success of their implemented solution.

Does anyone else see or have experience this ?

Quote:

Can somebody shed some light on this apparent chaos and maybe explain what the added value is of having a Known Error record that does not contain any more information than the Problem record? I honestly fail to see it. (Are there maybe organizations out there that use Known Error Databases that are accessible for Incident Mgmt, while there is no shared access to the Problem Database?).

I am actually in the process of being part of the implementation of a Known Error DB in my organisation. I would see the Problem record in containing a history and trail of all the work that was done from the time a problem was reported, a clear problem statement and description was formualted and documented (as well as problem symptoms), through to the classification where urgency and impact was assessed, resource allocation, investigative work to get to root cause and the formulation of a tested and proven work-around to restore service. The Known Error Record should be more high-level and precise with essentially a summary of the key elements that come out of the problem record, such as :
- Problem Statement
- Problem Symptoms
- Problem Root Cause
- Problem Workaround/s

This information can then be used by the Service Desk to restore service efficiently, as related incidents continue to be reported until a solution is formulated and implemented (if it is feasible and plausable to do so)

Quote:

Now back to the real world. Problems can be complex. There are often multiple factors that result in a Problem. For that reason I don't really like the common terms "root cause" or "true cause". These terms suggest that there is only one cause. Unfortunately, ITIL does not address this complexity and also seems to assume a 1:1

I agree with you here and have seen problems that have more than one root cause, also. A recent one was where a whole team were complaining that their ERP performance was poor, across multiple groups. In the end it was due to human error (users not using the tool correctly) and outdated pc hardware that were the root cause of this problem.

Quote:

Fun topic, isn't it? I'm curious to learn about other implementations of these concepts.

ABSOLUTELY. Thats why I thought I'd raise it

Looking forward to more discussion around this topic_________________ITIL V3 Capability - Operational Support & Analysis Certified

er, I pick option 1, BUT...
To me, "Known Error" is a problem status in your Problem Log, not a record in a separate data base. Does that make sense? Once a Problem has been identified, it moves (hopefully!) through various statuses (statii? whatever) from Investigating, to either Known Error and then, hopefully Resolved, or to No Friggin Clue, or Unknown, or whatever is used in your company.
If you want a list of Known Errors, you take a 'snapshot' of the problems in your log with that status.
Knowing the quantity of Known Errors is only a metric if you need to increase staffing in the groups responsible for getting the Problems resolved.
Thoughts?
/Sharon_________________In theory, there is no difference between theory and practice.
In practice, there is!

There are indeed ITSM systems where 'Known Error' is just a status on a Problem record. In my opinion, that is not a suitable approach. It does not properly support the many situations where a problem has multiple root causes (separate Known Errors) that need to or can be prioritized and resolved independently, possibly by different groups._________________Manager of Problem Management
Fortune 100 Company
ITIL Certified

...situations where a problem has multiple root causes (separate Known Errors) that need to or can be prioritized and resolved independently, possibly by different groups.

Good point, since one or multiple incidents can spawn 1 or many problems. I think how this type of situation is managed may depend on the size of your organization, your tracking & reporting structure, and how you want to manage answering questions like 'is it fixed yet', & 'what does my group have to do'. I personally like having separate problem records for each actionable root cause, but that can be unwieldy for some.
/Sharon_________________In theory, there is no difference between theory and practice.
In practice, there is!

I voted for #2. Reason: Service Desk use the KEDB when assessing/matching incidents. There is no need for them to know that there is a root cause on the molecular level on server x - they just need the match and the Workaround, even if it´s a temporary one. Maybe not 100% ITIL compliant, but very practical. True and final root cause may or may never be found, why hide this record in the "Problems>RCA in progress" drawer?

Of course, we could publish such a half-cooked KE as a "Known Problem" - which it is - but it would only add confusion.

I voted for #2. Reason: Service Desk use the KEDB when assessing/matching incidents. There is no need for them to know that there is a root cause on the molecular level on server x - they just need the match and the Workaround, even if it´s a temporary one. Maybe not 100% ITIL compliant, but very practical. True and final root cause may or may never be found, why hide this record in the "Problems>RCA in progress" drawer?

Of course, we could publish such a half-cooked KE as a "Known Problem" - which it is - but it would only add confusion.

Visibility is the key, at least with us, and at this point in time.

Visibility into workarounds is indeed very important, but the question is whether that necessitates recording a Known Error when you are actually clueless regarding the root cause. How are your Problem Mgmt process and tracking tool going to distinguish between 'real' Known Errors (i.e. identified root cause) and 'placeholder' Known Errors that are only recorded to provide workaround visibility? Your tracking tool should allow you to document a workaround as part of your Problem record with visibility to all who need it and not force you to log a Known Error when you don't have one._________________Manager of Problem Management
Fortune 100 Company
ITIL Certified

Where do you record work-around information that can be searched and used by users of the incident management tool, that Bluesman stores in their KEDB ?_________________ITIL V3 Capability - Operational Support & Analysis Certified

The tool we use provides 2 options:
1) designated fields to document the workaround in Problem as well as Known Error records
2) separate Solution records that can be associated with Problems or Known Errors and offer more elaborate ways to document, manage, and publish (to end-users) workarounds

We currently use option 1.

Incidents can be mapped against any of these records._________________Manager of Problem Management
Fortune 100 Company
ITIL Certified

We use 2 different views into a database that holds the Known Error/Problem. The SD view gives the KE/WA in a few fields, the second view (used by PM, CM, IM et al) provides the full Problem history, RCA details/history, status change timeline, RFC progress etc etc.

Call it "need to know" basis, if you want.

Works nicely for us, and protects SD staff from information overload.
(IF someone at the SD has a pathological urge to see the full details, he/she can walk over to the IM who has the full view. Never happened so far. )

Once a Problem is closed with a permanent resolution the KE part of it disappears from the SD view.