Incident Management Prioritization

Understanding the Behavior Priority Drives

When it comes to service management, the 2 most critical processes to focus on are incident management and change management. One helps you implement new services into the environment, while the other ensures those services are properly supported. In this article, we are going to focus on incident management and, perhaps, one of the most important aspects of the process – how to establish priority. Having the proper methodology for establishing priority is critical to ensure your team is working on the right thing at the right time.

A workable Priority scheme for Incidents involves you and your Customers understanding, and agreeing on the behavior that is driven by priority. Part of establishing an Incident Management process is to decide how to set priorities for Incidents. One commonly used approach is to establish Priority as a function of the assessment of two independent variables – Impact and Urgency, resulting in a Priority number from 1 to 5, with 1 being the most significant and 5 the least.

What I want to explore, however, is not the method for setting Incident Priority, but the implications of setting it. If the world were simple, all setting a priority for something would do was indicate to a person who had a choice between working on multiple things, which one to work on – obviously the one with the highest priority.

As a Service Provider, you would want to prioritize all the things that people had to work on – Incidents, Service Requests, Problems, Project Tasks and normal operational Tasks, ensure that you had an adequate pool of trained, tool-enabled resources and then get things into the hands the right people as soon as possible. In the service world, however, if we define an Incident as “an unplanned interruption to or degradation in the quality of a service”, both the users of that service (the people who employ it to help achieve their objectives) and the Customers (the people with whom you have agreed on the service standards and who pay for the service) are stakeholders in the Incident.

To a great extent, they see Incidents as a completely separate pool of things to be addressed – virtually independent of all the other things you work on – Service Requests, Project activity, etc. What they would really like you to tell them is, given a certain priority level of Incident, how long is it going to take to resolve. But that, of course, is what you usually can’t tell them. Unlike Service Requests which are highly predictable and thus most of which can be pre-planned, modeled and have agreed-to resolution times, Incidents by their very nature are unpredictable.

Committing to stakeholders about how long it will take to resolve some unknown situation borders on the ludicrous. Even quoting “targets” is almost meaningless. You can easily show historical graphs on how long it took to resolve different types of Incidents but using that to predict the future other than for Incidents where you have packaged a workaround in a Known Error is misguided. What then can you agree on with your Customers about how you will handle Incidents? Primarily two things:

How you will marshal resources to deal with them;

How you will keep them (and their users) informed of progress.

So when working with your Customers to decide how you will prioritize Incidents (and please tell me that you have them involved as an integral part of figuring this out) you can’t really come up with a viable method for doing that without understanding what behavior the various levels will drive. Now, besides agreeing on the criteria for prioritizing incidents, you will also likely want to agree on criteria for prioritizing other things (like Problems), but let’s focus here on just Incidents.

You will want to make sure that in your discussions with your Customers that you are both on the same wavelength about the intent of Incident Management – to restore service to normal operating parameters as quickly as possible or provide a means of bypassing the issue, without in any way looking for the cause or figuring out how to prevent recurrence.

This is called providing a Workaround and comprises the resolution of the Incident (with any follow-up work to be handled by Problem Management). And while there will be some discussion about what happens after the workaround has been provided, the bulk of your discussion should be the expected behavior of the Service Provider up to that point. Once an Incident ticket has been recorded, the nature of the Incident has been established (specifying the identity of the faulty service(s) and the symptom – a step called Categorization) and the Incident prioritized, the resolution clock starts.

(Obviously, you want to minimize the time from when the Incident is detected/reported to when you prioritize it.) The combination of the passing of time on that resolution clock and the priority level will drive the behavior of the Service Provider regarding the marshaling of resources and keeping Customer stakeholders informed.

So let’s take a look at those behaviors. The first behavior to look at is how you will marshal resources. There are really two parts to this.

Ownership.

The Customer will want to ensure that every Incident has a driver – a person or a team who will steer it through to completion. Typically, it will be agreed that lower priority incidents can be owned by a team, with most Service Providers using a Service Desk team for this – on every shift someone from that team will be assigned to conduct regular follow-ups and drive activity. For higher priority Incidents, the Incident Owner would be named in the ticket and will personally drive it until they turn the ownership responsibility over (like on a shift change). Good tools will keep the Incident Owner aware of activity progress requirements.

The second part of marshaling resources is the mechanism for engaging those resources.

Escalation.

Escalation involves bringing additional resources to bear on an Incident if the current assigned resources are not achieving the results in a timely manner. There are two types of Escalation – Functional Escalation, where you need to bring narrow but deeper knowledge to bear; and Hierarchical Escalation, where you need to bring more vision or authority to bear. Typically, you will want to define the different levels of Functional escalation (which may include third party suppliers) and different levels of Hierarchical escalation generically so that that actual people or groups that that will be involved will be determined by the Categorization of the Incident.

What you will agree upon with your Customers is the pattern of escalation that will be mandatory upon the reaching of defined resolution clock points for each priority level. The escalations are normally communicated to their targets through tool-executed messages for lower-priority Incidents and by direct Incident Owner communication for higher priority ones. There is, of course, the option for the Incident Owner to execute escalations prior to the required time. Usually every escalation is communicated to the specified Customer stakeholders automatically.

The second major behavior that is driven by priority is Customer notification. Very much like Escalation, you will want to define Notification targets generically so that the actual people or groups will be determined by the Categorization (and possibly reporting source) of the Incident. (Note that while we are focusing on Customer notifications, you may also be defining notification patterns for internal Service Provider roles. Clearly distinguish the latter from escalations.

When a Service Provider role is informed about an Incident, they should never have any doubt about whether Incident resolution action on their part is expected.) The typical agreement for priority-driven notification is that the frequency of notification increases as the priority increases, and that with higher priority Incidents the notifications are pushed to the stakeholders (verbally for the highest priorities) while with lower priority Incidents, the information can be simply recorded in the Incident ticket to allow the information to be pulled by the stakeholders.

Key to effective notification is the requirement for all progress, key action items, plans and projections to be recorded in the Incident ticket, with that record being the gold source of information, with no additional speculation being communicated to Customer stakeholders in verbal or email communication. Far too much damage is done by Service Provider people attempting to answer a “when do you think this will be resolved?” with an optimistic answer. This is not to say there will not be significant pressure to give an answer, but the answer should never exceed documented next steps and projections.

Wrap up

Formulating an effective prioritization mechanism for Incident Management involves not only agreeing how the priority level for an Incident will be set, but also what behavior each priority level will drive. It is in the interest of both the Service Provider and the Customer to develop a workable Incident Priority scheme that will further the overall relationship between the parties and allow Incidents to be dealt with effectively and efficiently. Some key guidelines are: