Incident SLA Management in Service Manager

This blog post describes how to build a custom SLA management solution in SCSM. If you are looking for more of a plug and play solution check out a solution provided by our partner Cased Dimensions that provides Service Level Management. Check out the Cased Dimensions demo video.

A question that has been discussed eagerly on forums regarding Service Manager 2010 is how to be able to take action upon incidents breaching their Service Level Agreement (SLA). In this post Patrik Sundqvist and I will show you one way to do it. There are three goals of this blog post:

Explain how to configure incident SLAs in Service Manager 2010.

Explain how to use the plug and play solution that we built for managing SLAs.

Explain how we built the solution and in particular how to create custom Windows Workflow Foundation activities that use the Service Manager SDK.

How to Configure Incident SLAs in Service Manager

When an incident is registered within Service Manager, it will get a priority based on a priority calculation drive by the urgency and impact of the incident. The priority and target resolution time are also recalculated each time the impact and/or urgency changes. The calculation is based on a matrix which can be configured directly in the console at "Administration" – "Settings" – "Incident Settings".

In the same place where you configure the priority matrix you're able to define target resolution times per priority level.

As mentioned above, when an incident is registered in Service Manager it receives a priority based on the matrix. At the same time as it receives the priority it also get's a "Target Resolution Time", which is based on the priority and the resolution time configuration.

Notice here how the Priority is set to 1 because the Impact and Urgency are both High. The priority is determined by the Urgency/Impact matrix shown above.

Here, notice how the Resolve By (also called 'Target Resolution Time') is set to the time the incident was created plus 30 minutes per the configuration shown above.

Out of the box you can manage incidents which are still active past their target resolution times by using the 'Overdue Incidents' view.

This is a pretty passive approach though and requires someone to be continually hitting refresh on the view instead of managing things by exception. You can also run an Incident KPI Trend report to see the number of incidents that didn't meet their SLA:

Wow! The Contoso service desk team is really doing a bad job of meeting their SLAs! :)

You can also run the Incident Resolution report

Either of these reports you can slice by queue, source, time range, etc. Our upcoming dashboard release for Service Manager will also have some interesting views on this data.

But again, these are also pretty passive approaches to managing incidents.

What we hear from customers a lot is that they want to take a more proactive approach to managing incident SLAs. After all some people's jobs depend on having good incident SLA numbers!

Here are a couple of things that people want to do which we don't provide for out of the box but with a little customization can be configured:

Have a view of incidents which are within X minutes of breaching the SLA – see this blog post but instead of doing it for Last Modified do it for Target Resolution Time is Less Than [Now] + 30m (or whatever your desired warning threshold is).

Send a notification to the assigned to analyst when the incident is X minutes away from breaching SLA.

Send a notification to a manager when the incident is X minutes away from breaching SLA. Send another one when it has breached SLA.

Escalate/route an incident automatically when the incident is X minutes away from breaching SLA or when it has breached SLA.

To detect and act upon incidents about to or breaching their SLA (their Target Resolution Time) you can use the built in workflow engine of Service Manager 2010. Here is how you can use this solution we provide in this blog post.

Deploying the Solution

Copy the following DLLs to the C:\Program Files\Microsoft System Center\Service Manager 2010 directory:

The Microsoft.ServiceManager.WorkfowAuthoring.* dlls come from the Service Manager Authoring Tool Beta 2. Be careful replacing what you have already there or replacing these in the future with new ones. Always create backups of these before you replace them!

2. Import the management pack Microsoft.Demo.IncidentSLAManagement.xml into Service Manager. Note – you can optionally configure how frequently the workflow that checks service levels runs. By default is every 15 minutes. Make sure you decide how often you want it to run before you import and don't run it too frequently! Just search for 'Minutes' in the XML and you'll see where it is set to 15. Just change it to some other number if you want before you import.

3. Go to the Administratoin/Settings view in the console. Double click on Incident SLA Management Settings and configure the warning threshold. This is the threshold at which you will change the incidents' SLA status to Warning. By default it is zero meaning there is no warning interval.

Note: this solution will start running immediately after import. If you don't want it to run immediately on import you can change the Rule Enabled attribute to "false" in the XML prior to importing and then enable it in the Administration/Workflows/Configuration view.

Now, what you will see is that any incidents which are still active past their target resolution time will be marked as Incident SLA Status = "Breached" and any incidents which are within X minutes (as defined by the Warning Threshold) of Target Resolution Time will be marked as SLA Status = "Warning". You can see this on the incident form in the Extensions tab.

To make it easy to see the incidents that are in a Warning or Breached state we have provided a couple of new views in the management pack:

Now you can use this property as part of notification subscriptions or incident event workflows to escalate or do other classification/routing things.

First go to the Library/Templates view and create a new incident template that will route/classify your incidents according to what you want – for example, if when incidents change to SLA Status = Breached you want to chnage the support group to 'Escalation Team' then in the new incident template set the Support Group = 'Escalation Team'.

Provide a name for the workflow like 'Escalate SLA Breaching incidents to the Escalation Team Support Group'.

Select 'When an incident is updated'.

Select the Incident SLA Management MP.

9. Click Next.

10. On the criteria page set it up so that "when the SLA Status change to Breached" the workflow will be triggered like this:

11. Click Next.

12. On the template screen, select the incident template you created in step #1. Click Next.

13. Optionally choose to notify people related to the incident. Click Next. Note: We have provided a couple of "out of the box" notification templates – one for 'Incident SLA Status – Warning' and one for 'Incident SLA Status – Breached'.

14. Click Create.

15. Click Close.

You can also set up notifications to other people like team leads, managers, etc. by following the same subscription logic by creating new notification subscriptions in the Administration/Notifications/Subscriptions view.

Now that you know how to use the solution now, let's take a look at how we built it.

How We Built the Solution

Note: This part is intended more for developers!

The solution is comprised of the following parts:

Incident class extension to add a new enum property for SLA Status

Enum values for 'Breached' and 'Warning'

New class for capturing the Warning Threshold administration setting

Custom form for displaying the Warning Threshold

Custom task to display the Warning Threshold settings form when the user clicks 'Properties' in the Administration/Settings view

2 notification templates – one for breached and one for warning

2 views – one for breached and one for warning and a new folder to put them in

New custom Windows Workflow Foundation activity that queries the database looking for objects which are in a warning state or breached state and marks them accordingly

Rule that runs on a schedule that runs the custom Windows Workflow Foundation activity

Let's take these one at a time. Most of these concepts have already been described previously so I'll just link to them here:

Now you need to make your custom Windows Workflow Foundation activity derive from a special base class we provide. This will allow your Windows Workflow Foundation activity to use the special property binding dialog in the Service Manager Authoring Tool that allows you to bind to trigger class properties.

In this particular solution we are basically making three queries each time this activity runs.

The first one gets incidents which are currently breaching SLA and which have not already been marked as breaching.

The second one gets incidents which are within the Warning Threshold of breaching SLA and have not already been marked as warning.

The last one gets incidents which have been marked as Warning, but because the target resolution time has since been adjusted (due to the incident urgency/impact changing) are no longer in a warning state.

Then for those incidents which match the first query it marks them as SLA Status = breached, those meeting the second query as SLA Status = Warning, and those meeting the last query as SLA Status = <blank>.

This is solution was really missing from SCSM, so I'm happy that it is finaly here.

1. Comment:

If you want to enable or disable the ProcessIncidents workflow, then you can't do it here:

Administration/Workflows/Configuration

instead you can do it here:

Administration/Workflows/Status

2. Question:

Isn't it possible somehow to manually trigger the ProcessIncidents workflow?

Reason why I want to do this: because the time interval between each run is 15 minutes for a real life example, but the administrator might want to run the WF just to make sure it processed all incidents which are about to breach the SLA.

Currently the solution doesnt support business hours. We've added that as a work item for us to implement on the code plex site already though. Feel free to suggest other improvements on the project site:

I've noticed that "Support for Business Hours" is already included in Codeplex site. What does that mean for me as a possible customer of SCSM? What shall I wait for to get the functionality in my future SCSM purchase? Service Pack of SCSM to get the functionality? Some Management Pack download? Some interim update? Do we speak here about any terms?

TargetResolutionTime_52562B0E_55A7_4FB1_2F9C_FDFDE976823E='27/05/2010 01:13:28' -- String was not recognized as a valid DateTime.TargetResolutionTime_52562B0E_55A7_4FB1_2F9C_FDFDE976823E='27/05/2010 01:13:28' -- String was not recognized as a valid DateTime.

Looks like a problem with the date format, though it looks fine to me.

@inteluser - thanks for investigating this and letting me know about it. This issue was also reported to me by someone else. It has something to do with date/time formatting when using a non- EN-US locale. It's on my list of bug fixes to make.