ITIL A guide to Event Management

Transcription

1 ITIL A guide to Event Management An event can be defined as any detectable or discernable occurrence that has significance for the management of the IT Infrastructure of the delivery of IT service and evaluation of the impact a deviation might cause to the services. Events are typically notifications created by an IT service, configuration item or monitoring tool. Effective service operation is dependent on knowing the status of the infrastructure and detecting any deviation from normal or expected operation. This is provided by good monitoring and control systems, which are based on two types of tools: Active monitoring tools that poll key configuration items to determine their status and availability. Any expectations will generate an alert that needs to be communicated to the appropriate tool or team for action. Passive monitoring tools that detect and correlate operational alerts or communications generated by configuration items. The objectives of event management - Event management is to provide the entry point for the execution of many service operation processes and activities. In addition, it provides a way of comparing actual performance and behaviour against design standards and Service Level Agreements. Other objectives include: Provides the ability to detect, interpret and initiate appropriate action for events Basis for operational monitoring and control and entry point for many service operation activities Provides operational information, as well as warnings and exceptions, to aid automation Supports continual service improvement activities of service assurance and reporting and service improvement The scope of event management - Event management can be applied to any aspect of service management that needs to be controlled and which can be automated. These include: Configuration Items (CIs): o Some CIs will be included because they need to stay in a constant state o Some CIs will be included because their status needs to change frequently and event management can be used to automate this and update the CMS Environmental conditions (e.g. fire and smoke detection) Software license monitoring for usage to ensure optimum/legal license utilization and allocation Security (e.g. intrusion detection) Normal activity (e.g. tracking the use of an application or the performance of a server) The difference between monitoring and event management These two areas are very closely related, but slightly different in nature. Event management is focused on generating and detecting meaningful notifications about the status of the IT infrastructure and services. Whilst monitoring is required to detect and track these notifications, monitoring is broader than event management. For example, monitoring tools will check the status of a device to ensure that it is operating within acceptable limits, even if that device is not generating events. Examples of events - Events that signify regular operation: Notification that a scheduled workload has completed A user has logged in to use an application

2 An has reached its intended recipient Events that signify an exception: A user attempts to log on to an application with the incorrect password An unusual situation has occurred in a business process that may indicate an exception requiring further business investigation (e.g. a web page alert indicates that a payment authorisation site is unavailable impacting financial approval of business transactions) A device s CPU is above the acceptable utilization rate A PC scan reveals the installation of unauthorized software Events that signify unusual, but not exceptional, operation: Server s memory utilization reaches within 5% of its highest acceptable performance level The completion time of a transaction is 10% longer than normal The value to the organization of event management Event management s value to the organization is generally indirect; however, it is possible to determine the basis for its value as follows: Event management provides mechanisms for early detection of incidents. In many cases, it is possible for the incident to be detected and assigned to the appropriate group for action, before any actual service outage occurs. Event management makes it possible for some types of automated activity to be monitored by exception thus removing the need for expensive and resource intensive real-time monitoring, while reducing downtime When integrated into other service management processes (such as, for example, availability or capacity management), event management can signal status changes or exceptions that allow the appropriate person or team to perform early response, thus improving the performance of the process. This, in turn, will allow the business to benefit from more effective and more efficient service management overall. Event management provides a basis for automated operations, thus increasing efficiencies and allowing expensive human resources to be used for more innovative work, such as designing new or improved functionality or defining new ways in which the business can exploit technology for increased competitive advantage.

3 The activities of event management - The service design phase of the service lifecycle should define which events need to be generated and then specify how this can be done for each type of CI. During the service transition phase, the event generation options would be set and tested. Event occurs Events occur continuously, but not all of them are detected or registered. It is, therefore, important that everybody involved in designing, developing, managing and supporting IT services and the IT infrastructure that they run on understands what types of event need to be detected. Event notification Most CI s are designed to communicate certain information about themselves in one of two ways: A device is interrogated by a management tool, which collects certain targeted data. This is often referred to as polling. The CI generates a notification when certain conditions are met. The ability to produce these notifications has to be designed and built into the CI, for example, a programming hook inserted into an application. Event detection Once an event notification has been generated, it will be detected by an agent running on the same system, or transmitted directly to a management tool, specifically designed to read and interpret the meaning of the event. Event filtering The purpose of filtering is to decide whether to communicate the event to a management tool or to ignore it. If ignored, the event will usually be recorded in a log file on the device, but no further action will be taken. Significance of events Every organization will have its own categorization of the significance of an event, but it is suggested that at least these three broad categories be represented:

4 Informational: This refers to an event that does not require any action and does not represent an exception. They are typically stored in the system or service log files and kept for a predetermined period. Examples of informational events include: A device has come online A transaction is completed successfully Warning: A warning is an event that is generated when a service or device is approaching a threshold. Warnings are intended to notify the appropriate person, process or tool so that the situation can be checked and appropriate action taken to avoid an exception. Examples of warning events are: Memory utilization on a server is currently at 65% and increasing. If it reaches 75%, response times will be unacceptably long and the Operational Level Agreement for that department will be breached. The collision rate on a network has increased by 15% in a short period of time (which is defined, i.e. an hour). Exception: An exception means that a service or device is currently operating abnormally. Typically this means that an Operational Level Agreement or Service Level Agreement has been breached and the business has been impacted. Exceptions could represent a total failure, impaired functionality or degraded performance. Examples of exception events include: A server is down Response time of a standard transaction across the network has slowed to more than 15 seconds Event correlation If an event is significant, a decision has to be made about exactly what the significance is and what actions need to be taken to deal with it. It is here that the meaning of the event is determined.

5 Trigger If the correlation activity recognises an event, a response will be required. The mechanism used to initiate that response is also called a trigger. There are many different types of triggers, each designed specifically for the task it has to initiate. Some examples could include: Incident triggers that generate a record in the incident management system Change triggers that generate a request for change A trigger resulting from an approved request for change that has been implemented but caused the event, or from an authorized change that has been detected Scripts that execute specific actions Paging systems that will notify a person or team of an event Database triggers that restrict access of a user to specific records or fields, or that create or delete entries in the database Response selection At this point of the process, there are a number of response options available: Event logged There will be a record of the event and any subsequent actions. Auto response Some events are understood well enough that the appropriate response has already been defined and automated. This is normally a result of good design or previous experience (within problem management). The trigger will initiate the action and then evaluate whether it was completed successfully. If not, an incident or problem record will be created. Examples of auto responses include rebooting a device, restarting a service, locking a device or application to protect it against unauthorized access. Alert and human intervention If the event requires human intervention, it will need to be escalated. The purpose of the alert is to ensure that the person with the skills appropriate to deal with the event is notified. The alert will contain all the information necessary for the person to determine the appropriate action Incident, problem or change? Some events will represent a situation where the appropriate response will need to be handled through the incident, problem or change management process. Open an RFC. Open an incident record As with an RFC, an incident can be created as soon as an exception is detected, or when the correlation engine determines that a specific type or combination of events represents an incident. Open or link to a problem record It is rare for a problem record to be opened without related incidents. In most cases this step refers to linking an incident to an existing problem record. This will assist the problem management teams to reassess the severity and impact of the problem, and may result in a changed priority to an outstanding problem. Special types of incident In some cases an event will indicate an exception that does not directly impact any IT service, e.g. unauthorized entry to a data centre. In this case, the incident will be logged using an incident model that is appropriate for this type of exception, e.g. a security incident. The incident should be escalated to the group that manages that type of incident. As there is no outage, the incident model used should reflect that this was an operational issue rather than a service issue. These incidents should not be used to calculate downtime, and can, in fact, be used to demonstrate how proactive IT has been in making services available.

6 Review actions As thousands of events are generated on a daily basis, it is not possible to review every one. However, it is important to check that any significant events or exceptions have been handled appropriately, or to track trends or counts of event types etc. In many cases, this can be done automatically. Close event Some events will remain open until a certain action takes place, for example an event that is linked to an open incident. However, most events are not opened or closed. informational events are simply logged and then used as input to other processes, such as backup and storage management. Auto response events will typically be closed by the generation of a second event. For example, a device generates an event and is rebooted through auto response as soon as that device is successfully back online, it generates an event that effectively closes the loop and clears the first event. The terminology of event management Event A change of state that has significance for the management of a configuration item or IT service. Trigger An indication that some action or response to an event may be needed. Alert A warning that a threshold has been reached or something has been changed. (An event has occurred).

ITIL A guide to event management Event management process information Why have event management? An event can be defined as any detectable or discernable occurrence that has significance for the management

ITIL Introducing service operation This document is designed to answer many of the questions about IT service management and the ITIL framework, specifically the service operation lifecycle phase. It is

1. Does the tool facilitate the creation, modification, resolution, and closure of Incident records? Comments: Yes. The tool facilitates; the creation of an incident with unique Service Record # s, modification

ITIL V.3.0 - Operational Support and Analysis Course Length: 5 Days Course Overview This 5 day hands on program leads to a Certificate in ITIL V3 Service Capability Management - Operational Support and

MANDATORY CRITERIA 1. Does the tool facilitate the creation, modification, and closure of Problem records? Comments: Yes. The tool provides two (2) methods in which to create a problem record. The record

SERVICE OPERATION Service Operation Achieving i effectiveness and efficiency i in the delivery and support of services so as to ensure value for the customer and the service provider SOURCE: ITIL Service

-11-G-001 General Criteria Does the tool use ITIL 2011 Edition process terms and align to ITIL 2011 Edition workflows and process integrations? -11-G-002 Does the tool have security controls in place to

Module 1 Study Guide Introduction to OSA Welcome to your Study Guide. This document is supplementary to the information available to you online, and should be used in conjunction with the videos, quizzes

Whitepaper Improving Productivity and Uptime with a Tier 1 NOC Summary This paper s in depth analysis of IT support activities shows the value of segmenting and delegatingg activities based on skill level

Purpose: [E]nsure that the assets required to deliver services are properly controlled, and that accurate and reliable information about those assets is available when and where it is needed. (ST 4.3.1)

the limits of your infrastructure. How to get the most out of virtualization Business white paper Table of contents Executive summary...4 The benefits of virtualization?...4 How people and processes add

ITIL v3 1 as a Practice ITIL = IT Infrastructure Library Set of books giving guidance on the provision of quality IT services Common language Best practices in delivery of IT services Not standards! Platform

SapphireIMS 4.0 BSM Feature Specification v1.4 All rights reserved. COPYRIGHT NOTICE AND DISCLAIMER No parts of this document may be reproduced in any form without the express written permission of Tecknodreams

SECURITY DOCUMENT BetterTranslationTechnology XTM Security Document Documentation for XTM Version 6.2 Published by XTM International Ltd. Copyright XTM International Ltd. All rights reserved. No part of

SapphireIMS Business Service Monitoring Feature Specification All rights reserved. COPYRIGHT NOTICE AND DISCLAIMER No parts of this document may be reproduced in any form without the express written permission

White Paper The Ten Features Your Web Application Monitoring Software Must Have Executive Summary It s hard to find an important business application that doesn t have a web-based version available and

Clarity Assurance allows operators to monitor and manage the availability and quality of their network and services clarity.com The only way we can offer World Class Infocomm service is through total automation

ITIL A guide to service asset and configuration management The goal of service asset and configuration management The goals of configuration management are to: Support many of the ITIL processes by providing

The Importance of Information Delivery in IT Operations David Williams Notes accompany this presentation. Please select Notes Page view. These materials can be reproduced only with written approval from

1 Attack Top Attackers Report, Top Targets Report, Top Protocol Used by Attack Report, Top Attacks Report, Top Internal Attackers Report, Top External Attackers Report, Top Internal Targets Report, Top

SMARTcontrol from 1% Dashboard The Dashboard gives you a real time view of hardware states in an easy to read graph and gauge layout. The system allows you to show data from as far back as 48 hours. Performance

WhatsUp Gold v11 Features Overview This guide provides an overview of the core functionality of WhatsUp Gold v11, and introduces interesting features and processes that help users maximize productivity

USM IT Security Council Guide for Security Event Logging Version 1.1 23 November 2010 1. General As outlined in the USM Security Guidelines, sections IV.3 and IV.4: IV.3. Institutions must maintain appropriate

CORPORATE SUPPORT SERVICES ICT Introduction This document describes the support services provided by Corporate Support Services ICT to Name of School. Hereafter Corporate Support Services ICT will be referred

OPERATIONAL SERVICE LEVEL AGREEMENT BETWEEN THE CLIENT AND FOR THE PROVISION OF PRO-ACTIVE MONITORING & SUPPORT SERVICES IN CONFIDENCE TABLE OF CONTENTS 1 CONTACT DETAILS 1 1.1 The Client Contract Management

The ITIL Foundation Examination Sample Paper B, version 5.0 Multiple Choice Instructions 1. All 40 questions should be attempted. 2. All answers are to be marked on the answer grid provided. 3. You have

LANDesk Service Desk LANDesk Service Desk Certified in All 15 ITIL v3 Suitability Requirements PinkVERIFY is an objective software tool assessment service that validates toolsets that meet a set of functional

A Guide to SupportDesk ITSM 1 Introduction Terminology: All SupportDesk systems are built around the same default terminology. In SupportDesk ITSM, the Customer and Inventory fields have been renamed as

Monitoring Microsoft Exchange Server in the Context of the Entire Network Abstract: Virtually every business process and function relies in some way on messaging applications. Microsoft Exchange is one

Taking the Service Desk to the Next Level BEST PRACTICES WHITE PAPER Table of Contents Executive Summary...1 The Service Desk Evolves...2 What s Next?...2 Enabling Innovations...3 > Configuration Management

Database as a Service (DaaS) Version 1.02 Table of Contents Database as a Service (DaaS) Overview... 4 Database as a Service (DaaS) Benefit... 4 Feature Description... 4 Database Types / Supported Versions...

Overview As an IT leader within your organization, you face new challenges every day from managing user requirements and operational needs to the burden of IT Compliance. Developing a strong IT general

IDS for SAP Application Based IDS Reporting in the ERP system SAP R/3 1 Research Question How is the performance of this SAP IDS when running with reduction of false positives and anonymization? Hypothesis

Copyright 11/1/2010 BMC Software, Inc 1 Copyright 11/1/2010 BMC Software, Inc 2 Copyright 11/1/2010 BMC Software, Inc 3 The current state of IT Service How we work today! INCIDENT SERVICE LEVEL DATA SERVICE

1 Why should monitoring and measuring be used when trying to improve services? a) To validate, direct, justify and intervene b) To validate, measure, monitor and change c) To validate, plan, act and improve

The Big Data Mining Company BETTER VISILITY FOR BETTER CONTROL AND BETTER MANAGEMENT 100 Examples on customer use cases Thanks to SECNOLOGY s wide range and easy to use technology, it doesn t take long

ESET Mobile Security Business Edition for Windows Mobile Installation Manual and User Guide Click here to download the most recent version of this document Contents 1. Installation...3 of ESET Mobile Security

Bloom Enhanced Performance Monitoring Service Level Agreement 1 SERVICE DESCRIPTION The Enhanced Performance Monitoring Service provides an enterprise-class level of assurance with regards to the performance

Department of Information Technology Active Directory Audit Final Report August 2008 promoting efficient & effective local government Executive Summary Active Directory (AD) is a directory service by Microsoft

Security Information & Event Management A Best Practices Approach Implementing a best-of-class IT compliance framework using iservice help desk and EventSentry monitoring software A white paper written

System Center Service Manager Vision and Planned Capabilities Microsoft Corporation Published: April 2008 Executive Summary The Service Desk function is the primary point of contact between end users and

To ensure the functioning of the site, we use cookies. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy &amp Terms.
Your consent to our cookies if you continue to use this website.