This module describes the concepts and tasks related to configuring alarm log correlation and monitoring alarm logs and correlated event records. Alarm log correlation extends system logging to include the ability to group and filter messages generated by various applications and system servers and to isolate root messages on the router.

This module describes the new and revised tasks you need to perform to implement logging correlation and monitor alarms on Cisco ASR 9000 Series Aggregation Services Routers.

Note For more information about system logging on Cisco IOS XR software and complete descriptions of the alarm management and logging correlation commands listed in this module, see the "Related Documents" section of this module. To locate documentation for other commands that might appear in the course of performing a configuration task, search online in the Cisco ASR 9000 Series Aggregation Services Router Commands Master List .

Feature History for Implementing and Monitoring Alarms and Alarm Log Correlation on Cisco ASR 9000 Series Router

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation

You must be in a user group associated with a task group that includes the proper task IDs. The command reference guides include the task IDs required for each command. If you suspect user group assignment is preventing you from using a command, contact your AAA administrator for assistance.

Information About Implementing Alarms and Alarm Log Correlation

To implement alarms and alarm correlation, you should be familiar with the following concepts:

Alarm Logging and Debugging Event Management System

Cisco IOS XR software Alarm Logging and Debugging Event Management System (ALDEMS) is used to monitor and store alarm messages that are forwarded by system servers and applications. In addition, ALDEMS correlates alarm messages forwarded due to a single root cause.

ALDEMS enlarges on the basic logging and monitoring functionality of Cisco IOS XR software, providing the level of alarm and event management necessary for a highly distributed system

Cisco IOS XR software achieves this necessary level of alarm and event management by distributing logging applications across the nodes on the system.

Figure 1 illustrates the relationship between the components that constitute ALDEMS.

Figure 1 ALDEMS Component Communications

Correlator

The correlator receives messages from system logging (syslog) helper processes that are distributed across the nodes on the router and forwards syslog messages to the syslog process. If a logging correlation rule is configured, the correlator captures messages searching for a match with any message specified in the rule. If the correlator finds a match, it starts a timer that corresponds to the timeout interval specified in the rule. The correlator continues searching for a match to messages in the rule until the timer expires. If the root case message was received, then a correlation occurs; otherwise, all captured messages are forwarded to the syslog. When a correlation occurs, the correlated messages are stored in the logging correlation buffer. The correlator tags each set of correlated messages with a correlation ID.

System Logging Process

By default, routers are configured to send system logging messages to a system logging (syslog) process. Syslog messages are gathered by syslog helper processes that are distributed across the nodes on the system. The system logging process controls the distribution of logging messages to the various destinations, such as the system logging buffer, the console, terminal lines, or a syslog server, depending on the network device configuration.

Alarm Logger

The alarm logger is the final destination for system logging messages forwarded on the router. The alarm logger stores alarm messages in the logging events buffer. The logging events buffer is circular; that is, when full, it overwrites the oldest messages in the buffer.

Note Alarms are prioritized in the logging events buffer. When it is necessary to overwrite an alarm record, the logging events buffer overwrites messages in the following order: nonbistate alarms first, then bistate alarms in the CLEAR state, and, finally, bistate alarms in the SET state. For more information about bistate alarms, see the Bistate Alarms section.

When the table becomes full of messages caused by bistate alarms in the SET state, the earliest bistate message (based on the message time stamp, not arrival time) is reclaimed before others. The buffer size for the logging events buffer and the logging correlation buffer, thus, should be adjusted so that memory consumption is within your requirements.

A table-full alarm is generated each time the logging events buffer wraps around. A threshold crossing notification is generated each time the logging events buffer reaches the capacity threshold.

Messages stored in the logging events buffer can be queried by clients to locate records matching specific criteria. The alarm logging mechanism assigns a sequential, unique ID to each alarm message.

Logging Correlation

Logging correlation can be used to isolate the most significant root messages for events affecting system performance. For example, the original message describing a card online insertion and removal (OIR) of a card can be isolated so that only the root-cause message is displayed and all subsequent messages related to the same event are correlated. When correlation rules are configured, a common root event that is generating secondary (non-root-cause) messages can be isolated and sent to the syslog, while secondary messages are suppressed. An operator can retrieve all correlated messages from the logging correlator buffer to view correlation events that have occurred.

Correlation Rules

Correlation rules can be configured to isolate root messages that may generate system alarms. Correlation rules prevent unnecessary stress on ALDEMS caused by the accumulation of unnecessary messages. Each correlation rule hinges on a message identification, consisting of a message category, message group name, and message code. The correlator process scans messages for occurrences of the message.

If the correlator receives a root message, the correlator stores it in the logging correlator buffer and forwards it to the syslog process on the RP. From there, the syslog process forwards the root message to the alarm logger in which it is stored in the logging events buffer. From the syslog process, the root message may also be forwarded to destinations such as the console, remote terminals, remote servers, the fault management system, and the Simple Network Management Protocol (SNMP) agent, depending on the network device configuration. Subsequent messages meeting the same criteria (including another occurrence of the root message) are stored in the logging correlation buffer and are forwarded to the syslog process on the router.

If a message matches multiple correlation rules, all matching rules apply and the message becomes a part of all matching correlation queues in the logging correlator buffer.

The following message fields are used to define a message in a logging correlation rule:

•Message category

•Message group

•Message code

Wildcards can be used for any of the message fields to cover wider set of messages. Configure the appropriate set of messages in a logging correlation rule configuration to achieve correlation with a narrow or wide scope (depending on your objective).

Types of Correlation

There are two types of correlation that are configured in rules to isolate root-cause messages:

Nonstateful Correlation—This correlation is fixed after it has occurred, and non-root-cause alarms that are suppressed are never forwarded to the syslog process. All non-root-cause alarms remain buffered in correlation buffers.

Stateful Correlation—This correlation can change after it has occurred, if the bistate root-cause alarm clears. When the alarm clears, all the correlated non-root-cause alarms are sent to syslog and are removed from the correlation buffer. Stateful correlations are useful to detect non-root-cause conditions that continue to exist even if the suspected root cause no longer exists.

Application of Rules and Rule Sets

If a correlation rule is applied to the entire router, then correlation takes place only for those messages that match the configured cause values for the rule, regardless of the context or location setting of that message.

If a correlation rule is applied to a specific set of contexts or locations, then correlation takes place only for those messages that match the configured cause values for the rule and that match at least one of those contexts or locations.

In the case of a rule-set application, the behavior is the same; however, the apply configuration takes place for all rules that are part of the given rule set.

The show logging correlator rule command is used to display apply settings for a given rule, including those settings that have been configured with the logging correlator apply ruleset command.

Root Message and Correlated Messages

When a correlation rule is configured and applied, the correlator starts searching for a message match as specified in the rule. After a match is found, the correlator starts a timer corresponding to the timeout interval that is also specified in the rule. A message search for a match continues until the timer expires. Correlation occurs after the root-cause message is received.

The first message (with category, group, and code triplet) configured in a correlation rule defines the root-cause message. A root-cause message is always forwarded to the syslog process. See the Correlation Rules section to learn how the root-cause message is forwarded and stored.

Alarm Severity Level and Filtering

Filter settings can be used to display information based on severity level. The alarm filter display indicates the severity level settings used to report alarms, the number of records, and the current and maximum log size.

Alarms can be filtered according to the severity level shown in Table 1

Table 1 Alarm Severity Levels for Event Logging

Severity Level

System Condition

0

Emergencies

1

Alerts

2

Critical

3

Errors

4

Warnings

5

Notifications

6

Informational

Bistate Alarms

Bistate alarms are generated by state changes associated with system hardware, such as a change of interface state from active to inactive, the online insertion and removal (OIR) of a card, or a change in component temperature. Bistate alarm events are reported to the logging events buffer by default; informational and debug messages are not.

Cisco IOS XR software provides the ability to reset and clear alarms. Clients interested in monitoring alarms in the system can register with the alarm logging mechanism to receive asynchronous notifications when a monitored alarm changes state.

Bistate alarm notifications provide the following information:

•The alarm state, which may be in the set state or the clear state.

Capacity Threshold Setting for Alarms

The capacity threshold setting determines when the alarm system begins reporting threshold crossing alarms. The capacity threshold for generating warning alarms is generally set at 80 percent of buffer capacity, but individual configurations may require different settings.

Hierarchical Correlation

Hierarchical correlation takes effect when the following conditions are true:

•When a single alarm is both a root cause for one rule and a non-root cause for another rule.

•When alarms are generated that result in successful correlations associated with both rules.

The following example illustrates two hierarchical correlation rules:

Rule 1

Category

Group

Code

Root Cause 1

Cat 1

Group 1

Code 1

Non-root Cause 2

Cat 2

Group 2

Code 2

Rule 2

Root Cause 2

Cat 2

Group 2

Code 2

Non-root Cause 3

Cat 3

Group 3

Code 3

If three alarms are generated for Cause 1, 2, and 3, with all alarms arriving within their respective correlation timeout periods, then the hierarchical correlation appears like this:

Cause 1 -> Cause 2 -> Cause 3

The correlation buffers show two separate correlations: one for Cause 1 and Cause 2 and the second for Cause 2 and Cause 3. However, the hierarchical relationship is implicitly defined.

Note Stateful behavior, such as reparenting and reissuing of alarms, is supported for rules that are defined as stateful; that is, correlations that can change.

Context Correlation Flag

The context correlation flag allows correlations to take place on a "per context" basis or not.

This flag causes behavior change only if the rule is applied to one or more contexts. It does not go into effect if the rule is applied to the entire router or location nodes.

The following is a scenario of context correlation behavior:

•Rule 1 has a root cause A and an associated non-root cause.

•Context correlation flag is not set on Rule 1.

•Rule 1 is applied to contexts 1 and 2.

If the context correlation flag is not set on Rule 1, a scenario in which alarm A generated from context 1 and alarm B generated from context 2 results in the rule applying to both contexts regardless of the type of context.

If the context correlation flag is now set on Rule 1 and the same alarms are generated, they are not correlated as they are from different contexts.

With the flag set, the correlator analyzes alarms against the rule only if alarms arrive from the same context. In other words, if alarm A is generated from context 1 and alarm B is generated from context 2, then a correlation does not occur.

Duration Timeout Flags

The root-cause timeout (if specified) is the alternative rule timeout to use in the situation in which a non-root-cause alarm arrives before a root-cause alarm in the given rule. It is typically used to give a shorter timeout in a situation under the assumption that it is less likely that the root-cause alarm arrives, and, therefore, releases the hold on the non-root-cause alarms sooner.

Reparent Flag

The reparent flag specifies what happens to non-root-cause alarms in a hierarchical correlation when their immediate root cause clears.

The following example illustrates context correlation behavior:

•Rule 1 has a root cause A and an associated non-root cause B

•Context correlation flag is not set on Rule 1

•Rule 1 is applied to contexts 1 and 2

In this scenario, if alarm A arrives generated from context 1 and alarm B generated from context 2, then a correlation occurs—regardless of context.

If the context correlation flag is now set on Rule 1 and the same alarms are generated, they are not correlated, because they are from different contexts.

Reissue Nonbistate Flag

The reissue nonbistate flag controls whether nonbistate alarms (events) are forwarded from the correlator log if their parent bistate root-cause alarm clears. Active bistate non-root-causes are always forwarded in this situation, because the condition is still present.

The reissue-nonbistate flag allows you to control whether non-bistate alarms are forwarded.

Internal Rules

Internal rules are defined on Cisco IOS XR software and are used by protocols and processes within Cisco IOS XR software. These rules are not customer configurable, but you may view them by using the show logging correlator rule command. All internal rule names are prefixed with [INTERNAL].

Configuring Logging Correlation Rules

This task explains how to configure logging correlation rules.

The purpose of configuring logging correlation rules is to define the root cause and non-root-cause alarm messages (with message category, group, and code combinations) for logging correlation. The originating root-cause alarm message is forwarded to the syslog process, and all subsequent (non-root-cause) alarm messages are sent to the logging correlation buffer.

The fields inside a message that can be used for configuring correlation rules are as follows:

Message category (for example, PKT_INFRA, MGBL, OS)

Message group (for example, LINK, LINEPROTO, or OIR)

Message code (for example, UPDOWN or GO_ACTIVE).

The logging correlator mechanism, running on the active route processor, begins queueing messages matching the ones specified in the correlation rules for the time specified in the timeout interval of the correlation rule.

The timeout interval begins when the correlator captures any alarm message specified for a given rule.

Configuring Hierarchical Correlation Rule Flags

Hierarchical correlation is when a single alarm is both a root cause for one correlation rule and a non-root cause for another rule, and when alarms are generated resulting in a successful correlation associated with both rules. What happens to a non-root-cause alarm hinges on the behavior of its correlated root-cause alarm.

There are cases in which you want to control the stateful behavior associated with these hierarchies and to implement flags, such as reparenting and reissuing of nonbistate alarms. This task explains how to implement these flags.

What To Do Next

To activate a defined correlation rule and rule set, you must apply them by using the logging correlator apply ruleand logging correlator apply rulesetcommands.

Applying Logging Correlation Rules

This task explains how to apply logging correlation rules.

Applying a correlation rule activates it and gives a scope. A single correlation rule can be applied to multiple scopes on the router; that is, a rule can be applied to the entire router, to several locations, or to several contexts.

Note When a rule is applied or if a rule set that contains this rule is applied, then the rule definition cannot be modified through the configuration until the rule or rule set is once again unapplied.

Note It is possible to configure apply settings at the same time for both a rule and rule sets that contain the rule. In this case, the apply settings for the rule are the union of all these apply configurations.

Applying Logging Correlation Rule Sets

Applying a correlation rule set activates it and gives a scope. When applied, a single rule-set configuration immediately effects the rules that are part of that given rule set.

Note Rule definitions that were previously applied (singly or as part of another rule set) cannot be modified until that rule or rule set is unapplied. Use the no form of the command to negate usage and then try to reapply rule set.

Modifying Logging Events Buffer Settings

Logging events buffer settings can be adjusted to respond to changes in user activity, network events, or system configuration events that affect network performance, or in network monitoring requirements. The appropriate settings depend on the configuration and requirements of the system.

This task involves the following steps:

•Modifying logging events buffer size

•Setting threshold for generating alarms

•Setting the alarm filter (severity)

Caution Modifications to alarm settings that lower the severity level for reporting alarms and threshold for generating capacity-warning alarms may slow system performance.
Caution Modifying the logging events buffer size clears the buffer of all event records except for the bistate alarms in the set state.

SUMMARY STEPS

1. show logging events info

2. configure

3. logging events buffer-size bytes

4. logging events threshold percent

5. logging events level severity

6. endorcommit

7. show logging events info

DETAILED STEPS

Command or Action

Purpose

Step 1

show logging events info

Example:

RP/0/RSP0/CPU0:router# show logging events info

(Optional) Displays the size of the logging events buffer (in bytes), the percentage of the buffer that is occupied by alarm-event records, capacity threshold for reporting alarms, total number of records in the buffer, and severity filter, if any.

Step 2

configure

Example:

RP/0/RSP0/CPU0:router# configure

Enters global configuration mode.

Step 3

logging events buffer-sizebytes

Example:

RP/0/RSP0/CPU0:router(config)# logging events buffer-size 50000

Specifies the size of the alarm record buffer.

•In this example, the buffer size is set to 50000 bytes.

Step 4

logging events thresholdpercent

Example:

RP/0/RSP0/CPU0:router(config)# logging events threshold 85

Specifies the percentage of the logging events buffer that must be filled before the alarm logger generates a threshold-crossing alarm.

–Entering no exits the configuration session and returns the router to EXEC mode without committing the configuration changes.

–Entering cancel leaves the router in the current configuration session without exiting or committing the configuration changes.

•Use the commit command to save the configuration changes to the running configuration file and remain within the configuration session.

Step 7

show logging events info

Example:

RP/0/RSP0/CPU0:router# show logging events info

(Optional) Displays the size of the logging events buffer (in bytes), percentage of the buffer that is occupied by alarm-event records, capacity threshold for reporting alarms, total number of records in the buffer, and severity filter, if any.

•This command is used to verify that all settings have been modified and that the changes have been accepted by the system.

Modifying Logging Correlator Buffer Settings

This task explains how to modify the logging correlator buffer settings.

The size of the logging correlator buffer can be adjusted to accommodate the anticipated volume of incoming correlated messages. Records can be removed from the buffer by correlation ID, or the buffer can be cleared of all records.

•Use this step to verify that records for particular correlation IDs have been removed from the correlated event log.

Displaying Alarms by Severity and Severity Range

This task explains how to display alarms by severity and severity range.

Alarms can be displayed according to severity level or a range of severity levels. Severity levels and their respective system conditions are listed in Table 1 under the "Alarm Severity Level and Filtering" section.

(Optional) Displays logging events occurring after the specified time stamp and within a severity range. The month, day, and year arguments default to the current month, date, and year, if not specified.

•In this example, alarms with a severity of warnings (severity of 4), errors (severity of 3), and critical (severity of 2) that occur after 22:00:00 on May 7, 2004 are displayed. All other messages occurring before the time stamp are omitted.

Displaying Alarms According to a Time Stamp Range

Alarms can be displayed according to a time stamp range. Specifying a specific beginning and endpoint can be useful in isolating alarms occurring during a particular known system event.

This task explains how to display alarms according to a time stamp range.

Clearing Alarm Event Records and Resetting Bistate Alarms

This task explains how to clear alarm event records and bistate alarms.

Unnecessary and obsolete messages can be cleared to reduce the size of the event logging buffer and make it more searchable, and thus more navigable.

The filtering capabilities available for clearing events in the logging events buffer (with the clearloggingeventsdelete command) are also available for displaying events in the logging events buffer (with the showloggingeventsbuffer command).

The following configuration example shows how to set the capacity threshold to 90 percent, to reduce the size of the logging events buffer to 10,000 bytes from the default, and to increase the severity level to errors:

!

logging events threshold 90

logging events buffer-size 10000

logging events level errors

!

Increasing the severity level to errors reduces the number of alarms that are displayed in the logging events buffer, because only alarms with a severity of errors or higher are displayed. Increasing the threshold capacity to 90 percent reduces the time interval between the threshold crossing and wraparound events; the logging events buffer thus does not generate a threshold-crossing alarm until it reaches 90 percent capacity. Reducing the size of the logging events buffer to 10,000 bytes decreases the number of alarms that are displayed in the logging events buffer and reduces the memory requirements for the component.

PLATFORM-ALPHA_DISPLAY-6-CHANGE : Alpha display on node 0/1/CPU0 changed to IOX RUN in state default

These messages are similar. To see only one message appear in the logs, one of the messages is designated as the root cause message (the one that appears in the logs), and the other messages are considered non-root-cause messages.

The root-cause message is typically the one that arrives earliest, but that is not a requirement.

logging correlator rule node_status type nonstateful

timeout 4000

rootcause PLATFORM INVMGR NODE_STATE_CHANGE

nonrootcause

alarm PLATFORM SYSLDR LC_ENABLED

alarm PLATFORM ALPHA_DISPLAY CHANGE

!

!

In this example, the correlation rule named node_status is configured to correlate the PLATFORM INVMGR NODE_STATE_CHANGE alarm (the root-cause message) with the PLATFORM SYSLDR LC_ENABLED and PLATFORM ALPHA_DISPLAY CHANGE alarms. The updown correlation rule is applied to the entire router.

PLATFORM-ALPHA_DISPLAY-6-CHANGE : Alpha display on node 0/1/CPU0 changed to IOX RUN in state default

the correlator forwards the PLATFORM-INVMGR-6-NODE_STATE_CHANGE message to the syslog process, while the remaining two messages are held in the logging correlator buffer.

In this example, the show sample output from the show logging events buffer all-in-buffer command displays the alarms stored in the logging events buffer after the 4-second time period expires for the node_status correlation rule:

The show logging correlator buffer correlation ID command generates the following output after the one minute interval expires. The output displays the alarms assigned correlation ID 12 in the logging correlator buffer.

The following example shows how to configure a correlation rule for the LINK UPDOWN and SONET ALARM messages:

!

logging correlator rule updown type stateful

timeout 10000

rootcause PKT_INFRA LINK UPDOWN

nonrootcause

alarm L2 SONET ALARM

!

!

logging correlator apply rule updown

all-of-router

!

In this example, suppose that two routers are connected. When the correlator receives a root-cause message, the correlator sends it directly to the syslog process. Subsequent PKT_INFRA-LINK- UPDOWN or L2-SONET-ALARM messages matching the rule are considered leaf messages and are stored in the logging correlator buffer. If, for any reason, a leaf message (such as the L2-SONET-ALARM alarm in this example) is received first, the correlator does not send it to the logging events buffer immediately; the correlator, instead, waits until the timeout interval expires. After the timeout, if the root message is never received, all messages in the logging correlator buffer received during the timeout interval are forwarded to the syslog process.

In this example, the correlation rule named updown is configured to correlate the PKT_INFRA-LINK-UPDOWN alarm (the root message) and L2-SONET-ALARM alarms (leaf messages associated with PKT_INFRA-LINK-UPDOWN alarms).

logging correlator rule updown type stateful

timeout 10000

rootcause PKT_INFRA LINK UPDOWN

nonrootcause

alarm L2 SONET ALARM

In this example, the updown correlation rule is applied to the entire router:

logging correlator apply rule updown

all-of-router

The following example shows sample output from the show logging events buffer all-in-buffer command. The output displays the alarms stored in the logging events buffer after the one minute time period expires for the updown correlation rule configured:

Note Only the first LINK UPDOWN root message is forwarded to the syslog process during the timeout interval.

The following example shows output from the show logging correlator buffercorrelationID command generated after the one-minute interval expires. The output displays the alarms assigned correlation ID 46 in the logging correlator buffer. In the example, the PKT_INFRA-LINK-UPDOWN root-cause message and L2-SONET-ALARM leaf messages generated during the timeout interval assigned correlation ID 46 are displayed:

Note The subsequent PKT_INFRA-LINK-UPDOWN and L2-SONET-ALARM leaf messages generated during the timeout interval remain in the logging correlator buffer because they are leaf messages.

The following example shows output from the show logging correlator buffer correlationID command. The output displays the alarms assigned to correlation IDs 46 and 47, the correlation IDs associated with the PKT_INFRA-LINK-UPDOWN and L2-SONET-ALARM root-cause messages:

RFCs

RFCs

Title

No new or modified RFCs are supported by this feature, and support for existing RFCs has not been modified by this feature.

—

Technical Assistance

Description

Link

The Cisco Technical Support website contains thousands of pages of searchable technical content, including links to products, technologies, solutions, technical tips, and tools. Registered Cisco.com users can log in from this page to access even more content.