Effective alarm management requires planning, maintenance

ISA 18.2 specification offers a roadmap to successful implementation

Every year existing HMI/SCADA systems are upgraded. Engineers update control systems seeking new technology or better support. HMI screens are ported or redrawn, storage methods are configured, and security roles are created. With new technology HMI screens become more effective through consolidation of information and enhanced visualization. Storage becomes smarter by utilizing advanced algorithms to store data faster and make it more accessible for data mining.

More devices, more data, and more alarms. Often overlooked or ignored are the questions what and why. Why is this data important? What is the purpose of this HMI display? Why is this alarm triggered?

In 2009 the International Society of Automation sought to help guide engineers and operators in the management of alarm systems. Written to aid in alarm management, the ISA 18.2 specification defines a simple, iterative process to maintain operability and efficacy of alarms in automation systems.

Alarms

The ISA 18.2 specification defines an alarm as “an audible and/or visual means of indicating to the operator an equipment malfunction, process deviation, or abnormal condition requiring a response.” The stipulation that alarms demand responses predicates that when designing the application, contributors must be familiar with the automation system in operation so as to correctly characterize what process values or device states represent valid alarms by definition.

Whether or not an application is old or new, alarms are a depiction of the system itself. Too many alarms and the image becomes cluttered; too few and the image is blurry. The ISA 18.2 specification and the guidelines proposed help eliminate poorly defined alarms from the initial design to keep the alarm system manageable.

Alarm management lifecycle

In order to qualify all potential alarms with a consistent procedure, the ISA 18.2 specification outlines an Alarm Management Lifecycle. The lifecycle is an iterative process diagram that begins with an alarm philosophy, guides through the rationalization and implementation of new alarms, defines a maintenance cycle, and recommends periodic auditing of the alarm system.

The process identified by the ISA 18.2 specification begins with a discussion of the individualized philosophy followed by a design, implementation, and change cycle fed by operational and maintenance-based feedback periodically checked by an auditing process.

Philosophy

Alarm philosophy is the first step in the alarm management lifecycle defined in the ISA 18.2 specification and fundamental to the process. Without a good plan defining roles for implementation, operation, maintenance, and periodic auditing, it is easy for any organization to let an alarm system degrade. The philosophy should document objectives and be revisited during auditing to ensure that it evolves with the alarm system.

Design, implementation, and change cycle

For existing systems it is important that the process begins with the operational feedback stages of the alarm management cycle as information ascertained from that process is crucial to eliminating alarms that are not practical or effective. Designing new alarms should include operations and maintenance representatives to ensure viability. Following the documented philosophy, a new alarm must be identified as necessary before it is implemented and organized into the system.

Operational feedback

The operational feedback loop observed as part of the alarm management lifecycle in the ISA 18.2 specification is the basis of the process for continuously improving an alarm system. Once an alarm system is in place and operational, the alarm management lifecycle shrinks in scope to the steps of operation, maintenance, monitoring and assessment, and auditing.

Emphasis should be placed on gaining insight from the live system by running periodic reports that identify key performance indicators of the alarm system. Key performance indicators are essential to identifying nuisance alarms such as stale or chattering alarms, acquiring root cause analysis evidence and Pareto charts to provide vision into the operations of an alarm management system.

Combining ISA 18.2 with Engineering Equipment and Materials Users’ Association (EEMUA 191) best practices can help applications reduce unnecessary stoppages and lead to a more effective alarm system.

Load balancing

For applications that have multiple operators per shift, the ability to balance the load of alarms through alarm filtering can lead to better productivity and reduce alarm load per operator station. The ISA 18.2 standard recommends the following metrics for alarm performance based on at least 30 days of data:

If all unnecessary alarms are eliminated, then the issue becomes routing the alarms proportionally to the appropriate level of staffing. Many alarming systems allow for alarm areas to be configured for any desired organization. Filtering on those alarm areas can allow for simple alarm load balancing.

Reporting

Periodic reporting is an important tool to not only monitor current operations, but gain insight over time to be used in larger scale system audits. Reports can be scheduled to run continuously and customized to summarize any metric the data allows. In general, reports can be categorized as relating to individual alarms, concerning operations, or examining the system as a whole.

For successful monitoring and assessment of an alarm system, it is important to generate reports from all categories to include a variety of perspectives. Every application looking to integrate the ISA 18.2 specification and EEMUA 191 standards to an alarm management system should consider using standard reports for the example alarm metrics list below.

Individual alarm reports

Reports designed to visualize alarm distribution, alarm chattering, and alarm frequency can be plotted against any time period and often offer a way to dynamically change the time period if necessary. The example report below parameterizes the start time, end time, and interval when the report is run to allow users to execute the report desired.

This report shows a good example of the alarm distribution based on a single week of operations and is organized by priority. While providing detailed information, this report shows a general increase of alarms over time. Although important, this report can’t accurately stimulate change, but rather requires further investigation.

Other individual reports, such as alarm chattering and alarm frequency, can allow insights into specific alarms that should be considered for modifications. Alarm chattering reports can expose specific alarms that tend to enter and exit alarm condition multiple times within a predefined time period. The configuration of these alarms may be too sensitive, require a deadband, or be redundant.

The first occurrence of the alarm may indeed require a response; subsequent occurrences are an operator distraction. To identify these chattering alarms, the simple report below reveals the average number of times the alarm triggered within the interval period of 60 seconds and the number of times this happened within the reporting period. These alarms can be changed or marked for review at the next scheduled audit.

System-wide alarm reports

For alarm systems it is essential that the operations also be viewed as one entity. Calculating the average alarm rate per time period is the most basic of these reports but will allow the health of the system to be monitored over time and is an important metric for the auditing process.

The report above can be modified to detail the average alarm rate per 10 minutes, per hour, or per day to provide this information for auditing purposes. Other reporting templates can be used to detail metrics such as peak alarm distribution.

Operator-based reports

The function of the operator response time report can be to reveal alarms that are ignored or become stale as well as to identify alarms that may be hidden by filters. Alarms that are left in the system the longest should also be considered during audit to identify their value in the system. Filters should be checked to make sure that they do not remove alarms unintentionally. The report below illustrates an example operator response time report and details the minimum, maximum, and average time to respond as well as the time to return.

Audit

Auditing is a continuous and iterative process that, combined with reporting tools, can reveal vital information of operations. Auditing is a process of periodic evaluations defined by the philosophy as outlined in the ISA 18.2 specification. The goal of an audit is to identify problems in the process and to continuously monitor, assess, and update the alarm system.

To reach the performance metrics mentioned in the ISA 18.2 specification and Table 1, the information must first be exposed. Through scheduled reporting and circulation of findings, contributors can identify the alarms and subsystems that represent the least effective processes to target first.

A critical tool to eliminate redundant alarms is to view the alarm system as a whole and to look at the interactivity between alarms. Alarm cross-correlation reports can show connections between alarm conditions as parent-child relationships indicating the percentage of times that an alarm triggers with another. By identifying child alarms through this type of root cause analysis, auditing can revisit the intention of the alarm to see if it is unnecessary or poorly defined. Typically the root condition is the only true alarm requiring a response.

Strategize, collect and compare

There are always barriers to beginning a project of this magnitude, but below are a few ideas to start the process. The ISA 18.2 specification is a list of guidelines. Some applications can benefit from full compliance, while others can from even basic adherence.

For any running system, data will need to be gathered in order to begin. If reports are already being generated, consider adding some of the suggested reports to highlight problem areas.

Does the system meet the performance metrics? Are too many alarms coming in? Are operator response times acceptable? Are there unnecessary alarms? Use all available information to define a plan of action.

Identify, rationalize, design and implement changes. Document any changes. This will help clear the system and reduce the noise created by extra alarms.

Define an audit schedule

Plan a recurring audit to keep the system manageable. As processes change so should the alarm management system. Through continuous system monitoring, assessment, and auditing, growing alarm management systems will evolve efficiently without degradation according to an applications alarm philosophy.

Table 1: ISA 18.2 Alarm Performance

Annunciated Alarms per time

Very likely to be ACCEPTABLE

MAXIMUM Manageable

Annunciated Alarms per day per operating position

150 alarms per day

300 alarms per day

Annunciated Alarms per hour per operating position

6 (average)

12 (average)

Annunciated Alarms per 10 minutes per operating position

1 (average)

2 (average)

Conclusion

An alarm management system is an integral piece of any HMI/SCADA system and is often configured for safety concerns. Allowing an alarm system to degrade can be a major factor in poor efficiency and unsafe conditions. Following the guidelines set forth in both the ISA 18.2 specification and EEMUA 191 can transform an alarm management system to one that helps increase efficiency.

An effective alarm system also requires a closed loop. It should have continuous and preferably automatic analysis to assure that it properly utilizes the attention of operators. Alarm management systems require planning, maintenance, and iterative auditing to remain effective, but with guidelines like the ISA 18.2 specification and products that comply with EEMUA 191, any application in any industry can benefit.

Pinkham is GENESIS64 Product Manager for ICONICS, Inc. He can be reached via the company website at www.iconics.com.