Modern hydrocarbon processing facilities have become increasingly
more complex. Likewise, the risks in managing greater capacity
refineries and petrochemical complexes have
increased. Ensuring the safety of employees, the environment and physical plant
assets in the event of an unexpected process excursion cannot
be overstated. The development of new techniques and
technologies designed to improve operational safety has evolved
to meet these challenges. Operating companies are increasing
efforts to reduce the risk of catastrophic events such as the
release of toxic, reactive or explosive chemicals that can
damage the environment or plant assets, as well as, cause
injury or death to employees and the general public.

ROAD TO IMPROVING PLANT SAFETY

This journey begins with the development of the modern
process safety management (PSM) systems and requirements.
Efforts to improve plant safety were led by state-of-the-art
functional safety systems. These systems enable the orderly
shutdown of processing units when abnormal situations occur
that are beyond the capabilities of the regulatory control
system or operators to correct or to prevent a catastrophe.

While functional safety has proven successful in reducing
the probability of catastrophic events and recognizes the role
of human factors, it does not explicitly address the key roles
of management and business processes in maintaining operational
integrity and profitable performance of process plants. In this
context, what are the approaches that operating companies
should take to go beyond functional safety to proactively
measure, monitor and display a plants risk profile in
near real time so that proper actions can be taken in a more
timely manner to improve process safety performance?

Why invest time and resources to go beyond the
limitations of functional safety? To answer this
question, we must discuss the pivotal concepts of
safety-performance indicators and values (plant assets, the
environment, the public and employees) at risk from potential
catastrophic events. What are the best practices for
establishing a PSM culture along with designing, implementing
and maintaining a proactive PSM system to complement existing
functional safety systems?

HISTORY OF SAFETY MANAGEMENT SYSTEMS

As industrialization and technology progressed in the early
20th century, the pattern of intermittent catastrophes began.
In 1921, at the BASF plant in Oppau, Germany, explosions
destroyed the plant, killing at least 430 people and damaging
approximately 700 houses nearby. This explosion occurred as
blasting powder was used to breakup the storage pile of a 50/50
mixture of ammonium sulfate and ammonium nitrate. This
procedure had previously been used 16,000 times without any
mishap. In 1947, a fire and explosion in Texas City, Texas, on
the Monsanto Chemical Co.s S.S. Grandcamp while
loading ammonium nitrate fertilizer killed over 430 people.
There was no specific legislative response to these
incidents.1

Interestingly, the US Center for Chemical Process Safety
(CCPS), which provides leadership and infrastructure to promote
and advance PSM, suggests process safety was born on the banks
of the Brandywine River in the early days of the 19th century
at E. I. du Ponts black powder works. Recognizing that
even a small incident could precipitate considerable damage and
loss of life, du Pont directed the works to be built and
operated under very specific safety conditions.2
Industry has a short memory; here is a brief list of several
recent major industrial disasters with dire consequences:

1984Bhopal, India. A toxic material released
caused 2,500 immediate fatalities and many other offsite
injuries over time.

1988Norco, Louisiana. A hydrocarbon-vapor-cloud
explosion resulted in seven onsite fatalities and 42
injuries, as well as over $400 million in damages.

1989Pasadena, Texas. An ethylene/isobutene
explosion and fire caused 23 fatalities, 130 injuries and
more than $800 million in damages.

Such catastrophic safety incidents damaged the public and
the environment. They also caused significant economic loss. In
response, governments continue to enact legislation and impose
fines focused on reducing the probability of future events.
Likewise, operating companies formed safety-related consortiums
that include suppliers of process automation technology. The goal is to identify
automation solutions that can enable operating companies to
avoid catastrophic safety events through early detection and
correction. As evidenced by recent safety-related catastrophes,
such solutions have not been entirely successful.

The present state-of-the-art safety management includes
safety studies (HAZID, HAZOP, risk analysis), safety
instrumented systems (SISs) for fire and gas detection, and
emergency shutdown, abnormal situation management applications,
and operator guidance tools. As illustrated in Fig.
1, the first step in implementing a functional safety
system is the upfront analysis and conceptual design. It begins
with a meeting with all stakeholders to determine possible
hazards and hazard characteristics, and to establish the basic
scope of the project. Work then proceeds to
develop the detailed design for the SIS. The next steps
involve:

While these approaches to safety management have produced
positive results in reducing the probability of potentially
dangerous process upsets or failures, they are either static
(e.g., HAZOP studies) or reactive (e.g., emergency shutdown
systems) in nature. Their performance is also hampered by
complacency. Time passing without an incident is not
necessarily an indication that all is well. There is always a
succession of failings that lead to an incident, as shown by
the Swiss-cheese model (Fig. 2). If unchecked,
all systems will deteriorate over time, and major incidents can
occur when defects cross a number of risk-control systems
concurrently. In effect, the holes in the
Swiss-cheese model become larger. Without setting leading and
lagging indicators for each risk-critical control system, it is
unlikely that failings in these barriers will be revealed as
they arise before all of the important barriers are
defeated.

Fig.
2. Swiss-cheese model of how a hazard
can propagate and become a harmful event.

Numerous recent high-profile incidents have heightened the
awareness that organizations need to pay more attention to
process safety. By definition, process safety is a blending of
engineering and management skills focused on preventing
catastrophic accidents and near hitsparticularly,
explosions, fires and damaging releases associated with the
loss of containment of energy or dangerous substances such as
chemicals and petroleum products.

These engineering and management skills exceed those
required for managing the workplace. As industrial
infrastructures continue to age, the consequences of applying
process safety incorrectly increases with escalating
consequences, such as:

In some cases, even when executives and managers have
prioritized process safety, things still go wrong. Too often,
organizations or individuals make process-safety decisions
under pressure, or without proper context or sufficient
information. Whats missing is the ability to provide
plant personnel with real-time, proactive actionable
information about the plants risk profile via continuous
measurement, monitoring and visualization of key operating and
safety-related parameters. Result: Potentially
hazardous events can be averted without resorting to a plant
trip or an emergency shutdown. This is the goal of PSM; it
involves next-generation automation solutions aimed at making
step-change improvements in safety performance. Such systems
can provide a safety early warning and hazard avoidance
system. This should be an essential component of the
modern hydrocarbon enterprise.

By way of definition, PSM is the application of management
systems to identify, understand and control process hazards,
thus preventing process-related injuries and
incidents.3 The goal is to minimize process
incidents by evaluating the whole process. PSM came into
widespread use after the adoption of OSHA Standard 29 CFR
1910.119 Process Safety Management of Highly Hazardous
Chemicals in 1992. PSM covers:

Process safety information

Employee involvement

PHAs

Operating procedures

Training

Contractors

Pre-startup safety reviews

Mechanical integrity

Hot work

Management of change

Incident investigation

Emergency planning and response

Compliance audits

Trade secrets.

Another definition of PSM is the proactive and
systematic identification, evaluation and mitigation or
prevention of chemical releases that could occur as a result of
failures in processes, procedures or
equipment.4 PSM is intended to ensure freedom
from unacceptable risk due to:

Fire

Explosion

Suffocation

Poisoning.

Fig. 3 shows where PSM fits into the
overall context of operational integrity (i.e., keeping the
process in the pipe), and how functional safety is a key
element of PSM.

Fig.
3. Role of PSM in supporting operational
integrity.

Business case for PSM

A cost/benefit analysis is at the center of
decision-making on investments. To justify cost, it is
necessary to determine if the magnitude of the value delivered
justifies the cost in terms of time, effort and money.
Investments in safetyfunctional safety systems, abnormal
situation management applications, etc.have been made
largely to satisfy legislative requirements and to maintain the
license to operate. There is no legislation that directly
defines the requirements for a real-time PSM system or the
penalties for not implementing one. Thus, investments in a PSM
system may be made if it can be shown that it delivers a
significant, tangible reduction in the risk of a catastrophic
failure, and that it produces a measurable economic benefit for
the plant. Table 1 summarizes estimated annual
benefits associated with implementing a PSM system. For a
100,000-bpd petroleum refinery, operating for 330 days/yr
at an average refining margin of $5/bbl, the
estimated annual PSM benefit is $2.85 million. In addition to
the stated benefits from Table 1, the
incremental value-at-risk can provide ongoing
quantified measures of the economic impact from the PSM
system.

DESIGN AND FRAMEWORK FOR A PSM SYSTEM

It is important to find the right level of balance among the
various possible safety indicators so that process-safety
decisions accurately reflect the companys desired
operational risk profile. Although risk can never be
eliminated, a variety of mechanisms can be put in place to
balance desired safety outcomes with day-to-day business
imperatives and pressures.

Too often, many organizations rely heavily on failure data
to monitor performance. Thus, improvements or changes are only
determined after something has gone wrong. Often, the
difference between whether a system failure results in a minor
or catastrophic outcome is purely down to chance. The
consequence of this approach is that improvements or changes
are only determined after something has gone wrong. Discovering
weaknesses in the quality of managing the process and control
systems by having a major incident is too late and costly.
Early warning of dangerous deterioration within critical
systems provides an opportunity to avoid major incidents.

Knowing that process risks are successfully controlled has a
clear link with business efficiency, as several indicators can
be used to show plant availability and optimizes operating
conditions. Effective management of major hazards requires a
proactive approach to risk management. Information to confirm
that critical systems are operating as intended is essential.
Leading indicators that can confirm that risk controls are
contining to operate is an important step forward in the
management of major hazard risks.

Measuring performance

The main reason for measuring process safety performance is
to provide ongoing assurance that risks are being adequately
controlled. Directors and senior managers need to monitor the
effectiveness of internal controls against business risks. For
petroleum refineries and petro-chemical manufacturers, process
safety risks are a significant aspect of business risk, asset
integrity and reputation. Many organizations lack good
information to show how well they are managing major hazard
risks. This is because the information gathered tends to be
limited to measuring failures, such as incidents or near
misses.

Those involved in managing process safety risks need to ask
fundamental questions about their systems, such as:

What can go wrong?

What controls are in place to prevent major
incidents?

What does each control deliver in terms of a safety
outcome?

How do we know that the controls continue to operate as
intended?

Measuring performance before a catastrophic failure

According to James Reason, (major) accidents result when a
series of failings within several critical risk-control systems
materialize concurrently.5 Each risk-control system
represents an important barrier or safeguard within the PSM
system. A significant failing in just one critical barrier may
be sufficient to give rise to a major accident. Continuously
measuring and monitoring the actual real-time performance of
these safety barriers ensures that operational integrity is not
compromised due to degradation of barriers.

Leading and lagging indicators are set in a structured and
systematic way for each critical risk-control system within the
whole PSM system. In tandem, they act as system guardians,
providing dual assurance to confirm that the risk-control
system is operating as intended or providing a warning that
problems are starting to develop.

Leading indicators are an active monitoring
form focused on a few critical risk-control systems to ensure
continued effectiveness. Leading indicators require a routine
systematic check that key actions or activities are undertaken
as intended. They can be considered as measures of process or
inputs essential to deliver the desired safety outcome. The
leading indicators identify failings or holes in
vital aspects discovered during routine checks on the operation
of a critical activity within the risk-control system.

Lagging indicators are reactive monitoring
methods requiring the reporting and investigation of specific
incidents and events to discover weaknesses within that system.
These incidents or events do not have to result in major damage
or injury or even loss of containment, providing they represent
a failure of a significant control system that guards against
or limits the consequences of a major incident. Lagging
indicators show when a desired safety outcome has failed or has
not been achieved. The lagging indicator reveals failings or
holes in that barrier discovered following an
incident or adverse event. The incident does not necessarily
have to result in injury or environmental damage, and it can be
a near miss, a precursor event or an undesired outcome
attributable to a failing in that risk-control system.

Several organizations and standards recommend applying
leading and lagging metrics to understand the quality of the
PSM system. Several examples are:

ISA 84.00.04Recommended Practices for
Guidelines for the Implementation of ANSI/ISA-84.00.01-2004
(IEC 61511 Mod)

CCPS

The Energy Institute (EI), formerly known as the
Petroleum Institute.

The common theme of these metrics is applying key
performance indicators (KPIs) generated from the management of
the process/functional safety equipment and the people and
processes that are used in terms of their competence,
leadership and risk-management capabilities.

For example, the EI has published a Process Safety
Management framework, developed by the energy industry,
for use by various industry sectors.6 The framework
is intended to be applicable worldwide, to all process
industries such as power, petroleum, chemicals, refining, etc. The framework
encapsulates learning from people with practical experience of
developing and implementing PSM as part of an integrated
management system. It clearly sets out what needs to be done to
ensure the integrity of the operation and define what measures
should be in place and how they are performing.
Note: It is not intended to replace existing
process safety or health, safety and environmental (HSE) management
systems.

The EIs framework consists of three levels: focus
areas, elements and expectations. The focus areas set out the
high-level components of the PSM framework. Within each of the
focus areas are a number of elements. Each element contains
expectations defining what organizations need to do properly to
meet the intent of each element. Details for EIs PSM
elements set four key operating aspects that organizations
should do to ensure the integrity of the operations:

Fig. 4 shows the proposed PSM
frameworkbased on industry guidelinesand the
associated components of a well-designed PSM system to enable
real-time measurement and monitoring of a plants risk
profile. It provides actionable information that can be used to
prevent catastrophic events. Where an organization has an
existing HSE or PSM system, it may be useful to benchmark
against the framework or to carry out a risk assessment vs. the
expectations of each element and identify any aspects of the
existing system that may need enhancing.

Fig.
4. PSM framework and components.

Implementing such a PSM system establishes the foundation of a
PSM control loop. Fig. 5
illustrates such a control loop to prevent complacency from
increasing the probability of a catastrophic event due to plant
personnel ignoring leading and lagging indicators about
degradation of protection levels provided by risk-control
loops.

Fig.
5. PSM control loop.

During plant operations, systems are modified to adapt to the
changing system needs. Systems and procedures can deteriorate
over time, and system failures discovered following a major
incident frequently surprise senior managers, who sincerely
believed that the controls were functioning as designed. Used
effectively, process safety KPIs can provide an early warning
that critical controls have deteriorated to an unacceptable
level.

Measuring performance to assess how effectively risks are
being controlled is an essential part of an HSE system. This
can be accomplished in two ways:

Active monitoring. It provides feedback
on performance before an accident or incident

Reactive monitoring. It involves
identifying and reporting on incidents to check that the
controls in place are adequate, to identify weaknesses or
gaps in control systems and to learn from mistakes.

SPIs and incremental value-at-risk

After a set of KPIs have been adopted, the asset
owners management is responsible for monitoring these
KPIs and responding to deviations from their baselines. At
higher management levels, the relevance of the KPIs associated
with managing plant equipment can be lost. Therefore, it
becomes necessary to translate the individual equipment level
KPIs and their business impact into plant-level safety
performance indicators and its business impact. This concept
can be extended to any number of facilities enabling upper management
to understand the quality of PSM across the enterprise.

Using the individual equipment KPIs, a new approach allows
an asset owner to understand the overall safety state of the
plant and its economic impact on the business. In addition,
this approach is tied to the existing LOPA and financial impact
analysis.

KPI metrics are gathered based on the asset owners
management of the plant equipment, capability of employees and
processes followed to manage process safety. Typically,
1020 key metrics can be covered and include 1) management
of safety-related equipment (e.g., completion of periodic
field-device proof tests associated with a distillation column), 2) competence
of plant personnel (e.g., their level of training and skills
testing), 3) adherence to established procedures (e.g.,
near-miss investigations) and 4) leadership (e.g., involvement
of leadership in periodic, formal safety reviews). These
metrics can originate from management based on the layers of
protection (LOPs) associated with the different lines of
equipment, from at a LOP level (e.g., SIS) or at the line of
equipment level (leadership).

The safety performance indicator (SPI) is an aggregation of
the individual KPIs into a single number. The SPI can be
calculated at the equipment level (equipment SPI) and at the
plant level. Fig. 6 illustrates the owner
safety model for an enterprises global assets. This model
can consist of plants distributed over different geographic
regions. A plant is decomposed into lines of equipment (LOE),
which have LOPs associated with the plant-safety model, as
shown in Fig. 7.

Fig.
6. Asset-owner safety model.

Fig.
7. Plant-safety model with KPIs and
SPI.

Underlying the plant-safety model is a safety related KPI
framework; it addresses the management of process safety
related to plant equipment, business processes, and procedures
used to manage the equipment and the capabilities of employees
applying these processes and procedures.

Calculating the weighted KPI for a protection layer

The KPI for a LOP can be calculated as:

where:KPI_LOP = Weighted average KPI of a layer of
protectionw = Weight of a KPI7KPI = Key performance indicator related to
plant, process, people (as applicable)K = Number of KPIs for an LOPI = Index for counting number of KPIsJ = Index for counting number of LOPs.

Calculating safety performance index for equipment

Consider that a piece of equipment has a number of LOPs.
From a safety perspective, the LOPs are of different importance
and risk levels. From the LOPA, each layer has an associated
risk-reduction factor. The weighted KPIs associated with the
equipment can be aggregated and weighted, using the
risk-reduction factor associated with the LOP:

where:L = Number of layers of protectionw_lop = Weight of a layer of protection (= RRF for the
layer of protection)I = Index for counting LOPJ = Index for counting number of pieces of
equipment.

Calculating safety performance index for a facility

Consider that a facility has a number of LOEs. From a safety
perspective, LOEs are of different importance/risk levels. From
the LOPA, each LOE has associated with it a total equipment
risk. The SPIs for the LOEs can be aggregated using the total
risk factor calculated from the LOPA:

where:E = Number of pieces of equipment in a plantI = Index used to count the pieces of equipment in the
plantEQ_RISK = Total mitigated risk for a piece of
equipment 8SPI_PLANT = SPI for the plant

Estimated losses associated with LOE risk and plant

Based on the SPI, a safety performance state can be
calculated. For example, the SPI can have ranges such as good
(> 95%), warning (90% to 95%) and bad (< 90%). Associated
with each LOE is an asset impact. For example, the asset impact
may be defined as S0 to S5, as shown in Table
2. Incremental estimated asset value-at-risk is a
safety performance adjusted metric (expected value) that can be
calculated using the SPI, the safety performance state and the
asset impact.

For example, the incremental asset value-at-risk can be
estimated as follows: 100% of the asset loss value-at-risk if
the safety performance state is determined to be
bad; 50% of the asset loss value-at-risk if the
safety performance state is determined to be
warning; 0% of the asset loss value-at-risk if the
safety performance state is determined to be
good:

LOE: Estimated incremental asset value-at-risk:

=

The plant-level incremental asset value-at-risk can be
estimated by adding the estimated incremental asset
values-at-risk for the LOEs with the facility. The plant-level
incremental production value-at-risk can be estimated by adding
the incremental production values-at-risk for the underlying
lines of equipment:

For a corporation with many plants, the incremental asset
values-at-risk and the product values-at-risk can be aggregated
as:

Dashboards

To display the SPI and related incremental asset
value-at-risk and incremental production loss, dashboards can
be used, as shown in Figs. 8 and
9. The plant-level dashboard could display the
plant safety-performance data and provide drill-down capability
to the underlying KPIs for analysis of the underlying causes of
identified risks. Once identified, corrective action plans can
be defined and implemented in a timely manner to avoid costly
catastrophic safety events.

Fig.
8. Example of a corporate dashboard.

Fig.
9. Example of a plant-level dashboard.

Best practices and lessons learned

As proven with the name of the American Fuel and Petrochemical Manufacturers
(AFPMs) safety conference, i.e., the National
Occupational and Process Safety Conference, the refining and petrochemical industries are clearly
focused on PSM as a key component of their operational
strategies. To support these operational strategies, there are
nine steps or best practices to use when implementing and
maintaining an effective process safety-performance management
system:

1 A Canadian Perspective of the History of
Process Safety Management Legislation, 8th International
Symposium: Programmable Electronic System in Safety-Related
Applications, Sept. 23, 2008, Cologne, Germany.2 Center for Chemical Process Safety website: http://www.aiche.org/CCPS/Students/GetSmart.aspx.3 Center for Chemical Process Safety website: http://www.aiche.org/CCPS/Students/GetSmart.aspx.4 H. J. Toups, LSU Department of Chemical
Engineering, with significant material from SACHE 2003
Workshop.5Managing the Risks of Organizational
Accidents, Ashgate Publishing Co., 1997.6Energy Institute, London, 1st Ed.,
December 2010.7 A weight of 0 signifies that a KPI is not
used.8 This is equal to the sum of all the mitigated
risks for an item of equipment.

The authors

Martin A.
Turk, PhD is the director of Global Industry
Solutions for the HPI for Invensys Operations
Management at Houston, Texas. For most of his 40+ years
of experience, Dr. Turk has been involved in
engineering, consulting, sales and marketing activities
related to process automation. These activities include
process simulation, advanced control and
information/automation system strategic planning. Dr.
Turk is responsible for definition of industry-specific
solutions for downstream petroleum refining and petrochemicals,
participation in industry conferences and working with
Invensys clients worldwide to identify and quantify
automation opportunities in their manufacturing facilities that will provide
them with significant returns on investments. He
received his BS degree in chemical engineering from the
University of Dayton and his PhD in chemical
engineering from the University of Notre Dame. Also, he
has published technical papers and made presentations
at domestic and international seminars on a variety of
subjects related to advanced automation solutions for
the process industries.

Ajay
Mishra is the R&D program manager at
Invensys. He helps define the detailed features and technology roadmaps for the
Triconex branded safety & critical control
products. Mr. Mishra holds a BSEE degree from the
College of Engineering, Pune, India and an MBA from the
UCLA Anderson School of Management. He has over 20
years of experience in safety and critical control
systems in process control SIS, and railways systems
including product development, project engineering, project management and
product management. Mr. Mishra is a TÜV certified
Functional Safety Engineer for hardware/software design
(IEC 61508) and Safety Instrumented Systems (IEC
61511).

Have your say

All comments are subject to editorial review.
All fields are compulsory.

Very good article. One comment, please put people first then environment and then plant/ asset. After all, it is about safety first.

Vasant Nandekar08.26.2013

One of the very informative article on Process safety management providing inside view. However PSE (Process safety events) are not reducing. Just now there was incident at HPCL refinery.......

Eduardo Sandi08.21.2013

Good material. Would you please send me a copy, as a independent consulting activity could be very beneficial to me. Besides this material could be shared with my fellows into the safety activities in plants.

Dr. Michael S. Schumann06.19.2013

Very interesting article. Would you please send me a copy. I would like to share this with my occupational safety students. Dr. Michael Schumann

Harshal Tirpude06.07.2013

Excellent article with brief history on Safety Management systems and helps to understand the PSM framework with overall operational & assets integrity.

rigoberto05.26.2013

muy bueno podri enviarmelo a mi email. gracias

Emilio Plascencia05.25.2013

Excellent article but I think you should have separate emergency shut down systems (ESD), as to the swiss cheese theory James Reason, Professor of Psychology at the University of Manchester, globally recognized expert in the study of human error in high-risk technological systems and the role played by it in major disasters consider distributed control systems (DCS) and ESD industry managed to reduce chemical risks proferos investigating the Reason.

Paulo Secco05.23.2013

Dear Sir/Madam,

Excellent article! Please may I have copy of full article?

Best Regards,

Paulo Secco

Prashant Ambaskar05.22.2013

Very informative and interesting article.Helps understand the concept better for reinforcing the PSM concepts in the company.Would appreciate if you could send me complete article.

Erwan Saiful05.22.2013

Dear authors,It's a very good and informative article where gives clear and straight explanation where PSM fits into overall operational and assets integrity.Thank you,

Allan Sandosham04.18.2013

As someone who is currently implementing a PS KPI program at an oil and gas company, I see that carefully defining each PS KPI and collecting accurate data from the relevant departments to calculate the PS KPI value each month or quarter is challenging enough. Manipulating the data into the integrated SPI as defined above is definitively a stretch goal. Would it be possible to see some specific worked out examples of the calculation being conducted.

Shahid Saeed04.12.2013

Interesting article.Could you please send me copy of this article?

Thanks & Regards,

03.29.2013

very nicely explained the PSM thing...

Gustavo Heins03.13.2013

Safety work places mean more production by man-hour

saidulu banothu03.11.2013

Dear Sir/Madam,

My self Saidulu Banothu, HSE Design Engineer in Petrofac Oil and Gas as Risk and I would like to have full article, if possible could you please send me this article.