This paper reviews the literature on resilience engineering as a safety management approach. Preferred Reporting Items for Reporting Systematic Reviews guidelines were used to search, select and evaluate 46 published works. The terms organisational resilience and resilience engineering are clarified, and functionalist and interpretive research frameworks used to analyze articles. This review suggests there is no universally agreed definition of resilience engineering; but it involves a collective aspect, is multifactorial, multilevel and multidimensional; associated with four key principles (anticipation, response, learning and monitoring) and successful outcomes. The gap between work as imagined and work as performed is an important aspect. Studies on resilience engineering have predominantly involved qualitative investigations; with data collected through site observations, safety audits and surveys. Eight research gaps were identified, and suggestions made on how these gaps can be addressed through empirical research.

Effective management of safety is an integral part of organisation’s risk management throughout the world. This is because more than 2.7 million workers die from work-related accidents and diseases, over 374 million people suffer from non-fatal accidents and injuries [1] . These are expected to increase further as organisations are challenged by the risks posed by globalisation, advanced technologies, and increasing complexity of organisations [2] [3] . A number of institutional, regulatory and structural arrangements have been implemented to address the impact of these developments. These include a strategy for sustainable prevention [4] , visions of zero accidents [5] or healthy, safe and productive working lives [6] . In order to support the above a number of safety management strategies and approaches have been developed, trialed and deployed for use. Many of these arose out of findings of commissions of inquiries into major catastrophic events and organisational disasters such as Three Mile Island, Bhopal and Chernobyl [7] , Piper Alpha [8] , Columbia and Challenger [9] . As such a wide range of safety management approaches exist, including:

1) safety laws, standards, procedures and rules,

2) human error and behavioural control initiatives,

3) designing for safety initiatives,

4) improvements in physical working conditions,

5) safety management systems,

6) safety culture,

7) organisational learning and high-reliability organisations.

Conceptually, the above approaches have been suggested to have evolved over five ages [10] or three eras [11] [12] of safety. According to this classification, a better understanding of how accidents were predominantly caused and how the learnings for these could translate into improved safety management practices marked each era. However, because workers continue to die or be impacted by serious work-related injuries, there are serious concerns that many of the existing approaches for managing safety have failed to bring about sustainable improvements. Many of the approaches themselves are over four decades old, are outdated and have not kept pace with developments in organisational theory, and more innovative solutions are required to drive safety improvements further than what has been achieved. The central tenet of this paper is that resilience engineering, which is a recent strategy for managing safety in high-risk complex organisations [13] [14] represents this innovation.

2. Resilience Engineering (RE)

RE was introduced in the safety domain in 2003 as an alternative to explaining how organisational disasters such as Challenger and Columbia occurred repeatedly. On February 1, 2003, Columbia disintegrated upon entry to earth, killing all seven crew members. The first report of this disaster, released by the Columbia Accident Investigation Board in 2003, identified three main factors, which preceded the disaster. These included i) physical failures which led directly to Columbia’s destruction, ii) underlying weaknesses in the National Aeronautics and Space Administration’s (NASA) organization’s and history that paved the way for catastrophic failure, and iii) other significant observations unrelated to the accident itself [15] . Two years later, authors such as Woods [16] , Woods [17] identified five general patterns of organizational behaviour predominant in disasters such as Columbia. These included:

1) a drift towards failure as defenses eroded under pressures of production,

2) organizations taking past success as a sign of confidence, instead of investing in efforts at future potentials of failure,

3) using a fragmented and distributed problem-solving process which clouded the bigger picture about risks and their effective management,

4) an inability to revise and manage risk assessments as new evidence emerged, and

5) breakdowns at the boundaries of organizational units, which impeded communication and coordination.

RE was developed as a solution for overcoming these, and similar concerns, in organisations operating in similar organizational contexts as NASA. Since being introduced RE is gaining momentum, with research published mostly from aviation, healthcare, nuclear power plants, petrochemical facilities; with some from electricity distribution, manufacturing, railways and construction [18] .

However, there are a number of fundamental problems with RE, including the lack of a universally accepted definition of organisational resilience (OR) or of RE, both of which have been used interchangeably in the literature [19] [20] . A multitude of factors, indicators and measures have been used in RE studies, so developing a nuanced understanding of what it is (or not), and how it can be used to improve safety continues to challenge academics, researchers and practitioners [19] . Moreover, no commonly accepted model of RE currently exists. The present paper, which is part of a larger PhD research project [18] , seeks to address this gap by analysing different schools of thought, definitions, conceptual developments and empirical evidence on RE.

Research Aim

The aim of the present paper is to develop a research framework for future studies of RE as a safety management strategy. To achieve the aim three research objectives were proposed:

1) Establish a common understanding of RE through an integrative review,

2) Explore how RE has been operationalized and researched, and

3) Develop a future research agenda for investigating RE.

3. Research Methods

This research method used included an integrative review. Such review analyzes and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated [21] [22] [23] . It addresses both mature and emerging topics, allows for the inclusion of a wide range of methods, theoretical and empirical literature; multiple perspectives; and were important for social policy [23] [24] . It can be used to evaluate the strengths of evidence, identify gaps in the current research and need for future research, build a bridge between related areas of published works, generate research question(s), identify a theoretical or conceptual framework, and explore which methods have been used successfully [25] [26] . In terms of scholarship, an integrative literature review stands alone as a form of research [22] [27] , and has been previously used to synthesize topics such as risk management [28] , resilience [29] [30] and safety culture [31] .

The specific method used included an adaptation of the recommended guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [32] . The four key steps used included identification of the relevant literature, screening the abstracts, full-text reading to check for eligibility and inclusion and excluding criteria [19] , illustrated in Figure 1. PsycINFO, Social Science Index (SSCI) and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases were searched through the EBSCOHost platform using organisational resilience* and resilience engineering* as keywords.

The initial search resulted in 103 articles after duplicates were removed. This included four books [13] [33] [34] [35] , which also included chapters picked up in the initial selection. An additional 20 identified through Google Scholar. Abstracts of these 123 articles were read, and 66 removed because they focused on topics other than safety management (resilience in communities, psychological, children and youth, climate change, ecology, sustainability, state-space theory) [19] . Full texts of the remaining 57 articles were fully reviewed to see they could be included or excluded from the final review. In order to be included, articles needed to be published in English, peer-reviewed, and include one or more of the following: definitions, measures/dimensions/factors or key concepts. Nine articles were removed at this stage, resulting in a sample of 46 papers for final

review. While this is not an exhaustive list of all articles, they provide a representative overview of what has been published on the topic.

4. Findings and Discussion

The next section examines the definitions used to explain RE. Because the published works referred to OR constantly, no review of RE can be considered complete without a summary of those aspects of OR that have been suggested to play an important role [19] . As such these are reviewed first.

4.1. Definitions of OR

There is no common understanding of OR, fifteen definitions extracted from the articles reviewed are summarised in Table 1.

The most common theme from the above definitions of OR is an ability to cope with adverse circumstances, disasters and disturbances. This is reflected in the key ideas such as recovery [39] [45] , withstanding major disruptions [40] [48] [49] , absorbing disturbances and change [50] , and adjust, adapt and/or maintaining function and structure [42] . Most of these definitions see it as a reactive capability, and many of these are also attributes of complex adaptive systems (CAS) [51] .

However, there appear to be two things that set resilient organisations apart from CAS.

The first is their ability to continue performing well without being affected significantly during a catastrophe or disaster; the fact they “operate smoothly even in difficult situations” [37] , “recover quickly to a stable state” [41] , and even “thrive during adversity” [47] serve to illustrate this point. The reaction of any organisation or system always depends on the severity of the catastrophic event. Organisations or systems that can thrive in the face of catastrophe are also rare. While most organisations react to stress by shutting down completely or operating at reduced capacity, resilient organisations absorb such stresses and continue to operate normally. They are able to this very well because this is what they are exposed as part of their normal operations [41] [49] . These organisations can (and sometimes do) fail, but their repertoire of capabilities allow them to recover, recalibrate and continue operations without being significantly affected.

The second is their ability to deal effectively with more than normal, every-day threats and disturbances. This involves going beyond past experiences and being prepared to deal effectively with unknown events, threats, and/or “unexampled hazards” [39] . For Malakis and Kontogiannis [50] it was about effectively dealing with disturbances, disruptions and changes “beyond the textbook envelope”. Resilient organisations assumed their models safety management were incomplete, so they proactively tested their assumptions and approaches, with the belief that they would be able to perfect this over time through learning from events and near-events [43] . These created an opportunity for imagination, innovation, and creativity; all of which provided opportunities for learning and bouncing forward.

Similar to OR, there is no single accepted definition of; examples of ten definitions are summarised in Table 2.

Three of the definitions suggest it is a new paradigm of safety management, which specifically focuses how people cope with complexity under pressure [52] [54] [60] . A paradigm “is an overarching conceptual construct, a particular way in which scientists make sense of the world or segment of the world” [61] . As a construct, it is not something that can be seen, felt, heard, tasted or smelt. The

A second definition suggests it involves adaptation from disturbances, changes, major mishaps [49] ; or those stressors that impact an organisation’s day to day operations. In this regard RE involves some level of changes, and these can be at any or all of a number of different levels, including process, practices or structures. Some aspects of the process, including behaviours and cognition; are captured in the seventh definition:

“(RE) looks for ways to enhance the ability of organizations to monitor and revise risk modes…create processes that are robust yet flexible, and… use resources proactively in the face of disruptions or ongoing production and economic pressures” [11] .

In conventional risk management suggested by Dekker, et al. [14] , safety management aims to maintain adverse outcomes as low as reasonably possible, so is reactive. This is because it is based on responding to something that has either gone wrong (failed), or has been identified as a risk, so is expected to go wrong (or fail) in the future. The alternative approach suggested by RE is to focus on success, i.e. what goes right [52] , with the aim of ensuring intended outcomes achieved are as high as possible. This shift in focus towards success also means that efforts in safety and safety management are aimed more at developing a better understanding of why and how things go right in the first place. This was a more proactive approach because organisations could make adjustments well before something happened, be better prepared to respond to hazards, threats and/or challenges by moving from state of normal operations to a state of high alert. In this alternative approach, safety showed only by the events that did not happen [52] , hence represented a positive outcome.

The fifth definition suggests it is an approach for minimising damages to systems arising from uncertain events; this is similar to the seventh which suggests it is a proactive approach to risk management, by anticipating of future risks and maintenance of safety. Risk management itself involves a process [63] . In safety management this includes identification of hazards, assessment of risks, and management and controls based on a hierarchy [64] . The identification does include an aspect of anticipation, but is usually associated with those that are known, based on what has been experienced in the past. In RE, however, there is also an emphasis on unexampled and irregular hazards as well [39] [45] .

The sixth definition suggest it is an ability of a system or organization, and therefore is the same as OR.

The eighth definition suggests it is a process of development, associated with everyday operations, can be decomposed at cognitive and behavioural dimensions, and present at all levels. This suggests that RE is multi-dimensional and multifactorial, associated with processes and can be examined, explored, investigated, assessed and/or measured in normal, everyday work settings; and decomposed at cognitive and behavioural levels. This is somewhat different to the next definition by Shirali, et al. [59] , who suggested the capacity was mainly in dealing with unexpected events. The last definition captures three key capabilities (creating foresight, anticipating risks and coping with complexities), conditions under which these become essential (under pressure) and a pathway (towards successful operations).

Towards a Unified Definition of RE

The above examination suggests that a clear formulation of RE is lacking, and it is still in the “midst of defining itself” [56] (p. 32). Sheridan [20] probably captured the concept best as “a family of ideas.” What is clear that it is a complex phenomenon, involves adjustments and/or adaptations, associated with proactive management of safety risks, and is a feature that extends beyond individuals. In other words, while an individual may exhibit capabilities or traits of resilience, this is not sufficient to drive RE. In this regard RE represents a sophisticated way for managing organisational safety; the sophistication not being in the technology, but in the way one thought about safety, accidents and risks, and how these could be better managed using existing approaches, methods and approaches but in more innovative ways [19] . The main shift being from reactive to proactive approaches, a focus on successes (in addition to failures), and on everyday operations (instead of only during emergencies and/or crises situations).

4.3. Theoretical Perspectives on RE

In seeking to address the relationship between knowledge, theory and research, Burrell and Morgan [65] claimed that knowledge was paradigmatic, encompassed a distinct worldview and governed the choice of research strategies and methods. The author suggested social and organisational research could be located along four paradigms; functionalism, interpretivism, radical structuralism and radical humanism. Mendonça [56] suggested that positivist and interpretive perspectives were two of the most commonly used frameworks in RE studies. Positivism suggests that social phenomena can be scientifically investigated by decomposing it into objective constructs, and any link and association between them can be discovered using methods of the natural sciences [61] , so is similar to the functionalist perspective [65] . Interpretivism, on the other hand, suggests there is no objective truth of the social world so they cannot be discovered, but any linkages or connection that may exist can be constructed by examining and/or exploring different aspects of that social world. The next section reviews articles published from these two perspectives.

4.3.1. RE from a Functionalist Perspective

Thirteen papers were published from a perspective, these are summarised in Table 3. Seven papers were theoretical in nature, while six included empirical studies.The theoretical papers included those which proposed selected factors and/or indicators [11] [41] [66] [67] [68] [69] which could be used to assess and measure RE in industries such as healthcare [66] [67] , chemical manufacturing [68] [69] [70] and mining [11] . Empirical papers included research published from industries such as aviation [71] [72] and chemical plants [59] [73] . The main methods used for collecting and analysing data in these studies included safety audits [54] [71] surveys [59] [72] [73] and lab experiments [74] .

While these studies continue to add to the body of knowledge of RE, they have their limitations. In part, this is based on the assumptions that are inherent in the functionalist perspective, which assumes that society has a concrete existence, follows certain order, and the existence of an objective and value-free social science which can produce true explanatory and predictive knowledge [61] . According to this perspective, RE is expected to be a concrete, objective “thing”; which can be objectively measured using scientific methods, through “formal propositions, quantifiable measures of variable, hypothesis testing, and drawing inferences from a representative sample” [56] . However, as discussed previously, there is no uniform definition of RE (although it is associated with a range of abstract, social phenomena normally associated with organisations).

Many of the indicators used in some these studies appear to be inconsistent with the basic tenets of RE, including the understanding that resilience can be measured and/or quantified through tabulations of errors, behavioural and/or other factors [52] . Resilience itself has been suggested to be a dynamic and emergent ability, and cannot be measured, at least not until after an impact [45] .

There are also limitations with the two main methods used in the studies above, safety audits and surveys.

Safety audits are widely used in safety practice, and have been used in three studies. A common shortfall with audits is that it limits the inquiry to those elements identified in the audit tool [54] [73] . This is one of the reasons why its effectiveness has been criticized by authors such as Hopkins. In his analysis of the Longford Esso gas explosions, he argued that auditing only provided only good news and failed to identify problems that later acted as precursors to the disaster [78] . Learning from failures is an important aspect of RE, in addition to learning from success, so audits may not be adequate in identifying deep, latent sources of failures that is common with most incidents.

Surveys collect data through a series of standardised questions based on a number of psychometric indicators. Many of the aspects included in these studies have also been associated with safety culture, which itself has been suggested to be ill-defined and largely misunderstood [79] . As the author argued, “aggregated numbers, like frequencies or means, do not offer much insight into an organisational safety culture, much less understanding of it” [79] . The measurement of a similar concept would be expected to generate a relatively superficial description of RE. What these approaches are unable to extricate are the dynamic work practices and social interactions that occur every day in teams and groups, between managers and workers, and between those conducting the work and the work itself; because these cannot be described in the words of a survey question [80] . Recognizing and understanding these interactions are clearly important as they contribute to the ‘success’ aspects of safety, and resilience [81] .

The quantitative results of surveys and audits are generally relied upon as measures of safety performance in many organisations. High results would generally be treated as better performers; hence the scientific merits of the data produced from both surveys and audits have an important bearing on organisational decision-making. However a systematic review on content validity of audits suggests there has been little research in this area [82] . Moreover, the inter-auditor reliability was also poorer than expected in integrated ISO 9001/ 14001 assessment instruments [83] . The limited number of empirical published research on the validity and reliability of survey instruments for measuring RE is limited. This is a significant gap in the literature, therefore there is a clear need to develop, test and validate these instruments further.

Both audits and surveys provide a snapshot of selected aspects of RE using a selected set of indicators and factors. This can be useful for diagnosing an organisation’s potential for RE. However, in order to be more useful in safety practice, it is necessary to have a clear understanding of the key interactions that occur among or between those factors, the influence of these factors and whether these hinder or enhance RE for safety management.

4.3.2. RE from an Interpretive Perspective

In contrast to the functionalists, the interpretive perspective seeks to understand the world as it exists [65] . This perspective views the social world as an emergent process, so is consistent with the views of RE, which suggests that safety is an emergent property of a socio-technical system [14] . Thirty-three papers published from this perspective are summarised in Table 4. Again, these studies included a combination of conceptual and empirical works. Conceptual papers generally discussed theoretical backgrounds and developments in RE [16] [17] [57] [84] , key factors and/or indicators [40] [58] [76] [85] - [90] , and how these could be integrated into approaches such as prescription and practice of work [75] [85] , safety culture [17] [58] [91] , safety management approaches [20] [40] [76] [86] and prevention of major disasters [92] .

An analysis of the literature suggests RE has been published from a number of dimensions. The most common of these include cultural, behavioural and cognition [19] .

4.4.1. Cultural Dimensions

The first theoretical connections between RE and culture was proposed by Carthey, et al. [66] , who suggested commitment, competence and cognisance were among the main cultural drivers and proactive measures of OR.

Flin [76] associated managerial resilience with safety culture, arguing this could be demonstrated through management commitments where “a belief that when safety and production goals conflict, managers will ensure that safety will predominate” (p. 229). Wreathall [41] , based on his previous research on leading indicators of organisational health in aviation, suggested seven key indicators of RE, including:

1) Top-level commitment

2) Awareness

3) Learning culture

4) Just culture

5) Flexibility

6) Preparedness

The author argued for a “need to tie this approach to the concepts of resilience” [41] (p. 280). Three of the indicators suggested by the author; learning, just and flexible, are also integral to safety culture [80] ; while some are also associated with managerial resilience [76] .

Woods [17] argued that RE essentially refocussed organisations that already had a culture of safety to become more proactive and adaptive, by challenging how data such as incidents were analysed and interpreted. Researchers such as Akselsson, et al. [71] , Han, et al. [58] and Macchi, et al. [91] have also demonstrated how key principles of safety culture can be examined through RE in contexts such as construction and healthcare. Pillay, et al. [11] have also proposed how these factors can be used to inform the reality gradient of safety as a quantitative indicator of RE.

The central tenet of these papers is that safety culture and RE are linked. In essence, they posit that a culture of safety is necessary for developing RE [11] [66] [71] , so they provide some support to the process view of RE. If this premise is to be accepted, it lends some support to Höpfl’s assertion that safety culture is an interpretive device [107] ; in this case one that could be used for interpreting RE.

Many of the papers discussed above, however, are conceptual and have not been empirically tested, and very few studies have sought to make a connection between the two. It could be therefore suggested that the link between safety culture and RE is still an area of further attention, both in terms of research and practice. More empirical studies are necessary to develop, understand and/or validate any association between safety culture and RE. As discussed above, safety culture is an “interpretive device” [107] , so can be used to interpret RE, and therefore acts as a tool for examining, exploring and/or measuring RE. And because safety culture itself includes artefacts, basic assumptions, attitudes, beliefs, espoused values, practices and perceptions [108] [109] , these needed to be taken into account when examining the links between safety culture and RE.

4.4.2. Cognitive Dimensions

Author such as Vogus and Sutcliffe [43] appear to be among the first to link resilience and cognition, suggesting that this was one of the resources organisations deployed when responding to current and emerging threats.

Back, et al. [74] also opined that cognitive strategies people used to support resilient performance could be used to explain their behavioural abilities to recognise, adapt and absorb changes, disruptions and surprises. The authors linked cognitive resilience with decision-making in daily operations, and used this to demonstrate how individuals detected, recovered and militated against computer failures in a series of lab experiments. The authors argued their research revealed a number of challenges in seeking to develop RE at the behavioural level, in terms of focussing on proactive safety management, and that this needed to be taken into account when new approaches and models were being implemented in industry.

Patterson, et al. [42] also made a similar observation, in that cognitive resilience enhanced sense-making during collaborative cross-checking, and this was useful in reducing human errors.

Back, et al. [88] posited that cognitive resilience could be decomposed at five levels of granularity, including i) individual, ii) small team, iii) operational, iv) plant and v) industry. Other researchers such as Bracco, et al. [44] , suggested a three-level decomposition at skill (S), rules (R) and knowledge (K). The authors argued cognitive resilience assisted in balancing mindfulness and encouraging adaptation that was necessary to maintain equilibrium in a system.

Malakis and Kontogiannis [50] concluded that cognitive resilience in air traffic control operations could be investigated by understanding failure sensitive strategies used by operators during safety critical events.

Most of the studies above imply that cognition is intrinsically linked to way people behaved and/or acted in specific contexts or simulated scenarios. Behavioural dimensions of RE are discussed in the next section.

4.4.3. Behavioral Dimensions

The idea that resilience is a behavioural characteristic arises from authors such as Vogus and Sutcliffe [43] who linked this with positive adjustments. As discussed previously, have been suggested to be decomposed at individual, small team, operational, plant and industry levels [88] . The authors used a case study to illustrate these ideas in the nuclear power plant, and concluded their framework was useful in identifying, in exploring and describing resilient behaviours in a range of small work contexts. In essence, the levels investigated in some of the behavioural dimensions represent the multiple levels of socio-technical arrangement of organisational systems that one needs to consider when addressing organisational risks [111] . Previous studies have also suggested that rehearsal (reflection-on-action) and creation of personalised cues (reflection-in-action) were some examples of cognitive strategies that could be used to support resilient behaviour [74] . These studies suggest that an examination of micro-levels of operations and activities can be used to explore and/or examine behavioural dimensions of RE in normal, everyday work.

Outside of the RE literature, Mallak [36] was among one of the first to develop and test an instrument to assess behavioural resilience in the healthcare industry. This instrument was based six factors of OR, including Goal-directed solution seeking

1) Risk avoidance

2) Critical understanding

3) Role dependence

4) Source reliance, and

5) Resource access.

These factors, however, have not informed any of the previously-discussed research in RE.

4.4.4. Granularity of RE

This aspect of RE is concerned with the way it either impacts or is expressed at different levels of system. Flin [76] suggested that resilience could be explored at managerial level, while McDonald [38] decomposed these at three layers; operational, organisational and industrial. Woods [40] referred to cross-scale interactions of upwards and downward resilience; suggesting that downward resilience included clear goals, structures and procedures; while upward resilience included decisions made at the micro-level and which were influenced by local adaptations. Woods conceptualisation alludes to ways in which workers at the sharp end of risk maintain a balance between the two conflicting goals of safety and production. Back, et al. [88] examined resilient behaviours at the “small team level” through an experimental case study in the nuclear power plant.

4.5. Mechanisms to Enhance RE for Safety Management

Earlier on, it was suggested that resilient capabilities played a role in facilitating adaptation [53] [57] , which is crucial in the RE process. Four of these, suggested to represent the basic foundations of RE, have been identified, including anticipation, responding, learning and monitoring.

1) Anticipation is the ability to address the potential, is characterized by knowing what to expect (in terms of threats and opportunities) in the future (potential changes, disruptions, pressures) and the consequences of these,

2) Responding is the ability address the actual, is characterized by knowing what to do (when faced with regular, irregular or unexampled threats) either through a prepared set of responses or by adjusting normal functioning,

3) Learning is the ability to address the factual, and is characterized by learning the right lessons from the right experience (both success and failure), and

4) Monitoring is the ability to address the critical, and is characterized by knowing what to look for; both in the environment and in the system [49] [111] .

A number of empirical works have been published on some of these abilities, including anticipating [60] [89] [103] [104] responding [42] [60] [89] [97] [103] [105] [106] learning [60] [73] [97] [98] [105] , and monitoring [60] [93] [105] . These studies have focused on, for example, challenges in adopting RE and building capacity [60] , management of micro-incidents [89] , factors that can contribute positively or negatively to RE [42] [73] [97] [98] [103] [105] and impact of non-technical training on RE [106] .

4.6. Work-as-Imagined and Work-as-Performed

One important argument that has been proposed in the RE literature is that work-as-imagined (WAI) and work-as-performed (WAP) will always be different [14] . This is because the assumptions about how works are to be accomplished are generally different from the actual conduct of work. One reason for this was due to the fact that there will always be holes in the design of work systems and processes because formal descriptions of work embodied in policies, regulations, procedures, and automation were incomplete as models of expertise and success; and because workers adapted these to suit the context of their actual work, either for achieving safety or efficiency [81] , it was inevitable that organisational practices changed as part of daily work. Over time these led to some gaps being created between WAI and WAP, which was an important marker of RE [10] [11] . Analyses of such gaps revealed how workers created and sustained failure-sensitive strategies, balanced their work to achieve incompatible goals (production and safety), and how organisation learning could be used to drive adaptations to create safety [52] .

Abech, et al. [93] investigated how operators in an oil distribution company adapted when events challenged its model of how it should operate, by focusing on how closely operators depended on written plans and procedures to deal with regular, irregular and unexampled threats. The research revealed gaps in communication between key stakeholders in the system which contributed to inadequate risk control measures (such as risky tank-filling work practices and insufficient engineering controls for overflows). One of the shortfalls in this study was limiting the interviews to the operators, so the data collected was at the level of WAP. Extending the interviews to managers and supervisors would provide richer data on WAI for the system as well, and a comparative analysis between the two would reveal rich data about gaps in the system and/or how these are actually narrowed in practice.

Antonsen, et al. [98] explored conditions that facilitated workers to balance WAI and WAP in the Norwegian offshore supply base. Using self-administered questionnaire and semi-structured interviews, the researchers focused on the implementation and compliance to procedures, and found that simple and accessible procedures had better chances of being used, and ensuring a broad and direct participation of workers in implementing the procedure were important in gaining more commitment and compliance, and also acted as a catalyst for organisational learning. The authors acknowledged, however, that eliminating the gap completely was impossible because there were indefinite number of local, situational variations in the work context they had investigated.

Da Mata, et al. [97] examined constraints in a Brazilian helicopter transportation system, the mental mode of pilots, and factors that played a role in their decision-making when coping with unexampled threats, multiple pressures and goal conflicts. The researchers interviewed co-pilots, operators, captains, a psychologist and a human resource analyst over six months. One shortfall in this study was the use collection and analysis of interview data only, which includes “indirect information filtered through the views of interviewees” [114] . This could be addressed through data triangulation, by extending data collection to documents and/or including observations of selected activities [115] .

Borys [112] investigated the impacts of risk awareness programs on workers’ awareness of risks, risk control practices, and the impact it had on safety culture. The researcher used an ethnographic approach, and collected a triangulation of data, including documents, key participant observations and semi-structured interviews. The research revealed that there were gaps between the paperwork and practice which created an illusion of safety. In a later study he investigated how managers and workers interpreted and used safe work method statements (SWMS) to explore if there was a gap between WAI and WAP [113] . The study revealed that SWMS were important for safety but informants felt it should be reserved for tasks that were out of the ordinary, and that a combination of formal and informal social interactions as well as SWMS was important for safety. The ethnographic approach used in the above studies is useful as a research strategy that has been used for investigating culture. However, a drawback with ethnography is that it can be very time consuming [80] . Another shortfall of this study was that it was limited to one organisation, and concentrated on actors internal to the organisation. No contractors or subcontractors were interviewed. Many works in construction are done by subcontractors [116] and they are also those who bear the burden of responsibility for safety [117] . Extending data collection to this group will have provided a greater insight into aspects of both WAI and WAP from these stakeholders.

Costella, et al. [54] conducted a safety management system audit of a Brazilian automobile exhaust systems manufacturer to examine how well a set of four variables (top management commitment, awareness, learning and flexibility) were embedded in the organisation. The research identified a lack of commitment to safety by management and some flexibility in decisions to cease operation of unsafe equipment being delegated to safety specialist, and in empowering the production manager to relocate workers away from machines which had been involved in an accident or where operators had raised concerns regarding dangerous and faulty machines, to other tasks. The authors suggested ambiguities about responsibility for decision-making for shutting down dangerous plant signified production management being unaware of how to trade-off between safety and production; and the difference between organisational level policies and site practices were not monitored which suggested a further symptom of failure to learn by the organisation.

Huber, et al. [73] also audited a European chemical manufacturing company to investigate organisational learning. The researchers found daily trade-offs between production and safety made it difficult to achieve safety goals ahead of production, and normalization of small accidents and incidents which were not reported or investigated. In addition, there were in excess of forty (40) corporate-level procedures and additional 30 - 40 local ones which were used as guidelines or suggestions. Many of the procedures had not been updated, creating gaps between written guidance and actual practice.

The series of studies on WAI and WAP reviewed above points towards a number of practical strategies that can be useful in researching RE in the general industry. The most important of these is that investigations into RE do not require the use of sophisticated methods, instruments or tools. This is in tandem with Hollnagel [111] , who posited that making use of existing methods and techniques, but from another perspective, was a useful way of operationalizing RE. The series of studies reviewed above studies focused on

1) work procedures, plans, safe work method statements, and risk control practices; many of which are part of an organisation’s safe systems of work;

2) safety management systems, which are used in many medium-sized and large organisations, irrespective of the industry they are in; and

3) normal, day-to-day operations and activities.

5. Limitations and Summary of Findings

5.1. Limitations

A number of limitations can be identified in this review.

The first is the choice of databases that were used. The predominant databases that were searched included those which capture studies from the social sciences. This will have excluded articles captured in Engineering databases (such as Compedex or Inspec), Web of Science, Scopus or MEDLINE, including those published from a systems engineering perspective. This needs to be considered by future reviewers.

The second is the choice of keywords used in the search criteria. A deliberate choice was made not to use safety management in the keywords as this will have generated thousands of articles. Future reviews should take into account a combination of keywords which tie in the key ideas from OR, RE and safety management.

The third is that most of articles generated included those which had been presented or published in conference proceedings, and later published as book chapters, or as journals between 1998 and 2012. This was due to the fact this research was conducted during a time when RE ideas were early in the inception stages. In this regard the author acknowledges that since this work was completed two recent reviews have been published on the topic [118] [119] . This also needs to be taken into account in future reviews.

The fourth limitation is that the articles selected for final review were not appraised for quality as required by PRISMA. Again, this needs to be considered in future reviews. While no specific safety management tools exist, the guidelines suggested for critical skills appraisals [120] and used in recent reviews on organisational learning for safety [121] can provide a way forward.

The fifth limitation is with the way in which the author chose to unpack the themes according to dimensions (Section 4.4). In the traditional sense, the notion of engineering is explicitly linked to something technical. However, this review did not investigate this dimension of RE. This needs to be considered in future reviews.

5.2. Summary of Findings

Despite these limitations this review has generated a number of important findings.

First, there is no universal definition of RE [11] [19] [20] [85] , possibly because “it exists more as a conceptual framework than a tight knit knot” [101] (p. 1590). Most researchers suggest it is closely associated with OR, while some have even referred to both as being the same thing. A closer examination, however, suggests there is a fundamental difference between the two. OR is an ability, capability or characteristic [36] [37] [39] [42] [52] , entails a complex system of human and their relations, and ability to deal with technical aspects. Although the capabilities of OR may be relatively easy to recognise, moving from OR to RE does require a deeper level of clarity about what it is it, how it can be developed, maintained and enhanced. RE can be regarded as either an approach [14] [52] [54] [57] [58] , process [11] [89] , or perspective [14] . In this regard it is a complex set of some objects and organisations. For this reason the author believes the two terms should not be used interchangeably, but as complementary ideas to support their development, implementation, maintenance and enhancement.

Second (and related to the first), there is no uniform way of assessing, examining, exploring, or measuring RE [19] . Most empirical works in RE in this sample included qualitative studies so are largely descriptive [20] [56] , although a number of quantitative studies have also been published. These studies used audits [54] [71] [73] and site observations [60] [89] [93] [96] [97] [98] [105] , although surveys [59] and lab-based experiments have also been utilized [74] . These investigations, however, provide some clues about the potential for RE, not RE per se [56] [81] .

Third, RE is multifactorial. While the published literature made reference to many factors, three have been suggested to be important for RE. These include culture [11] [17] [41] [58] [71] [76] [91] , cognition [42] [43] [44] [74] [88] [89] , and behaviours [16] [43] [88] .

Fourth, it is multi-dimensional, so it can be assessed, examined, measured or observed at different levels of granularity. However, although individuals may exhibit many of those abilities that drive resilience, RE can be best associated with collective units (group/teams, operations, plant, industry) [88] [122] , and organisation [38] [88] levels.

Fifth, it is associated with anticipation, responding, learning and monitoring [49] [111] . These capabilities are central in driving the RE process, and all four capabilities are necessary [111] .

Sixth, it is linked with adaptation and/or adjustments [17] [36] [38] [42] [43] [53] [57] , coping with threats [52] [54] [59] [81] and making trade-offs in favour of safety [14] [40] . Where these are outcomes of RE or capabilities is not really clear.

And seventh, the gap between WAI and WAP is an important facet of RE [41] [81] [85] . Understanding how this gap is narrowed in actual practice provides important clues regarding how RE can enhance safety management.

6. Gaps and Opportunities for Further Research

6.1. Literature Gaps

This review has identified eight main gaps in the published literature on RE.

First, while current research on RE has largely been carried out in a range of complex workplaces (such as healthcare, oil and gas, and nuclear power plants), there is a sparsity of empirical studies from contemporary high risk domains such as construction or mining, although propositions for such research have been made [11] [57] [123] .

Second, while some factors associated with safety culture have also been suggested to be important for RE, the mechanisms and linkages between these are not clear. More empirical studies are necessary to develop, understand and/or validate any association between safety culture and RE [41] .

Third, while behavior and cognition have been as important dimensions of RE, the mechanisms and linkages between these, or with safety culture, are not clear. More empirical studies are also necessary to develop, understand and/or validate any association between behaviour, cognition and RE.

Fourth, while learning from success has been suggested to be important in RE, very few empirical studies have been published on this aspect. Futures empirical RE studies need to take into account what has actually worked, i.e. successes, in order to get a deeper understanding of how things actually happen. Conceptualising and operationalising of success in RE needs to take into account the four main principles that have been suggested for RE; including anticipation, response, learning and monitoring [49] [111] .

Fifth, most research appears to have been limited to examining single units or levels of an organisation. The achievement of safety, at least on the level where work is done at the sharp end of risk is likely to be influenced by other levels, such as managers, supervisors, associations and government [110] , so it is imperative that RE research takes into account the multiple levels of a system, the key interactions that occur across these levels, and how safety emerges out of such interactions.

Sixth, most of the empirical works reviewed in this article include qualitative studies, consistent with [56] . Missing from the literature are empirical quantitative studies investigating the utility of RE as a safety management strategy [20] . Again, it is imperative that future studies do take into account how the indicators factors can be translated into useful variables and measures.

Seventh, while the published papers provide a rich source of information on concepts, ideas and notions associated with RE, many of the papers published have failed to build on each other’s work so there is very little shared analytical framework [102] . Ensuring such a conceptual and theoretical framework will be useful in setting a boundary around research or investigation into RE [19] .

However, before such conceptual and theoretical framework can be developed and used, it is imperative to identify indicator(s) that can be useful in advancing research and application in RE. Although the gap between WAI and WAP has been suggested to be an important, a framework which integrates the key principles, concepts and ideas of RE with WAI/WAP is missing. This represents the eighth gap in this review.

6.2. Opportunities for Further Research in RE

The research gaps identified above provide avenues for furthering research on RE for safety management. However, advancing such research framing a working definition in order to set some boundary and focus for RE. Based on the key concepts and ideas illustrated in this review, the following is proposed:

“Resilience engineering is a sophisticated approach for managing organisational safety through the development of cognitive, behavioural, and cultural abilities to enable organisational members at all levels to actively anticipate, respond, monitor and learn to operate close to the boundary of safe operations as part of normal work, by narrowing the gap between work as imagined and work as performed” [19] .

In proposing the above definition, it is not being suggested that the above is anyway superior to what authors have suggested, consistent with Westrum [39] .

Framing RE in this way makes a number of things clear.

One, resilience engineering is about organisational safety, not individual safety. Two, it incorporates cognition, behavioural and cultural aspects of an organisation, so research on RE can be directed at any or all of these aspects. Three, although an individual can have all these attributes, it is only when they are collectively distributed across all levels of the organisation that these play a role in RE. Four, the above collective aspects enable the organisation collectively to anticipate, respond, monitor and learn. Five, resilience engineering is about operating as close as possible to the boundaries of failure as part of normal work. This means research on RE should entail normal, everyday work; not simulations. And six, the gap between WAI and WAP is an important facet of RE.

Many industries utilise safe systems of work such as safety procedures, safety rules, permits [124] - [129] . Construction organisations also use safe work method statements and work health and safety management plans [113] [123] [130] . In this regard any of these can be used to investigate the gap between WAI and WAP. Practical research questions in this area can centre on whether, or the extent to which, any of the above enhances or hinders RE as a safety management strategy. Safety culture is a common topic in safety management, and a practical research question can investigate, for example; the influence of safety culture factors (management commitment, awareness, learning, being just, flexibility and preparedness) on the four key principles of RE.

Behavioural approaches are also commonly used for managing safety in the industry, and a practical research question which could be asked here is whether emergent approaches such as psychological contracts of safety [131] [132] enhance or hinder behavioural aspects of RE.

In terms of learning from success, many organisations capture, analyse and manage data on near-misses and dangerous occurrences. Practical research in this area can focus on for example; how, or the extent to which, these are managed, and the role that anticipation, awareness, learning and monitoring play in their effective management.

Advancing research in this area also requires the use of a suitable conceptual framework. The findings of this review suggest it is important that such framework would be aimed at understanding:

1) multiple levels of the system being investigated,

2) the key interactions that occur across these levels, and

3) how safety emerges out of such interactions.

To address gaps in methodology, future research in RE also needs to include quantitative studies. A practical starting point can be through the behavioural, cultural and cognitive dimensions of RE; and which can be investigated at team, organisational of industry levels. Data collected from such research can be analysed at the above levels of granularity, or examine their influence on the key principles of anticipation, awareness, learning and monitoring. Such research also needs to enable an exploration and/or measurement of the gap between WAI and WAP, and ideally integrate the multidimensional nature and construct of RE. Such a framework can be based on either the functionalist or interpretive perspectives of safety management.

Acknowledgements

M.P thanks the Australian Federal Government for a Australian Postgraduate Award (2010-2013) and the University of Ballarat for a Higher Degree by Research Scholarship (2010-2013). Feedback and comments from anonymous reviewers and participants of the 6th Applied Human Factors and Ergonomics Conference is also appreciated. M.P is also grateful for the comments provided by the anonymous reviewer, whose insights helped to enhance the findings of this review.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

Pillay, M. (2017) Resilience Engineering: An Integrative Review of Fundamental Concepts and Directions for Future Research in Safety Management. Open Journal of Safety Science and Technology, 7, 129-160. doi: 10.4236/ojsst.2017.74012.

International Labour Organization (2014) Safety and Health at Work: A Vision for Sustainable Prevention. XX World Congress on Safety and Health at Work 2014, International Labour Organization, Frankfurt.

Pillay, M. (2013) Exploring Resilience Engineering through the Prescription and Practice of Safe Work Method Statements in the Victorian Construction Industry. Ph.D. Thesis, School of Health Sciences, University of Ballarat, Ballarat.

York, L. (2008) Editorial: What We Know, What We Don’t Know, What We Need to Know: Integrative Literature Reviews Are Research. Human Resource Development Review, 7, 139-141. https://doi.org/10.1177/1534484308316395

Pillay, M. and Jefferies, M.C. (2015) A Revised Framework for Managing Construction Health and Safety Risks Based on ISO 31000. CIBWO99 International Health and Safety Conference, Benefitting Workers & Society Through Safe(r) Construction, Belfast, 2015, 467-477: International Council for Research and Innovation in Building and Construction.