Evolving Concepts

Tag Archives: sociotechnical system

In Part 1, I argued that dependability as a concept might be applied to organisations as well as to technical systems. In this post I will argue that both the organisational and technical levels should be modelled together as an interconnected system, and that test environments for dependability should include the simulation of organisational problems as well as technical problems.

Socio-technical stack
Higher level organisational requirements cannot be considered in isolation from the underlying IT requirements. Organisational and IT system problems can interact in complex ways and such problems are common in real-world organisations. Therefore, these different levels need to be considered together. Such a multi-level system can be viewed as a socio-technical stack [Baxter & Sommerville 2011].

Specific IT requirements for the organisation (resources, networks etc.)

IT dependability requirements (availability, security etc.)

Dependability requirements (2 and 4) may be more generic than 1 and 3. For example, all organisations will want to reduce error, but they may have different measures of what is acceptable. Requirements 3 and 4 can usually be satisfied by off-the-shelf components (but would need to be configured).

We assume that the software to satisfy the first set of requirements (1) has multiple users with different services. Such software is often called “enterprise application software”. In a health care system, users can be patients, clinicians or administrators. They access their own services in the system and they have specific actions available to them at particular stages in their workflow. For example, a patient could review their details or access records following a consultation. A clinician could request a test or respond to a symptom update from a patient.

Need for a test environment with simulation
To improve organisational resilience and dependability, it is important to develop new methods for detection and correction of organisational problems. To test these problem detection and recovery methods, it is useful to run simulated scenarios where human mistakes and IT failures can occur together. “Simulations” might involve people participating (as in a kind of role-playing game) or simulated computational agents [Macal 2016].

Examples of failure that might be simulated:

mistakes (e.g. choosing the wrong test)

administration failure: patient receives no response to a request (which should have a time limit).

software failure: e.g. data interoperability issues.

malware

hardware failure

A test environment needs to be coupled with the iterative development of the system being tested. This would involve the development of increasingly complex problem-detection software in parallel with increasingly challenging scenarios. For example, the first version might involve simple errors that are easy to detect. Subsequent stages might involve increasingly more detailed work scenarios with more complex errors or failures. The more advanced stages might also involve real users in different roles (e.g. nursing students, medical students) and include time pressure.

Importance of agile and participatory design
In addition to developing safe systems, changing them safely is also important. So the development and test methodology needs to include change management. Agile software engineering is particularly important here, along with participatory design (co-design) methods. Ideally the system would be co-designed iteratively by the different users as they become aware of error-prone situations (such as cognitive overload) while participating in the evaluations. Design needs be informed by cognitive science as well as computer science.

In later posts, I plan to talk about the role of AI and decision support in organisational dependability.

Recently I’ve been looking at collaborative decision-making in mental health, with the aim of identifying the technology requirements to support shared decision-making. Details of this project are here). One conclusion is that the underlying IT infrastructure needs to be considered, and in particular its reliability.

In general, a collaborative IT system can be understood as a distributed system with a particular purpose, where users with different roles collaborate to achieve a common goal. Examples include university research collaboration, public transport and e-government. In the example of health IT, a medical practice might have an IT system where a patient makes an appointment, medical records are inspected and updated, treatment decisions are made and recorded, and the patient may be referred to a specialist.

IT resilience and dependability
The resilience of an IT system is its capability to satisfy service requirements if some of its components fail or are changed. If parts of the system fail due to faults, design errors or cyber-attack, the system continues to deliver the required services. Similarly, if a software update is made, the system services should not be adversely affected. Resilience is an important aspect of dependability, which is defined precisely in terms of availability, reliability, safety, security and maintainability [Avizienis et al. 2004]. Importantly, dependability is not just about resilience, but also about trust and integrity.

IT dependability is usually understood on a technical level (the network or the software) and does not consider the design of the organisation (for example, if an error occurs due to lack of training).

Organisational resilience and dependability
Just as an IT system can be resilient on a technical level, an organisation (such as a health provider) can also be resilient and dependable in meeting high-level organisational requirements. Organisational requirements are defined in terms of an organisation, and are independent of IT. For example, they may be defined in terms of business processes or workflows. I think the idea of dependability requirements for an organisation is also useful and these may be specified separately. In healthcare, they might include the following:

transparency – e.g. is there an audit trail of critical decisions and actions?

accountability – e.g. is it possible to challenge decisions?

Technology can help to ensure that these dependability requirements are satisfied. For example, excessive workload may be detectable by automated monitoring (e.g. one person doing too many tasks) in the same way that technical faults or security violations can be detected.

In Part 2, I will discuss the need for a test and simulation environment.