Resilience for Connected Objects

Attacks occur, especially on IoT. While it is very hard to avoid an attack altogether, we can minimize its consequences.
The first factor to consider is the impact of an attack; there are many ways to analyze such impact, for instance from a financial or technical point of view. In a very simple analysis, we can consider three main categories of impacts:

The device performs its tasks normally, but the attacker also runs other software on it, for instance to make it participate to DDoS attacks.

The device apparently works normally, but its behavior is somehow altered, for instance by providing wrong information.

The device doesn’t work or obviously dysfunctions, for instance a car that won’t start.

A second factor to analyze the consequence of an attack is the duration of the attack, or the delay until recovery. Here, we can consider four situations:

Until next reboot. I am confident that the firmware hasn’t been modified, so a reboot will remove the attack vector from RAM and address the problem.

Until next hard reset. I am confident that the device will not boot on modified firmware, so a hard reset will restore a « good » copy of the last known correct firmware.

Until next update. I am not sure about the full firmware, but I know that a firmware update will remain possible and will address the issue.

Until replacement. The firmware update mechanism is corrupted as well (or the device is permanently damaged), so the only solution is to replace the device.

We could go further, but let’s stop here for now. The impact looks essential, but the differences between the three scenarios aren’t that great. An obvious dysfunction is bad, but a device whose behavior has been altered by an attacker can be just as bad, and a device that is controlled by a hacker can easily become dysfunctional. In all three situations, it is obvious that some fix is required.

That’s where duration and recovery are important. The two first options (reboot and hard reset) are only temporary fixes. Even if the attack is transient and can be removed, the vulnerabilities that made the attack possible are still present, and the attack can be reproduced easily. Ideally, a reboot or reset/restore must be accompanied with some operational restrictions (like a “safe mode”) until an update is performed.

What we are analyzing here with recovery methods is resilience, or according to Random House, “the power or ability to return to the original form, after being [attacked]”. It is a valuable property for systems that are submitted to high levels of stress. Resilience is about recovery, in opposition to resistance, which is about protection.

Resilience and resistance are complementary. Resilience is often considered easier to achieve than resistance, but maybe more importantly, resilience may still happen when resistance has failed. Even if an attack on a system has been successful, its recovery is possible is the system is resilient to that attack.

We can here use an analogy with resilience against natural disasters. Although a hurricane can cause massive destruction to coastal areas, (resistance is limited), most U.S. coastal areas are resilient against hurricanes because hurricane evacuation routes have been defined, and shelters are ready to host evacuated people. Such resilience measures rarely fail, and we are shocked when this occurs, like in New Orleans after Katrina.

For an embedded device, resilience is mostly about:

Secure boot, to restart from a clean sheet upon reboot, and detect a potentially persistent attack.

Firmware update, to load a new version that is not vulnerable to the attack.

More generally, resilience can be achieved through simple infrastructure means like roads and shelters. In the case of connected devices, resilience can be achieved through low-level features, within or below the operating system, and highly independent of the use case.

However, although resilience mechanisms are simple, they also need to be very robust. A hurricane shelter must be built on high ground with strong material like reinforced concrete that offer strong guarantees of resistance against a hurricane. Similarly, the secure boot and firmware updates of a connected device must be designed and implemented to resist attacks. In both cases, this resistance is easier to build because it is applied on a smaller scale, and it is mutualized:

Smaller scale. A shelter is a single building, typically used for other purposes (like a stadium) that is relatively easy to reinforce. Similarly, a secure boot is a small mechanism, so is the security-critical subset of a firmware update mechanism. These mechanisms offer a small attack surface, and are therefore easier to protect than an entire software stack.

Mutualized. A shelter is shared by all members of a community, who will use it together in case of emergency. Similarly, secure boot and firmware update are independent of the device’s role and application. The development and reinforcement efforts can therefore be mutualized, reducing the cost for every device.

Resilience must be proactive

Resilience is the capacity to recover from an attack, and as such, appears to be a highly reactive technology, that comes after the fact. It is of course true that resilience mechanisms are triggered after the fact, but in order to succeed, they need to be proactively prepared.

Just like evacuation routes and shelters need to be planned in advance and designed to resist a hurricane, resilience measures in IoT devices similarly need to be prepared beforehand. A firmware update mechanism is crucial, and it needs to be prepared with great care proactively. In particular, this mechanism must be designed to be highly resistant against attacks.

In the world of connected devices, we know that resistance to attacks will be difficult to achieve, because it requires a very large investment from vendors to fix all vulnerabilities and make their systems resistant to attacks. Resilience, on the other hand, can be achieved on a wide scale with a limited budget, and has the ability to provide a strong basis on which IoT security can gradually be built.