We were recently discussing a new IoT project with a client, and their most pressing questions and product requirements centered around OTA (over-the-air) device updates. Consequently, we thought it would be useful to share some best practices for supporting OTA updates, since they are an increasingly important factor in determining IoT system success.

What Is OTA and Why Is It Important?

You’ve probably heard about the numerous OTA-related horror stories in the past couple of years that have seen people locked outside rental properties by bricked smart locks, stuck watching a single channel on their smart TVs, and far more seriously, unwittingly driving cars that can be remotely controlled by hackers. We’re going to outline how we approach designing OTA update capabilities so that you can avoid building a system with a similar fate.

An over-the-air (OTA) update is a mechanism for remotely updating internet-connected hardware with new settings, software, and/or firmware. The OTA update mechanism is a core part of a system’s architecture, with the remote hardware device being responsible for identifying and applying updates to itself, and the cloud server responsible for distributing updates to its connected hardware clients.

At this point in the evolution of IoT, it is well established that a robust OTA update mechanism is an essential component for any successful IoT system design. It’s not feasible to update devices deployed in the field using the traditional manual method, which involves connecting each embedded device to a PC with a cable. The traditional method doesn’t scale beyond the development phase, and failure to update geographically dispersed devices means missing out on everything from critical security patches and bug fixes to the latest product features.

The Three OTA Architectures for IoT

There’s no-one-size-fits all approach to OTA. The right approach for a given IoT project depends on the nature of the hardware under consideration, the overall system architecture, the abilities of the team building the product, and the product itself. The most common OTA update scenarios are as follows:

Edge-to-cloud OTA updates: An internet-connected microcontroller is capable of receiving new firmware images from a remote server. These images can contain updates to both the microcontroller’s underlying hardware capabilities, as well as to the application running on top of them.

Gateway-to-cloud OTA updates: An internet-connected gateway, responsible for managing a fleet of local edge devices, is capable of receiving updates from a remote server that alters any or all of its software application, the software application’s host environment, and / or the gateway device’s firmware.

Edge-to-gateway-to-cloud OTA updates: An internet-connected gateway is responsible for managing a fleet of locally connected edge devices, which in turn are capable of receiving remote firmware updates via the gateway.

Going into the technical details of each OTA architecture is outside the scope of this overview, but what’s more important are the numerous design considerations that you should be paying attention to, regardless of which OTA update approach applies to the IoT product or system that you’re building.

Important OTA Design Considerations for IoT Systems

As mentioned earlier, many IoT horror stories involve customers being left with bricked devices as a result of failed OTA updates. What these incidents have in common is not only that the new device image was buggy, a perpetual risk when software is involved, but that the OTA mechanism was not implemented in a fail-safe manner, meaning that neither the user nor the manufacturer could easily rollback or overwrite the bad image.

Here are some key considerations to take into account when thinking about OTA update solutions for IoT systems:

Automatic recovery from corrupted or interrupted updates is a must. OTA updates should be atomic, either succeeding completely or failing gracefully in a recoverable manner. A failed update should be capable of rolling back to the previous stable version, and no update should have the ability to disable a device’s connection to the update server and preventing further updates from being pushed.

Code provenance and integrity checks are essential. While a connected device’s ability to receive remote updates introduces many advantages, it also poses security concerns. Cryptographic code signing must be used to confirm that connected devices only accept code from verified authors, and that the code hasn’t been altered in transit.

Code compatibility verification is advisable. If you’re supporting multiple MCU architectures in the field, you’ll be distributing multiple firmware images as part of your OTA update process. It is strongly advisable to first confirm that the image received by a given remote device is actually appropriate for the client’s MCU architecture before applying the update – e.g., check that you’re applying an image built for a TI CC3220 to an MCU of that type. An inadvertent mismatch of this nature can have consequences that are difficult, if not impossible, to recover from.

Use secure communication channels by default. All OTA updates should be performed over encrypted communication channels. This should include not only the TLS connection between the cloud and internet-connected gateway or edge, but also the local connection between the gateway and its edge devices.

Partial updates should be possible. To decrease both bandwidth consumption and on-device processing time, partial updates should be supported so that only the changes to a firmware image need to be transmitted and applied to a given device.

Five Questions When Considering OTA Updates for IoT

Before we wrap up, here’s five questions to ask yourself when considering an OTA update technology in the IoT context:

Does the OTA update mechanism support automatic recovery from failed updates while preserving its connection to the update server?

Is security a first-class feature of the OTA mechanism and not something bolted on as an afterthought?

Can OTA updates be applied in an efficient manner, minimizing resources like network bandwidth, storage, and compute?

Can OTA updates be applied at various levels within a given application e.g., application configuration updates versus device firmware updates?

Does the OTA update mechanism leverage container technology (or similar) to work seamlessly across diverse hardware and software environments?