Transfer learning for the IoT

If you know anything about artificial intelligence and the internet of things, you most likely can’t help but feel a great deal of excitement at the idea of combining the extraordinary promises that these two fields have to offer. With the unparalleled advances in the field of computer hardware, some of the advanced theoretical knowledge that has been around for decades can be finally leveraged and put to use in real-life, practical applications. And with an ever-growing IoT market, getting high quality data has never been so easy, enabling the development of ever better, more performant models, and in particular of deep learning models.

Wait, Before we Start: What is Deep Learning, Exactly?

Deep learning is a family of artificial intelligence techniques responsible for countless technological advances in the most recent years. Even though deep learning (formally referred to as “deep neural networks”) has existed since the 50’s and has seen its key algorithmic breakthroughs in the 80’s, it is only recently that computer hardware advances have caught up to theories, giving computer scientists access to the enormous computational power and data storage that they needed to make neural networks reach their full potential.

Deep learning algorithms are characterized by a unique multi-layer architecture in which knowledge is progressively “abstracted away” from the raw data they take as an input. For instance, when a deep neural net is trained to identify the pictures of a particular type of animal out of a pool of random images, it starts by first considering the images at the pixel level, then converts this knowledge into “higher-level” knowledge such as the edges of the object of interest, then identifies specific features (such as the paws, the tail, etc.) before eventually abstracting a general visualization of the object (or, to continue our example, of the animal) as a whole. The interesting point here is that the early layers that concern the lower-level knowledge could be reused with an entirely new type of images–such as satellite imagery for example–because edges remain edges regardless of what an image represents.

Remarkably enough, no data scientist ever explicitly programed a computer to perform such a task. Instead, the learning algorithm is fed with hundreds of thousands of images until it figures out by itself (and learns by itself!) how to recognize the desired objects.

Deep Learning on the Edge: Why it Matters

In parallel to the hardware improvements that helped the advancement of deep learning research, some tremendous discoveries have been made that have allowed the rapid development of IoT devices. In particular, the preponderance of microcontroller units (MCU) are creating a unique opportunity to make AI applications–especially deep learning-enabled applications–accessible to the consumer at an unprecedented scale and speed.

MCUs offer remarkable advantages for deployment of Deep Learning-based applications, as they reduce latency, conserve bandwidth and offer better guarantees in terms of privacy. When such AI applications are installed directly on an IoT device, that’s what I’d refer to as the deployment of AI applications on the edge. Choosing if an application is best deployed on server or cloud or on the edge of an IoT device comes down to making a decision regarding the trade-off between speed (through latency) and accuracy (as more complex and therefore larger models, can’t always be stored in the more limited memory of an IoT device).

What is Transfer Learning?

In the context of psychology, transfer learning is defined as the study of dependency of human learning or performance on preliminary experience. Humans are not taught how to perform every single possible task in order to master it. Rather, when they encounter new situations, they manage to solve problems in an ad-hoc manner by extrapolating old knowledge to new environments. For example, when a child learns to swing a tennis racket, he or she can easily transfer that skill set to ping pong or baseball. The same goes for conceptual understanding, like applying statistics or other mathematics to budgeting as an adult.

By contrast to the way humans function, most machine learning algorithms, once implemented, tend to be specific to a particular data set or to a particular, discrete task. Machine learning researchers have been giving more and more attention recently to how to empower computers to reuse their acquired knowledge and re-apply it to new tasks and new domains, attempting to abstract and transfer the “smarts” extracted from the data across multiple applications, usually somewhat similar to each other.

While Deep Learning can be used in an unsupervised context, generally it powers supervised learning. That means the algorithm learns from examples made of input-output pairs which it then uses to try to identify patterns and extract relationships, with the goal of predicting an outcome from new unseen data. Deep learning has application in countless areas of research, but suffers from one major drawback: these are extremely data-greedy algorithms require massive quantities of data in order to tune the thousands of parameters that come into play in a neural network architecture. This means that not only is a lot of data required to achieve good performance, but this data also needs to be labeled, which can be an expensive and time-consuming endeavor. Even worse, it is often not even possible to obtain quality data, or to label it accurately enough to be able to train a model from it.

This is where transfer learning can really help. It allows for a model developed for a specific task to be reused as the starting point on another one. It is an exciting area for machine learning scientists because it mimics the way humans generalize knowledge from one specialized task to another. In fact, it is a key strategy when it comes to reducing the required size of datasets in order for deep neural networks to even become a viable option.

Remember the example we surfaced earlier about how a model can learn to identify object edges in a deep learning network? That’s the exact sort of knowledge that can be transferred to another computer vision task. In this way, a model that learns to see regular objects like cats and dogs can transfer that understanding to more complex tasks, like identifying nuclei in cancer imagery.

click for larger imageFigure 1: The different learning approaches when building AI applications (Source: Figure Eight)

Transfer Learning Strategies

Transfer learning certainly comes across as an elegant and natural way to solve the dilemma of information poverty in the age of Big Data. However, some important tactical questions regarding transfer learning need to be answered prior to being able to use it in practice:

When is it appropriate to use it?

What’s the best way to perform the transfer?

To answer those questions, it is worthwhile to dig deeper into the theory of transfer learning to understand the different transfer learning approaches that exist.

As we have seen, sometimes, getting labeled data in the target domain is challenging while labeled data exists in abundance in another source domain. This is when transductive transfer learning becomes useful. In some cases, the source and target feature spaces are different, and sometimes they are the same but the marginal probability distributions of the input data are different. The latter case of the transductive transfer learning is closely related to domain adaptation for knowledge transfer.

In the subcase that scientists refer to as inductive transfer learning, however, it is the target and source tasks that differ from each other; in fact, sometimes even the source and target domains are not the same. In this case, some labeled data specific to the target domain remains necessary in order to induce an objective predictive model for the target domain.

Finally, the unsupervised transfer learning setting allows (remarkably!) to reuse models even when the target task is different from but still related to the source task.