Arm NN: Build and Run ML Apps Seamlessly on Mobile and Embedded Devices

Recently, we announced our neural network machine learning (ML) software, Arm NN, a key piece of technology that makes it much, much easier to build and run ML applications on power-efficient, Arm-based platforms.

In essence, the software provides a bridge between existing neural network frameworks – such as TensorFlow or Caffe – and the underlying processing hardware – such as CPUs, GPUs or the new Arm Machine Learning processor – running on embedded Linux platforms. This allows developers to continue to use their preferred frameworks and tools, with Arm NN seamlessly converting the result to run on the underlying platform.

Machine learning requires a training phase, where the learning happens ("These are pictures of cats.") and an inference phase, where what was learnt is applied ("Is this picture a cat?"). Training today typically happens on servers or similar devices, but inference is increasingly moving to the edge, which is where this release of Arm NN is focused.

Object recognition is one of many machine learning workloads being run on embedded platforms

It’s all about the platform

ML workloads – characterized by both large amounts of computation and requirements for memory bandwidth – provide one of the biggest challenges yet to mobile and embedded devices. As the requirement to run ML grows, partitioning these workloads to make best use of the available compute resources becomes increasingly important. To the software developer faced with a wide variety of potential platforms, this presents a real problem: the CPU typically has multiple cores (and with Arm DynamIQ big.LITTLE, even multiple core types), but there’s also the GPU to consider – along with potentially many other types of specialized processors – including the Arm Machine Learning processor – that may form part of an overall solution.

This is where Arm NN comes in.

As you can see from the chart below, Arm NN plays a pivotal role in hiding the complexities of the underlying hardware platform, whilst allowing developers to continue using their neural network framework of choice.

Application using ML

Applications written requiring ML

TensorFlow, Caffe, etc.

Continue using existing high-level ML frameworks and supporting tools

Arm NN

Automatically convert the above formats to Arm NN, optimize the graph and use the functions in the Compute Library to target the hardware

You’ll have noticed that a key requirement for Arm NN is the Compute Library, a set of low-level machine learning and computer vision functions that target Arm Cortex-A CPUs and the Arm Mali GPU. We aim to make this library the home of best-in-class optimizations for these functions, and recent optimizations have shown significant performance improvements – 15x or more over equivalent OpenCV functions. If you’re a user of Cortex-M CPUs, there's now a library of machine learning primitives for you too – the recently announced CMSIS-NN.

Main advantages

With this stack in place, developers immediately have some key advantages:

Significantly easier to run TensorFlow and Caffe on embedded platforms

Best-in-class optimized functions within the Compute Library give easy access to the power of the underlying platform

The programming model is the same, regardless of the core type being targeted

Existing software can automatically leverage new hardware features

Like the Compute Library, Arm NN has been released as open source software, which means it can be extended relatively easily to target other core types from Arm’s partners.

Arm NN for Android

Back in May at Google I/O, Google announced TensorFlow Lite for Android, the first hint of a major new API supporting the deployment of neural networks on Arm-based platforms supporting Android. On the surface, this offers a very similar solution to the Arm NN SDK under Android. With NNAPI by default, machine learning workloads run on the CPU, but a Hardware Abstraction Layer (HAL) mechanism supports running these on other types of processor or accelerator, too. At the time of Google’s announcement, our plans for Arm NN were well underway, so and it was a relatively simple step to provide a HAL for the Mali GPU using Arm NN. We will be doing the same thing for the Arm Machine Learning processor later this year.

CMSIS-NN

CMSIS-NN is a collection of efficient neural network kernels developed to maximize the performance and minimize the memory footprint of neural networks on Arm Cortex-M processor cores targeted for intelligent IoT-edge devices. We developed this library with a goal to squeeze every bit of performance for neural network inference on these resource constrained Cortex CPUs. Neural network inference based on CMSIS-NN kernels achieves ~5X improvement in both runtime/throughput and energy efficiency.

Arm NN going forward

This is the just the start for Arm NN: we have plans to add other high-level neural networks as inputs, and further graph-level optimizations to the Arm NN scheduler, as well as target other types of processor or accelerator … so watch this space for further developments throughout the year!