Cartoonifying Images on Raspberry Pi with the Compute Library

Hi folks!

Here we are! For the first hands-on guide of the new Computer Vision and Machine Learning software library developed at Arm: Compute Library!

Compute Library is a rich collection of functions for image processing, computer vision and machine learning optimized through NEON on Arm Cortex CPUs and through OpenCL on Arm Mali GPUs. The library has been designed to target a wide variety of use-cases and it is completely free of charge under the MIT open-source license.

With this first blog post (and more to come!) you will learn how to utilize the compute library along with the main steps to write a sample code to "cartoonify” your images on Raspberry Pi!

Introduction to Compute Library

The era of intelligent vision applications has been rapidly progressing over the last few years. Thanks to the recent advances in mobile computing performance and the recent developments in deep learning, more and more frequently, smart vision applications have been landing on our smartphones with capabilities that have been unthinkable up until a few years ago.

Bear in mind, the evolving of text messaging from just messaging to smart image messaging or the incredible progresses of intelligent personal assistants.

The challenges to deploy these applications still has problems such as:

Code/Performance portability: One of the problems developers have to face up to, as most of the time the algorithm has to be rewritten from scratch to reach the desired performance.

Code optimization on specific architectures: Does the architecture support SIMD acceleration? Does the architecture support FP16 acceleration? Is the architecture 32 or 64-bit? These are just few questions to have in mind when we want to considerably boost the performance of our algorithms.

Compute Library was born mainly behind these two challenges.

Developed over years of experience working closely with partners and developers in the sphere of imaging and vision products;

the library wants to make the deployment of intelligent vision applications easy and performant on Arm-based platforms in order to reduce the cost and the programming effort.

At the current state the library has roughly 60 functions, accelerated for both Arm Cortex-A CPUs (both aarch32 and aarch64 with NEON support) and Arm Mali GPUs (both Midgard and Bifrost architectures).

The functions implemented so far cover mainly the areas of image processing, computer vision and the machine learning needed to develop a smart vision application.

Although it is still the early days, and the complete absence of hand-written assembly code (the library currently uses just NEON intrinsics), Compute Library presents already significant performance uplift compared to other well-known libraries and has fp16 and fixed-point acceleration for some key functions.

Enabling remote access on Raspberry Pi

Assuming Ubuntu Mate has been correctly installed on the Raspberry Pi (if not, you can follow the instructions) we need to enable SSH connections on the device as OpenSSH server is disabled by default on Ubuntu Mate 16.04.2. This part will be necessary when are going to cross-compile the library.

For this scope you can use raspi-config.

Open a terminal on your Raspberry Pi:

sudo raspi-config

Select Interfacing Options.

Navigate to and select SSH.

Choose Yes.

Select Ok.

Choose Finish and reboot your Raspberry Pi

Now let’s see what is the IP address associated to the device and let’s try to SSH from our host machine:

Plug your Raspberry Pi into your router with the ethernet cable

Open a terminal on your Raspberry Pi and type:

ifconfig eth0 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'

Note: If the above command returns "eth0: error fetching interface information: Device not found", it means the device name for the ethernet port is not set to eth0. In this case you can try the following alternative for the Raspberry Pi

ifconfig eth0 | grep 'inet addr' | cut -d: -f2 | awk '{print $1}'

The above command should return the IP address associated to your device.

Once we know the IP address of the Raspberry Pi, we can establish a SSH connection from the host machine:

ssh <username_raspberrypi>@<ip_addr_raspberrypi>

where:

<username_raspberrypi>: username used on your Raspberry Pi

<ip_addr_raspberrypi>: IP address of your Raspberry Pi

Getting the Compute Library source code

Before starting to see how to build the library, let's have a look at its structure.

The latest version available of Compute Library can be grabbed from GitHub repository at Arm Developer

Within GitHub repository you should have the following structure:

The 3threemain folders to take in consideration for this tutorial are:

arm_compute: contains all the library's header files

examples: contains few examples to compile

src: contains the library's source files

In terms of building blocks, the library is essentially made up of 2 main parts:

The first is the core which includes the kernels.

The kernels are the low-level algorithms designed to be embedded in existing projects since:

Do not allocate any memory so the memory allocation must be handled by the caller.

Do not perform any type of multi-threading but provides the necessary information to the caller about how the workload could be split between threads.

The latter is the run-time which contains the functions, actual wrappers around the kernels.

The functions:

Can allocate the memory for the tensors (for instance the function can allocate the memory for the temporary tensors needed).

Can perform multi-threading as they can use the information provided by the kernels.

Hint: In order to have a clear view of the distinction between the "core" and "runtime" blocks, you could take a look at the NEGaussian5x5 function. As you will notice, this function calls 3 kernels, allocates 1 temporary tensor and split the task between threads using the Arm Compute Schedule.

The scons command should return “scons: done building targets” once the library has been successfully compiled.

Before continuing, just few comments about the arguments passed to the build command:

Werror=1: It enables the -Werror compilation flag

debug=0 & asserts=0: All optimizations are enabled and no validation is performed over the arguments passed to the functions. This means that if the application misuses the library it is likely to result in a crash.

neon=1 & opencl=0: it enables just the NEON acceleration. On Raspberry Pi there is no Arm Mali GPU so we can not benefit from OpenCL acceleration.

build=native: it compiles the library natively

examples=1: It compiles the examples

All the binaries (library + examples) will be inside the build/ folder.

Once you have built the library you should be able to run the examples executing the following command:

Note: The above command is valid for both Raspberry Pi 2 and Raspberry Pi 3 as Ubuntu Mate 16.04.2 is built for aarch32. In case your operating system was built for aarch64, you should replace arch=armv7a with arch=arm64-v8a.

In order to run the examples, we need to copy only the binaries and libarm_compute.so on the Raspberry Pi.

Cartoon effect with the Compute Library

We will now create a sample code for applying a cartoon effect on our images.

The sample code will help us show how to use the Compute Library and also how to convert an image so as to make them look hand drawn.

When it comes to develop a cartoon effect, the main computation blocks are essentially just two:

Region smoothing (for instance with Gaussian Filter 5x5)

Edge detection (for instance with Canny Edge algorithm)

In order to achieve the basic cartoon effect, we need to apply the Gaussian filter 5x5 and the Canny edge over the input image. The region smoothing will reduce the color palette whilst the edge detection will produce the sketch effect. Combining the outputs of these two stages with an arithmetic subtraction, we will be able to achieve the desired result.

A closer look at the code

Step 0: Header files

For implementing correctly the example we need just 3 header files:

// Contains the definitions of all the NEON functions
#include "arm_compute/runtime/NEON/NEFunctions.h"
// Contains the definition of all types used in the library
#include "arm_compute/core/Types.h"
// Contains the definition for the PPMLoader
#include "utils/Utils.h"

Why can't the memory be allocated during the initialization of the image?

The answer relies on the implementation of the kernels.

Most of the NEON and OpenCL kernels use vector load/store instructions to access the data in buffers. In order to avoid having special cases to handle the borders (when for instance the image's width is not a multiple of the width of the SIMD instruction used), all images use padding bytes.

In this library the padding bytes are defined just for first 2 dimensions of the image/tensor.

Since the configure methods will update the padding bytes requirements for each image, it is important to allocate the memory only when all the functions have been configured.

Step 4: Function configuration

Once all the images have been initialized, we can proceed with the configuration of the functions.

Summary

Congratulations! You have completed this hands-on guide, where we have started playing with the Compute Library.

In this first blog of the series, we have shown how to work with the compute library, and illustrated the main steps to render our images as hand drawn.

Good news! This is just the beginning for building awesome smart vision applications on an Arm-based platform as Raspberry Pi through the Compute Library. With upcoming blogs, we will see how to enrich our applications through a traditional computer vision pipeline (HOG/SVM + OpenCV) and through the revolutionary and powerful Convolutional Neural Networks.