How binarized networks work — and why they’ll be big for AI in 2020

Neural networks are great, but they're still being held back by cost and energy, but binarized networks might fix that.

52

Shares

Story by

Henk Muller

CTO, XMOS —
Henk is a computer architecture and chip/tool design guru, with over 20 years of academia at the universities of Bristol and Amsterdam.Henk is a computer architecture and chip/tool design guru, with over 20 years of academia at the universities of Bristol and Amsterdam.

The concept of neural networks first emerged more than 40 years ago when scientists experimented with mathematically modelling the functions of the brain. They worked out they could make a mechanical implementation of the neural network that could be trained to recognize patterns and classify data — for example recognizing whether a video contains a cat or a dog.

Over the past decade, the complexity and capacity of neural networks has increased sharply. Coinciding with the extraordinary growth of cheap and easily accessible heavy-duty supercomputers and graphics processing units (GPUs), they have come to the fore as the de facto network “brain” of choice for problem solving, pattern recognition, and anomaly detection. Today, organizations use them for sales forecasting, customer research, data validation, and risk management among many other applications.

Despite this adoption, there are drawbacks to neural networks — drawbacks that limit their potential. The holy grail is a neural network that can analyze information quickly without being expensive and energy greedy. Achieving networks that fulfill those criteria is a challenge, and one that must be overcome if we are to spread the benefits of neural networks closer to the edge of IT and communication networks, and to endpoint devices.

As a result, one alternative that many organizations are looking into is binarized networks. They are a fairly new technology, but they’re likely to make a significant impact in 2020. To understand why, we need to grasp how both types of networks work.

How neural networks work

Neural networks comprise multiple layers of weighted sums. Each weighted sum adds up to a number that either indicates that this data likely exhibits some feature, or likely that it does not exhibit a feature. These layers combine, for example, raw image data into features, and recombine those to eventually reach an answer.

To put that in simpler terms, let’s say you want a neural network to be able to detect faces in photographs. The system divides that image into small segments and then layers of the network will scan each segment of the image looking for the feature they have been trained to identify. The first layer may, for example, look for four basic features: black circles, white circles, white rectangles, and skin tone. These are very simple and easy-to-spot features.

The next layer may look for eyes (a black circle within a white circle), or mouths (a set of white rectangles near each other with skin around it), and the next layer may look for two eyes above a mouth, with skin extending around it. Each feature will score each segment of the image for how likely it thinks the desired feature is present in that part of the photo. These probabilities are then combined and if enough layers think the feature they are looking for is present the network will conclude that, yes, a face is present.

Figure 1

In figure 1 (which is a picture of Barack Obama) you can see how these layers of analysis and probabilities build up to enable the network, while still dealing in approximations, to provide a relatively accurate answer.

Note that the features like black circles, eyes, or mouth are not programmed by a human, but are discovered during the training phase. It may be that a different pattern (for example nose, ears, or hairline) is better at finding faces, and the beauty of a neural network is that it can be used to discover those patterns.

The drawbacks of traditional neural networks

The problem is, in seeking the highest possibly accuracy, the need to deal in fine-grained levels of probability mean that the mathematics employed is resource intensive. By applying a floating-point approach to the segment analysis, neural networks require a relatively large amount of compute power, memory, and time to work.

Although the cloud has plenty of compute power and memory, many edge applications cannot rely on the cloud. Autonomous vehicles, for example, which need to make instantaneous decisions based on the environment around them, can’t rely on bandwidth-limited connectivity for decision making.

It would be unsustainable to use neural networks based on floating point numbers on the edge. Many companies use integer arithmetic instead, saving a great deal of memory and power, but there’s a step that’s better — and that is where binarized networks come in.

How binarized networks work

If a neural network is a Picasso painting, a binarized network is a rough pencil sketch.

While neural networks give each ‘segment’ a fine-grained probability, binarized networks, as the name suggests, reduce the possible values to a black-and-white score of either –1 (if it thinks that the segment doesn’t include the feature it’s looking for) or +1 (if it does).

The weighted sums now weigh each feature either positively (multiply by +1) or negatively (multiply by –1), and rather than doing full multiplications we now only have to consider multiplying +1 and –1.

Figure 2

This approach sacrifices some accuracy, but we can reclaim that by making the network slightly larger. Binarized networks by their very nature are much simpler.

Compared to their floating-point counterparts they require 32 times less storage to store a number (a single bit versus 32 bits) and hundreds times less energy, making them far more applicable to ‘edge applications’ like autonomous vehicles where the devices themselves can process the information, rather than involving the cloud.

Typically, a binarized network still contains some non-binary layers, especially on the input side and sometimes on the output side. On the input side, the input image is probably full color, and this will have to be interpreted numerically before the binarized layers start. Similarly, on the output layer there is always a non-binary output.

The future of binarized networks

This simplicity opens a host of potential commercial applications where efficiency matters. An embedded single chip solution is much more likely to be able to store the coefficients for a binarized network than its floating point counterpart. It will need processor manufacturers to embrace the technology and provide support for binarized networks.

2020 is likely to be a big year for binarized networks. Companies are working hard to provide binarized technology, and the software needed to train binarized networks is maturing quickly. We’re likely to see the first real deployments of the technology very soon, enabling edge devices to contain a cheap and low-power device for classifying images or other data.

It’s the simpler future that will enable the next generation of technology.