Sony Develops World’s Fastest Deep Learning Framework [As Of 2018]

They built a framework that determines the appropriate batch sizes and optical number of GPUs for learning.

For training ImageNet/ResNet 50, it took only 3 minutes and 44 seconds.

Deep learning is a subfield of machine learning that uses neural networks inspired by the human brain. Its architectures like deep belief networks, deep neural networks, and recurrent neural networks have been implemented in several fields including sound and image recognition, bioinformatics, machine translation, social network filtering, and material examination.

In many cases, they’ve generated results superior to human experts, and that’s why deep learning has seen significant growth in recent years. Generally, deep neural networks are interpreted in terms of the probabilistic inference or universal approximation theorem.

In November 2018, Sony Corporation reported that they have achieved the world’s fastest deep learning speeds by using a combination of “AI Bridging Cloud Infrastructure (ABCI)” and “Core Library: Neural Network Libraries”.

To enhance the accuracy of deep learning techniques, the data size and model parameters (which are fed to the network) are increasing continuously. These increments are significantly raising the calculation times. Often, it takes weeks or even months to carry out one learning session.

Since the development of artificial intelligence demands an eternal process of trial-and-error, reducing this learning time is one of the topmost priorities.

How Does It Work?

To solve this issue, researchers at Sony adopted a popular solution: distributed learning using several graphical processing units (GPUs). However, when you increase the number of GPUs, the learning speed gets even worse in some cases.

This mostly happens when batch sizes are large — the amount of data to be handled at one time — which literally stops the learning process. The second reason for the delay is long data transmission times among GPUs. So if you add more GPUs for smaller tasks, things might go opposite of what’s expected.

Researchers developed a technology — 2D-Torus All-Reduce scheme — that can analyze the current state of the learning process and determine the appropriate batch sizes and an optimal number of GPUs according to that. It can handle learning even in massive-scale environments including ABCI.

The data synchronization technology developed for ABCI successfully increased transmission speed among GPUs. It was applied to the Neural Network Libraries, and it utilized ABCI computing resource to perform learning.

Results

It created a new speed record for training ImageNet/ResNet 50 (the industry benchmark for measuring distributed learning speeds) in only 3 minutes and 44 seconds with 75% accuracy while using as many as 2,175 NVIDIA Tesla V100 Tensor Core GPUs. This is the fastest reported training time to date.

These outcomes show that learning performed via Neural Network Libraries can attain blazing fast speeds, and by using the same framework, one can carry out learning within a small trial-and-error duration. The researchers plan to continue with their work and seek to develop new techniques to enhance AI technology.