Scale Up Deep Learning in Parallel and in the Cloud

Deep Learning on Multiple GPUs

Neural networks are inherently parallel algorithms. You can take advantage of this
parallelism by using Parallel
Computing Toolbox™ to distribute training across multicore CPUs, graphical processing
units (GPUs), and clusters of computers with multiple CPUs and GPUs.

Training deep networks is extremely computationally intensive and you can usually
accelerate training by using a high performance GPU. If you do not have a suitable
GPU, you can train on one or more CPU cores instead, or rent GPUs in the cloud. You
can train a convolutional neural network on a single GPU or CPU, or on multiple GPUs
or CPU cores, or in parallel on a cluster. Using GPU or any parallel option requires
Parallel
Computing Toolbox.

Tip

GPU support is automatic. By default, the trainNetwork function uses a GPU if
available.

If you have access to a machine with multiple GPUs, simply specify the
training option 'ExecutionEnvironment','multi-gpu'.

If you want to use more resources, you can scale up deep learning training to the
cloud.

Deep Learning Built-In Parallel Support

Training Resource

Settings

Learn More

Single GPU on local machine

Automatic. By default, the trainNetwork
function uses a GPU if available.

Deep Learning in the Cloud

If your deep learning training takes hours or days, you can rent high performance
GPUs in the cloud to accelerate training. Working in the cloud requires some initial
setup, but after the initial setup using the cloud can reduce training time, or
allow you to train more networks in the same time. To try deep learning in the
cloud, you can follow example steps to set up your accounts, copy your data into the
cloud, and create a cluster. After this initial setup, you can run your training
code with minimal changes to run in the cloud. After setting up your default
cluster, simply specify the training option
'ExecutionEnvironment','parallel' to train networks on your
cloud cluster on multiple GPUs.

Advanced Support for Fast Multi-Node GPU Communication

If you are using a Linux compute cluster with fast interconnects between machines
such as Infiniband, or fast interconnects between GPUs on different machines, such
as GPUDirect RDMA, you might be able to take advantage of fast multi-node support in
MATLAB. Enable this support on all the workers in your pool by setting the
environment variable
PARALLEL_SERVER_FAST_MULTINODE_GPU_COMMUNICATION to
1. Set this environment variable in the Cluster Profile
Manager.

This feature is part of the NVIDIA NCCL library for GPU communication. To
configure it, you must set additional environment variables to define the network
interface protocol, especially NCCL_SOCKET_IFNAME. For more
information, see the NCCL documentation and in particular the section on NCCL Knobs.

This website uses cookies to improve your user experience, personalize content and ads, and analyze website traffic. By continuing to use this website, you consent to our use of cookies. Please see our Privacy Policy to learn more about cookies and how to change your settings.