GPU-Accelerated TensorFlow

Get started today with this GPU-Ready Apps guide.

TensorFlow is a software library for designing and deploying numerical computations, with a key focus on applications in machine learning. The library allows algorithms to be described as a graph of connected operations that can be executed on various GPU-enabled platforms ranging from portable devices to desktops to high-end servers.

TensorFlow runs up to 50% faster on the latest Pascal GPUs and scales well across GPUs. Now you can train the models in hours instead of days.

Installation

System Requirements

The GPU-enabled version of TensorFlow has the following requirements:

64-bit Linux

Python 2.7

CUDA 7.5 (CUDA 8.0 required for Pascal GPUs)

cuDNN v5.1 (cuDNN v6 if on TF v1.3)

You will also need an NVIDIA GPU supporting compute capability 3.0 or higher.

Download and Installation Instructions

TensorFlow is distributed under an Apache v2 open source license on GitHub. This guide will walk through building and installing TensorFlow in a Ubuntu 16.04 machine with one or more NVIDIA GPUs.

The TensorFlow site is a great resource on how to install with virtualenv, Docker, and installing from sources on the latest released revs.

Once the CUDA Toolkit is installed, download cuDNN v5.1 Library (cuDNN v6 if on TF v1.3) for Linux and install by following the official documentation. (Note: You will need to register for the Accelerated Computing Developer Program). Steps for cuDNN v5.1 for quick reference as follow:

You should see “Hello, TensorFlow!”. Congratulations! You may also input “print(tf.__version__)” to see the installed TensorFlow’s version.

Training Models

TensorFlow can be used via Python or C++ APIs, while its core functionality is provided by a C++ backend. The API provides an interface for manipulating tensors (N-dimensional arrays) similar to Numpy, and includes automatic differentiation capabilities for computing gradients for use in optimization routines.

The library comes with a large number of built-in operations, including matrix multiplications, convolutions, pooling and activation functions, loss functions, optimizers, and many more. Once a graph of computations has been defined, TensorFlow enables it to be executed efficiently and portably on desktop, server, and mobile platforms.

To run the example codes below, first change to your TensorFlow directory 1:

Image recognition is one of the tasks that Deep Learning excels in. While human brains make this task of recognizing images seem easy, it is a challenging task for the computer. However, there have been significant advancements over the past few years to the extent of surpassing human abilities. What makes this possible is the convolutional neural network (CNN) and ongoing research has demonstrated steady advancements in computer vision, validated against ImageNet–an academic benchmark for computer vision.

QUICK DEMO USING INCEPTION-V3

First, let’s run the following commands and see what computer vision can do:

classify_image.py downloads the trained Inception-v3 model from tensorflow.org when the program is run for the first time. You'll need about 200M of free space available on your hard disk. The above command will classify a supplied image of a panda bear (found in /tmp/imagenet/cropped_panda.jpg) and a successful execution of the model will return results that look like:

The model used references the architecture described by Alex Krizhevsky, with a few differences in the top few layers. It is a multi-layer architecture consisting of alternating convolutions and nonlinearities, followed by fully connected layers leading into a softmax classifier.

Following the training, you can evaluate how well the trained model performs by using the cifar10_eval.py script. It calculates the precision at 1: how often the top prediction matches the true label of the image.

$ python cifar10_eval.py

If successful, you will see something similar to what's listed below:

2017-03-06 15:34:27.604924: precision @ 1 = 0.499

USING A PRE-TRAINED INCEPTION V3 ON NEW DATASET

Next, let’s revisit Google’s Inception v3 and get more involved with a deeper use case. Inception v3 is a cutting-edge convolutional network designed for image classification. Training this model from scratch is very intensive and can take from several days up to weeks of training time. An alternative approach is to download the pre-trained model, and re-train it on another dataset. We will walkthrough how this is done using the flowers dataset.

For more details on using the retrained Inception v3 model, see the tutorial link.

Benchmarks

Each of the models described in the previous section output either an execution time/minibatch or an average speed in examples/second, which can be converted to the time/minibatch by dividing into the batch size. The graphs show expected performance on systems with NVIDIA GPUs.

TRAINING IMAGES PER SECOND FOR INCEPTION V3 ON MULTIPLE GPUS

The Inception v3 model also supports training on multiple GPUs. The graph below shows the expected performance on 1, 2, and 4 Tesla GPUs per node.

Get the Latest from NVIDIA
on Data Center

LIMITED TIME OFFER: $49,900 ON NVIDIA DGX STATION

For a limited time only, purchase a DGX Station for $49,900 - over a 25% discount - on your first DGX Station purchase.* Additional Station purchases will be at full price.
Reselling partners, and not NVIDIA, are solely responsible for the price provided to the End Customer. Please contact your reseller to obtain final pricing and offer details.
Discounted price available for limited time, ending April 29, 2018. May not be combined with other promotions. NVIDIA may discontinue promotion at any time and without advance notice.