Getting started with deep learning

AWS just announced the release of their new p2.16xlarge
instances,
making this an especially great time to get started using a cloud service for
deep learning. If you’re a developer or engineer interested in learning what
all the fuss is about, the best way to learn is to spin up an instance
and try to build something.

We’ve written up a quick getting started guide on the best options we found
for quickly creating a versatile development environment. This writeup will help
you set up a GPU-enabled EC2 instance capable of running Tensorflow, Keras,
Caffe, Theano, Tensorboard, and various other useful packages.

Choose an instance type

While we initially used the g2 instance family, we’re switching over to p2
instances after running a few quick benchmarks. We found that training
AlexNet
on a p2.xlarge took less than half as long per batch as on a g2.2xlarge,
presumably owing to the NVidia K80s in the p2.xlarge, with their 12 GiB of GPU
memory (compared to 8GiB in the NVidia K520). In our case, it was a 2.2x
speedup for an instance that’s 1.4x the price on an hourly basis, which makes
it a no-brainer to use the p2.xlarge.

Set up an instance

So, after creating an EC2 account, boot up a p2.xlarge and install Ubuntu 14.04.
While you could use Ubuntu 16.04, a newer long-term-support (LTS) Ubuntu release,
many of NVidia’s GPU drivers haven’t been released for 16.04, making 14.04 the smart
choice for now.

Configure your instance to have at least 20GB of storage and to use
a security group that allows inbound connections via SSH. Later you might want to
open up ports for other services you’ll run on the machine, but for now, SSH is enough.

Prepare the instance

The first time you SSH into your instance, run the following to update all
installed packages:

Next, install the NVidia drivers that you need to take advantage of all of the GPU
power in your instance. Find your graphics card model with

lspci | grep -i nvidia

Visit the NVidia site to find the latest
drivers for your platform, but don’t download anything yet. There’s a cleaner
way to install the drivers, which is to use apt-get. Check this
PPA to see if the
drivers you need exist in the repository—generally, the more recent the
driver, the faster and more stable it is.

Setting up docker

We originally built a custom AMI as a deep learning environment but quickly
switched over to using docker instead. Using docker means you won’t need to
install anything on your Ubuntu 14.04 machine besides docker itself, and then
you can use docker to install everything else.

Docker saves you the trouble of compiling binaries from scratch, checking out source
code, downloading and reconciling dependencies, and other things that aren’t any fun.

Then, follow the instructions here to install
nvidia-docker, which you’ll need to take advantage of the NVidia drivers you just installed.
Usually, to install nvidia-docker, you can just use their install script:

Selecting useful images

We recommend one of the following two options for docker images for your setup:

This popular docker image is
an all-in-one environment. The image comes with TensorFlow, Theano, Torch,
Caffe Keras, Jupyter/iPython, and the standard Python numerical computing
packages (matplotlib, scikit learn, pandas, scipy, numpy). This is great if
you want to get started quickly in a way where things just work.

If you prefer a more modular setup, you can run multiple docker containers,
each with one framework. If you prefer this, use Kaixhin’s images
here

If you go with option 1 above, which is probably the simplest approach for now,
you’ll need to clone the repository containing the GPU dockerfile, build a docker container
using that Dockerfile, and run it on your machine.

Note the paths provided for creating a volume using -v in the docker run
command. Modify this to reflect whatever directory you prefer to use as a
shared volume on your docker container.

With the running docker container, you’re all set. If you don’t like writing
and running code in bash, use Jupyter instead by visiting
the-address-of-your-new-machine:8888 in your browser. You can always find the
public DNS address of your instance by viewing information for your instance in
the EC2 pane in the AWS management console.

This will give you access to a browser-based environment where you can write and run
iPython notebook files in such a way that import tensorflow and import keras will
just work.