TensorFlow Mechanics 101

Introduction

The goal of this tutorial is to show how to use TensorFlow to train and evaluate a simple feed-forward neural network for handwritten digit classification using the (classic) MNIST data set. The intended audience for this tutorial is experienced machine learning users interested in using TensorFlow.

These tutorials are not intended for teaching Machine Learning in general.

Prepare the Data

MNIST is a classic problem in machine learning. The problem is to look at greyscale 28x28 pixel images of handwritten digits and determine which digit the image represents, for all the digits from zero to nine.

Download

At the top of the run_training() function, the input_data$read_data_sets() function will ensure that the correct data has been downloaded to your local training folder and then unpack that data to return a named list of DataSet instances.

NOTE: The fake_data flag is used for unit-testing purposes and may be safely ignored by the reader.

Dataset

Purpose

data_sets$train

55000 images and labels, for primary training.

data_sets$validation

5000 images and labels, for iterative validation of training accuracy.

data_sets$test

10000 images and labels, for final testing of trained accuracy.

Inputs and Placeholders

The placeholder_inputs() function creates two tf$placeholder ops that define the shape of the inputs, including the batch_size, to the rest of the graph and into which the actual training examples will be fed.

Further down, in the training loop, the full image and label datasets are sliced to fit the batch_size for each step, matched with these placeholder ops, and then passed into the sess$run() function using the feed_dict parameter.

Build the Graph

After creating placeholders for the data, the graph is built according to a 3-stage pattern: inference(), loss(), and training().

inference() - Builds the graph as far as is required for running
the network forward to make predictions.

loss() - Adds to the inference graph the ops required to generate
loss.

training() - Adds to the loss graph the ops required to compute
and apply gradients.

Inference

The inference() function builds the graph as far as needed to return the tensor that would contain the output predictions.

It takes the images placeholder as input and builds on top of it a pair of fully connected layers with ReLu activation followed by a ten node linear layer specifying the output logits.

Each layer is created beneath a unique tf$name_scope that acts as a prefix to the items created within that scope.

When, for instance, these are created under the hidden1 scope, the unique name given to the weights variable would be “hidden1/weights”.

Each variable is given initializer ops as part of their construction.

In this most common case, the weights are initialized with the tf$truncated_normal and given their shape of a 2-D tensor with the first dim representing the number of units in the layer from which the weights connect and the second dim representing the number of units in the layer to which the weights connect. For the first layer, named hidden1, the dimensions are shape(IMAGE_PIXELS, hidden1_units) because the weights are connecting the image inputs to the hidden1 layer. The tf$truncated_normal initializer generates a random distribution with a given mean and standard deviation.

Then the biases are initialized with tf$zeros to ensure they start with all zero values, and their shape is simply the number of units in the layer to which they connect.

The graph’s three primary ops – two tf$nn$relu ops wrapping tf$matmul for the hidden layers and one extra tf$matmul for the logits – are then created, each in turn, with separate tf$Variable instances connected to each of the input placeholders or the output tensors of the previous layer.

Loss

The loss() function further builds the graph by adding the required loss ops.

First, the values from the labels_placeholder are converted to 64-bit integers. Then, a tf$nn$sparse_softmax_cross_entropy_with_logits op is added to automatically produce 1-hot labels from the labels_placeholder and compare the output logits from the inference() function with those 1-hot labels.

Note: Cross-entropy is an idea from information theory that allows us to describe how bad it is to believe the predictions of the neural network, given what is actually true. For more information, read the blog post Visual Information Theory (http://colah.github.io/posts/2015-09-Visual-Information/)

Training

Firstly, it takes the loss tensor from the loss() function and hands it to a tf$summary$scalar, an op for generating summary values into the events file when used with a tf$summary$FileWriter (see below). In this case, it will emit the snapshot value of the loss every time the summaries are written out.

We then generate a single variable to contain a counter for the global training step and the minimize() op is used to both update the trainable weights in the system and increment the global step. This op is, by convention, known as the train_op and is what must be run by a TensorFlow session in order to induce one full step of training (see below).

The sess.run() method will run the complete subset of the graph that corresponds to the op(s) passed as parameters. In this first call, the init op is a tf$group that contains only the initializers for the variables. None of the rest of the graph is run here; that happens in the training loop below.

Train Loop

After initializing the variables with the session, training may begin.

The user code controls the training per step, and the simplest loop that can do useful training is:

Because there are two values to fetch, sess$run() returns a list with two items. Each Tensor in the list of values to fetch corresponds to an array in the returned tuple, filled with the value of that tensor during this step of training. Since train_op is an Operation with no output value, the corresponding element in the returned list is NULL and, thus, discarded. However, the value of the loss tensor may become NaN if the model diverges during training, so we capture this value for logging.

Assuming that the training runs fine without NaNs, the training loop also prints a simple status text every 100 steps to let the user know the state of training.

Note that more complicated usage would usually sequester the data_sets$test to only be checked after significant amounts of hyperparameter tuning. For the sake of a simple little MNIST problem, however, we evaluate against all of the data.

Build the Eval Graph

Before entering the training loop, the Eval op should have been built by calling the evaluation() function with the same logits/labels parameters as the loss() function.

The evaluation() function simply generates a tf$nn$in_top_k op that can automatically score each model output as correct if the true label can be found in the K most-likely predictions. In this case, we set the value of K to 1 to only consider a prediction correct if it is for the true label.

The true_count variable simply accumulates all of the predictions that the in_top_k op has determined to be correct. From there, the precision may be calculated from simply dividing by the total number of examples.