News

TensorFlow

Developers often say that if you want to get started with machine learning, you should first learn how the algorithms work. But my experience shows otherwise.

I say you should first be able to see the big picture: how the applications work. Once you understand this, it becomes much easier to dive in deep and explore the inner workings of the algorithms.

So how do you develop an intuition and achieve this big-picture understanding of machine learning? A good way to do this is by creating machine learning models.

Assuming you still don’t know how to create all these algorithms from scratch, you’ll want to use a library that has all these algorithms already implemented for you. And that library is TensorFlow.

In this article, we’ll create a machine learning model to classify texts into categories. We’ll cover the following topics:

How TensorFlow works

What is a machine learning model

What is a Neural Network

How the Neural Network learns

How to manipulate data and pass it to the Neural Network inputs

How to run the model and get the prediction results

You will probably learn a lot of new things, so let’s start!

TensorFlow is an open-source library for machine learning, first created by Google. The name of the library help us understand how we work with it: tensors are multidimensional arrays that flow through the nodes of a graph.

tf.Graph

Every computation in TensorFlow is represented as a dataflow graph. This graph has two elements:

a set of tf.Operation, that represents units of computation

a set of tf.Tensor, that represents units of data

To see how all this works you will create this dataflow graph:

You’ll define x = [1,3,6] and y = [1,1,1]. As the graph works with tf.Tensor to represent units of data, you will create constant tensors:

This is how the TensorFlow workflow works: you first create a graph, and only then can you make the computations (really ‘running’ the graph nodes with operations). To run the graph you’ll need to create a tf.Session.

tf.Session

A tf.Session object encapsulates the environment in which Operation objects are executed, and Tensor objects are evaluated (from the docs). To do that, we need to define which graph will be used in the Session:

To execute the operations, you’ll use the method tf.Session.run(). This method executes one ‘step’ of the TensorFlow computation, by running the necessary graph fragment to execute every Operation objects and evaluate every Tensor passed in the argument fetches. In your case you will run a step of the sum operations:

Now that you know how TensorFlow works, you have to learn how to create a predictive model. In short,

Machine learning algorithm + data = predictive model

The process to construct a model is like this:

As you can see, the model consists of a machine learning algorithm ‘trained’ with data. When you have the model you will get results like this:

The goal of the model you will create is to classify texts in categories, we define that:

input: text, result: category

We have a training dataset with all the texts labeled (every text has a label indicating to which category it belongs). In machine learning this type of task is denominated Supervised learning.

“We know the correct answers. The algorithm iteratively makes predictions on the training data and is corrected by the teacher.” — Jason Brownlee

You’ll classify data into categories, so it’s also a Classification task.

To create the model, we’re going to use Neural Networks.

A neural network is a computational model (a way to describe a system using mathematical language and mathematical concepts). These systems are self-learning and trained, rather than explicitly programmed.

Neural networks are inspired by our central nervous system. They have connected nodes that are similar to our neurons.

The Perceptron was the first neural network algorithm. This article explains really well the inner working of a perceptron (the “Inside an artificial neuron” animation is fantastic).

To understand how a neural network works we will actually build a neural network architecture with TensorFlow. This architecture was used by Aymeric Damien in this example.

Neural Network architecture

The neural network will have 2 hidden layers (you have to choose how many hidden layers the network will have, is part of the architecture design). The job of each hidden layer is to transform the inputs into something that the output layer can use.

Hidden layer 1

You also need to define how many nodes the 1st hidden layer will have. These nodes are also called features or neurons, and in the image above they are represented by each circle.

In the input layer every node corresponds to a word of the dataset (we will see how this works later).

As explained here, each node (neuron) is multiplied by a weight. Every node has a weight value, and during the training phase the neural network adjusts these values in order to produce a correct output (wait, we will learn more about this in a minute).

In addition to multiplying each input node by a weight, the network also adds a bias (role of bias in neural networks).

In your architecture after multiplying the inputs by the weights and sum the values to the bias, the data also pass by an activation function. This activation function defines the final output of each node. An analogy: imagine that each node is a lamp, the activation function tells if the lamp will light or not.

There are many types of activation functions. You will use the rectified linear unit (ReLu). This function is defined this way:

f(x) = max(0,x) [the output is x or 0 (zero), whichever is larger]

Examples: ifx = -1, then f(x) = 0(zero); if x = 0.7, then f(x) = 0.7.

Hidden layer 2

The 2nd hidden layer does exactly what the 1st hidden layer does, but now the input of the 2nd hidden layer is the output of the 1st one.

Output layer

And we finally got to the last layer, the output layer. You will use the one-hot encoding to get the results of this layer. In this encoding only one bit has the value 1 and all the other ones got a zero value. For example, if we want to encode three categories (sports, space and computer graphics):

So the number of output nodes is the number of classes of the input dataset.

The output layer values are also multiplied by the weights and we also add the bias, but now the activation function is different.

You want to label each text with a category, and these categories are mutually exclusive (a text doesn’t belong to two categories at the same time). To consider this, instead of using the ReLu activation function you will use the Softmax function. This function transforms the output of each unity to a value between 0 and 1 and also makes sure that the sum of all units equals 1. This way the output will tell us the probability of each text for each category.

| 1.2 0.46|| 0.9 -> [softmax] -> 0.34|| 0.4 0.20|

And now you have the data flow graph of your neural network. Translating everything we saw so far into code, the result is:

(We’ll talk about the code for the output layer activation function later.)

As we saw earlier the weight values are updated while the network is trained. Now we will see how this happens in the TensorFlow environment.

tf.Variable

The weights and biases are stored in variables (tf.Variable). These variables maintain state in the graph across calls to run(). In machine learning we usually start the weight and bias values through a normal distribution.

When we run the network for the first time (that is, the weight values are the ones defined by the normal distribution):

input values: xweights: wbias: boutput values: z

expected values: expected

To know if the network is learning or not, you need to compare the output values (z) with the expected values (expected). And how do we compute this difference (loss)? There are many methods to do that. Because we are working with a classification task, the best measure for the loss is the cross-entropy error.

James D. McCaffrey wrote a brilliant explanation about why this is the best method for this kind of task.

With TensorFlow you will compute the cross-entropy error using the tf.nn.softmax_cross_entropy_with_logits() method (here is the softmax activation function) and calculate the mean error (tf.reduced_mean()).

You want to find the best values for the weights and biases in order to minimize the output error (the difference between the value we got and the correct value). To do that you will use the gradient descent method. To be more specific, you will use the stochastic gradient descent.

There are also many algorithms to compute the gradient descent, you will use the Adaptive Moment Estimation (Adam). To use this algorithm in TensorFlow you need to pass the learning_rate value, that determines the incremental steps of the values to find the best weight values.

The method tf.train.AdamOptimizer(learning_rate).minimize(loss) is a syntactic sugar that does two things:

compute_gradients(loss, <list of variables>)

apply_gradients(<list of variables>)

The method updates all the tf.Variables with the new values, so we don’t need to pass the list of variables. And now you have the code to train the network:

Now comes the best part: getting the results from the model. First let’s take a closer look at the input dataset.

The dataset

You will use the 20 Newsgroups, a dataset with 18.000 posts about 20 topics. To load this dataset you will use the scikit-learn library. We will use only 3 categories: comp.graphics, sci.space and rec.sport.baseball. The scikit-learn has two subsets: one for training and one for testing. The recommendation is that you should never look at the test data, because this can interfere in your choices while creating the model. You don’t want to create a model to predict this specific test data, you want to create a model with a good generalization.

In the dataflow graph of the beginning of this article you used the sum operation, but we can also pass a list of things to run. In this neural network run you will pass two things: the loss calculation and the optimization step.

The feed_dict parameter is where we pass the data for each run step. To pass this data we need to define tf.placeholders (to feed the feed_dict).

As the TensorFlow documentation says:

“A placeholder exists solely to serve as the target of feeds. It is not initialized and contains no data.” — Source

“If you use placeholders for
feeding input, you can specify a
variable batch dimension by creating the placeholder with tf.placeholder(…, shape=[
None, …]). The None element of the shape corresponds to a variable-sized dimension.” — Source

We will feed the dict with a larger batch while testing the model, that’s why you need to the define a variable batch dimension.

The get_batches() function gives us the number of texts with the size of the batch. And now we can run the model:

Now you have the model, trained. To test it, you’ll also need to create graph elements. We’ll measure the accuracy of the model, so you need to get the index of the predicted value and the index of the correct value (because we are using the one-hot encoding), check is they are equal and calculate the mean to all the test dataset: