I recently started working with PyTorch, a Python framework for neural networks and machine learning. Since machine learning involves processing large amounts of data, sometimes it can be hard to understand the results that one gets back from the network. Before getting into anything more complicated, let's replicate a really basic backpropagation as a sanity check. To run the code in this article, you'll need to install NumPy and PyTorch.

In neural networks primer, we saw how to manually calculate the forward and back propagation for a tiny network consisting of one input neuron, one hidden neuron, and one output neuron:

We ran an input of 0.8 through the network, then backpropagated using 1 as the target value, with a learning rate of 0.1. We used sigmoid as the activation function and the quadratic cost function to compare the actual output from the network with the desired output.

loss = criterion(output, target) calculates the cost, also known as the loss.

Next we use net.zero_grad() to reset the gradient to zero (otherwise the backpropagation is cumulative). It isn't strictly necessary here, but it's good to keep this in mind when running backpropagation in a loop.

loss.backward() computes the gradient, i.e. the derivative of the cost with respect to all of the weights and biases.

Finally we use this gradient to update the weights and biases in the network using the SGD (stochastic gradient descent) optimizer, with a learning rate of 0.1.

We print out the network topology as well as the weights, biases, and output, both before and after the backpropagation step.

Below, let's replicate this calculation with plain Python. This calculation is almost the same as the one we saw in the neural networks primer. The only difference is that PyTorch's MSELoss function doesn't have the extra division by 2, so in the code below, I've adjusted dc_da_l2 = 2 * (a_l2-1) to match what PyTorch does:

This network achieves about 97% accuracy on the test dataset, which seems consistent with the results in the book (96.59%).

To train a convolutional network (as described in chapter 6 of Michael Nielsen's book), run:

python pytorch_mnist_convnet.py

This network achieves better than 99% accuracy on the test dataset, which is close to the results in the book (99.23%). However, in the book, a lambda value of 0.1 is used for L2 weight regularization. My results got a lot worse when using this value for weight decay. It's possible…