Building a Convolution Neural Network (CNN) for handwritten digit recognition in Python using Keras

In this article we will be exploring one of the ways to build a Convolution Neural Network from scratch in python. For that we will be using Keras API with TensorFlow backend. We will be working on the handwritten digits dataset from Kaggle (https://www.kaggle.com/c/digit-recognizer). We will be using the training and testing datasets separately as given.

First import the basic relavant libraries for computation such as:

numpy is used primarily for mathematical calculations, but more so here because the neural networks take in only numpy arrays as inputs

Here, x_train refers to the input of the training set and y_train refers to the output or the ground truths of the training set. Since we do not have the ground truths for the test set as that is what we need to find out, we only have the input for the test set i.e. x_test.

Now, our datasets have each pixel of the picture of the handwritten digits as an entry of a row, i.e. 784 pixel values, we need to convert it into 28 * 28 2D numpy arrays.

Now, let us encode our ground truths, i.e. converting the labels into a form that is easier to work with. Hence we will convert 2 into [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]. In other words, 2 will be switched on and all the other digits from 0 to 9 will be switched off.

With all of our data preprocessed, we are ready to build the convolutional neural network. The CNN will be trained on the training set, i.e. take x_train as the input and compare the output with y_train. Then we will predict the output for x_test.
So first we import the necessary libraries.

Notice that Keras already has the templates for the layers that we will need in a CNN including convolutional layer, Conv2D, Max Pooling layer, MaxPool2D, and the flattening layer, Flatten.
We will build the CNN using the Sequential model which will focus on one layer at a time and work its way sequentially.

So here we first initialise the classifier as a Sequential classifier and then add the convolutional and max pooling layers. We can further add more convolutional layers to increase accuracy depending on the output.
After max pooling, we flatten the processed input and enter it into an ANN with only one hidden layer.

After our model has trained itself for the number of epochs that we specified, we can plot the model loss as well as accuracy. Theoretically, our model loss should decrease and the model accuracy should increase with increase in epochs. We will plot these by using matplotlib.

Now we must remember that we encoded out ground truths and only switched on the number that the ground truth had. So here we will get an array of arrays where each sub-array will contain 10 elements representing the digits from 0 to 9 and the value of each element will denote the probability of the output being that number. So we need to convert this such that the output will contain the digit which has the maximum probabilty.

predict = np.argmax(predict, axis = 1)

Thus, now the digit that had the maximum probability is stored in predict.If we print predict we get array([2, 0, 9, ..., 3, 9, 2]). So clearly we see that we predicted 2, 0 and 9 as the first 3 digits. Let us check this by plotting the x_test values.