II. Filters:

Filter weights are randomly or empirically initialized and are learned to become feature detectors. The filters on the first convolutional layer tend to learn low level features (edges, etc.) and subsequent layers’ filters learn high level features.

III. Conv Layer Output:

Unlike FC layers, each neuron in the conv layer will only have local connectivity to a local region of the whole image.

We can control the output dimensions by controlling zero-padding and stride.

IV. Parameter Sharing:

Each filter convolves on the input image and creates a 55 X 55 2D matrix. The K filters together create the output of the first conv layer which is 55 X 55 X K. Each of the 55 X 55 neurons for each depth slice of the output from the first conv layer use the same weights (11 X 11 X 3). The intuition here is that each filter represents a feature and it makes sense to apply the same filter for the entire image using the same weights.

V. ReLU Units:

ReLU applied at each depth slice (2D output from input and filter i). ReLU performs better than sigmoid and tanh units due to less of an effect from vanishing gradients but they still have pitfalls. For a more comprehensive look at different non-linear units and advantages and disadvantages of each, check out this post.

VI. Pooling:

Pooling involves downsampling for each depth slice. Here we see max-pooling, but we can also use other methods (avg. pooling, L2-norm pooling, etc.)

Intuition is to decrease the size of our processing units because we are still able to capture enough of the information needed by pooling.

Many current models are learning towards reducing or completely removing the use of pooling since it leads to loss of information. You will start to see clever architectures completely based on convolutional operations with smaller filter sizes.

VII. FC Layers:

The final layers of the CNN will be fully-connected (FC) layers, which will flatten the output from the CNN. The final layer will be a classifier that will solve our desired task.

LeNet5

VIII. Backpropagation:

Backpropagation for the convolution and pooling is slightly convoluted but take a look at the naive python implementation while reading the math for better understanding.

IX. Code Analysis:

Unlike the previous examples, we will not be making a CNN from scratch with just basic numpy but we will instead use tensorflow abstractions. I will upload my basic CNN with just numpy with complete backpropagation in a later post.

We will be using input_data.py to load the data but this time keep in mind that X is of shape [N X28 X 28 X 1]. With this shape, we can apply our filters and convolve on the image. We will also be having our model @ ckpt_dir = “CNN_ckpt_dir” and restoring an old model if available.

Once again, we will be splitting our data into batches for processing.

And of course, training is as usual with results for training and validation. Note that we now save our model at the end of each training epoch. When we use create_model() next time, we will be starting from the restored point.