As we all know that Geoffrey Hinton was one of the godfathers of Deep Learning in the 80's. He popularized the backpropagation Algorithm which is the main reason why deep learning works so well today and he believed in the idea of neural networks during the first major AI winter in the 80's while everyone else thought they wouldn't work. And Hinton recently published a paper on the idea of a capsule network which he was hinting on for a long time. This paper offers state-of-the-art performance on the MNIST dataset which is the handwritten character data set and it'd be Convolutional networks at this. And Convolutional neural networks are the state-of-the-art.

Given below is an image of a Convolutional network and how it looks like:

And if we use a standard multi-layer perceptron with all the layers fully connected to each other, it would become computationally intractable because images are very high dimensional consisting of a lot of pixels and if we are continuously apply these operations to every single pixel in an image in every layer it'll take way too long so the solution to this was to use a Convolutional network that was really popularized by Yann Laocoon. A Convolutional network looks like what is depicted in the image above. It first consists of an input image that has an associated label and this is done for an entire data set. And what a Convolutional network will basically do is that it will learn the mapping between the input data and the output label and so the idea is that eventually you give it a picture after training, and it will know what exactly it is. How the process goes is mentioned below:

First, an input image is fed to the network.

Filters of a given size scan the image and perform convolutions.

The obtained features then go through an activation function. Then the output through a succession of pooling and other convolution operations.

Features are reduced in dimension as the network goes on.

At the end, high-level features are then flattened and fed to fully connected layers, which will eventually yield class probabilities through a softmax layer.

During training time, the network learns how to recognize the features that make a sample belong to a given class through backpropagation.

ConvNets thus appear as a way to construct features that we would have had to handcraft otherwise. And today anyone can implement a very powerful Convolutional network in just a few lines of code where each line corresponds to a layer in the network.

Moving on to the improvements to CNN's:

One of the major improvements to CNN's was Alex networks. There were also some key improvements as mentioned below:

Krizhevsky introduced better non-linearity in the network with the ReLu activation. This proved to be efficient to be efficient for gradient propagation.

Introduced the concept of Dropout as regularization. From a representation point of view, you force the network to do things at random, so that it can see the next input data from a better perspective.

Introduced data augmentation. When fed to the network, images are shown with random translation, rotating that way, it forces the network to be more aware of the attributes of the images, rather than the images themselves.

Deeper. They stacked more Convolutional layers before pooling operations. The representation captures consequently finer features that reveal to be useful for classification.

This network largely outperformed what was state-of-the-art back in 2012, with a 15.4% top-5 error on the imageNet dataset.

Moving on to why these Convolutional Networks are doomed, there are mainly two reasons:

They cannot extrapolate their understanding of geometric relationships to radically new viewpoints.

Proceeding further, This video mainly goes over the working of Convolutional networks, some improvements that have been made to them over the years, and then also slowly move over capsule networks in both theory as well as code. The links of the same are given below along with the link to the code: