The overlapping pooling reduces the top-1 and top-5 error rates by 0.4% and 0.3% as compared to the non-overlapped pooling.

The paper observed that models with overlapping pooling is slightly more difficult to overfit.

Overall Architecture:

Eight learned layers — five convolutional and three fully-connected.

Five convolutional layers, some of which are followed by max-pooling layers,

Three fully-connected layers with a final 1000-way softmax.

Maximizes the multinomial logistic regression.

Has 60 million parameters and 650,000 neurons

Reducing Overfitting:

Data Augmentation:

The easiest and most common method to reduce overfitting on image data is to artificially enlarge the dataset using label-preserving transformations (e.g., [25, 4, 5).

The transformation criteria:

Allow very little computation

The transformed images do not need to be stored on disk.

Implementation in the paper: the transformed images are generated in Python code on the CPU while the GPU is training on the previous batch of images.

Two distinct data augmentation techniques:

Generating image translations and horizontal reflections.

Altering the intensities of the RGB channels in training images.

Dropout:

Dropout [10] setting to zero the output of each hidden neuron with probability 0.5.

The dropout neurons do not contribute to the forward pass and backpropagation.

So every time an input is presented, the neural network samples a different architecture, but all these architectures share weights.

Advantages:

Reduces complex co-adaptations of neurons,

Forced to learn more robust features

Reasonable approximation to taking the geometric mean of the predictive distributions produced by the exponentially-many dropout networks.

Dropout roughly doubles the number of iterations required to converge.

Results

In the ILSVRC-2012 competition, a variant of this model win top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

On the test data, the proposed method achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.

My Comments

This paper clarifies the need to use big data set to obtained good results on Image classification. The bigger the data set, the more parameters are needed to capture the large variation. The more parameters system is prone to overfitting. Some techniques to address this overfitting problem are addressed such as dropout and data augmentation.

In our try to apply deep CNN for our study, we found that finding appropriate parameter such as optimum or appropriate learning rate, number of epoch, and the batchsize is not easy. I hope there would be a special paper that state some guidelines in quickly find these parameters.