This paper is a much better introduction to Dropout than [Improving neural networks by preventing
co-adaptation of feature detectors](http://www.shortscience.org/paper?bibtexKey=journals/corr/1207.0580), written by the same authors two years later.
## General idea of Dropout
Dropout is a layer type. It has a parameter $\alpha \in (0, 1)$. The output dimensionality of a dropout layer is equal to its input dimensionality. With a probability of $\alpha$ any neurons output is set to 0. At testing time, the output of all neurons is multiplied with $\alpha$ to compensate for the fact that no output is set to 0.
## Interpretations
Dropout can be interpreted as training an ensemble of many networks, which share weights.
It can also be seen as a regularizer.