Show Reference: "To Recognize Shapes, First Learn to Generate Images"

@article{hinton-2007,
abstract = {The uniformity of the cortical architecture and the ability of functions to move to different areas of cortex following early damage strongly suggest that there is a single basic learning algorithm for extracting underlying structure from richly structured, high-dimensional sensory data. There have been many attempts to design such an algorithm, but until recently they all suffered from serious computational weaknesses. This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.},
author = {Hinton, Geoffrey E.},
doi = {10.1016/s0079-6123(06)65034-6},
issn = {0079-6123},
journal = {Progress in Brain Research},
keywords = {deep-learning},
pages = {535--547},
pmid = {17925269},
posted-at = {2013-08-15 01:58:43},
priority = {2},
title = {To Recognize Shapes, First Learn to Generate Images},
url = {http://dx.doi.org/10.1016/s0079-6123(06)65034-6},
volume = {165},
year = {2007}
}

Selfridge's Pandemonium is (at least one) progenitor of all hierarchical cognitive architectures.
It comprises a hierarchy of layers in which each layer detects patterns in the activity of its more primitive preceding layer.⇒

Unsupervised learning extracts regularities in the input.
Detected regularities can then be used for actual discrimination.
Or unsupervised learning can be used again to detect regularities in these regularities. ⇒

If we know which kind of output we want to have and if each neuron's output is a smooth function of its input, then the change in weights to get the right output from the input can be computed using calculus.

One problem with backpropagation is that one usually starts with small weights which will be far away from optimal weights.
Due to the size of the combinatorial space of weights, learning can therefore take a long time. ⇒

In the wake-sleep algorithm, (at least) two layers of neurons are fully connected to each other.

In the wake phase, the lower level drives the upper layer through the bottom-up recognition weights.
The top-down generative weights are trained such that they will generate the current activity in the lower level given the current activity in the output level.

In the sleep phase, the upper layer drives activity in the lower layer through the generative weights and the recognition weights are learned such that they induce the activity in the upper layer given the activity in the lower layer.⇒

The restricted Boltzman machine is an unsupervised learning algorithm which is similar to the wake-sleep algorithm.
It uses stochastic learning, ie. neural activations are stochastic with continuous probabilities given by weights.⇒

Hinton proposes building deep belief networks by stacking RBMs and training them unsupervised and in ascending order.
After that, the network goes into feed-forward mode and backprop can be used to learn the actual task.
Thus, some of the problems of backprop are solved by initializing the weights via unsupervised learning.⇒