The Generalized Delta Rule

A generalized form of the delta rule, developed by D.E. Rumelhart, G.E.
Hinton, and R.J. Williams, is needed for networks with hidden layers. They
showed that this method works for the class of semilinear activation functions
(non-decreasing and differentiable).

Generalizing the ideas of the delta rule, consider a hierarchical network
with an input layer, an output layer and a number of hidden layers. We will
consider only the case where there is one hidden layer. The network is
presented with input signals which produce output signals that act as input to
the middle layer. Output signals from the middle layer in turn act as input to
the output layer to produce the final output vector. This vector is compared
to the desired output vector. Since both the output and the desired output
vectors are known, the delta rule can be used to adjust the weights in the
output layer. Can the delta rule be applied to the middle layer? Both the
input signal to each unit of the middle layer and the output signal are
known. What is not known is the error generated from the output of the
middle layer since we do not know the desired output. To get this error,
backpropagate through the middle layer to the units that are responsible
for generating that output. The error genrated from the middle layer
could be used with the delta rule to adjust the weights.