This is the end of the preview. Sign up
to
access the rest of the document.

Unformatted text preview: Machine Learning Srihari Error Backpropagation Sargur Srihari 1 Machine Learning Srihari Topics Neural Network Learning Problem Need for computing derivatives of Error function Forward propagation of activations Backward propagation of errors Statement of Backprop algorithm Use of backprop in computing the Jacobian matrix 2 Machine Learning Srihari Neural Network Learning Problem Goal is to learn the weights w from a labelled set of training samples Learning procedure has two stages 1. Evaluate derivatives of error function with respect to weights w 1 ,..w T 2. Use derivatives to compute adjustments to weights 3 w ( + 1) = w ( ) E n ( w ( ) ) T=(D+1)M+(M+1)K =M(D+K+1)+K Machine Learning Srihari Back-propagation Terminology Goal: EfFcient technique for evaluating gradient of an error function E( w ) for a feed- forward neural network Backpropagation is term used for derivative computation only In subsequent stage derivatives are used to make adjustments to weights Achieved using a local message passing scheme Information sent forwards and backwards alternately Machine Learning Srihari Overview of Backprop algorithm Choose random weights for the network Feed in an example and obtain a result Calculate the error for each node (starting from the last stage and propagating the error backwards) Update the weights Repeat with other examples until the network converges on the target output How to divide up the errors needs a little calculus 5 Machine Learning Srihari Wide use of Backpropagation Can be applied to error function other than sum of squared errors Used to evaluate other matrices such as Jacobian and Hessian matrices Second stage of weight adjustment using calculated derivatives can be tackled using variety of optimization schemes substantially more powerful than gradient descent 6 Machine Learning Srihari Evaluation of Error Function Derivatives Derivation of back-propagation algorithm for Arbitrary feed-forward topology Arbitrary differentiable nonlinear activation function Broad class of error functions Error functions of practical interest are sums of errors associated with each training data point We consider problem of evaluating For n th term in the error function Derivatives are wrt the weights w 1 ,..w T Can be used directly for sequential optimization or accumulated over training set (for batch) 7 E (w) = E n n = 1 N (w) E n (w) Machine Learning...
View Full
Document