Learning in TLUs

Introduction

In this tutorial you will learn about:

Linear separability and inseparability

The Perceptron learning rule

The Least Mean Squares (LMS) learning rule

Linear separability

The simplest case of a linearly separable decision problem is one
consisting of two sets of points (patterns) in a 2-d vector space
that belong to different classes, where the two classes can be
separated by a straight line. In 3 dimensions the equivalent problem
is one where the points are separable by a plane.

We can extend this idea to any number of dimensions, though 4 or 5
dimensions is harder to visualise! Planes and straight lines are just
hyperplanes of dimension 1 and 2. There are hyperplanes of higher
dimension. In higher dimensional cases two classes in an N dimensional
space are termed "linearly separable" if they are separable by a
hyperplane of dimension N-1.

More precisely a linearly separable decision problem is one for which
there exists a solution of the form

(3)

The solution is simply the combination of weights and threshold which
minimises the error function that has been defined. The error function
can be defined in a number of ways. In the simplest case the error
could simply be the difference between the actual value of a
for each pattern x and the desired value of a . In a
TLU a simple way of defining the error is by thresholding a ,
and defining the output of the threshold function as the
classification of the input pattern. So our single TLU can distinguish
between two classes. The error measure now could simply be the number
of incorrectly classified patterns.

So we can see that a TLU implements a linear decision boundary. In the
previous tutorial we saw that by changing the weights and the
threshold of the TLU we can move the decision boundary. For the TLU to
solve a decision problem all the patterns from one class should be on
one side of the decision boundary, and all the patterns from the other
class should be on the other side.

Learning in a TLU is concerned with using an automatic procedure for
adjusting the weights and threshold so that the decision boundary
minimises the error function.

Linear inseparability

Clearly not all decision problems are linearly separable: they cannot
be solved using a linear decision boundary. Problems like these are
termed linearly inseparable. XOR is a linearly inseparable problem.

If you don't understand the concept of linear separability and
inseparability try plotting points in the input space in the demonstration below. Then try and separate the two
classes by altering the position of the decision boundary by hand.

The Perceptron learning rule

Now we know what sort of problem can be solved using a TLU let's think
about how a TLU can learn to solve such a problem. A simple learning
procedure for a TLU is called the Perceptron learning rule. First of
all we need to define an error function precisely. The error function
used for the perceptron training rule is based on the number of
incorrectly classified points in the training set.

(4)

where M is the set of misclassified input patterns, and d
is the desired output of the TLU. This is the perceptron error
criterion. If all input vectors are correctly classified the error
E=0. The contribution of a misclassified input pattern to the
error is its absolute distance from the decision boundary. The
perceptron learning procedure minimises this error function. The
learning procedure is as follows:

Loop until the error on the training set E=0

Take the next training example

Calculate the output (classification) y of the TLU

Compare it to the desired output d

If d=1 and y=-1 then adjust each weight wi using Eq.(5)

If d=-1 and y=1 then adjust each weight wi using Eq.(6)

Where Eqs.(5) and (6) are

(5)

(6)

The training set is simply the set of input patterns that the TLU is
learning to classify correctly. The value of alpha is in the interval
(0,1], and is often set to 1. The algorithm will converge to a correct
solution for any positive value of alpha if the problem is linearly
separable. This procedure can be used to adjust the threshold as well
by turning the threshold into another weight with a input value of
-1. This is the way it is implemented in the demonstration below.

To see how the Perceptron learning rule works use the
Demonstration. Set the training regime to be incremental. This
means that the weights and threshold will be updated after each input
pattern. The batch training regime means that weight changes are saved
up and only made at the end of an epoch. An epoch is a complete
presentation of all the input patterns in the training set. The error
signals for a sequence of patterns will be different for the two
training regimes. Can you explain why?

The LMS learning rule

Where e_x is the error for the input pattern x . In
the TLU implementation of this rule the error for each pattern is
calculated by finding the difference between the desired and actual
activation, i.e. the error is calculated using the activation of the
TLU before it is passed through the threshold function.

(8)

Where d_x is the desired output (+1 or -1) and a_x
is the actual activation. The update procedure is simple. The weights
are updated for every input pattern using the rule

(9)

If the weights are updated after every input pattern then the decision
boundary will continue to move around the point with the lowest
error. If batch training is used then the procedure will settle on the
solution providing the minimum error. Strictly both the perceptron
learning rules and the LMS rule were only designed to operate
incrementally. In this tutorial we've added batch versions to help you
understand the way in the which the original (incremental) rules work.