Pattern Classification And Single Layer Networks: Chapter 2

Intro

We have just seen how a network can be trained to perform linear regression. That is, given a set
of inputs (x) and output/target values (y), the network finds the best linear mapping from x to y.

Given an x value that we have not seen, our trained network can predict what the most
likely y value will be. The ability to (correctly) predict the output for an input the network has not seen is
called generalization.

This style of learning is referred to as supervised learning (or learning with
a teacher) because we are given the target values. Later we will see examples of unsupervisedlearning
which is used for finding patterns in the data rather than modeling input/output mappings.

We now step away from linear regression for a moment and look at another
type of supervised learning problem called pattern classification. We start by considering only single layer networks.

Pattern classification

A classic example of pattern classifiction is letter recognition. We
are given, for example, a set of pixel values associated with an image of a letter. We want the computer to determine
what letter it is. The pixel values are refered to as the inputs or the decision variables, and the
letter categories are referred to as classes.

Now, a given letter such as "A" can look quite different depending on the
type of font that is used or, in the case of handwritten letters, different people's handwriting. Thus, there will
be a range of values for the decision variables that map to the same class. That is, if we plot the values of the
decision variables, different regions will correspond to different classes.

Example 2:

Example 3:

Single layer Networks for Pattern Classification

We can apply a similar approach as in linear regression where the targets
are now the classes. Note that the outputs are no longer continuous but rather take on discrete values.

Two Classes:

What does the network look like? If there are just 2 classes we only need 1 output node. The target
is 1 if the example is in, say, class 1, and the target is 0 (or -1) if the target is in class 0. It seems reasonable
that we use a binary step function to guarantee an appropriate output value.

Training Methods:

We will discuss two kinds of methods for training single-layer networks that do pattern classification:

Single Layer with a Binary Step Function

The net output of the network is a linear function of the weights and the inputs

net = W X = x1 w1 + x2 w2
y = f(net)

x1 w1 + x2 w2 = 0 defines a straight line through the input space.

x2 = - w1/w2 x1 <- this is line through the origin with slope
-w1/w2

Bias

What if the line dividing the 2 classes does not go through the origin?

Other interesting geometric points to note:

The weight vector (w1, w2) is normal to the decision boundary.
Proof: Suppose z1 and z2 are points on the decision boundary.

Linear Separability

Classification problems for which there is a line that exactly separates
the classes are called linearly separable. Single layer networks are only able to solve linearly separable problems.
Most real world are not linearly separable.