Introduction

Perceptron is the simplest type of feed forward neural network. It was designed by Frank Rosenblatt as dichotomic classifier of two classes which are linearly separable. This means that the type of problems the network can solve must be linearly separable. Basic perceptron consists of 3 layers:

Sensor layer

Associative layer

Output neuron

There are a number of inputs (xn) in sensor layer, weights (wn) and an output. Sometimes w0 is called bias and x0 = +1/-1 (In this case is x0=-1).

For every input on the perceptron (including bias), there is a corresponding weight. To calculate the output of the perceptron, every input is multiplied by its corresponding weight. Then weighted sum is computed of all inputs and fed through a limiter function that evaluates the final output of the perceptron.

The output of neuron is formed by activation of the output neuron, which is function of input:

(1)

The activation function F can be linear so that we have a linear network, or nonlinear. In this example, I decided to use threshold (signum) function:

(2)

Output of network in this case is either +1 or -1 depending on the input. If the total input (weighted sum of all inputs) is positive, then the pattern belongs to class +1, otherwise to class -1. Because of this behavior, we can use perceptron for classification tasks.

Let's consider we have a perceptron with 2 inputs and we want to separate input patterns into 2 classes. In this case, the separation between the classes is straight line, given by equation:

(3)

When we set x0=-1 and mark w0=?, then we can rewrite equation (3) into form:

(4)

Here I will describe the learning method for perceptron. Learning method of perceptron is an iterative procedure that adjust the weights. A learning sample is presented to the network. For each weight, the new value is computed by adding a correction to the old value. The threshold is updated in the same way:

(5)

where y is output of perceptron, d is desired output and ? is the learning parameter.

Using the Program

When you run the program, you see area where you can input samples. Clicking by left button on this area, you will add first class sample (blue cross). Clicking by right button on this area, you will add first class sample (red cross). Samples are added to the samples list. You can also set learning rate and number of iterations. When you have set all these values, you can click on Learn button to start learning.

Using the Code

All samples are stored in generic list samples which holds only Sample class objects.

Before running a learning of perceptron is important to set learning rate and number of iterations. Perceptron has one great property. If solution exists, perceptron always find it but problem occurs, when solution does not exist. In this case, perceptron will try to find the solution in infinity loop and to avoid this, it is better to set maximum number of iterations.

Y is output of perceptron and samples[i].Class is desired output. The last 2 steps (looping through samples and computing new weights), we must repeat while the error variable is <> 0 and current number of iterations (iterations) is less than maxIterations.

Comments and Discussions

Hi, I'm just begin to study perceptron and found this article. I'm a little bit confused about the algorithm you used to draw separation line. Why do you assign x1 as -10 and 10? And then why do you use x2 = y for y = -(x1 * w1 / w2) - (x0 * w0 / w2)?
Also why we use *10 and /10 in some points like:
double posX = (panel.Width/2) + sample.X1*10 and new Point(0, (int)(shift - y * 10))?
Anyone care to share with me?
Thank you.

According to equation 5, you should update the weight by adding the learning rate * error. But in the implementation, you then divide this number by 2. Although halving the learning rate will surely work, I don't understand why the code is different from the equation.

Edit: saw this response in another answer
"Hello,

I'm so sorry, that I reply to late, but I had no time.
In this equation you must divide by number 2 because I decided to use -1 an 1 values to distinct between 2 classes.
If you set 0 for first class and 1 for second, than you don't need to divide by number 2.

Regards
Robert"

Edit: Now I have a different question on the same topic
The explanation suggests that we need to normalize the learning rate by the size of the space. This could be fine for two classes, but what if we introduce a third class, with a value such as -10. How would we know the normalization value?

Second question, why not use 0 and 1? When I train the network with 0 and 1, the line ends up sticking to some of the points, but when -1 and 1 are used, the line ends up between the points. Obviously -1 and 1 are better, but a book on pattern recognition I'm reading had us use 0 and 1, and my results were not good. That's why I searched the Internet and found this tutorial. Can you explain why -1 and 1 are better than 0 and 1?