Introduction

This article is part 3 of a series of three articles that I am going to post. The proposed article content will be as follows:

Part 1: This one will be an introduction into Perceptron networks (single layer neural networks)

Part 2: Will be about multi-layer neural networks, and the back propagation training method to solve a non-linear classification problem such as the logic of an XOR logic gate. This is something that a Perceptron can't do. This is explained further within this article.

Part 3: This one is about how to use a genetic algorithm (GA) to train a multi-layer neural network to solve some logic problem, ;f you have never come across genetic algorithms, perhaps my other article located here may be a good place to start to learn the basics.

Summary

This article will show how to use a Microbial Genetic Algorithm to train a multi-layer neural network to solve the XOR logic problem.

A Brief Recap (From Parts 1 and 2)

Before we commence with the nitty griity of this new article which deals with multi-layer neural networks, let's just revisit a few key concepts. If you haven't read Part 1 or Part 2, perhaps you should start there.

Part 1: Perceptron Configuration (Single Layer Network)

The inputs (x1,x2,x3..xm) and connection weights (w1,w2,w3..wm) in figure 4 are typically real values, both positive (+) and negative (-). If the feature of some xi tends to cause the perceptron to fire, the weight wi will be positive; if the feature xi inhibits the perceptron, the weight wi will be negative.

The perceptron itself consists of weights, the summation processor, and an activation function, and an adjustable threshold processor (called bias hereafter).

For convenience, the normal practice is to treat the bias as just another input. The following diagram illustrates the revised configuration:

The bias can be thought of as the propensity (a tendency towards a particular way of behaving) of the perceptron to fire irrespective of its inputs. The perceptron configuration network shown in Figure 5 fires if the weighted sum > 0, or if you are into math type explanations.

Part 2: Multi-Layer Configuration

The multi-layer network that will solve the XOR problem will look similar to a single layer network. We are still dealing with inputs / weights / outputs. What is new is the addition of the hidden layer.

As already explained above, there is one input layer, one hidden layer, and one output layer.

It is by using the inputs and weights that we are able to work out the activation for a given node. This is easily achieved for the hidden layer as it has direct links to the actual input layer.

The output layer, however, knows nothing about the input layer as it is not directly connected to it. So to work out the activation for an output node, we need to make use of the output from the hidden layer nodes, which are used as inputs to the output layer nodes.

This entire process described above can be thought of as a pass forward from one layer to the next.

This still works like it did with a single layer network; the activation for any given node is still worked out as follows:

where wi is the weight(i), and Ii is the input(i) value. You see it the same old stuff, no demons, smoke, or magic here. It's stuff we've already covered.

So that's how the network looks. Now I guess you want to know how to go about training it.

Learning

There are essentially two types of learning that may be applied to a neural network, which are "Reinforcement" and "Supervised".

Reinforcement

In Reinforcement learning, during training, a set of inputs is presented to the neural network. The output is 0.75 when the target was expecting 1.0. The error (1.0 - 0.75) is used for training ("wrong by 0.25"). What if there are two outputs? Then the total error is summed to give a single number (typically sum of squared errors). E.g., "your total error on all outputs is 1.76". Note that this just tells you how wrong you were, not in which direction you were wrong. Using this method, we may never get a result, or could be hunt the needle.

Using a generic algorithm to train a multi-layer neural network offers a Reinforcement type training arrangement, where the mutation is responsible for "jiggling the weights a bit". This is what this article is all about.

Supervised

In Supervised learning, the neural network is given more information. Not just "how wrong" it was, but "in what direction it was wrong", like "Hunt the needle", but where you are told "North a bit" "West a bit". So you get, and use, far more information in Supervised learning, and this is the normal form of neural network learning algorithm.

This training method is normally conducted using a Back Propagation training method, which I covered in Part 2, so if this is your first article of these three parts, and the back propagation method is of particular interest, then you should look there.

So Now the New Stuff

From this point on, anything that is being discussed relates directly to this article's code.

What is the problem we are trying to solve? Well, it's the same as it was for Part 2, it's the simple XOR logic problem. In fact, this articles content is really just an incremental build, on knowledge that was covered in Part 1 and Part 2, so let's march on.

For the benefit of those that may have only read this one article, the XOR logic problem looks like the following truth table:

Remember with a single layer (perceptron), we can't actually achieve the XOR functionality as it's not linearly separable. But with a multi-layer network, this is achievable.

So with this in mind, how are we going to achieve this? Well, we are going to use a Genetic Algorithm (GA from this point on) to breed a population of neural networks that will hopefully evolve to provide a solution to the XOR logic problem; that's the basic idea anyway.

So what does this all look like?

As can be seen from the figure above, what we are going to do is have a GA which will actually contain a population of neural networks. The idea being that the GA will jiggle the weights of the neural networks, within the population, in the hope that the jiggling of the weights will push the neural network population towards a solution to the XOR problem.

So How Does This Translate Into an Algorithm

The basic operation of the Microbial GA training is as follows:

Pick two genotypes at random

Compare scores (fitness) to come up with a winner and loser

Go along genotype, at each locus (point)

So only the loser gets changed, which gives a version of Elitism for free; this ensures the best in breed remains in the population.

With some probability, copy from winner to loser (overwrite)

With some probability, mutate that locus of the loser

That's it. That is the complete algorithm.

But there are some essential issues to be aware of when playing with GAs:

The genotype will be different for a different problem domain

The fitness function will be different for a different problem domain

These two items must be developed again whenever a new problem is specified. For example, if we wanted to find a person's favourite pizza toppings, the genotype and fitness would be different from that which is used for this article's problem domain.

These two essential elements of a GA (for this article problem domain) are specified below.

1. The Geneotype

For this article, the problem domain states that we had a population of neural networks. So I created a single dimension array of NeuralNetwork objects. This can be seen from the constructor code within the GA_Trainer_XOR object:

2. The Fitness Function

Remembering the problem domain description stated, the following truth table is what we are trying to achieve:

So how can we tell how fit (how close) the neural network is to this ? It is fairly simply really. What we do is present the entire set of inputs to the Neural Network one at a time and keep an accumulated error value, which is worked out as follows:

Within the NeuralNetwork class, there is a getError(..) method like this:

Then in the NN_Trainer_XOR class, there is an Evaluate method that accepts an int value which represents the member of the population to fetch and evaluate (get fitness for). This overall fitness is then returned to the GA training method to see which neural network should be the winner and which neural network should be the loser.

So how do we know when we have a trained neural network? In this article's code, what I have done is provide a fixed limit value within the NN_Trainer_XOR class that, when reached, indicates that the training has yielded a best configured neural network.

If, however, the entire training loop is done and there is still no well-configured neural network, I simply return the value of the winner (of the last training epoch) as the overall best configured neural network.

This is shown in the code snippet below; this should be read in conjunction with the evaluate(..) method shown above:

//check to see if there was a best configuration found, may not have done
//enough training to find a good NeuralNetwork configuration, so will simply
//have to return the WINNER
if (bestConfiguration == -1)
{
bestConfiguration = WINNER;
}
//return the best Neural network
return networks[bestConfiguration];

So Finally the Code

Well, the code for this article looks like the following class diagram (it's Visual Studio 2005, C#, .NET v2.0):

The main classes that people should take the time to look at would be:

GA_Trainer_XOR: Trains a neural network to solve the XOR problem using a Microbial GA.

TrainerEventArgs: Training event args, for use with a GUI.

NeuralNetwork: A configurable neural network.

NeuralNetworkEventArgs: Training event args, for use with a GUI.

SigmoidActivationFunction: A static method to provide the sigmoid activation function.

The rest are the GUI I constructed simply to show how it all fits together.

Note: The demo project contains all code, so I won't list it here. Also note that most of these classes are quite similar to those included with the Part 2 article code. I wanted to keep the code similar so people who have already looked at Part 2 would recognize the common pattern.

Code Demos

The demo application attached has three main areas which are described below:

Live Results Tab

It can be seen that this has very nearly solved the XOR problem; it did however take nearly 45000 iterations (epoch) of a training loop. Remembering that we have to also present the entire training set to the network, and also do this twice, once to find a winner and once to find a loser. That is quite a lot of work; I am sure you would all agree. This is why neural networks are not normally trained by GAs; this article is really about how to apply a GA to a problem domain. Because the GA training took 45000 epochs to yield an acceptable result does not mean that GAs are useless. Far from it, GAs have their place, and can be used for many problems, such as:

Sudoko solver (the popular game)

Backpack problem (trying to optimize the use of a backpack of limited size, to get as many items in as will fit)

To name but a few, basically, if you can come up with the genotype and a Fitness function, you should be able to get a GA to work out a solution. GAs have also been used to grow entire syntax trees of grammar, in order to predict which grammar is more optimal. There is more research being done in this area as I write this article; in fact, there is a nice article on this topic (Gene Expression Programming) by Andrew Krillov, right here at the CodeProject, if anyone wants to read further.

Training Results Tab

Viewing the target/outputs together:

Viewing the errors:

Trained Results Tab

Viewing the target/outputs together:

It is also possible to view the neural network's final configuration using the "View Neural Network Config" button.

What Do You Think?

That is it; I would just like to ask, if you liked the article, please vote for it.

Points of Interest

I think AI is fairly interesting, that's why I am taking the time to publish these articles. So I hope someone else finds it interesting, and that it might help further someone's knowledge, as it has my own.

Anyone that wants to look further into AI type stuff, that finds the content of this article a bit basic, should check out Andrew Krillov's articles at Andrew Krillov CP articles as his are more advanced, and very good.

History

v1.1: 27/12/06: Modified the GA_Trainer_XOR class to have a random number seed of 5.

Comments and Discussions

I'm trying to calculate the rsquared value after training is complete and I need the coefficient values which would be the final weights. I see that there are 3 properties under the neural network. Hidden, InputToHiddenWeights, and HiddenToOutputWeights. Which one of these is what I would want?

That is what I thought but I must be doing something wrong because I have 3 inputs and 1 output so I initialize the neural network with 3 hidden layers and 3 input layers and 1 output layer. I'm getting 4 double arrays in the HiddenToOutputWeights and it seems that the more training times that I do the training with, the more inaccurate the weights seem to get. I'm trying to run code that compares multiple regression with the neural network in your code and then I create a rsquared value for each method and choose the closest one. All I have changed in your code was the inputs and outputs. Is there anything that I need to change to make sure it isn't still trying to solve the xor code?

All I'm trying to do is perform a multiple regression equation using 3 inputs and 1 output. I feed in the 3 input arrays and I'm just trying to find the weight for each that best matches the output data. I don't know how I should change the fitness function if at all. I read that section on it but I guess I'm not seeing what to do.

getError() currently takes the error, gets the square of it and then the squared root, so the result is the modulus of the original error value.Instead it should return the square error, so Math.Sqrt() should be removed.

As a suggestion to get better results already with less train loops, initialiseNetwork() should be called when creating new NeuralNetwork .

There seem to be a problem with this program.I edited the program, and it looks like most seeds would make it stuck on a local minima.I think the error for this is that he used a bad genetic algorithmMaking the winner spread his genes over the loser's genes.

Do u know how to create a Neural network application that read a data from excel file? then show output in GUI diagram.. Maybe u will understand more when u refer to this link (http://www.jpier.org/PIER/pier116/13.11022601.pdf[^] ) this project previously is done by matlab . Do u know how to convert it to C# programming.

I use Aforge implement a back properagation network,I use sigmoid as ativation function, it have 282 input, 6 output, the second of output instant bigger than 0.5 in traing sample. we trained the network till error is 0. but the second output instant less than 0.5. it is very strange, why it report error is 0?

great article!Question: what is the optimal layout for specific inputs/outputs to choose? E.g. why for xor function & 2 inputs & 1 output - 1 hidden layer with 2 nodes?What will happen if increase number of hidden layers & nodes? If I have more inputs and outputs, should it be equal amount of nodes in hidden layer? Is layout of nn also depends on purpose of network?

Hello Mr. Barber, at first, i would like to say that i think your 3 steps tutorial a great base for AI, helped me a hot, but when i tryed to pass the XOR problem to C++, i am encountering unnexpected behavior of the ANN's, the learning rate is too slow! i think the problem is the random number generation...

Hi Mr. Sacha, I want to train a neural network to recognize the characters [A-Z] U [a-z] U [0-9] from a user-drawn image in a panel (Java) or picture box (VB6). The picture is divided into a 10 by 10 matrix and each intersection in the matrix represents 1 input neuron. Based from what I have read in your discussion above, genetic algorithms tend to be slow relative to BP when used to train neural networks. But you also said that it could be remedied if only you could come up with a good fitness function as well as a genotype... (Did I get it right?)Can you recommend me a good link so that I will have something to read regarding my project? I want to use GAs in evolving weights.

By the way, your article is quite easier to understand than the article in generation 5. But the code presented there when run, achieves (in only 1000 cycles) the performance of your code (in 45000 cycles). Why is that? I am a beginner in this field and I hope you could somehow help me. Also, could you recommend of a good data structure to use in storing the weights (100 input neurons, 10-15 hidden neurons, and 26 output neurons, fully connected)? Or maybe a good architecture for the network itself.

You know you say my code took 45000 cycles and it does but it is training a Neural Network using a GA, this takes a long time. I bet the one at generation 5 is hust using BP. If you look at 2nd articl in this series, it only took 2500 cycles to solve the XOR using BP.

As far as how to represent the network, you could use my code directly or use Andre Kirillov Neuural Network stuff, also here at codeproject.

Thanks, haven't read the stuff you mentioned yet but I will after finishing this reply. Uhm, actually, the one I mentioned in my first post in this article is a Neural Network trained by GA. You can follow this link if you want proof of what I am saying. http://www.generation5.org/content/2000/nn_ga.asp

Hi, I think rather than constant repetition of the same back-propagated, sigmoid-driven, multilayer networks that we see practically everywhere, it would be more interesting if you were to present an algorithm (with measured stability metrics) regarding the implementation of mutation and crossover in neural networks.

For example, what do you mutate? A neuron? (Value? Activation Function? Type of Synapse?) A neuron layer? (Number of neurons? Arrangement? Bias values?)

And how do you implement network crossover? What defines a slice of the network, and how do you insert it into a different one?

But as a newbie to AI While I agree there was other stuff on the net that possibly says the same thing, I did actually feel that Sacha's article was at the "right" level and more importantly in the "right style" ( for me at least) I have have looked at the other articles I had seen on the net again after reading Sacha's stuff and they made more sense to me the second time. Yes they probably were saying the same thing in the end but their writing style just lost me too quick.

While I am no where near being able to say I understand fully ANN, I can at least now understand the terms you mention

<small>"if you were to present an algorithm (with measured stability metrics) regarding the implementation of mutation and crossover in neural networks.
For example, what do you mutate? A neuron? (Value? Activation Function? Type of Synapse?) A neuron layer? (Number of neurons? Arrangement? Bias values?)
And how do you implement network crossover? What defines a slice of the network, and how do you insert it into a different one?
"</small>

You use terms like Mutate Activation functions and cross over networks which I am only now at a point where i realize I need help next on. ( and that is what they are called )

I liked Sacha Writing style ... it worked for me ( at this moment in time) and I agree there is loads of other stuff out there but I for one am glad he did write what he did.

Now Could some one write the follow up Please I need to learn how to "Mutate my crossed over function network activation " thingnbobby watsit and Quickly

Now Could some one write the follow up Please I need to learn how to "Mutate my crossed over function network activation " thingnbobby watsit and Quickly

This really made me laugh, yeah ANN are a bit like that. Those books I refer to are good, and you should also check out Andrew Kirillovs articles. I mentioned those in my article.

I am a big fan of his actually.

Im glad you like my writing style. I try to make easy to grasp. These were always meant to be beginners articles. At uni we did, spiking, DRNN, and all sorts of others, but the style described by my articles is by far the easiest to follow. So if I only helped one person thats really cool.

Im glad I helped you in particular, I like what you said it made me smile.

Hmm, maybe I will. It seems that Neural Network articles are universally popular on CodeProject. But, on the other hand, I'm not sure that anyone is really interested about the more advanced stuff (GA, committee machines, parallelization, etc.)... but I will sit on the idea and maybe come up with something.

I have been reading all your articles and they are very good. I am new to the AI world but I have a problem that needs it now.

I understand ( sort of) the coding side of things but my questions are higher level than that. how do you define your inputs , nodes , Biases etc.

For example how do define the number of nodes you use/need and how many layers deep should it be and when do you need backpropagation ( Is backpropagation only needed if you have an XOR situation you want to test for?)

Sorry if these are very general questions.

Just to give you a very basic description of my problem which is basically document categorization .... I have a situation where I have to recognise a variable number of documents when I see them... The catch is the documents while having "Roughly" the same content each time i see them it will rarely be identical There will always be Extra stuff that could be considered "noise" and the location of the "Functional" content with in a document can vary..... I really don't care about the content of the document my issue is to recognise that the document is one Ive seen before or a new one and catagorize it appropriately.

A simple example would be say... Um ... I know!!! a cooking web site... I am interested in seeing when ever and how often different recipes are visited. I don't care about the actual recipes but I want to know that I have seen this one before or its a new one. so I know what pile to put it under ( Baking, Cakes, apple pies)

( for the purpose of this example assume i only have access to the page data nothing else) each time a recipe is displayed there will be loads of different Ads wrapped around the page and at random there might be News type articles relating to the history of the dish.... and to top it off say it is a supermarket web site and they modify the recipe text to include product brands that are on special that week. so the actual Recipe text will differ slightly. This is not my actall situation but it sums it up pretty well

How would you recommend defining the sort of network I would need?

In my situation I think I can find about 5-6 "parameters" that have a fairly High correlation to the document "catagory" each parameter on its own would not be unique but as a set would be a "reasonable" good fingerprint ( but rarely 100%)

Do i just have one neuron per Parameter I want to match against? do I need to have multiple layers of nurons? do I need set up any special Back propergation rules? ( in my case it parameter I would be looking at would all be addative)

Is it possible to avoid having to supply a test file as I would never know all the possible documents i might see beforehand?

There is a certain amount of structure and a few "general" key words I can rely on.Would you suggest someform of GA to get it to "learn" each new page by itself as it comes across it or would I have to build a test set of all the pages I am interested in first?

Sorry if this is getting too long, Please tell me to go some where else if this is not the right place to ask.:(

For an XOR with 4 inputs, I would need 4 inputs if you see what I mean. Then I would look at what state I trying to get out of the neural network for the problem domain. For an XOR, I simply want a 0 or a 1. So The network only needs 1 output. The bit in between inputs and outputs (hidden node layers) is kind of your choice.

As far as recognising entire strings, that may be a bit over ambitious. I would try and get your head around single characters first (this is hard enough, and takes lots of training, just to do this)

Ther are some very good articles, and possible you should pose your questions to these fellows

Oh I do think you should have gotten the prize. As an almost complete beginner, Your articles must have been one of the "friendliest" introduction to AI Neural networks I have seen so far. As long as you are still enjoying doing them please keep it up.

I will look at your references and do some more head scratching.

In the meantime maybe one of the other readers may have some suggestions...

Maybe a good article for some one to write is "problem analysis" wrt using neural nets ....Who knows if I learn enough I might have a go at writing some thing.I don't know if other people are as Slow as me at picking up this AI "thinking" up

can you recommend any good references that show real world examples ( even simple ones) and the Neural nets chosen , and WHY it was chosen. I feel that with Neural nets libraries I have just been given my first "Swiss Army pocket knife" with a 101 "gadgets" on it and i don't know where/how to choose which which one i need

Yet, it is really sad. I totally agree and I personally think that there is some sort of cheating. That guy was not so close to be winner, so he just made something mean to become winner. If it is so, than it is really low of him.

Yes, you are right. We need just to write good articles. Personally I am not a great fun of prizes I got and think they could be much better. But I understand why they are so. These prizes are just a way for advertisement and nothing more.

I am really glad that you agree with me, it just doesnt seem possible I mean for something like 11 days there were 2 clear leaders, then in 1 day one article tripples its votes.

Mmmmm I think not.

I am totally annoyed by this, But as I say I am not going to bother with that stuff any more, just do my own thing and let things happen.

You know I generally write stuff I am interestd in, sometimes this means lots of interest, other times not so much, but I like what I write, and thats the main thing for me. So i will continue in this route.

As for this competitoon, I am over it. There was some cheating me thinks. But sod it.