I do have to warn though because I am moving at the moment to the neural network so I only created the base codes based on the books and C++ source codes for my Unity project to be used later. So there are some problems at the moment with the code but the code samples can work as an example to you to modify and use how it best fits you.

Niching techniques

“Can be great for retaining population diversity, and are particularly useful where fitness landscape might contain multiple peaks or where it is essential to protect new innovation within a population.” (Rabin, 2003)

Explicit fitness sharing

“Is a mechanism where individuals with similar genetic properties are grouped together. Each individual’s score is then divided by the number of genomes within its group.

Newfitness = oldfitness / numberofneighboors

This punishes the fitness scores of individuals who have many similar neighbors, thereby preventing any one group from growing too large and taking over the population.” (Rabin, 2003)

Speciation

“Goes one step further by separating the genomes into groups in the same way as explicit fitness sharing, but this time individuals are only allowed to breed with members of their own species. Typically, a species is killed when either its size decreases to zero or its fitness has not increased within a user-defined number of generations. This means that the individuals that would normally have died out early in the evolution of a population remain active for much longer, protected among their species members. Because of the protection you can experiment with much larger mutation rates than normal.” (Rabin, 2003)

The compatibility function

“To determine if one genome should belong in the same species as another, you must use a function that compares two genes strings and returns a compatibility distance. If the distance is below a user defined threshold, then the genomes are considered to be in the same species. This compatibility function varies considerably depending on the encoding representation.

Dist= (x-y) / n

N = the number of genes in each chromosome

X and y= represent two different individuals in the population

Each iteration of the genetic algorithm, the compatibility function is used to test every individual against the first member in each species. If the distance is within a user defined threshold, then the individual is added to the appropriate species. Id the individual is incompatible with all the current species, then a new species is created and the individual is added to that.” (Rabin, 2003)

///
&lt;summary&gt;
/// Separate the population into species. This separates the individuals into species of similar genomes.
/// &lt;/summary&gt;
public void Speciate(ref List&lt;Host&gt; population)
{
//first clear the existing members and kill off any non developing
//species
this.Species.Clear();
//now separate the population into species
for (int gen = 0; gen &lt; population.Count; ++gen)
{
bool bAdded = false;
foreach(Species curSpecies in this.Species)
{
//calculate the compatibility score
double cs = Compatibility(population[gen], curSpecies.Sample());
//if the compatibility score is less than our tolerance then
//this genome is added to the species
if (cs &lt; CompatibilityTolerance) { curSpecies.AddGenome(population[gen]); bAdded = true; break; } }//next species if (!bAdded) { //not compatible with any current species so create a new //species Species.Add(new Species(population[gen], NextSpeciesID++)); } }//next genome //update all the species to make sure their sample member is set //to the best genome found so far. Kill off any empty species //foreach (Species curSpecies in this.Species) for(int x = this.Species.Count -1; x &gt;= 0; --x)
{
var curSpecies = this.Species[x];
curSpecies.UpdateSampleGenome();
if ((curSpecies.Empty() ||
(curSpecies.GenerationsNoImprovement() &gt; GenerationsAllowedWithoutImprovement)) &amp;&amp;
(Species.Count &gt; 1))
{
Species.RemoveAt(x);
}
}
}
///
&lt;summary&gt;
/// this allocates a compatibility score between two genomes. If the
/// score is below a certain threshold then the two genomes are &lt;/summary&gt;
/// considered to be of the same species&lt;param name="g1"&gt;&lt;/param&gt;
/// &lt;param name="g2"&gt;&lt;/param&gt;
/// &lt;returns&gt;&lt;/returns&gt;
public float Compatibility(Host g1, Host g2)
{
if (g1.DNA.Genes.Count != g2.DNA.Genes.Count) return 0;
float RunningTotal = 0.0F;
for (int gene = 0; gene &lt; g1.DNA.Genes.Count; ++gene)
{
//RunningTotal += Mathf.Abs(g1.Genes[gene] - g2.Genes[gene]);
RunningTotal += Vector2.Distance(g1.DNA.Genes[gene], g2.DNA.Genes[gene]);
}
return RunningTotal / g1.DNA.Genes.Count;
}
///
&lt;summary&gt;
/// this method calculates the amount of offspring each species
/// should produce.&lt;/summary&gt;
/// &lt;param name="AmountNeeded"&gt;&lt;/param&gt;
public void CalculateExpectedOffspring(int AmountNeeded)
{
//first calculate the total fitness of all active genomes
float TotalFitness = 0.0F;
foreach (Species curSpecies in this.Species)
{
//apply fitness sharing first
curSpecies.FitnessShare();
TotalFitness += curSpecies.TotalFitness();
}
//now it is necessary to calculate the expected amount of offspring
//from each species
double expec = 0.0;
foreach (Species curSpecies in this.Species)
{
curSpecies.SetExpectedOffspring(TotalFitness, AmountNeeded);
expec += curSpecies.ExpectedOffspring();
}
}
///
&lt;summary&gt;
/// this sorts the species and assigns a color to each one.
/// The color is just cosmetic to be used as a visual aid.&lt;/summary&gt;
public void SortAndAssignVisualAid()
{
if (this.Species.Count &lt; 0) return; this.Species.OrderByDescending(o =&gt; o.BestEverFitness());
}

Co-evolution

“If you have different species competing with each other, they have to work harder at solving the problem. If the fitness of one is somehow inversely proportional to the fitness of the other, you have competition. This can greatly increase the speed and quality of the evolution process.” (Rabin, 2003)

The Michalewicz method

“Contains several mutation and crossover operators that are used in combination to fully exploit the characteristics of real value encoding GAs.

Mutation operators

Boundary mutation: With probability p, this operator changes the value of a gene to either its minimum possible value, Gmin, or its maximum possible value Gmax.

Replace mutation: With probability p, this operator resets a gene to a uniform random between Gmin and Gmax.

Non-Uniform mutation: With probability p, this operator adjusts a gene’s size by a small random amount, the limits of which decrease over time.

Crossover operators

Arithmetical crossover: This simply averages the values of the genes at each locus.

Simple crossover: this operator is the same as a single point crossover.

Heuristics crossover: Given the parents G1 and G2 this operator creates an offspring according to Equation: Child = r(G2 – G1) + G2

Variable G2 is the fitter of the two parents, and r is a random number between 0 and 1.

Alone these operators produce bad results but combined together they have a tendency to produce fitter offspring’s.

Tip: Using adaptive probability distribution for the operators can be speedier and lose little in performance. This means that each operator uses the same equal probability.” (Rabin, 2003)

Genetic programming

“Genetic programming is a powerful extension to evolutionary computing. They allow considerable flexibility, allowing the application programmer to develop solutions of more diverse structure that those provided by genetic algorithms. Grammatical evolution is one way to implement genetic programming while minimizing the computation expense related to the creation and subsequent elimination of nonsense organism.” (Rabin, 2003)

Growth in Genetic Algorithms

Idea: Enhancements through flexibility

Growth

Neutral networks

Variable-length genome

Speciation

Co-evolution.

Growth

Look at nature for examples on how to give more “life” to Gas and not look to static created by a mechanical code. In nature things grow rather than are strictly parametrized. Look at this growth methods from biology, chemistry etc. (Rabin, 2003)

Environment

Such example would be to look at the environment something lives in and how they affect one another. What factors, things and properties of both the environment and the entity play into the role of environment based growth? When do the growth stop for a living organism? Take into consideration how the growth is going to be terminated. Look again into nature. (Rabin, 2003)

Neutral network

A concept that increases evolvability. Have some junk DNA switched on inside an important host. This junk DNA can then be subject to the same rules of evolution. This might not produce great results at first but every now and then might help be of advantage. Increases the probability of finding a better solution.

Implementation ideas:

Scanning through the genome with a logic that determines an interaction values by a single mutation, for example.

Variable-length genomes: Have unequal crossover points. Both parents have their own crossover points so a child can have a genome much bigger or smaller than its two parents, which allows evolution to control the complexity of the solutions.

Another example would be to duplicate a gene once it is identified or to just make a random duplications during crossover.

This would mean that the system cannot be a simple parametrization system.

Introduction

OK, in the last post I wrote some basics on Genetic Algorithms and I showed an example which solved a text “puzzle”. In this post I will go a bit more in-depth into Genetic Algorithms and show you an example of a Genetic Algorithm solving a “best” possible path to different targets while avoiding obstacles.

I will start by showing the target seeking Genetic Algorithm in action. Then I will go through the code explaining the important parts. After that I will spend some time on general GA related topics and give some explanations and sample codes.

This post is more about gathering as much details as possible into on location. This works as notes for me and hopefully is someone else is looking at this post he or she might find something useful. This is a work in progress so I apologize for any possible errors in this post. I’ll fix them when I become aware of them and when I can.

Updated 16.9.2015: Made some additions to the descriptions to existing operators and scaling techniques.

Sample application

The sample application is rather simple. It has the following tasks to perform.

The object which is the focus of our attention is to find target

Arrange the targets by how close they are from the initial starting position

Then chose the closest target

Next calculate the best possible path to the target. Things to take into consideration:

Avoiding obstacles

Finding the fastest route with least amount of steps

Favoring paths which last location is closest to a target

Favoring paths which travel the least amount of space

When the best possible path has been found to the target using the path.

Go to the next target until no more targets are left and repeat steps 4-6

Here are the link to the source codes for those whom are interesting in poking around :). I warn though that the code and the solution is raw and unpolished. It does the job when I am learning GAs at the moment:

The fitness score calculation

But currently I will only show you the calculations which define the fitness values and plays the major role in determining a valid path to a target.

In this sample there are two classes which hold the actual data for the paths. One is named as genome which holds the travel data. The other is named Host which holds the genome data and operates on it.

public float CalculateFitnessAndDistance(Vector2 targetLocation)
{
if(this.EndLocation != this.StartLocation)
{
this.DistanceToTarget = Vector2.Distance(targetLocation, this.EndLocation);
float distanceToTargetFromStartLocation = Vector2.Distance(targetLocation, this.StartLocation);
// The target has been reached and make sure we are not dividing by zero.
if (this.DistanceToTarget == 0)
this.DistanceToTarget = 1;
// This means only the target was hit and make sure that we do not end up diving by zero
if (this.obstaclesHit == 0)
this.obstaclesHit = 1;
/* The fitness score is based on four things.
1. The time it takes a path to reach its target, if time is not taken into consideration the slowest and safest path will win
2. the distance from the path, if distance is not taken into consideration the result would be a long path
3. and how many obstacles hit along the path, if obstacles are not taken into consideration the GA will not know if a path is good or bad even if it hits obstacles and how bad it is. The more obstacles a path hits the more unlikely it is to get reproduced.
4. The travel distance. The smaller the distance is the better the fitness score will be.
In each of the values the lower the time, or the distance or the obstacles hit the better the fitness score is. That is the score will be closer to 1 or above it for the best possible path.
The higher any value is the more likely it is to not be the most optimal path. The worst path is the one that hits obstacles.
*/
var calculation = (this.finnishTime) * this.DistanceToTarget * this.obstaclesHit * this.DistanceTraveled;
this.DNA.Fitness = 1 / calculation;
}
// We have not reached the target yet, this is another obstacle and the fitness must be reduced. We want to penalize the path for any obstacles. This is not what we want in the population for possible solutions.
if(this.HitObstacle)
{
this.DNA.Fitness *= 0.1F;
}
// Award the path for hitting the target.
if(this.HitTarget)
{
this.DNA.Fitness += 1;
}
return this.DNA.Fitness;
}

The genome code above is rather simple. All it does is store the data and initializes the genome with random vectors pointing to direction where to go next.

The host data is the actual place where the main fitness calculation takes place based on the genome data.

Now take a look at the function: CalcualteEndLocationOfHost. This function is called before the fitness calculation function named: CalculateFitnessAndDistance. In the CalcualteEndLocationOfHost function pre fitness calculation operations are done such as determining if the object has hit obstacles, how many of them, has it hit the target and also calculates the path for the object.

The fitness score is then calculated in the CalculateFitnessAndDistance function. The fitness score is based on four things:

The time it takes a path to reach its target, if time is not taken into consideration the slowest and safest path will win

the distance from the path, if distance is not taken into consideration the result would be a long path

and how many obstacles hit along the path, if obstacles are not taken into consideration the GA will not know if a path is good or bad even if it hits obstacles and how bad it is. The more obstacles a path hits the more unlikely it is to get reproduced.

The travel distance. The smaller the distance is the better the fitness score will be.

In each of the values the lower the time, or the distance or the obstacles hit the better the fitness score is. That is the score will be closer to 1 or above it for the best possible path.

The higher any value is the more likely it is to not be the most optimal path. The worst path is the one that hits obstacles. This is by the way a crude fitness score calculation method. There is definitely room for improvement. Currently it does the job.

This is basically it. There are of course more details such as the mutation operators, crossover and selection operators but I will cover these later since these are common to Genetic Algorithms and not specific to this path finding GA.

Also as with the previous GA example solving the text puzzle the same GA related parameters need to be defined and tuned for optimal performance. These are:

The Population size

Crossover rate (this is a new one, simply determined if a crossover should be performed by a random floating point value between 0 and 1.)

Mutation rate

Genome genes amount(in other words how many steps in a path)

Images on the path solving application

Now it’s time to look at some cool visual images on how the path finding operates.

In this first image you can see how the very first generation has no specific knowledge where to go yet. It goes all over the place in a circular path. The yellow lines represent path steps taken which do not collide with obstacles while red lines represent the path hitting an obstacle.

The next image shows you close to the very last generation in the path solving GA. Here you can see that the algorithms has converged towards the target and is just about to find the proper path based on the fitness score.

In this last image you can see the algorithm in play later just searching for possible best path (the green lines). While the red lines represent paths traveled by accepted paths.

Genetic algorithms explained

Here I have gathered some explanations from other sources to help you grasp and get a better idea of GAs. These also work for me as notes for later usage.

What is a Genetic Algorithm

“Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetics. As such they represent an intelligent exploitation of a random search used to solve optimization problems. Although randomised, GAs are by no means random, instead they exploit historical information to direct the search into the region of better performance within the search space. The basic techniques of the GAs are designed to simulate processes in natural systems necessary for evolution, specially those follow the principles first laid down by Charles Darwin of “survival of the fittest.”. Since in nature, competition among individuals for scanty resources results in the fittest individuals dominating over the weaker ones.” http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html

“Concisely stated, a genetic algorithm (or GA for short) is a programming technique that mimics biological evolution as a problem-solving strategy. Given a specific problem to solve, the input to the GA is a set of potential solutions to that problem, encoded in some fashion, and a metric called a fitness function that allows each candidate to be quantitatively evaluated. These candidates may be solutions already known to work, with the aim of the GA being to improve them, but more often they are generated at random.

The GA then evaluates each candidate according to the fitness function. In a pool of randomly generated candidates, of course, most will not work at all, and these will be deleted. However, purely by chance, a few may hold promise – they may show activity, even if only weak and imperfect activity, toward solving the problem.

These promising candidates are kept and allowed to reproduce. Multiple copies are made of them, but the copies are not perfect; random changes are introduced during the copying process. These digital offspring then go on to the next generation, forming a new pool of candidate solutions, and are subjected to a second round of fitness evaluation. Those candidate solutions which were worsened, or made no better, by the changes to their code are again deleted; but again, purely by chance, the random variations introduced into the population may have improved some individuals, making them into better, more complete or more efficient solutions to the problem at hand. Again these winning individuals are selected and copied over into the next generation with random changes, and the process repeats. The expectation is that the average fitness of the population will increase each round, and so by repeating this process for hundreds or thousands of rounds, very good solutions to the problem can be discovered.

As astonishing and counterintuitive as it may seem to some, genetic algorithms have proven to be an enormously powerful and successful problem-solving strategy, dramatically demonstrating the power of evolutionary principles. Genetic algorithms have been used in a wide variety of fields to evolve solutions to problems as difficult as or more difficult than those faced by human designers. Moreover, the solutions they come up with are often more efficient, more elegant, or more complex than anything comparable a human engineer would produce. In some cases, genetic algorithms have come up with solutions that baffle the programmers who wrote the algorithms in the first place!” http://www.talkorigins.org/faqs/genalg/genalg.html

Genetic Algorithms Overview

“GAs simulate the survival of the fittest among individuals over consecutive generation for solving a problem. Each generation consists of a population of character strings that are analogous to the chromosome that we see in our DNA. Each individual represents a point in a search space and a possible solution. The individuals in the population are then made to go through a process of evolution.

GAs are based on an analogy with the genetic structure and behavior of chromosomes within a population of individuals using the following foundations:

Individuals in a population compete for resources and mates.

Those individuals most successful in each ‘competition’ will produce more offspring than those individuals that perform poorly.

Genes from `good’ individuals propagate throughout the population so that two good parents will sometimes produce offspring that are better than either parent.

Search Space

“A population of individuals are maintained within search space for a GA, each representing a possible solution to a given problem. Each individual is coded as a finite length vector of components, or variables, in terms of some alphabet, usually the binary alphabet {0,1}. To continue the genetic analogy these individuals are likened to chromosomes and the variables are analogous to genes. Thus a chromosome (solution) is composed of several genes (variables). A fitness score is assigned to each solution representing the abilities of an individual to `compete’. The individual with the optimal (or generally near optimal) fitness score is sought. The GA aims to use selective `breeding’ of the solutions to produce `offspring’ better than the parents by combining information from the chromosomes.

The GA maintains a population of n chromosomes (solutions) with associated fitness values. Parents are selected to mate, on the basis of their fitness, producing offspring via a reproductive plan. Consequently highly fit solutions are given more opportunities to reproduce, so that offspring inherit characteristics from each parent. As parents mate and produce offspring, room must be made for the new arrivals since the population is kept at a static size. Individuals in the population die and are replaced by the new solutions, eventually creating a new generation once all mating opportunities in the old population have been exhausted. In this way it is hoped that over successive generations better solutions will thrive while the least fit solutions die out.

New generations of solutions are produced containing, on average, better genes than a typical solution in a previous generation. Each successive generation will contain more good `partial solutions’ than previous generations. Eventually, once the population has converged and is not producing offspring noticeably different from those in previous generations, the algorithm itself is said to have converged to a set of solutions to the problem at hand.” http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/hmw/article1.html

Implementation Details

“

Based on Natural Selection

After an initial population is randomly generated, the algorithm evolves the through three operators:

Selection which equates to survival of the fittest;

Crossover which represents mating between individuals;

Mutation which introduces random modifications.

Selection Operator

key idea: give preference to better individuals, allowing them to pass on their genes to the next generation.

The goodness of each individual depends on its fitness.

Fitness may be determined by an objective function or by a subjective judgement.

Crossover Operator

Prime distinguished factor of GA from other optimization techniques

Two individuals are chosen from the population using the selection operator

A crossover site along the bit strings is randomly chosen

The values of the two strings are exchanged up to this point

The two new offspring created from this mating are put into the next generation of the population

By recombining portions of good individuals, this process is likely to create even better individuals

Mutation Operator

With some low probability, a portion of the new individuals will have some of their bits flipped.

Its purpose is to maintain diversity within the population and inhibit premature convergence.

Applications of GA

I think this quote is rather good at explaining uses for Genetic Algorithms:

“Genetic algorithms has been used for difficult problems (such as NP-hard problems), for machine learning and also for evolving simple programs. They have been also used for some art, for evolving pictures and music.

Advantage of GAs is in their parallelism. GA is travelling in a search space with more individuals (and with genotype rather than phenotype) so they are less likely to get stuck in a local extreme like some other methods.

They are also easy to implement. Once you have some GA, you just have to write new chromosome (just one object) to solve another problem. With the same encoding you just change the fitness function and it is all. On the other hand, choosing encoding and fitness function can be difficult.

Disadvantage of GAs is in their computational time. They can be slower than some other methods. But with todays computers it is not so big problem.

To get an idea about problems solved by GA, here is a short list of some applications:

Strengths of GAs

“The first and most important point is that genetic algorithms are intrinsically parallel. Most other algorithms are serial and can only explore the solution space to a problem in one direction at a time, and if the solution they discover turns out to be suboptimal, there is nothing to do but abandon all work previously completed and start over. However, since GAs have multiple offspring, they can explore the solution space in multiple directions at once. If one path turns out to be a dead end, they can easily eliminate it and continue work on more promising avenues, giving them a greater chance each run of finding the optimal solution.However, the advantage of parallelism goes beyond this. Consider the following: All the 8-digit binary strings (strings of 0’s and 1’s) form a search space, which can be represented as ******** (where the * stands for “either 0 or 1”). The string 01101010 is one member of this space. However, it is also a member of the space 0*******, the space 01******, the space 0******0, the space 0*1*1*1*, the space 01*01**0, and so on. By evaluating the fitness of this one particular string, a genetic algorithm would be sampling each of these many spaces to which it belongs. Over many such evaluations, it would build up an increasingly accurate value for the average fitness of each of these spaces, each of which has many members. Therefore, a GA that explicitly evaluates a small number of individuals is implicitly evaluating a much larger group of individuals – just as a pollster who asks questions of a certain member of an ethnic, religious or social group hopes to learn something about the opinions of all members of that group, and therefore can reliably predict national opinion while sampling only a small percentage of the population. In the same way, the GA can “home in” on the space with the highest-fitness individuals and find the overall best one from that group. In the context of evolutionary algorithms, this is known as the Schema Theorem, and is the “central advantage” of a GA over other problem-solving methods (Holland 1992, p. 68; Mitchell 1996, p.28-29; Goldberg 1989, p.20).

Due to the parallelism that allows them to implicitly evaluate many schema at once, genetic algorithms are particularly well-suited to solving problems where the space of all potential solutions is truly huge – too vast to search exhaustively in any reasonable amount of time. Most problems that fall into this category are known as “nonlinear”. In a linear problem, the fitness of each component is independent, so any improvement to any one part will result in an improvement of the system as a whole. Needless to say, few real-world problems are like this. Nonlinearity is the norm, where changing one component may have ripple effects on the entire system, and where multiple changes that individually are detrimental may lead to much greater improvements in fitness when combined. Nonlinearity results in a combinatorial explosion: the space of 1,000-digit binary strings can be exhaustively searched by evaluating only 2,000 possibilities if the problem is linear, whereas if it is nonlinear, an exhaustive search requires evaluating 21000 possibilities – a number that would take over 300 digits to write out in full.Fortunately, the implicit parallelism of a GA allows it to surmount even this enormous number of possibilities, successfully finding optimal or very good results in a short period of time after directly sampling only small regions of the vast fitness landscape (Forrest 1993, p. 877). For example, a genetic algorithm developed jointly by engineers from General Electric and Rensselaer Polytechnic Institute produced a high-performance jet engine turbine design that was three times better than a human-designed configuration and 50% better than a configuration designed by an expert system by successfully navigating a solution space containing more than 10387 possibilities. Conventional methods for designing such turbines are a central part of engineering projects that can take up to five years and cost over $2 billion; the genetic algorithm discovered this solution after two days on a typical engineering desktop workstation (Holland 1992, p.72).

Another notable strength of genetic algorithms is that they perform well in problems for which the fitness landscape is complex – ones where the fitness function is discontinuous, noisy, changes over time, or has many local optima. Most practical problems have a vast solution space, impossible to search exhaustively; the challenge then becomes how to avoid the local optima – solutions that are better than all the others that are similar to them, but that are not as good as different ones elsewhere in the solution space. Many search algorithms can become trapped by local optima: if they reach the top of a hill on the fitness landscape, they will discover that no better solutions exist nearby and conclude that they have reached the best one, even though higher peaks exist elsewhere on the map.Evolutionary algorithms, on the other hand, have proven to be effective at escaping local optima and discovering the global optimum in even a very rugged and complex fitness landscape. (It should be noted that, in reality, there is usually no way to tell whether a given solution to a problem is the one global optimum or just a very high local optimum. However, even if a GA does not always deliver a provably perfect solution to a problem, it can almost always deliver at least a very good solution.) All four of a GA’s major components – parallelism, selection, mutation, and crossover – work together to accomplish this. In the beginning, the GA generates a diverse initial population, casting a “net” over the fitness landscape. (Koza (2003, p. 506) compares this to an army of parachutists dropping onto the landscape of a problem’s search space, with each one being given orders to find the highest peak.) Small mutations enable each individual to explore its immediate neighborhood, while selection focuses progress, guiding the algorithm’s offspring uphill to more promising parts of the solution space (Holland 1992, p. 68).However, crossover is the key element that distinguishes genetic algorithms from other methods such as hill-climbers and simulated annealing. Without crossover, each individual solution is on its own, exploring the search space in its immediate vicinity without reference to what other individuals may have discovered. However, with crossover in place, there is a transfer of information between successful candidates – individuals can benefit from what others have learned, and schemata can be mixed and combined, with the potential to produce an offspring that has the strengths of both its parents and the weaknesses of neither. This point is illustrated in Koza et al. 1999, p.486, where the authors discuss a problem of synthesizing a lowpass filter using genetic programming. In one generation, two parent circuits were selected to undergo crossover; one parent had good topology (components such as inductors and capacitors in the right places) but bad sizing (values of inductance and capacitance for its components that were far too low). The other parent had bad topology, but good sizing. The result of mating the two through crossover was an offspring with the good topology of one parent and the good sizing of the other, resulting in a substantial improvement in fitness over both its parents.The problem of finding the global optimum in a space with many local optima is also known as the dilemma of exploration vs. exploitation, “a classic problem for all systems that can adapt and learn” (Holland 1992, p. 69). Once an algorithm (or a human designer) has found a problem-solving strategy that seems to work satisfactorily, should it concentrate on making the best use of that strategy, or should it search for others? Abandoning a proven strategy to look for new ones is almost guaranteed to involve losses and degradation of performance, at least in the short term. But if one sticks with a particular strategy to the exclusion of all others, one runs the risk of not discovering better strategies that exist but have not yet been found. Again, genetic algorithms have shown themselves to be very good at striking this balance and discovering good solutions with a reasonable amount of time and computational effort.

Another area in which genetic algorithms excel is their ability to manipulate many parameters simultaneously (Forrest 1993, p. 874). Many real-world problems cannot be stated in terms of a single value to be minimized or maximized, but must be expressed in terms of multiple objectives, usually with tradeoffs involved: one can only be improved at the expense of another. GAs are very good at solving such problems: in particular, their use of parallelism enables them to produce multiple equally good solutions to the same problem, possibly with one candidate solution optimizing one parameter and another candidate optimizing a different one (Haupt and Haupt 1998, p.17), and a human overseer can then select one of these candidates to use. If a particular solution to a multiobjective problem optimizes one parameter to a degree such that that parameter cannot be further improved without causing a corresponding decrease in the quality of some other parameter, that solution is called Pareto optimal or non-dominated (Coello 2000, p. 112).

Finally, one of the qualities of genetic algorithms which might at first appear to be a liability turns out to be one of their strengths: namely, GAs know nothing about the problems they are deployed to solve. Instead of using previously known domain-specific information to guide each step and making changes with a specific eye towards improvement, as human designers do, they are “blind watchmakers” (Dawkins 1996); they make random changes to their candidate solutions and then use the fitness function to determine whether those changes produce an improvement.The virtue of this technique is that it allows genetic algorithms to start out with an open mind, so to speak. Since its decisions are based on randomness, all possible search pathways are theoretically open to a GA; by contrast, any problem-solving strategy that relies on prior knowledge must inevitably begin by ruling out many pathways a priori, therefore missing any novel solutions that may exist there (Koza et al. 1999, p. 547). Lacking preconceptions based on established beliefs of “how things should be done” or what “couldn’t possibly work”, GAs do not have this problem. Similarly, any technique that relies on prior knowledge will break down when such knowledge is not available, but again, GAs are not adversely affected by ignorance (Goldberg 1989, p. 23). Through their components of parallelism, crossover and mutation, they can range widely over the fitness landscape, exploring regions which intelligently produced algorithms might have overlooked, and potentially uncovering solutions of startling and unexpected creativity that might never have occurred to human designers. One vivid illustration of this is the rediscovery, by genetic programming, of the concept of negative feedback – a principle crucial to many important electronic components today, but one that, when it was first discovered, was denied a patent for nine years because the concept was so contrary to established beliefs (Koza et al. 2003, p. 413). Evolutionary algorithms, of course, are neither aware nor concerned whether a solution runs counter to established beliefs – only whether it works.

Limitations of GAs

“The first, and most important, consideration in creating a genetic algorithm is defining a representation for the problem. The language used to specify candidate solutions must be robust; i.e., it must be able to tolerate random changes such that fatal errors or nonsense do not consistently result.There are two main ways of achieving this. The first, which is used by most genetic algorithms, is to define individuals as lists of numbers – binary-valued, integer-valued, or real-valued – where each number represents some aspect of a candidate solution. If the individuals are binary strings, 0 or 1 could stand for the absence or presence of a given feature. If they are lists of numbers, these numbers could represent many different things: the weights of the links in a neural network, the order of the cities visited in a given tour, the spatial placement of electronic components, the values fed into a controller, the torsion angles of peptide bonds in a protein, and so on. Mutation then entails changing these numbers, flipping bits or adding or subtracting random values. In this case, the actual program code does not change; the code is what manages the simulation and keeps track of the individuals, evaluating their fitness and perhaps ensuring that only values realistic and possible for the given problem result.In another method, genetic programming, the actual program code does change. As discussed in the section Methods of representation, GP represents individuals as executable trees of code that can be mutated by changing or swapping subtrees. Both of these methods produce representations that are robust against mutation and can represent many different kinds of problems, and as discussed in the section Some specific examples, both have had considerable success.This issue of representing candidate solutions in a robust way does not arise in nature, because the method of representation used by evolution, namely the genetic code, is inherently robust: with only a very few exceptions, such as a string of stop codons, there is no such thing as a sequence of DNA bases that cannot be translated into a protein. Therefore, virtually any change to an individual’s genes will still produce an intelligible result, and so mutations in evolution have a higher chance of producing an improvement. This is in contrast to human-created languages such as English, where the number of meaningful words is small compared to the total number of ways one can combine letters of the alphabet, and therefore random changes to an English sentence are likely to produce nonsense.

The problem of how to write the fitness function must be carefully considered so that higher fitness is attainable and actually does equate to a better solution for the given problem. If the fitness function is chosen poorly or defined imprecisely, the genetic algorithm may be unable to find a solution to the problem, or may end up solving the wrong problem. (This latter situation is sometimes described as the tendency of a GA to “cheat”, although in reality all that is happening is that the GA is doing what it was told to do, not what its creators intended it to do.) An example of this can be found in Graham-Rowe 2002, in which researchers used an evolutionary algorithm in conjunction with a reprogrammable hardware array, setting up the fitness function to reward the evolving circuit for outputting an oscillating signal. At the end of the experiment, an oscillating signal was indeed being produced – but instead of the circuit itself acting as an oscillator, as the researchers had intended, they discovered that it had become a radio receiver that was picking up and relaying an oscillating signal from a nearby piece of electronic equipment!This is not a problem in nature, however. In the laboratory of biological evolution there is only one fitness function, which is the same for all living things – the drive to survive and reproduce, no matter what adaptations make this possible. Those organisms which reproduce more abundantly compared to their competitors are more fit; those which fail to reproduce are unfit.

In addition to making a good choice of fitness function, the other parameters of a GA – the size of the population, the rate of mutation and crossover, the type and strength of selection – must be also chosen with care. If the population size is too small, the genetic algorithm may not explore enough of the solution space to consistently find good solutions. If the rate of genetic change is too high or the selection scheme is chosen poorly, beneficial schema may be disrupted and the population may enter error catastrophe, changing too fast for selection to ever bring about convergence.Living things do face similar difficulties, and evolution has dealt with them. It is true that if a population size falls too low, mutation rates are too high, or the selection pressure is too strong (such a situation might be caused by drastic environmental change), then the species may go extinct. The solution has been “the evolution of evolvability” – adaptations that alter a species’ ability to adapt. For example, most living things have evolved elaborate molecular machinery that checks for and corrects errors during the process of DNA replication, keeping their mutation rate down to acceptably low levels; conversely, in times of severe environmental stress, some bacterial species enter a state of hypermutation where the rate of DNA replication errors rises sharply, increasing the chance that a compensating mutation will be discovered. Of course, not all catastrophes can be evaded, but the enormous diversity and highly complex adaptations of living things today show that, in general, evolution is a successful strategy. Likewise, the diverse applications of and impressive results produced by genetic algorithms show them to be a powerful and worthwhile field of study.

One type of problem that genetic algorithms have difficulty dealing with are problems with “deceptive” fitness functions (Mitchell 1996, p.125), those where the locations of improved points give misleading information about where the global optimum is likely to be found. For example, imagine a problem where the search space consisted of all eight-character binary strings, and the fitness of an individual was directly proportional to the number of 1s in it – i.e., 00000001 would be less fit than 00000011, which would be less fit than 00000111, and so on – with two exceptions: the string 11111111 turned out to have very low fitness, and the string 00000000 turned out to have very high fitness. In such a problem, a GA (as well as most other algorithms) would be no more likely to find the global optimum than random search.The resolution to this problem is the same for both genetic algorithms and biological evolution: evolution is not a process that has to find the single global optimum every time. It can do almost as well by reaching the top of a high local optimum, and for most situations, this will suffice, even if the global optimum cannot easily be reached from that point. Evolution is very much a “satisficer” – an algorithm that delivers a “good enough” solution, though not necessarily the best possible solution, given a reasonable amount of time and effort invested in the search. The Evidence for Jury-Rigged Design in Nature FAQ gives examples of this very outcome appearing in nature. (It is also worth noting that few, if any, real-world problems are as fully deceptive as the somewhat contrived example given above. Usually, the location of local improvements gives at least some information about the location of the global optimum.)

One well-known problem that can occur with a GA is known as premature convergence. If an individual that is more fit than most of its competitors emerges early on in the course of the run, it may reproduce so abundantly that it drives down the population’s diversity too soon, leading the algorithm to converge on the local optimum that that individual represents rather than searching the fitness landscape thoroughly enough to find the global optimum (Forrest 1993, p. 876; Mitchell 1996, p. 167). This is an especially common problem in small populations, where even chance variations in reproduction rate may cause one genotype to become dominant over others.The most common methods implemented by GA researchers to deal with this problem all involve controlling the strength of selection, so as not to give excessively fit individuals too great of an advantage. Rank, scaling and tournament selection, discussed earlier, are three major means for accomplishing this; some methods of scaling selection include sigma scaling, in which reproduction is based on a statistical comparison to the population’s average fitness, and Boltzmann selection, in which the strength of selection increases over the course of a run in a manner similar to the “temperature” variable in simulated annealing (Mitchell 1996, p. 168).Premature convergence does occur in nature (where it is called genetic drift by biologists). This should not be surprising; as discussed above, evolution as a problem-solving strategy is under no obligation to find the single best solution, merely one that is good enough. However, premature convergence in nature is less common since most beneficial mutations in living things produce only small, incremental fitness improvements; mutations that produce such a large fitness gain as to give their possessors dramatic reproductive advantage are rare.

Finally, several researchers (Holland 1992, p.72; Forrest 1993, p.875; Haupt and Haupt 1998, p.18) advise against using genetic algorithms on analytically solvable problems. It is not that genetic algorithms cannot find good solutions to such problems; it is merely that traditional analytic methods take much less time and computational effort than GAs and, unlike GAs, are usually mathematically guaranteed to deliver the one exact solution. Of course, since there is no such thing as a mathematically perfect solution to any problem of biological adaptation, this issue does not arise in nature.

Genetic algorithm operators, selectors, ranking and encoding

Here you will find a brief explanation of some of the main operators, selectors, ranking and encoding in Genetic Algorithms. You will also find some code samples where I have some to provide. These code samples are in C#.

Selection

Here is a good quote from the book (Buckland, 2002):

“Selection is how you choose individuals from the population to provide a gene base from which the next generation of individuals is created. This might mean individuals are selected and placed into the new generation without modification ala elitism, as we discussed in the last chapter, but usually it means the chosen genomes are selected to be parents of offspring which are created through the processes of mutation and recombination. How you go about choosing the parents can play a very important role in how efficient your genetic algorithm is. Unlike choosing a soccer team, if you choose the fittest individuals all the time, the population may converge too rapidly at a local minima and get stuck there. But, if you select individuals at random, then your genetic algorithm will probably take a while to converge (if it ever does at all). So, the art of selection is choosing a strategy which gives you the best of both worlds—something that converges fairly quickly yet enables the population to retain its diversity.”

Steady State Selection

“Steady state selection works a little like elitism, except that instead of choosing a

small amount of the best individuals to go through to the new generation, steady

state selection retains all but a few of the worst performers from the current population.

The remainder are then selected using mutation and crossover in the usual

way. Steady state selection can prove useful when tackling some problems, but most

of the time it’s inadvisable to use it.

” (Buckland, 2002)

Sample code

Fitness Proportionate Selection

“Selection techniques of this type choose offspring using methods which give individuals

a better chance of being selected the better their fitness score. Another way

of describing it is that each individual has an expected number of times it will be

chosen to reproduce. This expected value equates to the individual’s fitness divided

by the average fitness of the entire population. So, if you have an individual with a

fitness of 6 and the average fitness of the overall population is 4, then the expected

number of times the individual should be chosen is 1.5.” (Buckland, 2002)

Sample code

Elitism

” elitism is a way of guaranteeing that the fittest members of a population are retained for the next generation. … select n copies of the top m individuals of the population to be retained. I often find that retaining about 2-5% of the population size gives me good results. … you will discover that using elitism works well with just about every other technique described in this chapter (except stochastic universal sampling) ” (Buckland, 2002)

” Elitism helps the population to converge on a solution faster than if it is not present. It is easy to code.A typical figure for N best to be added to the population is around 1 – 10 % of the population size. Can be as high as 20 %. Too much elitism can cause the population to converge too quickly.” AI Game Programming Wisdom 2

Roulette Wheel Selection

“A common way of implementing fitness proportionate selection is roulette wheel selection, as I have already discussed. This technique does have its drawbacks, however. Because roulette wheel selection is based on using random numbers and because the population sizes of genetic algorithms are typically small (sizes between 50 and 200 are common), the number of children allocated to each individual can be far from its expected value. Even worse, it’s probable that roulette wheel selection could miss the best individuals altogether! This is one of the reasons elitism is a good idea when utilizing roulette wheel selection—it ensures you never lose the best individuals to chance.” (Buckland, 2002)

Stochastic Universal Sampling

“Stochastic Universal Sampling (SUS for short) is an attempt to minimize the problems of using fitness proportionate selection on small populations. Basically, instead of having one wheel which is spun several times to obtain the new population, SUS uses n evenly spaced hands, which are only spun once.

If you use SUS in your own genetic algorithms, it is inadvisable to use elitism with it because this tends to mess up the algorithm.” (Buckland, 2002)

“Stochastic universal sampling (SUS) is a technique used in genetic algorithms for selecting potentially useful solutions for recombination. It was introduced by James Baker.[1]

SUS is a development of fitness proportionate selection (FPS) which exhibits no bias and minimal spread. Where FPS chooses several solutions from the population by repeated random sampling, SUS uses a single random value to sample all of the solutions by choosing them at evenly spaced intervals. This gives weaker members of the population (according to their fitness) a chance to be chosen and thus reduces the unfair nature of fitness-proportional selection methods.

Other methods like roulette wheel can have bad performance when a member of the population has a really large fitness in comparison with other members. Using a comb-like ruler, SUS starts from a small random number, and chooses the next candidates from the rest of population remaining, not allowing the fittest members to saturate the candidate space.

Described as an algorithm, pseudocode for SUS looks like:

SUS(Population, N)F := total fitness of PopulationN := number of offspring to keepP := distance between the pointers (F/N)Start := random number between 0 and PPointers := [Start + i*P | i in [0..(N-1)]]
return RWS(Population,Pointers)

Where Population[0..i] is the set of individuals with array-index 0 to (and including) i.

Here RWS() describes the bulk of fitness proportionate selection (also known as “roulette wheel selection”) – in true fitness proportional selection the parameter Points is always a (sorted) list of random numbers from 0 to F. The algorithm above is intended to be illustrative rather than canonical.

Tournament Selection

“This technique is very efficient to implement because it doesn’t require any of the preprocessing or fitness scaling sometimes required for roulette wheel selection and other fitness proportionate techniques (discussed later in the chapter). Because of this, and because it’s a darn good technique anyway, you should always try this method of selection with your own genetic algorithms. The only drawback I’ve found is that tournament selection can lead to too quick convergence with some types of problems.” (Buckland, 2002)

“Tournament selection is a good alternative to fitness proportionate selection with or without scaling. This technique is very fast due to lack of complex calculations.

Chose a random number of individuals at random and them choosing the fittest among them. This is repeated until the next generation of individuals are generated. The higher the number of selected individuals the higher pressure. The lower the more diverse the population will be. Typical selection number is between 2-10%.” AI Game Programming Wisdom 2

Crossover

Crossover involves creating a child out of the genetic code of two parents.

Partially-Mapped Crossover

”PMX Crossover is a genetic algorithm operator. For some problems it offers better performance than most other crossover techniques. Basically, parent 1 donates a swath of genetic material and the corresponding swath from the other parent is sprinkled about in the child. Once that is done, the remaining alleles are copied direct from parent 2.

Randomly select a swath of alleles from parent 1 and copy them directly to the child. Note the indexes of the segment.

Looking in the same segment positions in parent 2, select each value that hasn’t already been copiedto the child.

For each of these values:

Note the index of this value in Parent 2. Locate the value, V, from parent 1 in this same position.

Locate this same value in parent 2.

If the index of this value in Parent 2 is part of the original swath, go to step i. using this value.

If the position isn’t part of the original swath, insert Step A’s value into the child in this position.

Order-Based Crossover

“To perform order-based crossover, several elements are chosen at random from one parent and then the order of those elements is imposed on the respective elements in the other parent.” (Buckland, 2002)(With slight text alternation from book example to a more general description).

“Order 1 Crossover
Order 1 Crossover is a fairly simple permutation crossover. Basically, a swath of consecutive alleles from parent 1 drops down, and remaining values are placed in the child in the order which they appear in parent 2.

Step 2: Drop the swath down to Child 1 and mark out these alleles in Parent 2.

Step 3: Starting on the right side of the swath, grab alleles from parent 2 and insert them in Child 1 at the right edge of the swath. Since 8 is in that position in Parent 2, it is inserted into Child 1 first at the right edge of the swath. Notice that alleles 1, 2 and 3 are skipped because they are marked out and 4 is inserted into the 2nd spot in Child 1.

Step 4: If you desire a second child from the two parents, flip Parent 1 and Parent 2 and go back to Step 1.

Order 1 PerformanceOrder 1 crossover is perhaps the fastest of all crossover operators because it requires virtually no overhead operations. On a generation by generation basis, edge recombination typically outperforms Order 1, but the fact that Order 1 runs between 100 and 1000 times faster usually allows the processing of more generations in a given time period.

Single-Point Crossover

“It simply cuts the genome at some random point and then switches the ends between parents. It is very easy and quick to implement and is generally effective to some degree with most types of problems.” (Buckland, 2002)

Two-Point Crossover

“Instead of cutting the genome at just one point, two-point crossover (you guessed it) cuts the genome at two random points and then swaps the block of genes between those two points. … Two-point crossover is sometimes beneficial because it can create combinations of genes that single-point crossover simply cannot provide. With single point, the end genes are always swapped over and this may not be favorable for the problem at hand. Two-point crossover eliminates this problem. ” (Buckland, 2002)

Multi-Point Crossover (parameterized uniform crossover)

“There’s no need to limit the amount of crossover points you can have. Indeed, for some types of encoding, your genetic algorithm may perform better if you use multiple crossover points. The easiest way of achieving this is to move down the length of the parents, and for each position in the chromosome, randomly swap the genes based on your crossover rate. For some types of problems, multi-point crossover works very well, but on others it can jumble up the genes too much and act more like an over enthusiastic mutation operator. Common values for the crossover rate using this type of crossover operator are between 0.5 and 0.8.” (Buckland, 2002)

Insertion Mutation

This is a very effective mutation and is almost the same as the DM operator, except here only one gene is selected to be displaced and inserted back into the chromosome. In tests, this mutation operator has been shown to be consistently better than any of the alternatives mentioned here.

Displaced Inversion Mutation

Select two random points, reverse the element order between the two points, and then displace them somewhere along the length of the original chromosome. This is similar to performing IVM and then DM using the same start and end points.

Encoding

Encoding is simply the way a problem is presented in such a way that the computer can solve the problem desired by someone.

Scaling Techniques

“Although using selection on the raw (unprocessed) fitness scores can give you a genetic algorithm that works (it solves the task you’ve designed it for), often your genetic algorithm can be made to perform better if the fitness scores are scaled in some way before any selection takes place. “ (Buckland, 2002)

Rank Scaling

“Rank scaling can be a great way to prevent too quick convergence, particularly at the start of a run when it’s common to see a very small percentage of individuals outperforming all the rest. The individuals in the population are simply ranked according to fitness, and then a new fitness score is assigned based on their rank. … Once the new ranked fitness scores have been applied, you select individuals for the next generation using roulette wheel selection or a similar fitness proportionate selection method. … This technique avoids the possibility that a large percentage of each new generation is being produced from a very small number of highly fit individuals, which can quickly lead to premature convergence. In effect, rank scaling ensures your population remains diverse. The other side of the coin is that the population may take a lot longer to converge, but often you will find that the greater diversity provided by this technique leads to a more successful result for your genetic algorithm. “ (Buckland, 2002)

“A cheap and easy method of retaining population diversity, while slowing down convergence. The drawback due to low variance is that it might take a long type to converge upon a solution. Used with elitism is a good approach.” AI Game Programming Wisdom 2

Sample code

public void FitnessScaleRanking(ref List&amp;amp;lt;Host&amp;amp;gt; pop)
{
// Arrange the population according to the highest fitness score currently
var population = pop.OrderByDescending(o =&amp;amp;gt; o.DNA.Fitness);
// The highest ranking value will be the max count of hosts in the population, while the minimum is for the least fittest memeber will have the fit score of one.
int populationSize = pop.Count;
foreach(Host host in population)
{
// Apply a new fittness score based on the raking value which is determined by the population size.
host.DNA.Fitness = populationSize;
// Go to the next ranking value for the next host
populationSize--;
}
}

Another example of ranking with the ranking value converted into a float value ranging from 0 to 1.

public void FitnessScaleRanking(ref List&amp;amp;amp;lt;Host&amp;amp;amp;gt; pop)
{
public void FitnessScaleRankingToFloatRangeZeroToOne(ref List&lt;Host&gt; pop)
{
// Arrange the population according to the highest fitness score currently
var population = pop.OrderByDescending(o =&gt; o.DNA.Fitness);
// The highest ranking value will be the max count of hosts in the population, while the minimum is for the least fittest memeber will have the fit score of one.
int populationSize = pop.Count;
foreach (Host host in population)
{
// Apply a new fittness score based on the raking value which is determined by the population size.
host.DNA.Fitness = Mathf.Abs((1 / (float)populationSize) - 1);
// Go to the next ranking value for the next host
populationSize--;
}
}

Sigma Scaling

“If you use raw fitness scores as a basis for selection, the population may converge too quickly, and if they are scaled as in rank selection, the population may converge too slowly. Sigma scaling is an attempt to keep the selection pressure constant over many generations. At the beginning of the genetic algorithm, when fitness scores can vary wildly, the fitter individuals will be allocated less expected offspring. Toward the end of the algorithm, when the fitness scores are becoming similar, the fitter individuals will be allocated more expected offspring.” (Buckland, 2002)

Boltzmann Scaling

”… sometimes you may want the selection pressure to vary. A common scenario is one in which you require the selection pressure to be low at the beginning so that diversity is retained, but as the genetic algorithm converges closer toward a solution, you want mainly the fitter individuals to produce offspring.

One way of achieving this is by using Boltzmann scaling. This method of scaling uses a continuously varying temperature to control the rate of selection. …

Each generation, the temperature is decreased by a small value, which has the effect of increasing the selection pressure toward the fitter individuals.” (Buckland, 2002)

“Sometimes, it’s preferable for selection pressure to be low at the begging and high toward the end. This ensures that your population remains diverse at the commencement of the algorithm. As the algorithm converges toward a solution, the fitter individuals are given preference.” AI Game Programming Wisdom 2

This is my first post in a series of Artificial Intelligence posts on AI learning. I got interested in the subject a few weeks ago when I started to work on some ideas for a game prototype I had in my mind. I was reading a book on game AI and turned the pages to a section of the book about AI learning It was fascinating :)! Anyways I remembered one of my older project where I played around with physics and nature I decided to back to it and the book that inspired me to nature and how it works.

So in this first post I’ll show you guys some sample C# code for a simple Genetic Algorithm that solves a text puzzle. On in other words I am using a GA algorithm so find the fastest solution to get the same text output from the algorithm that is input to the algorithm.

To start with if you are not familiar with Genetic Algorithms this source is a good place to start. My sample code at the moment is based on this page. This is where I will continue from later in the future parts of these upcoming blog posts on learning AI: http://natureofcode.com/book/chapter-9-the-evolution-of-code/

Logic and code

So lets start, ouh and remember I am just starting to learn this so double check the info and test for yourself :):

To put it in simple words GA algorithm is about inheriting data from generation to generation where the best candidate from the desired goal is chosen until the desired outcome generations later is achieved. Now this is a simplification of what is going on and there are more variables to take into consideration. I will keep things as simple as I can and hope that you refer to the link above or to more technical sources for more in-depth knowledge.

To start with a GA algorithm needs two things:

genotype => Digital information that is passed down from generation to generation

phenotype => this is the expression of data, how the data is going to be represented. This could be data that tells a system where an object is on the screen.

In this first part where the code sample is trying to solve a text problem the genotype and phenotype are one and the same called: DNA

What this class will do is generate a random number of character between a certain ASCII range. It will also evaluate a fitness for a DNA object to be used when deciding if the DNA will reproduce. Also to keep variation and increase the changes of a faster solution solving a crossover between two DNA objects are made. Also as a last measure to ensure that the best possible solution is achieved mutation to the DNA is introduced where a random character at random times is replaced in a DNA sequence of characters.

The Population classes main function is to hold the main population of DNA sequences and the mating pool used to generate the next generation of population (or to replace the current generation of population with the next generation). Also some statistical data can be retrieved from the class itself along with the information when the problem has been solved. In this case when the output is the same as the input.

Next is actual main program which will run the population to solve a problem.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using AIEngine.DataStructures;
using AIEngine;
namespace GeneticAlgoritmTextTest
{
class Program
{
///
<summary>
/// Change this value to alter how fast a problem is solved
/// </summary>
public static int PopulationCount = 1000;
public static String target = "TO BE OR NOT TO BE";
///
<summary>
/// Change this value to alter how fast a problem is solved
/// </summary>
public static float mutationRate = 0.01F;
static void Main(string[] args)
{
bool exit = false;
// Create the population which is responsible for solving the problem.
Population population = new Population(target, mutationRate, PopulationCount);
while (!exit)
{
// In each iteration we calculate the fitness of each DNA sequence to be used later in the algorithm logic
population.CalculateFitness();
// Here the algorithm implements a selection method for chosing the best DNA sequences from the population.
population.NaturalSelection();
// Next we will generate a new population based on algorithmic logic of crossover between two random DNA sequences and adding some mutation into it.
population.Generate();
Console.WriteLine();
Console.WriteLine(population.AllPhrases());
Console.WriteLine("Cycle average fitness: " + population.GetAverageFitness());
Console.WriteLine("Total generations: " + population.GetGenerations());
Console.WriteLine("Best fitness in cycle: " + population.GetBest());
// And before we go to the next iteration we check to see if the text puzzle has been solved.
exit = population.Finished();
}
Console.ReadLine();
}
}
}

The code above simple creates a population which will try to solve the a text puzzle. A fitness is calculated for each member of the population. A selection method is applied for the population and a mutation is introduced at random population members.

The end result should be something like this:

There are two images above. The first one shows the very first populations in the algorithm. The last image show the very last population generations up until the very last one which solved the problem.

Optimization

There are two things to do if you want to optimize the algorithm in this example:

Changing the mutation rate and population count variables.

By playing around with these values you can find out the most “optimal” values to solve your desired problem. You can do this manually or you can write a piece of code which will do this for you based on the telemetry from the algorithm.

Notice that having to a too large mutation rate will make the algorithm unsolvable and like wise having a too small or way too large population count will cause your algorithm to take longer or a very long time.

The fitness function logic

Here I will simply quote Daniel Shiffman from The Nature of code: “If you cannot define your problem’s goals and evaluate numerically how well those goals have been achieved, then you will not have successful evolution in your simulation.” So defining a good fitness function will go a long way towards creating a better result. Also each problem will most likely have a very unique fitness function. Similarities can occur(Still learning 🙂 ).