Genetic algorithms pop up all over computer science and applied computing. They are simple, easy to apply and easy to understand. What mystery remains is why they work at all? How can something seemingly so random home in on a one in a million plus solution?

The idea behind the “Genetic Algorithm” approach to improving an existing solution to a problem is to allow them to show how good modifications to the solution are by making them compete.

The good news is that the basic ideas and methods aren’t difficult to understand.

The bad news is that even after you understand them you will probably still be wondering why they work at all!

Evidence that the genetic algorithm proper does work is all around you. Just look at how excellently animals fit their ecological niches. Over millions of years the process of genetic selection produces organisms that there exploit their environment almost to perfection.

It is the desire to turn this natural design principle to our own projects that motivates the sometime strange seeming procedures of the artificial genetic algorithm.

In short the artificial genetic algorithm tries to implement what you see around in nature as gene pools change under evolutionary pressure by sexual reproduction and the action of the survival of the fittest.

Genes

The basic idea of evolution, and hence the genetic algorithm, is very simple.

First you start off with a population of “units”. The units could be almost anything you care to think of – humans, programs, computers, electronic circuits, solutions to maths problems, etc..

The next requirement is some measure of “quality” of the units.

It has to be possible for you to rate each member of the population according to its “fitness”, i.e. quality. The quality function usually isn’t a problem because, given you are looking for a procedure to make things better, you usually have some idea of what “better” means. You might have difficulty in expressing this "better" in a simple numerical way but you can generally find something that at least correlates with "better". Given you are going to have to evaluate a complete population at each generation you also would like the measure of fitness to be quick to calculate.

The next requirement is usually a much bigger problem.

You have to give the units something that corresponds to a gene. The artificial gene has to have a coding that controls the make up of the unit in a natural way. This sounds easy enough because anything that specifies the unit could be considered to be a gene.

For example, if you have a population of electronic circuits then the circuit diagram specifies a unit precisely and naturally. Unfortunately in this case the word “naturally” is the problem.

We need a gene representation that can be used for reproduction and this is more difficult and in many ways often seems less “natural” than the obvious representation. To make sense of this we need to look at what reproduction is all about.

Reproduction

To implement the genetic algorithm we need the units in our population to breed.

Obviously if we simply allow them to reproduce we end up with a growing population with no novel units - i.e. they are all the same.

The solution to this is to make the units reproduce sexually. That is two units are selected and their genetic material is mixed in some way to create a new unit which isn’t an identical copy of either parent.

Breeding is used to create a new population and in most cases the units in the original population are discarded – i.e. they die after breeding.

If you design an algorithm based on the previously described steps then the result is just a population which changes randomly at each generation.

To introduce the benefits of evolution to the process you have to apply a “survival of the fittest” law. In the natural environment organisms that are better at utilising it tend to survive and organisms that survive tend to breed more successfully.

This means that genes that code for good survival tend to be passed on and as a result the population gets better at utilising the environment.

In the case of the genetic algorithm all we have to do is assign a probability of breeding that is proportional to the measure of fitness we have. With this modification the population tends to improve on average with each generation and if we wait long enough eventually we will succeed in breeding a better circuit, program or whatever.

To summarise:

To implement a genetic algorithm we need

A population of units with a suitable fitness function.

A breeding mechanism that involves the exchange of genes.

The probability of breeding has to be proportional to fitness of the unit.

Mutation?

Some accounts of the genetic algorithm also include “mutation” but this is of dubious advantage in most practical problems.

Mutation is simply the occasional random change to a gene so that the process of improvement has a chance to get “unstuck”. The idea is that a steady improvement in some aspect of the system may miss a completely new and novel approach.

A bit like fishes getting better and better at breathing water and then suddenly a mutation occurs by chance alone which produces a fish that breathes air.

The problem with this idea is that the vast majority of mutations are completely useless and are more likely to produce a fish that can’t breath at all than something better.

In the case of nature, where there are huge resources to be wasted and a huge amount of time to devote to the problem, mutation is occasionally useful. In practical applications of the genetic algorithm, where time is always short, mutation tends to just be a nuisance factor.

On the other hand mutation can help a population that is stuck in a suboptimal local maximum escape and go on to find a better local maximum or even the global maximum fitness. In the case of the fish the genetic algorithm would tend to carry on refining its solution produces better and better fishes but without mutation the chance of leaving the sea and finding even bigger rewards on the land would be low.

One possible solution to the disruptive effects of mutation is to start the simulation off with a high mutation rate and gradually reduce it as the population converges on a single optimal solution. This is essentially the application of simulated annealing to the genetic algorithm

A simple example

All this sounds very good, perhaps even very exciting, but there is still an air of mystery about it all.

In particular what is a gene in the case of a general genetic algorithm?

At first you might think that creating a genetic specification is going to be hard but it actually isn’t quite as difficult as it sounds.

To see how it all works let’s take a look at a simple but unlikely example and apply the genetic algorithm to the problem of finding the maximum of x squared in the integer range 0 to 64.

There are lots of more direct ways of solving this problem than the genetic algorithm and no doubt readers can easily work out that the maximum is at x=64 without any help from calculus.

To solve this simple problem using the genetic algorithm is very instructive, however, and demystifies a lot of the procedures.

Fitness of a population

First we need a population of units. In this case these are just integer values in the range 0 to 64. That is each unit carries with it a single integer.

We start off with a population of units with values chosen at random.

The fitness of each unit is simply the value of x squared that it produces i.e. its value squared.

The probability that any particular x value will be selected for breeding is given by its fitness divided by the total fitness of the population. In genetic algorithm applications it is common that the survival rate it scaled by the population's total fitness.

For example, suppose we have an initial population of just three units with values {3,5,9}

The raw fitness of each unit is simply the square of their value:

{3*3,5*5,9*9}

or

{9,25,81}

Then the total fitness is

3*3+5*5+9*9=9+25+81=115

and hence the probability of each unit being selected for breeding is 9/155=0.08, 25/115=0.23, 81/115=0.70 respectively i.e.

{0.08,0.23,0.70}

You can already see that the larger numerical values have a higher probability of breeding which seems to correspond to the common sense idea that offspring should inherit from units that are closer to the maximum.

Genes and crossover

The next problem is what to use as the genes and how to mix them to create new units.

The simplest idea for a gene in this case is a binary representation of the x value itself.

That is the three genes corresponding to the units in {3,5,9} are {[00011], [00101] ,[01001]}.

In the genetic algorithm, genes often take the form of binary strings. One reason for this is that it makes breeding easier. For example, in this case breeding can be implemented by simply selecting two units at random according to their fitness. Next given two gene strings we select a “cross over” point in each string at random and swap the corresponding portions of the strings.

You can think of this as cutting each gene into two fragments and joining the fragments, one from each gene, back together.

Gene crossover – a natural biological phenomena?

For example if the two last two units are chosen then the genes are [00101] and [01001]. If the random cross over point is 3 (from the right) then the new genes passed on to the pair to the offspring are:

You can see that the 00 and 01 at the start of each gene swap places to give the new gene. In this case the new members of the population correspond to x=13 and x=1 i.e. one much fitter offspring and one spectacularly unfit offspring.

This is how genetic algorithms work in general and not every offspring of “fit” units is necessarily “fit”.