This is a follow up on the discussion Be a monkey! about getting
monkeys to write a Shakespeare novel by randomly typing on a typewriter.
I have translated this into the following programming challenge:

Write a Perl program that given a limited number of statements,
creates a number that comes as close as possible to some target number.
The program starts with

There are 6^30 = 2.2 * 10^23 ways to combine the 5 statements into a
program of at most 30 of such statements. 6^30 = 2.2*10^23, which makes
it quite impossible to just search all possible combinations for an answer.
Randomly generating such programs is about as likely to find
an answer as a monkey writing one or two lines of Shakespeare by randomly
typing.

However the following program does find an exact solution
or a close approximation to the target number. It uses the technique of
genetic programming,
based on natural selection.

It works with a population of individuals.
Each individual has 30 genes. Each gene is a Perl statement.
To evaluate the fitness of an individual these statements
are stringed together into a Perl program. This program is
then evaluated using eval(). The better this program is
as creating the target number, the fitter the individual.

The population cycles through a number of generations,
in which a new population is generated (bred) from the old population.
Pairs of individual are formed and they get two children.
Each child is either an exact copy of one
of one of its parents or a recombination of its parents.
Which individuals are to become parents is based on their
fitness (how well they did at reaching the target number).
The fitter the individual the more offspring it has.

After this all parents die and the cycle repeats.

Generation after generation the population will become better
at creating the target number. The nice quality of this technique
is that no knowledge about the search space is needed.
The only thing you need to define is a function that tells how
good a particular solution is.
Try changing the Gene base (possible Perl statements) or the target
number. Not that there is no intelligence behind the process, but
it can come-up with surprising solutions. Would be a good technique
to create obfuscated code. :)

It is ofcourse not a solution to all problems, for instance it
does not work if there is only one good solution in the
entire search-space. There must be intermediate solutions too.
However it can be used to solve very hard problems.
For instance I use a modified version of this program to solve
the problem of how to pack 256 connections of varying capacity into
32 links. This has a search space of 2*10^385. Searching this would
require more time then the life time of the current universe and many
that come after it :)

This is my first attempt of using OO in Perl. So please
point-out any improvements. (I used the cookbook and the Perl FAQ as
information sources).

Another interesting thread is Tom Ray's
Tierra. Realizing that computer programs are just too brittle to evolve well (as one bad character in a program can render it all useless, vs. the laxness in say DNA coding and codons), Ray proposed an interesting computer architecture loosely based on biology that supports evolving programs.
www.hip.atr.co.jp/~ray/tierra/

Another interesting idea: Danny Hillis (of Connection Machine fame ) suggested in a 1990 paper that programs evolve better when confronted with "parasites". Hillis claimed he could evolve better sorting programs when they were pitted against an evolving landscape of "hard" sort sequences (which in turn were generated by different program evolving hard sequences with the goal of stumping the sorters... see
citeseer.nj.nec.com/context/15365/9503

This field hasn't generated much of practical significance yet, but it is cool.

Well, I beleive that there are a number of chip frabricators that are using GA to minimize the effect of parasitic transistors, also to minimize labour in layout.

Danny Hillis's idea does work quite nicely if you can get it right.

Another idea is to implement a sex difference. By having the genome translate into two different phenomes (for instance in a tracker implementation make sex X build from the genome from the left and sex X from the right) and forcing phenome/genomes to mate with the opposite (you can have as many sexes as you want, I played around with 3 sexes, when two go together they produce a child of the third type. Got the idea from a Piers Anthony book, from the Tarot series.) this minimizes the chance of getting stuck on a local min/maxima. Partially because the best solution will be forced to breed with the other sex, which is evaled differently, thus ensuring that 'good' genes get mixed with 'bad' genes (from either sexes POV). This means the randomness (im sure there is a more appropriate word) in the genepool stays higher.

Another idea is implement chromosomes. Ie split the genome up into smaller packets that can mutated/bread individually. That way random insertions dont fubar the whole genome, just the chromosome they are in.

I found the challenging aspect, and perhaps the limiting spect is coming up with an appropriate fitness function. If you can score your creatures then you can can solve the problem they arte trying to solve so whats the point? I mean not really but you get the idea.

For instance if you do a tracker implementation, by very subtly changing the fitness function you basically kill any possiblity of solving the problem (eating all of the dots).

From what I know there is really no way to know that your fitness function will enable to population to improve.

I would just like to say that I, for one, am seriously impressed. I'm going to have to start playing around with this and see what I can come up with (though I can tell that this will be a steep learning curve).

Well done! I tried to do this several months ago and failed
because I didn't hit on the idea of using a sequence of
individual statements. (I tried to define a small language
I could eval and had trouble making it both
expressive and easy to evaluate.)

I notice you left out division---presumably you don't
want to have to deal with errors from the eval
statement?

I notice you left out division---presumably you
don't want to have to deal with errors
from the eval statement?

Yup that was one of the reasons. It would have made
the program longer that it already is.

Programs with errors in them are not a problem
as long as they have different fitness values.
For instance as long as program with one error in it has a
higher fitness as a program with two errors in it.
If all programs that do not evaluate correctly,
result in the same fitness value there is no way
for the algorithm to gradually move to a better solution.

Yup. It thought it would make it easier to
understand the problem (the programming challenge).
It is also to limit the number
of possible solutions. If there are many building blocks
the search space is large but also full of good solutions.
Then even a random search works. With this I hoped to
demonstrate that even with limited building blocks
the algorithm can work to a good solution.
(It would be interesting to create a Perl program that
can determine the solution density, say using monte carlo
or so).

I was a bit surprised at

|=

myself.

:) I added that to show that the algorithm can come up
with solutions that are not easily visable to humans.
You can even add things like

I think I can pretty safely say that no number of monkeys,
under any circumstances, will ever reproduce a single
Shakespeare novel... since he was a playwright, and not a
novelist. Sorry to nitpick. :-)

Ay, mistress, and Petruchio is the master;
That teacheth tricks eleven and twenty long - Shakespeare

I believe your statement is false. Shakespeare was a human. Humans are primates. Primates are monkeys. Thus, a monkey did in fact write all Shakespearean novels! Imagine what any number of monkeys, under any circumstance could do!
Now who's nitpicking? :-)
cheers,
Thomax
"What find I here? Fair Portia's counterfeit! What demi-god Hath come so near creation?" -- Shakespeare (Bassanio in The Merchant of Venice)

As tilly kindly pointed out, only some primates are
monkeys, and humans are not amongst these.

Some people might still say that, "a monkey did in fact
write all Shakespearean novels". Predicating a quality
(written by a monkey) to a non-existent object
(a Shakespearean novel) is logically
problematic. And these problems are very interesting,
though I'm sorry to say that I no longer remember them
well enough to speak knowledgeably about them.

In any case, it doesn't matter. Let's say a monkey did
write all Shakespearean novels (in which case so did each
of the squirrels in my yard... and indeed, they wrote
themselves, too. The novels that is, not the squirrels).
Still, I said a monkey would never reproduce a
single Shakespeare novel, and that holds true.

It interests me, however, that in the same post you claim
both to be a monkey and to be picking nits... at least you
seem well-groomed. ;-)

The primates include lemurs, monkeys, and apes. The apes
are the ones without tails, and we are apes. Among the
great apes the chimpanzee and bonobo are closest, then we
join in, then gorillas, and the orangutang is more distant.
This is measuring by percentage of genetic material that is
the same.

Yes, you heard it right. We are biologically more similar
to chimps than either we or chimps are to gorillas.

cool concept and the implementation looks well done.
i need to study it more in order to make any detailed
or deep comments.
extremely minor nitpiks and some questions.
1) someone already covered the map in void context.
2) my $min = ${$fitnesses}[0];
for ($i = 0; $i < $size; ++$i) {
# set $i = 1 since you used 0th index to load $min.
# same thing for the $max value in a diff routine later.
$min = ${$fitnesses}[$i] if (${$fitnesses}[$i] < $min);
}
3) sub random_gene {
my $self = shift;
return ${$self->{GENES}}[rand(@{$self->{GENES}})];
why do you allow perl to truncate the above, which provides
a fair distribution for the 0th index, but you handle
explicitly below, which never includes the 0th index? there
may be a good reason, but i couldn't figure out what it was.
if (rand(1.0) < 0.005) {
my $mutate = 1 + int(rand(@{$self->{NEW_GENES}} - 1));
4) if (rand(1.0) > 0.5) {
my $cut = 1 + int(rand(@genes1 - 1));
# i understand it here - since replacing the entire gene
from beginning might not make sense
5) i factored out the following from the code so I could stick
everything on the top and play:
my $Genes = ['$x+=1 ;', '$x=$y ;', '$y=$x ;', '$x|=$y ;', '$x+=$y ;',+ ' ;'];
my $IndivGeneLen = 32;
my $PopSize = 1999;
my $Target = 10512;
my $NumGenerations = 100;
next thought - this can be generalized into a module.
very cool, thanks!

i figured out the answer to #3. the declaration of the lexical
variables occur in index 0, that makes them position dependent.
therefore index 0 has to be skipped during mutations and splicing.
something like this might be worthy of a comment in the
code.
___cliff rayman___cliff_AT_rayman.com___

i figured out the answer to #3. the declaration of the lexical
variables occur in index 0, that makes them position dependent.
therefore index 0 has to be skipped during mutations and splicing.
something like this might be worthy of a comment in the
code.
___cliff rayman___cliff_AT_rayman.com___

Hhhmm, I've written my on vesion, with somewhat different operators (genes) to help with the speed of convergence. Actually, I changed the goal too, it now looks for an individual number, repedatively; it eventually prints ot JUST ANOTHER PERL HACKER.
But I'm having a problem, quite often the populaion i overrun with exactly the same genes. I really need hlp figuring out how this is happening.
The big differences are the choose and breed routines, which are called select and mate in this code. The mate routine selects some of the most fit oganisms, breeds them, and replaces some of the least fit organisms with the offspring.
Here's the code:

"But I'm having a problem, quite often the populaion i overrun with exactly the same genes. I really need hlp figuring out how this is happening."

There can be a number of reasons for that.
(1) Your population size is too small (GP works best with large populations)
(2) Your mutation rate is too low (but your program it looks fine), or
(3) Individuals with a low fitness have a too low probability to reproduce. If only the fitest individuals are allowed to reproduce they will take over the whole
population. So weaker individuals have to have a chance to reproduce too. The probability for this depends on the population size. For a large population it can be low, for a small population is has to be high.

Whoa -- someone else hit upon the same idea that I did!
Two years ago, whilst a Sophmore (yeah, in HS -- I'm only now a Senior) I needed a science project for biology. Having previously fallen in love with Perl, and being interested in GP, I also wrote my own GP system in Perl.
Your system is considerably different from mine -- probably mostly because your implementation isn't quite true to the definition of GP as defined by Koza. Normally, GP individuals are actual program trees, with branching constructs and multiple layers. This makes crossover harder than just string manipulation -- you have to keep track of the inherent structure of the individual.
Many people writing GP in C or Java use pointers to construct the tree. Perl being what it is, I wrote a tokenizer that tokenized the syntacticly correct Perl individuals, and munged them as strings. Not as "elegant," nor as fast, but muchly fun. ;>
Anyways, I was thinking, sooner or later, of throwing the code up on CPAN -- but later is the key word. I'm using my Perl GP implentation to do some research for the Westinghouse competition, which demands a 20-page paper. Which is due October 2nd. So I'm a little busy right now.. ;> Incidentally, the paper is about making distributed GP more efficient -- and I of course wrote my own client/server GP implentation in Perl. Gotta love threaded Perl. Anyways, I'll wander through here again once I have Free Time again, and post again -- hopefully with a little more clarity and content.

WOW...
this was really fun and interesting code...
I have a question...isn't there a patent on
Genetic Programming algorithms? I remember
hearing about this once before...but maybe
it was in relation to something else...
Anyone have any information about this...?

I know that the general idea is not new... if I remember correctly, these sort of ideas run at least back to the '60s, when they were known as "branch and bound algorithms". Thus my hunch is that while a particular implementation might well be under active patent, loads of prior art could be found for anything more general. So there's $0.02 worth of insight from someone with only $0.01 worth of knowledge on the topic. :-)

I realize that this thread may not be followed anymore, but if it does (and especially if gumpu is out there listening), it seems to me that each Individual ought to know its own fitness. In other words, FITNESS should be a field in each Individual, and FITNESSES should not be field in Population.

You are right. FITNESS should be a field in each
Individual. Fitness is something associated with an
individual. Population should only keep track of
the statistics of all the individuals in the population.
Bad design choice; probably had a case of 'premature
optimization' :)

I disagree. Fitness is a combination of the individual abilities and the constraints of the environment. If you put an individual in a different environment its fitness for that environment will change.

which I consider an improvement, not only due to the fact that there's (a little) less code, but the concept of choosing an individual and not a fitness is emphasized.

Finally, I must admit that I have a bias in relation to the idea you presented. Yes, environment does determine fitness. However, what if you're trying to evolved generalized behavior, i.e., a program that will perform well in any environment, and not simply the one's it was trained in? The little bit of work I've done with GP has been focused in the direction of trying to avoid such "over-training" or specialization.