Evolving Core Warriors
Dave Hillis
Class Project for Evolutionary Computation, 1998
Introduction
------------
This paper describes the results of a project that applied genetic programming
(GP) techniques in an attempt to generate core wars programs (called warriors) that
would be competitive with those generated by experienced human programmers. Core
wars is an attractive application; the fitness of a program involves complicated interactions
with multiple competing agents and there is a public domain simulator and debugger
(pmars) and a large number of published hand-written programs to compare against.
Although the project did not produce a warrior strong enough to challenge current
tournament winners, it did produce warriors whose strength and sophistication surpassed
those previously reported in the literature for evolved warriors.
Core Wars
---------
Core wars is a game in which 2 virus-like computer programs battle each other for
control of a virtual core memory. The game first became widely known in the 1980's with
the publication of 2 articles by Dewdney[1]. The programs are written in "redcode" which
is similar to an assembly language.
In a battle, a simulator program loads 2 warriors into random locations in memory.
The remainder of the memory contains "DAT" instructions - any process that tries to
execute a "DAT" instruction is terminated. The simulator alternates virtual machine cycles
between the warriors. Each warrior can read or write into any memory location; allowing
it to copy parts of itself and to modify it's own code or its opponent's. A warrior may
spawn several independent processes but does not increase its share of the machine cycles
by doing so. A warrior loses when all of its processes have terminated. If both warriors
survive past a fixed number of cycles (usually 80,000) a tie is declared.
Players compete by writing warriors and entering them into tournaments. In a
tournament, each warrior battles all the others (usually 200 times each with random
starting locations). Each tie earns 1 point, each win earns 3, and each loss earns nothing.
The warrior's score is the total over all of its battles.
There are a number of king of the hill tournaments that run continuously via email
submission. In a king of the hill tournament, 25 or so warriors occupy a virtual hill and
other warriors try to displace them. Each warrior on the hill has a hill-score consisting of
its average score against all of the warriors (including itself) on the hill. A warrior
challenges the hill by playing against all of the warriors on the hill (200 times each) thus
obtaining its own hill-score. If the challenging warrior has a higher hill-score than the
lowest scored member of the hill, it replaces that warrior on the hill. (The new warrior on
the hill need not necessarily have won any of its battles with the displaced warrior: only
the average score matters.)
Core wars players tend to group strategies into 3 broad categories. Stone
strategies involve blindly changing (or "bombing") as much memory as possible, usually by
copying DAT commands into different addresses in hopes of disrupting the other warrior's
program. Paper strategies try to fill up the memory with independently running, self
replicating, copies of themselves. Scissors strategies try to find where the other program is
and then destroy it.
As the names imply: scissors tend to beat papers which tend to beat stones, which
tend to beat scissors. A warrior may use one or a combination of strategies, but a
tournament can always be expected to include a mix of strategies. A warrior must do
reasonably well against all of these strategies to do well in a tournament.
Beginning core wars players commonly test their programs against the Wilkies
benchmark: a group of 12 warriors. The benchmark score is the average obtained by
competing a warrior 100 times against each of the 12 warriors and dividing by 12. A
warrior that won all of its battles would score 300. A warrior that won exactly half of its
battles and lost half of its battles would score 150. A warrior that tied in every battle
would score 100. A "decent" human written warrior should score well over 100.
A full description of core wars is beyond the scope of this paper. The KOTH
website [2] contains redcode reference manuals, tutorials, free software, access to on-line
tournaments, and many related links.
Core Wars and Evolutionary Computation
--------------------------------------
Previous Work
-------------
Core wars has inspired a number of Alife studies involving evolutionary
computation. Some like TIERRA[3] and Venus[4] involve a core wars-like environment
in which programs compete in a virtual memory filled with "food" and other resources.
Others, which I will call pure core wars studies, use the standard core wars rules: allowing
their results to be directly compared to a large body of human generated programs. (Such
a comparison is not necessarily fair since these were intended to be Alife experiments.) To
date, I have been unable to find a published paper on evolved pure core warriors. The
following examples come from unpublished papers posted on the world wide web.
Perry[5] describes a study in which he used a GA with mutation and multi-point
crossover and a binary representation. He experimented both with an exogenous fitness
function (competing the individuals against a suite of test warriors) and an endogenous
one (competing the individuals against each other). Although the compute power he had
available severely limited the experiments performed, he did a good job of describing the
attractiveness of the problem and of the technical hurdles involved. His paper has a link
from the KOTH web site and the other pure core wars studies (below) all cite it.
Boer[6] describes a study in which he used an evolutionary strategy and an
endogenous fitness function to evolve warriors. His program, ga_war.c, uses the redcode
source files as the genetic representation, and a mutation operator, but no crossover. The
file based approach is slow but allows one to use the publicly available "pmars"
simulator[2] for the objective function. He makes the source code for ga_war.c freely
available.
ga_war.c, forms an initial population by randomly generating one line warriors,
testing them against a particular (very weak) warrior and only accepting those that win or
tie against it. Once an initial population is formed, 2 warriors are selected at random and
conduct some number of battles (nominally 5) against each other. If 1 warrior outscores
the other, the looser is replaced by an exact copy of the winner. Otherwise both warriors
are replaced by mutated copies of themselves. The program generates new warriors in this
way for some fixed number of iterations.
Mutation can add lines, delete lines, or modify any fields in any lines. If a mutated
warrior cannot win or tie against the weak fixed warrior, the program tries again until an
acceptable warrior is generated. A warrior that has been replaced may, depending on how
many battles it has won before, be sent to "Valhalla" from whence it may return to the
population in some later generation.
Coleman[7] uses Boer's program and obtains similar results. Newton[8] uses
Boer's program, a program called "Sys 4" written by George Lebl, and a program of his
own called "RedMaker.
Each of the pure core wars studies described above included some comparison
with human derived warriors. None of them produced a warrior that was competitive with
good human written ones or reported a Wilkies benchmark score over 85.
Technical Challenges
--------------------
The most serious challenge is that the quality of human written programs is high.
Many people have been competing against each other for years and the resulting programs
are highly optimized.
Fitness function
----------------
A pure core wars fitness function makes the problem a reinforcement learning one.
In the pure core wars studies, the fitness function is based entirely on the number of wins,
loses and ties of the individual over some number of battles. There is no placing of "food"
in memory or providing additional mating opportunities to programs that split as ways to
lead the population into desired behaviors like self-replication and exploration.
Placing the warriors at random starting positions for each battle adds a strong
stochastic element to the outcome of a battle. This is a problem in that many (very time
consuming) battles between 2 individual warriors are needed to measure how well they do
against each other. It is also a blessing in that a relatively weak warrior will still win
against a stronger opponent in the battles where its starting position is sufficiently
favorable. This stochastic element smoothes out the fitness function. It is much more
useful to be able to say that warrior A wins 10% of its battles with warrior B than to say
only that it is weaker.
For a tournament, 100 to 200 battles are fought between individual warriors. For
most of the studies reviewed, only 5 or 10 battles were used - relying on the GA's ability
to work with noisy evaluation functions.
For an endogenous fitness function, the fitness is based on battles against other
warriors in the population. But to be competitive against human coded warriors a warrior
must be able compete against a range of strategies. Unless the population contains a mix
of strategies (scissors, paper, stone), the individuals need not evolve to deal with a mix of
strategies. None of the pure core wars studies reviewed reported the generation of paper
or scissors strategies.
An alternative is to use an exogenous fitness function by training the warriors
against a static (unchanging) hill of human coded warriors. In practice, it is difficult to
select a hill that is just hard enough and there is always the danger that the population will
not evolve generally useful strategies.
Evolvability
------------
The search space of solutions is large. For a single line in a warrior, there are 17
possible command types each of which has 7 possible modifiers and 2 fields. Each field
contains a number between 0 and 7999 as well as its own addressing mode. There are 8
different addressing modes. There are (17*7*8*8000*8*8000) unique possible lines. A
warrior may contain up to 100 lines.
In the pure core wars studies, the population always converged toward stone-type
solutions that were very short (not counting introns), efficient, and ultimately not very
good. Since a core Wars battle is a race to kill before you are killed, any change that slows
a warrior down is strongly selected against. Any independent process that a warrior splits
off will be parasitic (wasting precious cycles) unless it is immediately useful. Evolution
pushes toward warriors with short functioning code lengths both to increase efficiency and
in response to disruption from crossover or mutation. The operators themselves also resist
the development of complex programs.
One way to allow a GP to escape this evolutionary dead end may be to force it to
evolve warriors with multiple semi-independent parts. A suitable crossover operator may
be able to encourage more complex individuals that may ultimately be of better quality.
Description of the Approach
---------------------------
In this work, I used ga_war.c [6] as a starting point and experimented with some
modifications to try to achieve better performance. Specifically, I replaced the evolution
strategy with a steady state GA using tournament selection, I imposed a block based
organization on the individuals, added 2 crossover operators, and added an endogenous
fitness function based on the king of the hill concept.
Figure 1 shows a typical individual. The representation is nearly the same as that of
Boer: an individual is represented as an ascii file, complete with header, and in a format
that the pmars simulator can work with. One major difference is that each individual is of
fixed length (5 blocks of 10 lines each) plus 4 "spl" commands that force it to start by
splitting new processes off to 4 locations in the program. These locations correspond to
the start points of the second, third, fourth, and fifth blocks in the program. The result is
that every warrior will, at least briefly, have 5 independent processes running on different
parts of the program. (In practice, this is a mild constraint as some of the processes
quickly die.) The rational for using the block based organization was that it could reduce
brittleness by forcing some redundancy and encouraging modularity.
The block based crossover operator selected a cut point on a randomly chosen
block boundary, placed the second "half" of the first parent into the first half of the child,
then placed the first half of the second parent into the second half of the child. The child
then underwent a mutation operation where every command, modifier, address mode and
field had a 2% chance of a random change. The idea was to produce minimal disruption
within each block while producing significant changes in relative block location. The
program also included an ordinary 1-point crossover operator with a completely random
cut point.
Other departures from Boer's work were that warriors were not sent to or from
Valhalla (reintroduced to the population) and new children created by crossover and
mutation were not screened against the weak warrior prior to acceptance into the
population. (I had originally intended to use the screening to test the effects of lethality of
the crossover operators, but, as the observed lethality rates were low, I did not pursue this
issue.)
At the end of each generation, the program takes the individual with the highest
fitness and calculates its Wilkies benchmark score. This computationally expensive
diagnostic gives some measure of the population's strength independent of its fitness
metric.
Results
-------
My original intention was to conduct a quick pilot study to establish a reasonable
level of performance for the program and then to conduct a parametric study to evaluate
the usefulness of the block based approach described above. Not surprisingly, perhaps, the
pilot study expanded to take up the whole time for the project leaving no time for the
parametric study. Although I provide quantitative data below, there is not a sufficient
statistical basis to establish which techniques were helpful and which were not.
It can take a population of 50 individuals close to an hour of CPU time on a
SPARC ULTRA-2 to process 1 generation. (The run-time varies: battles that result in a tie
require 80,000 cycles in the pmars simulator while battles that don't are shorter.)
Static Exogenous Hill
---------------------
The warriors with the highest Wilkies benchmark scores evolved in populations
evaluated against human coded warriors. The best of these, shown in figure 2, had a
Wilkies benchmark score of 94 which is significantly higher than the highest score that I
found in the literature for an evolved warrior. (This figure was calculated as the average of
10 benchmark measurements and is fairly precise.)
This warrior combined a stone strategy with a strategy of blindly splitting off new
processes to different locations in memory. When these new processes landed on another
warrior's instructions, the new process would execute its opponent's code, making it
harder to kill. The process splitting strategy worked best against papers, which spread
their code throughout memory, producing more ties.
Dynamic Endogenous Hill
-----------------------
The most interesting results came from populations that evolved with fitness
functions based on a dynamic endogenous hill. The program started by creating a random
population of warriors and a hill with a fixed number (nominally 12) of randomly formed
warriors. Each warrior on the hill competed against the whole hill (100 battles each) to
receive a hill-score. Each individual in the population competed against all of the warriors
on the hill (10 battles each for a quicker but noisy score) and received a fitness value based
on its hill-score. At the end of each generation, the best individual in the population
challenged the hill (100 battles each for a more accurate measure). If the challenger's hill-
score was higher than that of the lowest scored individual on the hill, a copy of the
challenger replaced the loser on the hill.
Using a dynamic endogenous hill is not qualitatively much different from simply
competing each warrior against a random subset of the warriors in the population .(Boer's
program uses a random subset of size 1.) It does have the advantages of smoothing out
the changes to the fitness landscape (only one warrior on the hill can change per
generation), it is biased toward stronger warriors than average, it makes it easier to
identify the best and worst individuals in the population, and examining the interactions
between warriors on the hill provides insight into the dynamics of the population.
Interestingly, the challenging warriors almost always made it onto the hill. In fact,
the new warrior on the hill frequently had the highest hill-score. This phenomena did not
appear to be the result of noisy evaluations but rather the absence of any Nash equilibria.
Turn-over on the hill was also high. When a warrior made it onto the hill, the GP started
selecting for the ability to defeat it. Individuals that out matched it were rewarded and
soon some of them made it onto the hill. Its hill-score dropped and eventually it was
replaced.
Figure 3 shows the Wilkies benchmark scores of the best individuals for a test
using a population size of 50 a dynamic endogenous hill of size 8 and probabilities of using
block based and ordinary crossover of 95% and 5% respectively. The population evolved
for several hundred generations, but only the last 500 or so are shown. This experiment
ran for many days and was subjected to occasional tinkering such as killing off the hill and
introducing individuals that had evolved against different fitness functions.
Figure 4 shows the Wilkies benchmark scores of the best individuals for a test
using a population size of 50 a dynamic endogenous hill of size 12 and probabilities of
using block based and ordinary crossover of 95% and 5% respectively, with no tinkering.
The population rose for the first 170 generations then declined to very low strength.
Figure 1 shows a warrior that occurred near the peak of this run. It is quite
significant in that it has a genuine paper strategy, it makes copies of itself and those copies
make copies in turn. It also has a stone function as most human written papers do.
Although self-replicating programs have been evolved before [3,4], this may be the first
time that it has been done using a pure core wars strategy.
For a time, the population held a mixture of paper and stone strategies but then it
declined into a group of low scoring, uncharismatic stones. In a related test, the paper
warrior from figure 1 was reintroduced into the population and quickly took over the hill
and the population. This suggests that the decline could have been averted had the
population been protected from forgetting previous good solutions. The Valhalla function
that Boer used may be a practical way of achieving this.
Figure 5 shows the Wilkies benchmark scores of the best individuals for a test
using a population size of 50 a dynamic endogenous hill of size 12 and probabilities of
using block based and ordinary crossover of 0% and 100% respectively, with no tinkering.
Many more generations and many more runs will be needed to determine whether the
block based crossover operator is more useful than ordinary crossover.
Future Work
-----------
This is one of the most addictive technical problems that I have encountered. The
next step is to add a Valhalla function that will periodically reintroduce old hill members
back into the population.
Conclusion
----------
The ultimate measure of success for evolved core warriors will come when they
start to win tournaments against human coded programs. This work fell far short of that
goal but did show real progress toward it by evolving warriors that showed more
sophisticated behavior and higher benchmark scores than those that I was able to find in
the literature.
The fact that a paper strategy evolved in less than 200 generations is encouraging
evidence that competitive core warriors may indeed evolve using purely endogenous
fitness functions.
References
----------
1. Dewdney, A.K. (1984) "Computer Recreations: In the Game called Core War
Hostile Programs Engage in a Battle of Bits", Sci. Amer. 250(5), 14-22.
2. http://www.koth.org/
3. T. Ray, "Tierra.doc. Documentation for the Tierra Simulator V4.0, 9-9-92".
anonomous ftp: tierra.slhs.udel.edu
4. "Emergent Computation. Self-Organizing, Collective, and Cooperative
Phenomena in Natural and Artificial Computing Networks", [FORREST90]:, Cambridge,
MA: MIT Press. (Special issue of Physica D.)
5. John Perry, "Core Wars Genetics: The Evolution of Predation",
http://www.KOTH.org/evolving_warriors.html
6. Jason Boer, http://www.avalon.net/~jboer/projects/corewar/corewar.html
7. Ryan Coleman, "Learning By Simulating Evolution Using Corewars",
http://www.geocities.com/CapeCanaveral/Launchpad/6898/evolving.html
8. Terry Newton, "Evolved Warriors",
http://www.nc5.infi.net/~wtnewton/corewar/evol/index.html
---------------------------- figure 1 ------------------------------------------
>From - Wed Dec 09 22:16:08 1998
;redcode-94
;assert CORESIZE == 8000
;name mypaper.red
;strat 12/09/98. Paper with core clear. Generated using a GA with a 12
;strat warrior endogenous hill. (Wilkies score of 57).
org S;
S;
spl 14;
spl 23;
spl 32;
spl 41;
add.f $ 7034, } -49
mov.i { 6405, } 0
mod.b { 6561, { 1
dat.ab $ 2367, > 31
seq.f } 35, > -18
mov.f < 665, > 2127
dat.b } 7, @ -14
djn.ba } 28, # 5382
djn.i } 2904, { -48
slt.i # 433, > -16
add.b $ 25, } 35
mov.i { 7003, } 40
mod.a # 6561, { 1
dat.ab $ 2367, > 2431
div.f } 35, > -27
mov.f } -42, $ 5432
dat.b @ 3534, } -14
djn.b $ 28, # 5382
djn.i { 6290, { -48
slt.i # 433, < -45
mul.b < 3509, { 59
seq.i * 4122, @ 38
sub.f > 23, } 23
spl.ab * 2509, < 2134
jmn.x * 27, { -36
spl.b # 0, # 24
mov.i } -33, } -8
djn.f > 6006, { 3287
djn.i $ 3528, $ 41
sub.ab $ -21, > -10
mul.a * 25, < -20
mov.ab $ 5, } 47
jmz.ab > 4941, $ 4584
mul.b { 2367, > 5
jmp.i } -11, > -27
jmz.b $ 48, } 7475
dat.b @ 20, > -14
div.i @ 28, # 359
jmn.x { 3553, { 7109
slt.i } 5169, < 49
dat.a # -9, > 4138
sub.ba @ 14, < 5845
sne.a # 7555, $ 24
jmp.a } -11, } -2
seq.ba $ 1110, @ -40
add.x } -17, } 3803
div.ab $ 46, > 48
seq.ab } 5, $ 21
nop.ba < -19, > 2860
mov.i # 511, $ 5405
end
;0
------------------------------------------ figure 1 ------------------------------------------
------------------------------------------------- figure 2
;redcode-b verbose
;assert CORESIZE == 8000
;name HAL_00.red
;author Dave Hillis
;strat submitted @date@ Genetic Programming
...
...
spl 8;
spl 12;
spl 16;
jmn.b >A7 ,A16
djn.ab #A16,A12,{A15
sub.ab *A12,>A16
spl.b *A13,A13,$A4
djn.b