Like both
ES
and
GAs,EP
is a useful method of optimization when other techniques such as gradient descent or direct,
analytical discovery are not possible. Combinatoric and real-valued
FUNCTION OPTIMIZATION
in which the optimization surface or
FITNESS
landscape is "rugged", possessing many locally optimal solutions, are well suited for evolutionary
programming.

History

The 1966 book, "Artificial Intelligence Through Simulated Evolution" by Fogel, Owens and Walsh
is the landmark publication for EP applications, although many other papers appear earlier in the
literature. In the book, finite state automata were evolved to predict symbol strings generated
from Markov processes and non-stationary time series. Such evolutionary prediction was motivated by
a recognition that prediction is a keystone to intelligent behavior (defined in terms of adaptive
behavior, in that the intelligent organism must anticipate events in order to adapt behavior in
light of a goal).

In 1992, the First Annual Conference on evolutionary programming was held in La Jolla, CA.
Further conferences have been held annually (See Q12). The conferences attract a diverse group of
academic, commercial and military researchers engaged in both developing the theory of the EP
technique and in applying EP to a wide range of optimization problems, both in engineering and
biology.

Rather than list and analyze the sources in detail, several fundamental sources are listed below
which should serve as good pointers to the entire body of work in the field.

The Process

For EP, like GAs, there is an underlying assumption that a fitness landscape can be
characterized in terms of variables, and that there is an optimum solution (or multiple such optima)
in terms of those variables. For example, if one were trying to find the shortest path in a
Traveling Salesman Problem, each solution would be a path. The length of the path could be
expressed as a number, which would serve as the solution's fitness. The fitness landscape for this
problem could be characterized as a hypersurface proportional to the path lengths in a space of
possible paths. The goal would be to find the globally shortest path in that space, or more
practically, to find very short tours very quickly.

The basic EP method involves 3 steps (Repeat until a threshold for iteration is exceeded or an
adequate solution is obtained):

(1) Choose an initial
POPULATION
of trial solutions at random. The number of solutions in a population is highly relevant to the
speed of optimization, but no definite answers are available as to how many solutions are
appropriate (other than >1) and how many solutions are just wasteful.

(2) Each solution is replicated into a new population. Each of these offspring solutions are
mutated according to a distribution of
MUTATION
types, ranging from minor to extreme with a continuum of mutation types between. The severity of
mutation is judged on the basis of the functional change imposed on the parents.

(3) Each offspring solution is assessed by computing its fitness. Typically, a stochastic
tournament is held to determine N solutions to be retained for the population of solutions, although
this is occasionally performed deterministically. There is no requirement that the population size
be held constant, however, nor that only a single offspring be generated from each parent.

It should be pointed out that EP typically does not use any
CROSSOVER
as a genetic operator.

EP and GAs

There are two important ways in which EP differs from GAs.

First, there is no constraint on the representation. The typical GA approach involves encoding
the problem solutions as a string of representative tokens, the
GENOME.
In EP, the representation follows from the problem. A neural network can be represented in the same
manner as it is implemented, for example, because the mutation operation does not demand a linear
encoding. (In this case, for a fixed topology, real- valued weights could be coded directly as
their real values and mutation operates by perturbing a weight vector with a zero mean multivariate
Gaussian perturbation. For variable topologies, the architecture is also perturbed, often using
Poisson distributed additions and deletions.)

Second, the mutation operation simply changes aspects of the solution according to a statistical
distribution which weights minor variations in the behavior of the offspring as highly probable and
substantial variations as increasingly unlikely. Further, the severity of mutations is often
reduced as the global optimum is approached. There is a certain tautology here: if the global
optimum is not already known, how can the spread of the mutation operation be damped as the
solutions approach it? Several techniques have been proposed and implemented which address this
difficulty, the most widely studied being the "Meta-Evolutionary" technique in which the variance of
the mutation distribution is subject to mutation by a fixed variance mutation operator and evolves
along with the solution.

EP and ES

The first communication between the evolutionary programming and
EVOLUTION STRATEGY
groups occurred in early 1992, just prior to the first annual EP conference. Despite their
independent development over 30 years, they share many similarities. When implemented to solve
real-valued function optimization problems, both typically operate on the real values themselves
(rather than any coding of the real values as is often done in GAs). Multivariate zero mean Gaussian
mutations are applied to each parent in a population and a
SELECTION
mechanism is applied to determine which solutions to remove (i.e., "cull") from the population. The
similarities extend to the use of self-adaptive methods for determining the appropriate mutations to
use -- methods in which each parent carries not only a potential solution to the problem at hand,
but also information on how it will distribute new trials (offspring). Most of the theoretical
results on convergence (both asymptotic and velocity) developed for ES or EP also apply directly to
the other.

The main differences between ES and EP are:

1. Selection: EP typically uses stochastic selection via a tournament. Each trial solution in the
population faces competition against a preselected number of opponents and receives a "win" if it is
at least as good as its opponent in each encounter. Selection then eliminates those solutions with
the least wins. In contrast, ES typically uses deterministic selection in which the worst solutions
are purged from the population based directly on their function evaluation.

2.
RECOMBINATION:
EP is an abstraction of
EVOLUTION
at the level of reproductive populations (i.e.,
SPECIES)
and thus no recombination mechanisms are typically used because recombination does not occur between
species (by definition: see Mayr's biological species concept). In contrast, ES is an abstraction
of evolution at the level of
INDIVIDUAL
behavior. When self-adaptive information is incorporated this is purely genetic information (as
opposed to phenotypic) and thus some forms of recombination are reasonable and many forms of
recombination have been implemented within ES. Again, the effectiveness of such operators depends
on the problem at hand.

References

Some references which provide an excellent introduction (by no means extensive) to the field,
include: