Abstract

Many physical systems of interest to scientists and engineers can be modeled using a partial
differential equation extended along the dimensions of time and space. These equations are
typically nonlinear with real-valued parameters that control the classes of behaviors that the
model is able to produce. Unfortunately, these control parameters are often difficult to measure
in the physical system. Consequently, the first task in developing a model is usually to search
for appropriate parameter values. In a high dimensional system, this task potentially requires a
prohibitive number of evaluations and it may be impossible or inappropriate to select a unique
solution.

We have applied evolutionary algorithms (EAs) to the problem of parameter selection in models
of biologically realistic neurons. Our objective was not to find the "best" solution, but rather
we sought to produce the manifold of high fitness solutions that best accounts for biological
variability. The search space was high dimensional (> 100) and each function evaluation
required from one minute to several hours of CPU time on high performance computers. Using this
model and our goals as an example, we will: 1) review the problem from the neuroscience
perspective; 2) discuss high performance computing aspects of the problem; 3) examine the
suitability of EAs for the efficient optimization of this class of problems; and 4) describe and
justify the specific EA implementation used to solve this problem.

1. The Problem in Neuroscience

Neurons are the fundamental units responsible for the transmittal and transduction of
information in the brain. Information is transmitted between neurons via diffusable chemicals. The
chemical signals from many convergent presynaptic neurons are spatiotemporally integrated and
transduced into an electrical response by the postsynaptic neuron (Figure 1a). If the electrical
response is sufficiently large, then the postsynaptic neuron will propagate a momentary event and
release its own chemical transmitters, carrying the signal to the next set of neurons in the
network. These suprathreshold electrical events are referred to as action potentials, bursts, or
spikes.

How information is encoded in the rate and pattern of electrical signaling events in single
neurons and populations of neurons is currently the subject of hot debate among neuroscientists
(Ferster and Spruston 1995). Interpretations of the neural code have been proposed for many
systems, for example: the representation of faces in monkeys (Abbott et al 1996); the force
exerted in a voluntary motor behavior (Monster and Chan 1977); sensory information in the cricket
cercal system (Theunissen et al 1996); the representation of odors (Laurent 1996); and
cognitive encoding of direction in the motor cortex (Georgopoulos et al 1993). However, it
is fair to say that at this time, we don't really know how neurons encode information and
the answer to this question is at the heart of understanding brain function.

Spike production in neurons is governed by a class of membrane-spanning proteins known as
voltage-gated and/or calcium-dependent (VGCD) channels (Figure 1b). These channels contribute in a
nonlinear manner to the electrical response of the neuron by varying their ionic conductances in
response to the voltage and/or calcium concentration. A large variety of VGCD channels exist, and
they contribute substantially to the electrical properties of the neuron (Johnston et al
1996, Spruston et al 1994, Hille 1992). The contribution of VGCD channel subtypes to
information encoding varies adaptively, however. Intracellular signaling compounds, whose
concentrations are elevated in response to activity, functionally modulate the kinetics and
maximum conductances of VGCD channels (Perezreyes and Schneider 1994, Kallen et al 1993, Li
et al 1993, Toro and Stefani 1991, Numann et al 1991, Chetkovich et al 1991).
A significant body of work in Aplysia suggests that some forms of animal learning can be
attributed to the activity-dependent modulations of VGCD channels (Klein and Kandel 1978,
Fitzgerald et al 1990, Edmonds et al 1990). Therefore, there are many combinations
of VGCD channel dynamics that are congruent with the "normal" functioning of neurons. However,
specific classes of combinations may be required for or achieved during information processing
associated with certain cognitive or behavioral states.

Figure 1
(A) CA3 hippocampal neuron morphology digitized from an experimental preparation.
Presynaptic neurons transmit chemical signals onto receptor sites in the basal and apical
dendrites. If the spatiotemporally propagated electrical signal becomes sufficiently depolarized
in the region where the axon extends away from the soma (the axon hillock), action potentials are
generated and transmitted via the axon to the next set of neurons in the network. S, soma; BD,
basal dendrites; AD, apical dendrites; AX, axon.
(B) VGCD channels are distributed throughout the lipid bilayer membrane of the neuron.
Electrochemical conditions induce conformational changes in the three dimensional structure of the
protein channel. Some conformations are more conductive to the flow of ions by, for example,
releasing an inactivation gate blocking the aqueous pore. A region of charged animo acid residues
at the mouth of the channel serves as a filter for ions of a particular charge or size. AP,
aqueous pore; SF, selectivity filter; IG, inactivation gate.

To address the extent of VGCD channel influence on information processing, we need a
substantial amount of information about the types and numbers of VGCD channels present in a given
class of neurons. We need to know the kinetics of the channel conductances with respect to their
voltage and calcium sensitivities. We need to know their distribution along the spatial extent of
the neuron and how these distributions may change over time. Finally, we need tools to manipulate
the channel distributions, conductances and/or kinetics so that we can observe how channel
modulations affect the behavior of the entire neuron.

Substantial advances have been made over the past decade in the tools and techniques available
for the measurement and manipulation of VGCD channels and currents. These methods include single
channel recording (Sakmann and Neher 1983, Wonderlin et al 1990), voltage-sensitive dyes
(Ebner and Chen 1995), optical imaging of calcium ion concentrations (Denk et al 1990), and
localization of channels via immunohistochemistry or autoradiography (Rhodes et al 1996,
Maletic-Savatic et al 1995). However, we still don't have the ability to experimentally
identify and control VGCD channels in a precise manner. Spatial limitations prevent the
measurement of individual channels on the smaller dendritic processes. Channel localization
methods are limited by the specificity of the marker compounds and the resolution of the optical
signal (Kuhar and Unnerstall 1985). At this point in time, it is not possible to manipulate the
spatial expression of channels nor is it possible to impose a specific localized change in the
conductance of a single channel type.

As an alternative approach, biophysically realistic computational models of neurons (Segev
et al. 1989, De Schutter and Bower 1994) allow observation of the state variables, such as
voltage and calcium concentration, and control of the scaling parameters, such as the kinetics of
channels and their spatial distributions, with a precision unattainable in experimental
preparations. The past decade has seen the development of several public domain neural simulators
(De Schutter 1992). The power of modeling and ease of model implementation are persuasive
arguments for studying the influence that VGCD channels can have on the bioelectrical behaviors of
neurons within a computational framework.

The outstanding problem in the development of these models has been to determine parameter sets
that represent the spatial distributions of the VGCD channels. Models of neurons typically have
from tens to thousands of unconstrained parameters. Because these models exist in high dimensional
spaces, it is not possible to evaluate every combination of plausible parameters. For example, to
enumeratively search through each combination of 100 binary parameters {0,1}, a total of
2100, or 1030, simulations must be performed. If each simulation requires
one second to be run and evaluated, 4 * 1020 centuries would be required to complete
this evaluation [1].

We would like to know ALL the parameter sets that produce behaviors appropriate to a given
electrical signaling behavior. Such an encompassing manifold, or boundary enclosing a volume of
solutions, captures the reality of biology-like diversity. We would like to examine the
relative ranges of each parameter manifold as a measure of sensitivity of the model along each
parameter axis. Finally, we would like an efficient method of producing these solutions.

2. High Performance Computing Aspects of the Problem

Because this IMA volume and associated workshop were intended to focus on the high performance
computing (HPC) aspects of EA implementations, it seems reasonable to briefly review the
computational structure of the model. This will additionally help the reader to better understand
the source and the significance of the parameters we would like to optimize. Our model is
structurally similar to systems in other HPC fields such as fluid dynamics and geophysics. In
many ways, the process and goals of model building are shared across HPC applications. We have a
computational model of a physical process. We have many unconstrained parameters in our
description. We want the behaviors, e.g. the time series, from our simulated model to correspond
well with data collected from the physical system. We would like to interpret parameter sets that
yield good correspondences to validate the appropriateness of the model or to make predictions
about the physical system. Our approach should therefore be generalizable to a number of HPC
endeavors.

The physics of electrical behaviors in biophysically realistic neurons is described by the
cable equation (Rall 1989). The cable equation is a second order partial differential equation
(PDE) of the parabolic type. It is analytically similar to the heat or diffusion equations.
The equation describes voltage as a function of space and time.

The voltage term, V, represents the transmembrane voltage difference as a deviation from
the resting value. X and T represent the dimensions of distance and time over a
specified domain. The Method of Lines replaces the PDE problem by a mathematically
equivalent system of ordinary differential equations (ODE) by discretizing the spatial
domain. Within each local domain, or spatial compartment, kinetic equations such as those
describing VGCD conductances can be incorporated. This resulting ODE system can be
subsequently solved using numerical integration. The derivation follows.

Figure 2
(A) A length of a neuronal process is discretized into three cables that are spatially
adjacent to each other and coupled through a linear resistance term.
(B) The electric circuit diagram for three electrical compartments has an axial resistance
term coupling the compartments. Each region of membrane in this figure has two sources of local
current: membrane (passive leak) and capacitive.

The following substitutions

produce a form of the cable equation that incorporates biological parameters.

x is the distance along the axis of a membrane cylinder (cm) and t is time (ms).
λ (cm) and τ (ms) are scaling metrics for the electrical properties of the system.
Known as the length constant and the time constant, respectively, they are metrics for the
distance or time required for the voltage to attenuate to 1/e of the initial value when a
deviation from the resting value is applied. They are defined in terms of the resistance and
capacitance per unit length or area.

Rm is the specific membrane resistance (Ωcm2), Cm is the
specific membrane capacitance (μF/cm2), R1 is the axial resistance
(Ωcm), and r is the radius of the cylinder (cm).

The PDE is replaced by a vector equation of N ODEs through the approximation of
the second order spatial derivative with a central differences expression. The number of
compartments[2], N, depends on the number of discretized
elements required for the system to numerically converge. The number of adjacent elements is
determined by the specific morphological representation of a given compartment
[3]. Equation 7 describes the Laplacian for an element with
two neighboring compartments.

To clarify the notation, consider the simplifying assumption that all compartments have uniform
radius and length. Rearrangement and substitution yields:

The axial (neighbor) and membrane currents are defined by application of Ohm's Law.

Thus by substitution, the single compartment ODE can be described by three current
sources; capacitive, neighboring (axial), and membrane.

It is generally assumed that Cm and R1 are constant for all compartments
in the model, while Rm is often varied (Holmes and Rall 1992). The capacitance of the
membrane is typically estimated to be a constant value of 1.0 μF/cm2 (Hodgkin and
Rushton 1946).

The voltages representing spatially adjacent compartments in the
term couple the system
of N ODEs. This co-dependency requires that the entire system be solved simultaneously as a
vector-matrix equation. The matrix is structurally symmetric, although not necessarily numerically
symmetric when compartments of nonuniform physical dimensions are allowed. The matrix is sparse
and constant (with respect to the location of zeros) for a given discretized morphology (Hines
1984, Mascagni 1989, Eichler West and Wilcox 1996). Numerical methods optimized for sparse
matrices, such as the minimal ordering method, eliminate unnecessary operations on zero-valued
matrix elements (Press et al 1992).

is composed of several
current sources: injected, passive leak, and VGCD channel currents.

Any carrier, pore, or mechanism for transferring ions across the membrane is considered a
subset of this current type. Injected current,
, is a driving term typically
applied to the soma compartment. This term models microelectrode current injections during
biological experiments. The membrane leak current,
, stabilizes the voltage near
the resting potential.

is the conductance
(μS) and is the reversal
potential (mV) determined by the Nernst potential of the permeable ion species.
and
are typically constant-valued.
is the sum of the currents
produced by multipleVGCD channel types. Using the classic Hodgkin and Huxley model (1952) as an
example, we would include currents representing the fast sodium (INa) and potassium delayed
rectifier (IK(DR)) channels in the
expression.

The conductances of the VGCD channel subtypes are functions of voltage, calcium concentration,
and/or time. Ion channels are assumed to exist in either an open (conductive) or closed state.
Using the potassium delayed rectifier parameter as an example, n represents the normalized
fraction of the population that is in an open state, α is the rate at which the channels
open, and β is the rate at which the channels close. The rate constants
αn and βn are functions of voltage and/or calcium concentration
[4].

From this relationship, the first order kinetic equation for the open state is:

To produce the current term for a given channel subtype, n is multiplied by a conductance
scaling factor (Gn in μS) and the driving potential for the ionic species permeable
to that channel (V-En in mV). The driving potential, determined in part by the
concentration gradient of the major ion species, favors inward currents through sodium and calcium
permeable channels and outward currents through the potassium permeable channels.

The conductance scaling factors, Gn, are typically the unknown parameters in
compartmental models. As described in the next sections, EA strategies can be used to determine
appropriate values for these parameters.

Many channel types have more complicated kinetics due to the existence of multiple
subconductance states (m, eqn. 18; n, eqn. 19) and/or inactivation gates (h,
eqn. 19). For example, the Hodgkin and Huxley (1952) sodium and potassium delayed rectifier
currents are defined as:

It should be noted that the classic biophysical kinetic description of Hodgkin-Huxley is only
an approximation of the ensemble behavior of channels (Hille 1992, Keynes 1992). Further, this
model may be inappropriate to describe the behavior of some channel types (Goldman 1943).

The state parameter ODEs (eqn. 16) must be solved concurrently with the voltage ODEs
(eqn 11); however, the state parameter equations depend only on the local values of voltage and/or
calcium. Inclusion of these current sources potentially increases the problem size substantially.
For the Hodgkin-Huxley (1952) model with N compartments, 4N equations (m,h,n,V) must be solved
simultaneously. The Traub (1991) CA3 hippocampal neuron model, which we have used in the modeling
work presented in this paper, contains 11N equations. This model uses a simplified 19 compartment
representation of a neuron for a total of 209 ODEs and 114 unknown conductance scaling factors.
The De Schutter-Bower (1994) cerebellar Purkinje cell model contains many more channel types and
an experimentally-derived morphology for a total of 32,000 ODEs and 19,200 unknown conductance
scaling factors. The trend in neuroscientific modeling is to build larger models that include more
experimental detail. Thus, the need to determine appropriate areas of parameter space will only
continue to grow.

The Traub (1991) model offered many advantages for a proof-of-concept EA experiment; each
simulation required minimal computer resources and the model expressed a range of characteristic
hippocampal neuron behaviors. A depolarizing afterpotential (DAP) was produced after a stimulus
subthreshold for burst elicitation and suprathreshold bursts were followed by
afterhyperpolarizations (AHPs). The model demonstrated a transition between single spiking to
bursting regimes that corresponded well with experimentally-observed frequency-intensity responses.
Finally, a network of these neurons was able to reproduce picrotoxin-induced synchronized multiple
bursts and has been since refined as a model of gamma-frequency EEG activity (Traub et al
1996).

Traub's model reduced the morphology of a CA3 hippocampal neuron to a 19 compartment equivalent
cylinder (Rall 1962). The passive membrane properties, Rm = 10,000 Ωcm2 and Cm =
3.0 μF/cm2, yield a time constant [5] of 30 ms
consistent with experimental observations (Turner and Schwartzkroin 1980, Turner 1984). The test
stimulus was a 0.3 nA depolarizing current injection into the soma compartment because Traub
described physiologically interesting behaviors produced by this stimulus. The model defined
kinetic equations for six channel conductances within each compartment: a sodium channel, a hybrid
high threshold calcium channel, a potassium delayed rectifier channel, a potassium A channel, a
potassium AHP channel, and a calcium-dependent potassium channel.

The EA in this study searched the 114 dimensional space containing the six channel conductances
in each of the 19 compartments. Traub selected his parameters by trial-and-error. One consequence
of this approach was the predominant use of arbitrary zeros and the report of a single "best"
solution of the search rather than a manifold. We wanted to demonstrate the ability of an EA to
automatically select a volume of solutions that either included the parameter set of Traub or
demonstrated an improved correspondence to experimental observations in comparison to Traub's
parameter choice.

We defined two low dimensional (single value) metrics to characterize the parameter space for
the purposes of post-analysis and initialization of the random seed parameters. The total
conductance is the sum of all conductances over all compartments and the
excitatory/inhibitory conductance ratio (EIR) is the sum of all excitatory (inward)
conductances over all compartments divided by the sum of all inhibitory (outward) conductances
over all compartments.

EAs have not previously been applied to the field of compartmental neuronal modeling, although
reduced dimensional parameter searches using systematic (Bhalla and Bower 1993) or stochastic
(Foster et al 1993) search methods have been reported. Would these or methods other than an
EA perform better? Descriptions of many optimization methods are available both conceptually in
textbooks and algorithmically in computer executable codes obtainable from public domain sites on
the Internet. These methods include hill climbing, simulated annealing, neural networks, genetic
algorithms, local search, and heuristic search. With the large number of methods and their
variations available, how does one choose the best for one's application?

Wolpert and Macready (1995) propose that there are "No Free Lunches" (NFL) for effective
optimization. Using theoretical arguments, they claim that all search algorithms perform exactly
the same when averaged over all cost functions. In other words, the set of functions for which
algorithm A outperforms algorithm B is complementary to the set for which B outperforms A.
Furthermore, observing how well an algorithm has done so far in a search will tell us little about
how well it will continue to do. Consequently, it is a flawed strategy to choose a method by
comparing initial algorithmic performance and then to search completely based on these preliminary
results. Wolpert and Macready warn that if one does not incorporate any information about the
function into the algorithm, then success must rely on a fortuitous matching between the features
of function and the search path determined by the algorithm.

The NFL paper produced a great deal of discussion because the implication that choosing a
successful method for a given application may be akin to the luck of a coin flip stands in
contrast to empirical experience. Culberson (1996) extended the NFL theorem with a corollary
proposing that all algorithms perform the same as a random enumerative search when the method
is blind. These assertions suggest that we will not likely find a canned method to solve our
optimization problem. A hand-tuned method is required, taking into account as much information
about the function and solution space as we know.

The need to incorporate domain knowledge into a search is not newly revealed by the NFL-related
reports. Based on empirical studies, Davis (1991) suggested that the difficulty of search is not a
property of a given problem, but rather depends on the relationship between the problem and the
algorithm. Using formal language theory, Hart and Belew (1991) also conclude that analyses of
optimization methods (in particular, a GA), which do not specify the class of functions being
analyzed, can make few claims regarding the efficiency for an arbitrary function. Specifically,
they state:

"These results suggest that future analyses of the Genetic Algorithm must pay close attention
to the relationship between the algorithmic parameters of the GA and the function space from which
the fitness function is selected. Since any fixed set of algorithmic parameters cannot enable the
GA to efficiently optimize an arbitrary function in a broad class like the one we have considered,
we must consider smaller classes or distributions of functions for which those parameters are most
appropriate. Alternatively, if we have a function we wish to optimize, we must carefully select
our algorithmic parameters if we wish to optimize the function efficiently with the Genetic
Algorithm."

This statement implies that there is a specific implementation of an algorithm that will
be effective for each class [6] of functions. For an algorithm
to be effective, the biases in how search space is explored must correlate well with salient
features in the cost function. The problem is, of course, to know what is salient about the
function when the whole point of the search is that little is known! (Otherwise, why would one be
searching over it?)

To optimize the parameters for the compartmental neuronal model, one must first choose a method
that sounds reasonable, gain some intuition into how the optimization algorithm works, and attempt
to modify it with domain-based knowledge. Evolutionary Algorithms (EAs) are a popular and
successful parameter optimization technique inspired by biological principles. Armed only with
intuition and a predisposition towards this technique because of the familiarity of the biological
metaphor, our initial optimization efforts focused on understanding the applicability of EAs for
the compartmental neuron channel distribution problem.

4. Designing an EA Strategy

The specific mechanisms of selection and recombination applied to a particular problem space
determine the success of an EA. Selection identifies and retains favorable parameter combinations.
Crossover enhances the search mechanism through the breakup and recombination of parameter
combinations, maximizing a good balance between these two extremes. Thus, the representation of
the problem along the string and the type of crossover function applied are important factors in
the rate of success of a search because probability favors the breakup of longer parameter
combinations. In the following sections, we will discuss and justify our implementation with
respect to: 1) representation; 2) recombination; 3) selection; 4) initialization; and 5)
termination criteria.

Representation

The representation of a problem is the mapping between problem space and the data space
operated on by the algorithm. Three aspects of representation must be considered for an EA
application: 1) embedding information onto string values; 2) ordering values onto the linear array
of the string; and 3) choosing the cardinality of the alphabet (binary vs. real-valued). These
aspects should be examined with respect to the choice of recombination operators one intends to
apply.

We considered two embedding spaces for our problem: 1) a string of values representing each of
the individual parameters; and 2) a string of scaled functions which could be applied over the
normalized distance from the soma. We use the soma compartment as a spatial reference point in our
system because most experimental recordings are made in the soma. However, other PDE systems might
consider the utility of applying scaled functions of the normalized length of the spatial domain
along one or more dimensions.

Figure 5
Only three parameters are needed to encode channel distributions as functions: function type (ex.
linear, exponential, sinusoidal); a maximum function value; and a minimum function value. In this
example, the function was applied over a bidirectional normalization of distance. However, this
choice was arbitrary and other pertinent system metrics might be more appropriate.

For the string of individual values, two orderings appear to be natural for the problem:
parameter ordering and compartment ordering. For parameter ordering, like-parameter types are
ordered adjacently along the string. For example (using two channel conductance parameters):

Na1Na2...NaN...K(DR)1K(DR)2...K(DR)N (20)

where Na and K(DR) represent the sodium and potassium delayed rectifier conductance parameters,
and the subscripts refer to compartment number (N total compartments). In compartment ordering,
all parameters from the same compartment maintain adjacency along the string. For example:

Na1K(DR)1...Na2K(DR)2...NaNK(DR)N (21)

Such an encoding would yield a string of length MN, where M is the number of parameter
types and N is the number of compartments. Thus, this representation scales with the number
of compartments. For the Traub model, we have a string length of 114. The De Schutter-Bower model
would be searched rather inefficiently with this representation, as the string would contain a
total of 19,200 parameters.

A scaled function embedding requires assumptions about the parameter distributions that we
expect to see in nature. It is reasonable to assume that the actual distributions, with respect to
the linear distance from the soma, may be described as noisy instances of a constant, linear,
exponential, or sinusoidal function (Stuart and Sakmann 1994, Masukawa et al 1991).
Additionally, structure-specific functions may be used to account for the observations that some
channels may be present only at, for example, branch points (Catterall et al 1991). While
this approach has the advantage of incorporating domain-based observations, we are limiting the EA
to exploring combinations of these plausible distributions. It would be somehow more satisfying to
start from random conditions and have the EA converge on a set of parameters that could be fit by
scaled functions.

In this embedding space (Figure 5), each parameter distribution is defined by three string
values: a value encoding function type; a maximum function value; and a minimum function value. An
additional term could be optionally added to include a degree of noise to the function
distributions.

Such an encoding would yield a string of length 4M, where M is the number of parameter
types. This representation is independent of any assumptions regarding the number of
compartments appropriate for the model. The function would be scaled over the range determined by
the maximum and minimum values. The parameter value for each compartment would be the evaluation
of the function relative to the position of that compartment over the normalized length of the
neuron.

The ordering of parameters along the string affects the defining length, e.g., the difference
in the position along the string of those parameters that contribute most to achieving high
fitness. Strings with longer defining lengths have a higher probability of being disrupted during
crossover. Consequently, different orderings of the same problem will be differentially affected
by the same crossover operator. On one hand, it is useful for crossover to be disruptive so that a
thorough exploration of parameter combinations can take place. However, too much disruption may
destroy any high order combinations of optimal parameters that the EA has discovered, rendering
the overall search inefficient.

It is difficult to know a priori which of the three embedding descriptions would be most
effective. If we chose a parameter ordering, the first parameter type and the last parameter type
along the length of the string would be disrupted with a high probability (with respect to 1-pt
crossover analysis). If we chose a compartment ordering, compartments numbered furthest apart
would be disrupted most often. Which is more important to maintain - the longitudinal compartment
distribution of parameters or the inner-compartment ratio of parameters types to one another?
Without any empirical or theoretical basis for the decision, one approach is to develop more than
one crossover operator so that both orderings will be operated upon. Estimating the disruptive
effect of crossover on the scaled function embedding is not quite as intuitive because each
parameter value for the model is embedded in three (four) dimensions instead of one in the
string parameter space. Previous work suggests that EAs are most effective when each string
parameter can be optimized independently (Davis 1991). Consequently, a function-based search may
be difficult for the EA to optimize. Determining the pros and cons of each ordering will require
empirical exploration.

Genetic algorithms have traditionally used binary representations, primarily because of the
relative simplicity in analysis offered by low cardinality alphabets. The parameters in neuron
models are real-valued, however. Transformation of real values into a binary representation would
have required a trade-off between the precision and efficiency of search. For example, if the
resolution we desire for a given parameter is 0.1 arbitrary units, and if we assume a search range
of 0.0-100.0, then each binary representation of a parameter would require 10 bits. The 114
parameters of the Traub model would be encoded in a string length of 1140. This is a relatively
long string over which we would expect recombination to be inefficient. Increasing the number of
compartments would exacerbate this problem. The EA would spend too much time searching the least
significant digits of the string early in the search. Once the significant digits had converged,
they would no longer require searching. However, genetic drift would have caused a certain degree
of convergence on the least significant digits such that there would no longer be a random
distribution in the population to sufficiently explore that area of parameter space. Real
encodings offer many advantages over binary encodings, including the convenience and psychological
comfort that comes with the direct correspondence between the mapping spaces. Technical benefits
of real codes include avoidance of Hamming Cliffs, elimination of crossover disruption
within a parameter, and a reduced susceptibility to deceptive path traversal (Goldberg
1991).

For reasons of convenience, we decided to implement a real-valued string. To enable a more
direct comparison to Traub's parameters, we chose to represent the space as a string of 114 values.
Since we lacked sufficient information to choose between parameter ordering or compartment
ordering, we decided to develop crossover operators that transposed and exploited both
orderings.

Recombination Operators

Historically, crossover has been associated with genetic algorithms and genetic programming,
whereas mutation had been predominantly studied in evolutionary programming and evolution
strategies. Exclusive use of either operator is becoming less prevalent, especially in
applications. Theoretical and empirical studies have compared the conditions under which one
operator is more successful than the other. The results do not provide us with much guidance,
however. The main conclusion is that the success of a recombination strategy is specific to the
problem being optimized and cannot be predicted apriori.

Commonly studied and applied forms of crossover include the 2-point and uniform operators.
Early research suggested that the number of crossover points should be low to minimize disruption
(Holland 1975, De Jong 1975). In contrast, many recent applications have demonstrated superior
performance with more disruptive recombination operators (Syswerda 1989). It has been theoretically
demonstrated that 2-pt crossover [7] is the least disruptive
crossover operator. In contrast, uniform crossover [8] is the
most disruptive but is without a positional bias (Spears and De Jong 1990). The question remains,
which crossover operator should we apply to which ordering of our problem?

Mutation is both a mechanism for local search and a mechanism to prevent solution components
lost during crossover from being eliminated for all successive generations. Mutation is applied in
binary representations by flipping the value of the bit in the randomly selected position. In
real-valued strings, more sophisticated scaling or replacement methods are required. Genetic
algorithms typically apply mutation as a background operator at low rates, for example 0.1%
(Goldberg 1989). In contrast, it has been shown that a high mutation rate can be beneficial in
some applications (Bramlette 1991). In particular, a rate of 20% is recommended in some studies
(Bagchi et al 1991).

Because we were left with questions about which rate and kind of real-valued mutation would be
successful for our problem, we decided to let the EA tell us which strategy worked best. This was
implemented as a self-adaptive strategy for choosing the rate of application of all the
operators of potential interest to us. This approach has been used to select mutation rates in
evolution strategies (Bäck et al 1991, Saravanan et al 1995) and crossover
rates in genetic algorithms (Spears 1995). Further, Parsons et al (1995) report a
synergistic effect gained by retaining many types of operator, suggesting additional benefits
gained from this strategy.

We have custom-designed recombination operators [9] to
explore the real-valued parameter space potentially suitable for the compartmental neuron problem.
The set of recombination operators includes: crossover 1 (2-point, compartment ordering),
crossover 2 (2-point, parameter ordering), single parameter mutate, parameter-type scale, and
scale all (Figure 6). Two versions of the 2-point crossover operator were applied to exploit
both compartment ordering and parameter ordering. This allowed us to reduce
positional bias while maintaining low disruption. The single parameter mutation explored
solutions adjacent in a single dimension to current solutions. We assumed that the ratios of the
conductances of each of the channel types were important determinants of high fitness solutions
and subsequently developed scaling operators, scale all and parameter-type scale, to
explore this assumption.

All operators had the same initial probability of being chosen. Subsequent probabilities were
based on the success of a given operator relative to that of the others. The minimum probability
of operator selection was fixed at 1%. A success was awarded when a parameter set produced by a
given operator scored a fitness value above the average fitness of the population.

Figure 6
The crossover and mutation operators for the EA parameter search. Two parent parameter strings
(represented with red and blue borders) were randomly selected from the mating pool for crossover
operations. One parameter string was randomly selected from the mating pool for mutation
operations. a, Crossover 1 (2-pt, compartment ordering). Two points along the string are
randomly selected. The parent strings contribute alternate regions to the offspring string. b,
Crossover 2 (2-pt, parameter ordering). The procedure is the same as in (a), except the
parameters were first reordered such the same compartment parameters were adjacent. c, Single
Parameter Mutate. Two random numbers were selected. The first number determined which of the
parameters would be operated on. The second number was scaled to the range appropriate to that
parameter and replaced the original value of the selected parameter. d, Parameter-type Scale
Mutation. Two random numbers were selected. The first number determined which of the parameter
types would be operated on. The second number was a scaling factor that ranged from 0.0 to 2.0.
All conductances of the selected channel type were multiplied by the scaling factor. e, Scale
All Mutation. Random number selection (a coin flip with probability 0.5) determined whether
all values in a string were scaled by constant multiplicative factor 0.9 or 1.1.

Selection Mechanisms

Selection identifies and retains high fitness strings. Selection mechanisms are characterized
based on strength. Strong selection concentrates quickly on exploiting the best individuals,
favoring fast convergence at the expense of minimally sampling the search space. Such a strategy
could potentially lead to premature convergence on a less than optimal solution, but the answer is
obtained quickly. Weak selection maintains a highly diverse population, but at the expense of
increased search time. Very weak selection results in a random walk search. Since we want to
evolve a manifold and not just a "best" single solution, we will avoid the stronger methods.
This would potentially cost us an increased number of function evaluations, but we hoped that we
would find better solutions.

We developed the following selection mechanism. An extinction threshold, a variable fitness
criterion, was automatically adjusted to maintain a population size between 500 and 1000
population members. Population members from both current and previous generations were admitted to
the mating pool if their individual fitness values exceeded the extinction threshold. This
threshold-based "elite selection" mechanism allowed a highly fit individual to contribute to the
mating pool for multiple generations if its fitness was sufficiently high, thus preventing the
loss of good strings (Bäck and Hoffmeister 1991). Many EA implementations allow multiple
copies of high fitness strings to contribute to the mating pool in proportion to the absolute or
scaled fitness value of that string relative to the other strings (Forrest 1993, Goldberg 1989).
Our implementation limited membership in the mating pool to a single copy of each suprathreshold
string. This strategy reduced the possibility of premature convergence caused by the
overexpression of proportionately higher fitness strings.

The initial population size was 1000 parameter sets. Preliminary results suggested that this
population size provided a sufficient compromise between diversity and efficiency
[10]. The population size must be large enough to maintain a
diverse gene pool. Otherwise, the EA must primarily rely on mutations to explore the space. In
such a case, the rate of improvement is reduced to that of a random walk search. However, an
extremely large population could increase the amount of computer time needed to solve the problem
and might not improve the rate at which the problem is solved (Macready et al 1996).

Initialization

The initial parameters were randomly selected and scaled as follows: 1) 114 real-valued random
numbers were drawn between 0 and 1; 2) a random number was selected between 0.5 and 2.0 to scale
the excitatory-to-inhibitory conductance ratio; and 3) a random number between 0 and 2,500
mS/cm2 was selected to scale the total conductance. Preliminary results (data
not shown) suggested that parameter sets with values outside of these bounds did not yield
fitnesses greater than 0.1. The recombination operators allowed subsequent generations to explore
beyond these initial bounds.

Termination

Termination criteria in the EA literature are somewhat arbitrary because the statistical nature
of the search prohibits knowing when the population will evolve to a given fitness value.
Typically, EA searches are terminated after: 1) a predetermined number of generations or
individual population members have been evaluated; 2) a goal fitness (such as 1.0) has been
achieved; or, 3) the diversity of the string values in the population has been significantly
reduced. Our arbitrary termination criterion was satisfied after 200 CPU hours had elapsed on each
of eight dedicated Silicon Graphics Power Challenge R8000 90 MHz processors (approximately 125
Tera floating point operations total). At most HPC research centers, researchers are allocated a
certain number of CPU cycles on various machines during six month to one year grant periods. We
find that a CPU-based termination criteria gives us a much more informative measure of what we can
potentially solve with the CPU time we have available.

5. Designing a Fitness Measure

It was not clear how much information and what kind of information should be included in our
fitness criteria. However, it would seem to be possible to manipulate the algorithm to produce
whatever results we want. Therefore, any claims regarding the power of the EA parameter optimizing
approach would be compromised if we tweaked the fitness measure until we obtained good
performance. Therefore, we attempted to objectively design and adhere to a fitness measure
philosophy before we began the EA search. While the specific quantitative aspects of our criteria
were extracted from the biological literature, our approach to fitness measure design was intended
to be a generally applicable to any time series.

We partitioned the information into three categories: 1) mathematical requirements; 2)
etiological satisfaction; and 3) waveform decomposition. The decision to award 20% of the fitness
points to the mathematical requirements and etiological satisfaction categories and 60% to the
waveform decomposition was arbitrary. Given the variability in real physiological responses, we
distributed the assignment of scores using a gradient rather than a step function. In other words,
a range of values received the full score for satisfaction of each individual criterion and
those values that were close to the optimal range received partial credit. By coaxing the
evolution along through the assignment of partial credit, we provide the EA with more information
to guide the search given a constant number of fitness criteria.

Mathematical requirements: Time series exhibit damped, periodic, chaotic, or
random characteristics. The stimulus we applied to the model should generate approximately
periodic bursting. Therefore, the first goal was to partition the space of all solutions to
eliminate parameter spaces that yield damped or steady state time series
[11]. Credit was assigned for fulfilling the following
successive criteria. Does the solution demonstrate spiking at all? Does it spike beyond the
initial transient? Does it spike during the last 20% of the time series?

Etiological criteria: Does the parameter set produce periodic solutions for
the wrong reasons? The somatic time series should yield spiking in response to the depolarizing
current injection. Dendritic compartments with sustained, elevated voltages could provide a
source of depolarizing axial current that would inappropriately (in the absence of synaptic
currents in this model) lead to somatic spiking. Thinking like an experimental biologist, we
considered these solutions equivalent to neurons which had their dendritic membranes damaged
during experimental preparation and rejected them.

Waveform decomposition: The fitness filter is not a temporal point-by-point
curve fitter. Instead, we decomposed the evaluation of each spatiotemporal dataset into a set of
characteristics extended along multiple scales (figure 7). The largest scale feature was the
somatic burst rate. We then zoomed into the characteristics of the individual bursts to
evaluate such features as the peak-to-burst ratio, burst width, and the least squares fit to an
experimental CA3 burst waveform. At the finest scale, we evaluated the postburst
afterhyperpotential amplitude as well as quantitative aspects of individual spikes such as the
width and height. Lastly, spatial behaviors such as the relative calcium influx and peak height
were examined and scored.

Figure 7(cont): Waveform decomposition for the somatic time series. (A) Global behavior,
such as the burst rate, was evaluated and scored. Time series R and S received full credit for
satisfaction of this criteria because the burst rates, approximatley 0.8 and 1.2 Hz respectively,
are within the experimentally observed range. (B) Qualities of the individual bursts, such
as peak-to-burst ratio and burst width, were evaluated and scored. The left burst received maximal
credit for a peak-to-burst ratio and burst width within the acceptable ranges. (C) The fine
detail characteristics of the spikes within a burst, such as peak height and width and the
afterhyperpotential (AHP) voltage, were the final behaviors to be evaluated.

Finally, we specifically included observations noted by Traub into our evaluation. It does not
appear that Traub proposed a list of criteria and then optimized the parameters to meet these
criteria. Rather, it is simply reported that the model is able to replicate experimental
observations such as the following: 1) afterhyperpotentials fell 5 to 10 mV below resting
potential; 2) somatic bursts produced at approximately 1 Hz in response to the application of 0.3
nA depolarizing current in the soma; 3) apical and basal dendrites yielded spikes; 4) maximum
calcium influx observed in the midapical dendrites; and 5) spike width at half amplitude in soma
should be approximately 0.8 ms.

6. HPC and Parallelism

The simulation executed for five seconds of neuron-time so that sufficient data were available
for evaluation by the fitness filter. Each simulation required 49 seconds on a Silicon Graphics
R8000 90 MHz processor. For a model such as the Purkinje cell (De Schutter and Bower 1994), each
simulation could require hours. The evaluation of each parameter set required a significant amount
of computer time. Because each parameter set can be evaluated independently, EA applications have
a natural parallel decomposition. Our EA searches have been implemented in parallel in the
following four ways: 1) fork/exec functions; 2) rcp/rsh scripts; 3) Loadleveler/DQS application
management; and 4) MPI/PVM libraries. We used combinations of these approaches to maximize our
access to heterogeneous computing environments. This environment included a 9 processor Cray
C916/10512, a 12 processor Silicon Graphics Power Challenge (R8000 75 MHz), a 8 processor Silicon
Graphics Power Onyx (R8000 90 MHz), 8 IBM RS/6000 Model 590 workstations, and a 14 processor IBM
SP Supercomputer.

In the simplest case, parallelism can be implemented at the process level using standard ANSI C
system calls to fork/exec functions. No special software or libraries are required. Load
balancing is handled by the operating system. However, the implementation is limited to shared
memory architectures such as machines offered by Cray and Silicon Graphics. Our EA program
executed each simulation independently as a child process. Each simulation was assigned an
identification number using the putenv function. The simulator used the identification
number to open the correct parameter file and to name the output file containing the fitness
measure. The wait function monitored the completion status of the child processes. We found
this method sufficient for most of our small to medium sized applications.

To implement parallelism on distributed memory systems and/or heterogeneous environments, we
wrote shell scripts which used rcp/rsh to distribute the parameter files, execute the
simulations on remote machines, and collect the results. This method required that an executable
copy of the simulation program was installed on every machine. In the interest of security, many
system administrators disable remote commands to prevent password-free access to their machines.
While we were able to access many departmental workstations, we could not access the HPC machines
in this fashion. To maximize efficiency, we wrote load balancing routines to determine the CPU
load on each machine before submitting jobs. Unfortunatley, if other users submitted jobs after we
started a simulation, we had to wait for simulations on the loaded machines to complete. If the
wait became too long, we started another version of the delayed simulation on the fastest machine
that was currently idle. We found this method of obtaining parallelism the least effective with
respect to the machines to which we had access. However, this is a great way for researchers
without access to machine time at a HPC center to steal extra CPU cycles from idle workstations
overnight!

LoadLeveler is a sofware product developed by IBM for managing parallel applications. Versions
of this product are available for IBM RS/6000 architectures, Sun, HP and Silicon Graphics
workstations. This is reliable and easy to use software featuring machine specific configurations,
load balancing, checkpointing, and a graphical interface for developing applications and
monitoring the status of jobs currently executing. It schedules either serial or parallel
(including MPI or PVM) jobs written as LoadLeveler or NQS scripts. The good news and bad news is
that this is a commercial product; while it is technically well-supported by IBM, it is not free.
As a low budget alternative, there are several shareware products available via anonymous
ftp that claim similar functionality and features. We did not evaluate the claims of the shareware
products, but we found LoadLeveler to be a useful and stable tool.

MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) are libraries of routines
for passing messages between processors that potentially represent a range of architectures. These
libraries have quickly become the standard for the development of parallel codes. The libraries
are available via anonymous ftp, but must be installed on every machine participating in the
virtual network. They offer a wide variety of communication classes including gather,
scatter, and all-to-all using communication modes such as standard,
synchronous, ready, or buffered in blocking and non-blocking
forms (Snir et al 1995, Gest et al 1994). These libraries have limited error
handling routines, do not support dynamic task spawning, and require the user to incorporate load
balancing algorithms. However, the widespread interest in these libraries by developers and the
adoption of these libraries by multiple vendors have resulted in a number of third party
applications now available to manage CPU resources when the libraries themselves are
insufficient.

For many applications, parallelism implemented with file transfers is inefficient compared with
the message passing model. This is because the communication time required for the data transfer
is large relative to the computation time performed by each distributed CPU (Kumar et al
1994). Most parallel architectures, (ex.. Thinking Machines Corporation's CM-5, Cray's T3E) have
dedicated hardware and parallel versions of standard programming languages (ex. Fortran 90, C*) to
reduce communication overhead, in addition to their support of MPI/PVM. Our programs yielded
approximately linear speedup [12] with both the message
passing and file transfer implementations because of the relatively large amount of CPU time
required per simulation. However, it has been our observation from the EA literature that most
fitness functions do not require as much CPU time as we needed. For maximum portability, longevity,
and efficiency, we recommend parallelization using a message passing library such as MPI/PVM in
conjunction with third party CPU management software. However, depending on the application and
the available architectures, process/file-based parallelism is easy to implement and may be just
as efficient as other methods.

7. A Success Story

We judged our application to be a success by two criteria. First, the average fitness of the
population quickly exceeded the fitness score achieved by Traub's parameter set. Second, the high
fitness parameter sets produced time series that resembled those observed experimentally. Similar
results were obtained in three separate experiments, each using different random initial
conditions. As a bonus, we identified manifolds for our low dimensional descriptors (the total
conductance and the excitatory/inhibitory conductance ratio) of the evolved high dimensional
parameter space which will allow us to reduce the parameter space in future refining searches.
(See Eichler West 1996 for additional results.)

The EA evolved a population of high fitness parameter strings from random initial conditions.
The initial generation yielded an average fitness of 0.04. This was consistent with our
observations that parameter sets composed of random numbers showed a 96% probability of scoring a
fitness less than 0.20 (n = 11,600). The EA improved the overall fitness of the population,
rapidly at first but more slowly at higher fitness. The average fitness per generation exceeded
the fitness of Traub's parameter set (0.85) by generation 62. The best fitness overall (0.92) was
obtained in generation 101.

Figure 8
The average fitness per generation demonstrates rapid improvement. Because the population size is
dynamic, the x-axis is only approximate.

The solution with the highest fitness demonstrates fast sodium-dependent bursting in the soma,
low amplitude spiking in the basal dendrites, and slow calcium-dependent responses in the apical
dendrites. Neither the EA-produced parameter sets nor Traub's parameter set satisfied all the
fitness criteria, suggesting that additional channel kinetic equations may be necessary to achieve
a higher correspondence to experimental results. A quantitative description is available elsewhere
(Eichler West 1996).

Figure 9
The solution with highest fitness yield spatiotemporal behaviors consistent with those observed
experimentally. The y-axis range is -85 mV to +40 mV. The length of the x-axis is approximately 80
ms.

We examined the manifolds for the total conductance and the excitatory/inhibitory conductance
ratio. The total conductances of high fitness solutions (> 0.9) lay between 442 and 652
mS/cm2 (= 513, s.d.
= 34, n = 47,484), consistent with the total conductance of 527 mS/cm2 proposed by
Traub. The excitatory/inhibitory conductance ratio (EIR) for high fitness solutions (> 0.9)
occurred in the range 0.78 to 1.31
( = 0.96, s.d. = 0.12, n =
47,484), consistent with the EIR of 0.97 proposed by Traub. Many low fitness solutions also fell
within the ranges of these solutions. Thus, while selecting a parameter set within these ranges
does not guarantee a high fitness solution, selection of any parameter set outside of this range
yielded low fitness solutions.

Figure 10
Solution manifolds for the low dimensional metrics as a function of fitness. a, Total conductance
manifold as a function of fitness score for the search space explored by the EA. b, The
excitatory/inhibitory conductance ratio manifold as a function of fitness score. Note that the
range with respect to the y-axis decreases with increasing fitness.

8. Conclusions

This study has applied a new method for parameter optimization that promises to dramatically
improve the robustness of neuronal simulations. Further, the automation aspects of the search
process promises to improve the efficiency of model development. By robustness, we mean that the
manifold of high fitness solutions produced by the EA application demonstrates that the model
produces appropriate behaviors over a range of parameter values, reflecting a variability
analogous to that observed in nature. The results of our proof-of-concept experiment yielded
improved correspondence between simulated and experimental behaviors. The method should find
general applicability in a broad range of HPC simulation endeavors outside of neurophysiological
modeling.

The manifold approach suggests methods of analysis that would not have been possible with
previous approaches to parameter fitting.

1. The EA-generated manifold can be used to determine an area of parameter space to
examine in future refining searches. While we carefully designed our recombination operators
to explore outside of the boundaries of our initial conditions, high fitness solutions confined
themselves to a more restricted area of parameter space.

2. The interpretation and significance of the individual channel parameter manifolds require
a much greater description of the biological system and are thus not within the scope of this
chapter [13]. However, all the channel distribution
manifolds resembled noisy instances of simple functions: linear (potassium A,
calcium-dependent potassium); exponential (sodium, potassium delayed rectifier); or, sinusoidal
(calcium, potassium AHP). This encouraged us to re-evaluate the potential of the function
representation described in section 4.

3. The manifolds were subsequently analyzed to reveal covariance relationships between
parameters, thereby providing an indication of the sensitivity of parameters with respect to
modulations in the other parameters. The manifolds identified varying degrees of parameter
sensitivity for the potassium A and the calcium-dependent potassium channels and a strong
positive covariance between the sodium and potassium delayed rectifier channel distributions.
The complementary colocalization of the calcium and potassium AHP distributions is suggestive of
an experimentally testable role for calcium-dependent conductances in the control of
spatiotemporal plasticity (Eichler West 1996).

One future goal made evident by these preliminary studies is the need to reduce the number of
function evaluations required to sufficiently search parameter space by applying better EA
selection strategies. The results in figure 8 suggest that a reexamination of our termination
criteria might be the first place to start. The search strategy we used required access to
significant amounts of CPU time on supercomputer-class machines. We believe that our EA-based
approach to neuronal modeling offers a powerful new set of methods and interpretive paradigms, but
it will not be generally useful to all neural modelers until it is implementable by those who do
not have access to high performance computers.

How do we know when we are including too much or too little information to the fitness measure?
The concern is that additional criteria will increase the computational time required to evaluate
the response of each simulation with little or no gain for parameter manifold refinement. One
approach may be to divide criteria into "training and testing" sets. The set of high fitness
solutions created from the "training" criteria can be evaluated for criteria for which there is no
explicit selection (the "testing set"). Those testing criteria which fail must necessarily become
members of the new training set.

We are first and foremost neuroscientists with collective intuition into our system gained by
reviewing the experimental literature, performing experiments in the laboratory, and simulating
models of neurons. In this first attempt at applying an EA, we tried to incorporate much
domain-based intuition into the algorithm design. The application was quite successful. However,
with respect to the NFL theorem, we are still left with two nagging questions. Would another
method perform better, given a similar attentiveness to the incorporation of domain-based
information? Did we get lucky? Answers to these questions will require significantly more
empirical research.

Acknowledgements

This work was supported by grants from Cray Research/Minnesota Supercomputer Institute and NIMH
R01-MH52903. The authors gratefully acknowledge generous access to supercomputing facilities
provided by the Minnesota Supercomputer Institute, the IBM Shared University Research Project, and
the Laboratory for Computational Sciences and Engineering at the University of Minnesota. RMEW
would like to thank the IMA and program committee for their invitation to present this work. RMEW
would also like to thank Ihab Awad and Joe Haberman for their technical support with
parallelization issues, and Jon Gottesman and David Yuen for thought-provoking discussions.

Klein, M., and Kandel, E. R. Presynaptic modulation of voltage-dependent Ca2+ current:
mechanism for behavioral sensitization in Aplysia californica. Proceedings of the National
Academy of Sciences of the USA75: 3512-3516, 1978.