The following is a summary of Warren Ewens arguments regarding the cost
of natural selection from his book "Mathematical Population Genetics" (Ewens
1979). I have made a strong effort to summarize Ewens' work here, and while
I hope to improve this page in the future, you can currently get a far
clearer explanation for his work in his original papers. These are fully
referenced in the bibliography.

Ewens summarized his arguments regarding the cost of natural selection
in his book "Mathematical Population Genetics." He first addresses the
substitution load (which he, along with Kimura, identifies as being sometimes
called the "evolutionary load" or the "cost of natural selection" (Ewens,
1979, pg. 68) in section 2.10 Genetic Loads). He shows that when
h = 0.5 (the coefficient of dominance), if the starting frequency of an
allele A1 having fitness coefficient s is x, the mean fitness
of the population will be 1 + sx and l (the load or cost contribution
of selection for a single generation will be approximately s(1 - x). Hence,
the load for the entire substitution process (for a single substitution)
will be L = Ss(1 - x) for each generation over
the course of the substitution. Note that this is identical to Haldane's
formula for the cost of natural selection. He notes that the summation
is approximated by òt1t2
s(1 - x)dt, which is the same as òx1x2
s(1 - x)dx which is 2òx1x2
x-1dx = 2log(x2/x1). Ewens notes that
this differs only trivially from -2log(x1) (at least for situations
where x1 is near 0 and x2 is near 1). He follows
Haldane and Kimura in using starting and ending frequencies of 0.0001 and
0.9999 for the substituting allele to come up with a cost of 18.4 when
h is 0.5 and notes that the load is generally higher when the coefficient
of dominance is not equal to 0.5. Following Haldane, Ewens uses a "typical"
value of 30 for the substitution load / cost.

Ewens then describes the meaning of the load as follows:

What does this calculation really mean? Suppose all selection
is through viability differences and the number of reproducing adults in
each generation remains constant at N. A considerable portion of
the depletion in population numbers between birth and the age of reproduction
is non-genetic. Taking only the genetic component and supposing there is
no depletion through genic deaths of the optimal genotype A1A1,
a straightforward calculation shows that when the frequency of A1
is x there must be N(1 + s)/(1 + sx) individuals at birth
so that after differential variabilities operate, there are N individuals
at maturity. Thus the average individual is required leave approximately
1
+ s(1 - x) offspring after non-genetic deaths are taken account of,
so that there will be Ns(1 - x) "genetic deaths" in each generation
associated with the evolutionary process. Summed over the entire process
this gives NL individuals in all or an average of NL/T each
generation if the substitutional process takes T generations.

Ewens then considers a series of such substitutions at different loci but
with the same fitness parameters, each substitution starting regularly
n
generations apart. In this scenario, if each substitution takes T generations
to complete, there will be T/n substitutions going on at any given
time. This will lead to a total of NL/T)(T/n) = NL/n "selective deaths"
per generation. He then notes that if one sets an upper limit of 0.1 N
on this number (as per Haldane), then using the representative value of
L = 30, one calculates a lower limit of n = 300, so that successive
mutations cannot start more frequently than once every 300 generations
or the number of selective deaths will be too large for the population
to carry.

Ewens then mentions that some have argued that selection through fertility
differences may escape this load or cost problem. However, he shows that
if one looks at the offspring requirements of the most fit individual required
to drive a series of substitutions as described above, a similar argument
can be made for selection driven by fertility differences as was just made
for substitutions driven by viability differences. The offspring requirement
of each individual of the most fit genotype (ie the individuals that have
only the fitter gene at each locus of all of the ongoing substitutions)
will be 1 + s(1 - x) for each locus currently undergoing substitution.
Thus, the most fit individual will be required to produce 1 + L/T offspring
for each locus currently undergoing substitution. Using a simple multiplicative
model of fitness, this indicates that if T/n substitutions are going on
simultaneously, individuals with the most fit genotype will be required
to produce (1 + L/T)T/n offspring in all each generation.
This is approximately exp(L/n) which is exp(30/n) using the
typical value of 30 for the substitution load. (Ewens notes that this is
approximately 1.1 offspring per parent for the most fit genotype when n
= 300 generations as suggested by Haldane). Ewens shows that the offspring
requirement per parent of the most fit genotype rises rapidly as n (the
number of substitutions going on simultaneously) decreases. If n is small
(as suggested by Kimura), the offspring requirement per parent of the most
fit genotype will be high. Ewens gives an example of the kind of numbers
required in a later section (9.2) of the book where he shows that these
offspring requirements for either viability or fertility selection are
not really the problem envisioned by Kimura.

In section 9.2 - Arguments Leading to the Neutral Theory: Loads,
Ewens gives an example of the offspring requirements required for a series
of substitutions as described above. He recaps Kimura's (then - in 1979)
recent estimate of the substitution rate as six substitutions per generation,
which puts n = 1/6. Plugging this value into the Load equation (
exp(30/n))
gives an offspring requirement of exp(30/(1/6)) = exp(180) = [approximately]
1078 offspring. Ewens then quotes Kimura to show agreement on
this point:

"to carry out mutant substitution at the above rate, each
parent must leave e180
1078 offspring for only one of the offspring to survive. This
was the main reason why random fixation of selectively neutral mutants
was first proposed by one of us as the main factor in molecular evolution."

Ewens mentions that this huge offspring requirement only applies to the
parents of the "most fit genotype" and is does not apply to the average
individual. He refers this to his derivation of the offspring requirement
( exp(30/n)) that I have described above. Ewens then rederives the
same equation using a different set of arguments that arrive at the same
equation:

First, he assumes a sequence of loci that are substituting because of
selective differences at each locus with h (the coefficient of dominance)
= 1/2 and a selection coefficient of s. The contribution of a single locus
undergoing substitution to the average fitness (wavg) of the
population is expected to be 1 + sx. (Click here for
proof.) Considering multiple loci and multiplicative fitnesses, wavg
=
Pi(1
+ sxi) , that is the average fitness will be the product of
1 + sxi where xi is the frequency of the ith
locus undergoing substitution. If there are J loci undergoing substitution
at any one time, the average fitness will be approximated by wavg
=
(1 + (1/2)s)J. If each substitution takes T generations and
there are n substitutions starting per generation, then J = T/n and
wavg = (1 + (1/2)s)T/n = exp((1/2)sT/n). The fitness
of the individual having the optimal genotype (homozygous for each of the
favorable alleles undergoing substitution) will be given by wmax
= (1 + s)T/n, which is approximately equal to exp(sT/n).
If the fitnesses are rescaled so that wavg = 1, then the
fitness requirement for the optimal genotype will be exp((1/2)sT/n). To
determine T (so that we will know how many generations are required for
a substitution), Ewens uses the usual starting and ending values for favorable
gene frequencies (0.0001 and 0.9999 respectively) and the formula T = òx1x2
{sx(1 - x)}-1dx where x1 = 0.0001 and x2
= 0.9999. This yields T = 36.8/s, meaning that a substitution under these
conditions where s = 0.01 will require around 3,680 generations. Plugging
this value back into the equation for the offspring requirement of the
optimal phenotype (Ewens refers to this as l, the substitution load -
l = exp((1/2)sT/n)) gives l = exp((1/2)* 0.01*3680/n)) = exp(18.4/n).
Using the substitution rate estimated by Kimura of 6 substitutions per
generation puts the offspring requirement of the most fit genotype at exp(18.4/(1/6))
= exp(110.4) = 9 X 1047, a ridiculous number of offspring for
any living creature. Furthermore, using the "representative value" of 30
for the substitution cost (to account for increases to the cost due to
dominance effects) recovered Kimura's estimate of exp(30/(1/6)) = exp(180)
= 1 X 1078, another impossible offspring requirement.

After a qualitative discussion of some factors (i.e. frequency dependancy
and non-multiplicative epistasis among the various substituting loci) that
can be expected to reduce the substitution load and hence the offspring
requirement of the optimal genotype, Ewens moves on to the most critical
factor that reduces the substitution load. Ewens notes that if the parameter
values for a series of substitutions are taken as having an initial frequency
of 0.0001, a final frequency of 0.9999, a coefficient of dominance (h)
of 0.5, and an selection coefficient (s) of 0.01, if 6 substitutions
start each generation (as suggested by Kimura) leading to n = 1/6; then
there will be 22,080 substitutions going on at any given time. That means
that there will be 22,080 genes in the process of going from a frequency
of very nearly 0 in the population to fixation. Many of these genes, having
begun the substitution process relatively recently will have quite low
frequencies in the population, making individuals carrying the optimal
genotype quite rare. Under these conditions, Ewens calculated the probability
of any one individual having the optimal genotype (i.e. having all 22,080
beneficial alleles simultaneously) as 10-23,200. Needless to
say, such an individual is never going to exist in a finite population!

Ewens then addressed the problem of determining what the optimal genotype
would be that was likely to actually exist in a finite population. Using
the statistics of extreme values in a population of finite size, Ewens
shows that if the mean and variance of the number of preferred (fitter)
alleles is known for a population, the fittest genotype that will be likely
to actually exist can be determined. He refers to an earlier paper (Ewens
1970) for the derivation that the variance in preferred alleles in the
series of substitutions described above is given by s/n. Using s
= 0.01 and n = 1/6, the variance will be 0.06 which leads to a standard
deviation of 0.245 (recalling that the standard deviation is given by the
square root of the variance). Using the statistics of extreme values, Ewens
stated that for a population of size 105 , if s is small
(less than 0.1), the population fitness distribution should be approximately
normal and the most fit individual in that population would be expected
to have a fitness that is no more than 4 standard deviations above the
mean. (He references Pearson and Hartley, 1958, Table 28 for this.) For
our example, the standard deviation of fitness is 0.245 which leads to
an expected optimal fitness of 1 + 4(0.245) = 1.98.

Ewens' calculations indicate that a population maintained at around
100,000 individuals is capable of driving six substitutions per generation
(the highest rate ever claimed for amino acid substitutions among a variety
of mammal lineages) with a reproductive excess of 1.98 - 1 = 0.98 offspring
per parent. Although this offspring requirement is high compared to Haldane's
claim that the intensity of natural selection rarely exceeds 0.1, it is
well within the reproductive capabilites of humans and apes where a family
size of 4 children will meet the requirement. Families having more than
four children will have "extra" offspring available to "pay the cost" of
deleterious mutations, random death, and other non-substitutional causes.
Nonetheless, despite the questionable signifigance of Haldane's limit of
10% for the selection intensity, we can easily turn the equation around
to see how many substitutions can occur without exceeding a 10% reproductive
excess to pay the substitution cost:

1 + 4(s/n)0.5 = 1.1
4(s/n)0.5 = 0.1
16s/n = 0.01
n = 16s/0.01

For s = 0.01,

n = 16 * 0.01/0.01
n = 1 substitution every 16 generations.

For the 500,000 generations in the combined human / chimp lines, this would
allow 31,250 substitutions.

It's also worth noting that this number is dependent upon the selection
coefficient. If the bulk of selection coefficients for substitutions are
closer to 0.001 rather than 0.01, then Ewen's formula would allow a subtitution
rate of 1 every 1.6 generations, permitting around 300,000 substitutions
in the combined 500,000 generations separating chimps and humans from their
common ancestor.

Individuals of diploid species have two copies of each gene. We can
designate the favored allele as A and the non-favored allele as a. Therefor,
if there are two alleles (versions) of this particular gene, then there
are three kinds of individuals (genotypes, actually) that may exist:

AA - Has 2 copies of the favored allele. This individual would be homozygous
(has 2 copies of the same allele) for the favored allele.
Aa - Has 1 copy of the favored allele and 1 copy of the non-favored
allele. Such an individual is heterozygous - it has 2 different alleles.
aa - Has 2 copies of the non-favored allele. Individuals with
the aa genotype are said to homozygous for the non-favored allele.

Fitnesses for each of the three kinds of individuals can be calculated
from which alleles they have (their genotypes). Fitness contributions are
calculated from the selection coefficient (the fitness of an individual
that is homozygous for the favored allele) and the coefficient of dominance
(the degree of dominance for the favored allele - ranges from 0 (completely
recessive) to 1.0 (fully dominant). The fitness for each genotype is given
as follows:

AA - 1 + s
Aa - 1 + sh
aa - 1

The average fitness of a poulation is wavg and is given by
the the sum of the fitness of each possible genotype multiplied by each
genotypes frequency in the population: