What follows is proof that
the substitution of a beneficial mutation for a haploid DOES NOT require
the death of 10 times the population size (or more) as Haldane believed.

In "The Cost of Natural Selection", Haldane described a scenario where
a formerly disadvantageous allele becomes advantageous due to a change
in the environment. In this situation, the rare beneficial allele is said
to have a fitness of 1.0, while the common allele is said to have a fitness
of 1 - s, where s is some small value. We will use 1% (0.01) as an example.

The classic view of gene fixation involves assuming a starting frequency
for two alleles of a single gene, calculating the frequency of each genotype
in the next generation, and then applying a fitness factor to each genotype
to simulate the effects of selection.

Haldane's Calculations for the Replacement of Gene "a" by Gene "A"
Symbols:
p - frequency of gene A at any given time.
q - frequency of gene a at any given time. Note that p + q always equals 1.
s - the selection coefficient of gene A.
w - the fitness of gene a, w = 1 - s.
N - number of individuals in the population.
F - the average number of offspring that each parent produces in a generation.
Assumptions:

Gene A is a beneficial mutation that occurs in exactly one individual at the start of our calculations. Population will remain constant through time.

w * p2 = frequency
of AA individuals in the next generation
w * 2pq = frequency of Aa individuals in the next generation
q2 = frequency of aa individuals in the next generation.

As a simplifying step, it is
conventional to adjust the fitnesses such that
the highest fitness is 1 and the less fit individuals have a fitness less
than 1.

Therefor,
p2 = frequency of
AA individuals in the next generation
2pq = frequency of Aa individuals in the next generation
q2 / w = frequency of aa individuals in the next generation.

The number of interest to Haldane was
(q2 / w) * N = (0.999995)2 / 1.01 * 100,000 =
99,009 (aprox.).
Notice that 991 aa individuals have been lost due to selection. This
is what Haldane called the cost of substitution for one generation. He
continued iterating this process, summing the number of aa individuals
lost each generation until gene A became nearly fixed. That sum, divided
by the population size was what Haldane called the cost of substitution.
But, does this process reflect reality very well? Why should the addition
of one individual having a slightly beneficial gene cause the death of
991 individuals who otherwise would have gotten along fine. What is especially
telling is that if the population size is raised to 1 million, now 9901
individuals have to die, even though the fitness hasn't changed any! This
just doesn't make sense. Haldane continued this process until the new allele
was fixed, and concluded that the cost was typically 30 which implied the
death of 30 times the population size individuals. Hence, if the population
size was 100,000, a cost of 30 would require the death of 3,000,000 individuals.
As I will show below, this meaning applied to the cost of substitution
is completely incorrect.

The problem lies in the normalization of the fitness constant and the
assumption of constant fitness. As Bruce Wallace puts it, although Haldane
killed 991 individuals with his selection coefficient, most of those individuals
were "resurrected" in the next generation with the assumption of a constant
population! What we need is a slightly different fitness "constant" that
allows for the fact that one slightly fit individual is not going
to kill off nearly 1000 less fit individuals. And that is just what I provide
in the next section.

Haldane's Calculations for the Replacement of Gene "a" by Gene "A" With
Frequency Dependent Selection

First, let's consider a population of genetically identical individuals
(except for X and Y chromosomes!). Using the symbols and assumptions above,
we know that each individual will produce F offspring of which only 1 will
survive to reproductive adulthood. This maintains the population at N which
was a condition we imposed above. This implies a fitness of 1/F for each
individual. Now let's propose a mutation in an individual such that the
mutant is a small fraction s more fit than the non-mutant (i.e. s = .01
would imply an increased fitness of 1% for the new A allele over the old
a allele). Just to be thorough, I am going to consider the possibility
of incomplete dominance of our new, beneficial allele (as Haldane did in
"The Cost of Natural Selection"). Let X be the fitness of an individual
homozygous for the beneficial mutation, Y the fitness of a heterozygote,
and Z is the fitness of individuals homozygous for the old, less fit allele.
X, Y, and Z can be related by X = (1 + s) * Z and Y = (1 + sh) * Z . s
is a positive selection coefficient as described above, and h is a factor
between 0.0 and 1.0 that indicates the relative dominance of the two alleles.
Thus h = 1 indicates complete dominance for the new allele, h = 0.0 indicates
complete dominance for the old allele. Notice that Z is slightly less than
X as expected for a less fit individual, with Y somewhere in between, tending
toward X or Z depending on the value of h. Now we are going to apply a
round of reproduction to our population and solve for Z and ultimately
Y and X in terms of p, q, s, h, and F. The variables p, q, and N have not
changed in meaning from above. Variable s has a somewhat modified meaning
as described in this paragraph (but I think it is still quite analogous
to the classic coefficient of selection). Variables F, h, X, Y, and Z were
just introduced.

Reproduction :
F*N*p2 + F*N*2pq + F*N*q2 = F*N

Note that if for example F is 4 (a pair of parents produce 8 offspring
in their lifetimes) and N (the parent population size) is 100,000; we have
400,000 offspring before selection. Let's now apply selection:

X*F*N*p2 + Y*F*N*2pq + Z*F*N*q2 = N

Note that the subsequent generation's population has been brought back
to exactly N by selection, thus meeting our requirement of a fixed population
size. Now our goal is to solve for Z. Since:

We now have fitness equations for the two alleles in terms of F, s, h,
p, and q! Now, let's take time out for a reality check. How do the fitness
terms vary for a new, beneficial mutation just starting out versus what
happens when it reaches fixation (let's just consider h = 1, i.e. complete
dominance for the new allele)? Well, when the new A allele exists in only
a few individuals, q is essentially 1, p is 0, and X and Z approach (1
+ s) / F. This is a small but positive fitness that will lead to increased
numbers of A individuals. Meanwhile, the old aa individuals have a fitness
of very nearly 1/F because q is close to 1 and p to 0. That means these
individuals will hardly feel the competition with the Aa and AA individuals
because they are so rare. Only as the frequency of the new mutant gene
is raised ( and q reduced) do the aa individuals begin to be seriously
outcompeted. As q approaches 0 (and p approaches 1), the fitness of the
aa individuals approaches 1 / [F * (1 + s)] while the AA and Aa individuals'
fitness approaches 1/F. When the A gene becomes fixed, its fitness is 1/F,
just like every other fixed gene in the population. This makes sense because
it no longer has anyone to compete with - there are no longer any non-A
individuals.

Now, my question is (to anyone who has made it this far): What is the
substitution cost? How have the patterns of death and birth differed while
gene fixation was going on from when the animals were simply reproducing
without substitution? To me, it looks like the exact same number of individuals
have lived and died in each generation as would have lived and died if
substitution were not occurring. To me, the cost of substitution appears
to be an artifact of an old, simple equation to determine the survivors
from one generation to the next under natural selection. This was exactly
the conclusion Bruce Wallace reached in "Fifty Years of Genetic Load: An
Odyssey"6 and it seems perfectly obvious to me.

I have found further insight into these questions by breaking down the
fitness equations for X, Y, and Z using the method of partial fractions.
If Z = 1/[F*(1 + s*p2 + sh*2pq)] then

Z = 1/F +A/(1 + s*p2 + sh*2pq) where A is an arbitrary value
that can be calculated to preserve the equality. A can be solved for by
noting that 1/F + A /(1 + s*p2 + sh*2pq) = 1/[F*(1 + s*p2
+ sh*2pq)]. Therefor,

1 + s*p2 + sh*2pq + A*F = 1

A*F = -(s*p2 + sh*2pq)

A = -(s*p2 + sh*2pq) / F

This leads to:

Z = 1/F - s/F * (p2 + h*2pq)/( 1 + s*p2 + sh*2pq)

Similarly, for Y,

Y = (1 + sh)/F*(1 + s* p2 + sh*2pq)],

Let Y = 1/F + B/(1 + s* p2 + sh*2pq)], where B must be determined.

1/F + B/*(1 + s* p2 + sh*2pq)] = (1 + sh)/F* (1 + s* p2
+ sh*2pq)]

1 + s* p2 + sh*2pq + B*F = 1 + sh

B*F = sh - s* p2 - sh*2pq

B = 1/F * (sh - s* p2 - sh*2pq)

Therefor,

Y = 1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2
+ sh*2pq)

Lastly, applying the same treatment to X:

X = (1 + s)/F*(1 + s* p2 + sh*2pq)]

Let X = 1/F + C/*(1 + s* p2 + sh*2pq)], where C must be determined.

1/F + C/*(1 + s* p2 + sh*2pq)] = (1 + s)/F*(1 + s* p2
+ sh*2pq)]

1 + s* p2 + sh*2pq + C*F = 1 + s

C*F = s - s* p2 - sh*2pq

C = 1/F * (s - s* p2 - sh*2pq)

In which case:

X = 1/F + s/F*(1 - p2 - h*2pq) / (1 + s* p2 +
sh*2pq)

A round of selection using these forms of the fitness equations looks
like this:

{[1/F + s/F * (1 - p2 - h*2pq) / (1 + s* p2 +
sh*2pq)]* p2 +

[1/F + s/F * (h - p2 - h*2pq) / (1 + s* p2 + sh*2pq)]
* 2pq +

[1/F - s/F * (p2 + h*2pq) / (1 + s* p2 + sh*2pq)]
* q2} =

1/F

What does it all mean? Well, first of all, notice that the fitness of
each genotype (AA, Aa, and aa) is a sum of 1/F and a factor of s/F that
is dependent upon the selection coefficient, dominance, and allele frequencies.
Notice that if the selection coefficient is zero, the average fitness will
be 1/F, just enough to maintain the species population. Also note that
for all values of p and q such that p + q = 1 (I will be adding this derivation
later), the S/F terms for the three genotypes will always sum to zero,
so the average fitness will always be 1/F.