† Institute of Cell, Animal and Population Biology, University of Edinburgh, Scotland EH9 3JT, United Kingdom‡ Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada

Abstract

In 1991, Barton and Turelli developed recursions to describe the evolution of multilocus systems under arbitrary forms of selection. This article generalizes their approach to allow for arbitrary modes of inheritance, including diploidy, polyploidy, sex linkage, cytoplasmic inheritance, and genomic imprinting. The framework is also extended to allow for other deterministic evolutionary forces, including migration and mutation. Exact recursions that fully describe the state of the population are presented; these are implemented in a computer algebra package (available on the Web at http://helios.bto.ed.ac.uk/evolgen). Despite the generality of our framework, it can describe evolutionary dynamics exactly by just two equations. These recursions can be further simplified using a “quasi-linkage equilibrium” (QLE) approximation. We illustrate the methods by finding the effect of natural selection, sexual selection, mutation, and migration on the genetic composition of a population.

EVOLUTION involves simultaneous changes at many genetic loci. Modeling these changes is difficult because associations between alleles at different loci (“linkage disequilibria”) cause the effects of selection acting on one locus to spill over onto other loci, generating indirect selection (see Ewens 1979, p. 195). These indirect effects influence the evolution of genes that are themselves under direct selection, altering the course of adaptation (e.g., Hill and Robertson 1966; Barton 1983). Moreover, indirect selection determines the fate of modifier genes that have important effects even if they themselves are free of direct selection. Examples of such modifiers include female mating preference genes (Fisher 1952), modifiers of recombination (Otto and Michalakis 1998), and modifiers of the mutation rate (Dawson 1999; Sniegowskiet al. 2000).

The most obvious approach to modeling multilocus systems is simply to follow the frequencies of all possible genotypes. There are three basic drawbacks here. First, the number of genotypes grows exponentially with the number of loci, rapidly overwhelming both analytical and simulation approaches when there are even a modest number of genes. Second, the quantities that are often of most interest, such as allele frequencies and mean phenotypes, are obscured by working with genotypes. Third, approximations for the dynamic equations appear more naturally when we work with quantities other than genotype frequencies.

Several approximate approaches have been developed to deal with these problems.

The infinitesimal model assumes that very many genes influence the phenotype, such that each allele has an infinitesimal effect (Fisher 1918; Bulmer 1980; Turelli and Barton 1994). In this limit, the genetic variance contributed by allelic variation at each locus is constant, and evolutionary change is due solely to changes in associations among loci. This is an accurate and general approximation for short-term change under strong selection, but cannot describe changes in allele frequencies over the longer term.

Another method introduced by Fisher (1953) models a population by following the inheritance of “junctions” between chromosome regions with different ancestries. This approach is well suited to describing the ancestry of samples of neutral genomes (Hudson 1990) and models of hybridization, in which selection can be approximated as acting on the proportions of genetic material derived from different source populations (Baird 1995). It is again intractable over long timescales, however, since the number of junctions increases geometrically.

Price (1970) gave an exact and completely general equation, in which the average change in a trait is precisely equal to its covariance with relative fitness plus the change due to transmission. This takes the classical approach of quantitative genetics, by following only the phenotype and disregarding the (usually unknown) genetics that underlies it.

Several independent developments extend Price's approach by following the mean, variance, and higher moments of the phenotypic distribution (e.g., Barton and Turelli 1987; Bürger 1991; Shapiroet al. 1994). Each moment depends on higher moments, however, and so approximations are required to give a closed set of dynamical equations. Such approximations are accurate under restricted circumstances only. In contrast to thermodynamics, where molecular motions can be averaged out over macroscopic scales, genetic details do influence phenotypic evolution.

Barton and Turelli (1991, hereafter BT91) developed the quantitative genetic approach to provide a complete description of multilocus systems, with no restrictions on the relation between genotype and phenotype. This developed from work by Barton (1983, 1986), Barton and Turelli (1987), and Turelli and Barton (1990); it was paralleled by independent work of Christiansen (1987) and Bürger (1991). Barton and Turelli's (1991) approach contains three key elements. First, it gives a general representation of populations with multiple alleles and multiple loci. Second, it derives exact recursions for the effects of selection and recombination on allele frequencies and the associations between alleles at different loci. Third, it finds a “quasi-linkage equilibrium” approximation (QLE) that allows the recursion equations to be greatly simplified under some conditions.

Although the notation and framework of the BT91 approach are general in many respects, the methods it develops are restricted to certain forms of inheritance. Their notation can describe autosomal genes in randomly mating diploids and in nonrandomly mating haploids. It cannot, however, accommodate such complications as nonrandom mating in diploids, polyploidy, sex linkage, genome imprinting, and cytoplasmic inheritance. The main aim of this article is to show how the BT91 approach can be generalized to include all forms of inheritance. We also show how migration and mutation can be described in the same framework.

This article begins by presenting a notation that is sufficiently flexible to accommodate a variety of evolutionary forces and modes of inheritance. Next, we derive general recursion equations that describe how the genetic state of a population changes over the course of a generation. The following section shows how the selection and transmission coefficients that appear in the recursions are calculated for any particular situation. We then present the QLE approximation for the recursions, which greatly simplifies the equations. The QLE approach is illustrated with examples in the following section.

The notation and recursions set out in this article have been implemented in a set of Mathematica packages (Wolfram 1999) that are available on the Web at http://helios.bto.ed.ac.uk/evolgen. These packages use the general notation, in essentially the same form as in this article, and apply this notation to define functions appropriate for selection and recombination in diploids. appendix d gives examples that show how the recursions can be computed automatically to give algebraic expressions for genetic changes in an arbitrary set of loci.

A GENERAL NOTATION FOR MULTILOCUS EVOLUTION

Here we lay out a notation that is sufficiently flexible to account for the different modes of inheritance and evolutionary forces that motivate the model. The section starts by introducing the concepts on which the notation relies, shows how the notation can describe genotypes and populations, and then describes the relation between phenotypes and genotypes.

Contexts and Positions, Selection, and Transmission: It is useful to begin by defining the word “gene” to mean a particular copy of a nonrecombining sequence at some locus in some individual. Thus two different genes may or may not reside at the same locus, and if they do they may or may not be in the same allelic state. A gene at a given locus can be found in any of a number of situations: It might be carried by a male or a female, it might have been inherited from a mother or a father, or it might reside in one deme or another. We refer to this collection of qualities as the gene's context. Context is a key concept in our notation and it is important in two ways. First, it determines how evolutionary forces act on the gene. A selection coefficient, for example, may depend on whether the gene is carried by a male or a female. If there is genomic imprinting, then the selection coefficient will also depend on the sex of the individual from which the gene was inherited. Second, a gene's context affects how it is transmitted. Consider two autosomal loci in a diploid individual. If there is no recombination between the loci during meiosis, the resulting gamete will carry copies of the genes that both descended from the individual's mother or father; with recombination, the gamete will carry one gene from the individual's mother and one from the father.

The information we need to specify a gene's context varies between models. In a model of a spatially structured population, the context will include geographical information. Likewise, in a life history model the context specifies the life stage of the individual carrying it. The context will include the sex of the individual carrying a gene in a model with two sexes, but not in a model of a hermaphroditic population.

Loci are referred to by lowercase italic letters. The context is written in a series of subscripts whose elements carry the relevant information. In this article we use the convention that for diploid populations with two sexes, the first subscript of the context gives the sex of the individual carrying the gene, which we call its “sex of carrier.” The second subscript gives the sex of the parent from which it was inherited, its “sex of origin.” Sexes are denoted by “m” for male and “f” for female. For example, genes at a diploid locus i that are carried by females and that descended from a male (the female's father) are referred to as ifm. These are then four possible contexts for genes at this locus. Contexts in hermaphrodites would not include the sex of carrier (since all individuals are the same sex), but would include the sex of origin to denote whether the gene was transmitted through an egg or a sperm. One would account for more than two sexes (as when modeling a plant population with tristyly) by simply allowing the subscripts to take more than two possible values. Subscripts can be added to denote other information, such as the deme or the family in which a gene resides.

We use the term position to refer to a particular locus in a particular context. An example of a position is the place in the genomes of females where genes inherited from males (their fathers) at locus i reside. Positions, like loci, are defined independently of the allelic states of the genes that reside there. With n diploid loci in a dioecious population, there are 4n positions that genes occupy (n loci × two sexes of carrier × two sexes of origin). Open-faced lowercase letters refer to single positions, for example i=ifm. Open-faced uppercase letters refer to sets of positions, e.g., A={i,j}. A genome, denoted G, is the set of all positions in an individual. With a single diploid locus i, for example, the genome for a male is G={imm,imf}, and for a female it is G={ifm,iff}. The notation is summarized in Table 1.

Two fundamental kinds of events that occur during the course of a generation are selection and transmission. “Selection” accounts for variation in the contribution of different genotypes to the next stage in the life cycle. Fitnesses are assigned to either individuals or groups, depending on the form of selection. The simplest case is viability selection, in which case fitnesses are assigned to individuals. For sexual selection and assortative mating, we account for the relative contributions of mated pairs of male and female genotypes; here fitnesses are assigned to all possible kinds of pairs (BT91). Thus according to our use of the term, with nonrandom mating there can be selection even when all individual genotypes have equal survival, mating success, and fecundity. Group selection is described by assigning fitnesses to all possible combinations of genotypes that could comprise a group.

By “transmission” we mean an event that changes the context of a gene. A simple example is meiosis followed by syngamy: A gene that was carried by a female becomes a gene that was inherited from a female. The rules of transmission depend both on the gene's context and on the mode of inheritance obeyed by its locus (autosomal, Y-linked, cytoplasmic, etc.).

Describing genotypes and populations: The genotype of an individual at position i is represented by the indicator variable Xi. With just two alleles per locus, Xi can take two values, which it is convenient to set at 0 or 1; for this special case, the frequency of allele 1 at position i is written pi and the frequency of allele 0 as qi=1−pi. A fact that is useful later is that under these conventions, the expected value of Xi (averaging over all individuals in the population) is equal to i. When there are more than two alleles, we can choose any distinct values to distinguish the alleles. If we are considering alleles that have additive effects on a quantitative trait, it is convenient to set the values equal to their effects so that the expectation of Xi equals the position's contribution to the mean value of the trait.

These conventions can be generalized if the situation demands it. When a position has pleiotropic effects on a set of k traits, the variable Xi becomes a vector of length k. In multiallelic models, vectors can also be used as an alternative to the scalar-value convention described in the last paragraph. We can define the indicator to be a vector of length equal to the number of alleles, all entries of which are zero except for the one corresponding to the allelic state. (This approach might be used in a model of genes with additive effects if, for example, alleles with the same effect mutate to other alleles at different rates.) Other conventions are also possible: For example, each position could be represented by a vector of length two, with the first giving the allelic effect and the second a label for the allele.

In general, an individual is represented by a vector X containing the values of his/her indicator variables for every position in the genome. It is useful below to extend this vector to include positions from more than one individual.

The genetic state of a population can be completely described by a set of statistical moments that we call associations. These include associations among genes within a haploid genome, which are conventionally referred to as linkage disequilibria. We use the more general term, however, since we are concerned with associations among arbitrary sets of positions, which may or may not be linked and may or may not be in a population that is at equilibrium. Indeed, we need to consider associations between positions that are in different individuals. The relation between different measures of linkage disequilibrium is summarized in the discussion.

Our notation allows multilocus moments to be defined in a variety of ways. The key quantities that determine the moments are a set of reference values. There is one reference value for each position, and the reference value for position i is denoted ℘i. (Note the distinction between an italic pi, which denotes an allele frequency, and a curly ℘i, which denotes a reference value.) To
describe a population, we first make a change of variables, such that the allelic state of an individual gene at position i is measured relative to the reference value for that position, ℘i:
ζi=Xi−℘i.(1)
Choice of the reference values is up to the investigator. Typically it is useful to define ℘i as the expected value of Xi among zygotes. The reference value then has a particularly simple meaning for biallelic models. Defined that way, if there are no differences in allele frequencies between positions at a locus (e.g., between males and females), the reference value is equal to the frequency of allele 1 at that locus (℘i=℘i=pi). The new indicator variable ℘i then takes the values 1 – pi and –pi.

Next we define the product of all the ζ's in the set of positions A:
ζA=Πi∈Aζi.(2)
The symbol ∈ indicates that the product includes one term for each element in the set A. If we choose an individual at random from the population, then ℘A is a random variable. The association between the alleles at the positions in set A is defined as the expectation of ℘A taken over the whole population,
DA=EX[ζA],(3)
where EX[·] denotes an expectation over the distribution of genotype frequencies. The DA are therefore moments, that is, measures of statistical association. As an example of the notation, the association between alleles at loci i and j in a diploid male, one inherited from his female parent (mother) and the other from his male parent (father), is written Dimfjmm. Products over empty sets are defined to be 1, so that DØ = 1. The D's are the same as BT91's C's. Figure 1 illustrates the notation. [We assume here and below that the indicators Xi have scalar values (0 or 1, say). If they are vectors, then the pairwise D{i,j} are matrices, and nth-order associations are tensors of rank n.]

—A model of a dioecious population with four autosomal loci. Open circles are genes inherited from a female; solid circles are genes from a male. Three of the associations between the 16 positions are shown.

The associations have particularly simple interpretations when the reference points are chosen to be the current allele frequencies (℘i=pi). Moments for single positions vanish: Di=0. Associations between pairs of positions are equal to the covariance in the allelic state of genes at those positions. Departures from Hardy-Weinberg proportions are measured by D's involving pairs of positions at the same locus that have the same sex of carrier but different sexes of origin; for example, Dimfimm for locus i in males. When there are no sex differences in allele frequencies, Dimfjmf is equal to the conventional measure of pairwise linkage disequilibrium (also called gametic phase disequilibrium) between loci i and j. Moments involving more than two loci are measures of higher-order associations in the population. Following the standard terminology for statistical moments, we say that the associations are “centered” when the reference values are set equal to the current allele frequencies.

Equations 1, 2, 3 provide a recipe for translating genotypic frequencies into a set of reference values ℘ and associations D that completely describe the genetic state of a population. The reverse translation is of course also possible. For example, with biallelic loci and the reference values defined to be equal to the current allele frequencies (℘i=pi), the frequency of genotype X is
f(X)=Πi∈G[Xipi+(1−Xi)qi]+ΣU⊆G∗{DU(−1)(∣U∣−Σi∈UXi)Πi∈∖U[Xipi+(1−Xi)qi]},(4)
where ∣U∣ means the number of positions in set U, and G∖U stands for the positions in set U that are left after those in set V are taken away. The first term in Equation 4, a product that includes one term for each position in the genome, gives the genotype frequency that would be found in the absence of any associations. The second term accounts for the effects of the associations. G is a set of all positions in a genome whose sex of carrier is the same as that of X. Expressions more general than Equation 4 that allow for multiple alleles and arbitrary definitions for the reference values can be derived using results that are developed below.

Summations of the kind seen in the second term of Equation 4 make frequent appearances in this article. The sum includes one term for each possible subset U of positions in the set G, including G itself. When an asterisk appears, as in Equation 4, the sum does not include a term in which U equals the empty set, Ø. Thus, if G consists of the two positions i and j, the sum in Equation 4 will have three terms as U takes on the values {i}, {j}, and {i,j}. When the summation symbol is not followed by an asterisk, the sum does include a term in which U=∅.

The notation allows for more than two alleles per locus. It does become more complicated in that event, however, because the extra degrees of freedom require us to account for associations with repeated positions. With three alleles, for example, the allele frequencies at a position are described by the two variables Di and Dii. However, when there are only two alleles per locus, associations containing repeated positions can be expressed in terms of associations with no repeated positions. (This article focuses mainly on biallelic loci, which is perhaps not a severe restriction as loci can be defined as single-nucleotide sites. Readers who are interested in loci with multiple alleles should consult BT91, or the documentation with the Mathematica packages, for more details about those models.)

Two basic equations for simplifying associations for biallelic loci are useful later. From Equations 1, 2, 3DUii=piqiDU+(1−2pi)DUi,ζi2=piqi+ζi(1−2pi)(5)
(see BT91, Equation 5). Here and throughout, expressions of the form AB stand for A∪B, the union of sets A and B; thus DUi=DU∪i, etc.

The relation between phenotypes and genotypes: This notation is sufficiently flexible to allow for any relation between phenotypes and genotypes. Let Z be the value of a phenotypic character in an individual. This value can be written in general as a function of the individual's genotype,
Z(X)=Z¯+ΣA⊆GbA(ζA−DA)+ez,(6)
where Z̄ is the trait mean in the population and eZ is a random environmental component that is independent of genotype and that has mean 0. The ζA that appear on the right-hand side are calculated from the genotype vector X on the left using Equations 1 and 2. (Note that the term in the sum corresponding to the null set A=∅ makes no contribution because ζØ = DØ = 1.)

Equation 6 can describe any kind of genetic dominance, epistasis, sex differences in expression, genomic imprinting, etc. The relation between genotype and phenotype is determined by the choice of the coefficients bA. If there is gene-by-environment interaction, the b becomes a function of the state of the environment; it may be convenient to include that environmental state as a component of the context. Equation 6 also applies with multiple alleles, provided that the set G includes the appropriate number of repeated elements. For example, suppose that there are three alleles at locus i and two alleles at locus j. For haploid genotypes, the set G is then defined as {i,i,j}. The coefficients bi, bj, bii, bij, biij, are then required to account for the 5 d.f., and the set A in the summation of Equation 6 ranges over all five distinct subsets of G.

The b coefficients can become quite numerous. For example, with just two biallelic loci each with four possible contexts (say, two sexes of carrier and two sexes of origin), there are 28 – 1 = 255 possible b coefficients: 8 corresponding to single positions, 28 corresponding to pairs of positions, etc. The number of coefficients drops dramatically, however, in many cases. With no epistasis, genotype-phenotype relations can be fully described with only 8 distinct coefficients, while with a completely additive model (no dominance and no epistasis), only 4 distinct coefficients are needed.

EVOLUTION BY SELECTION AND TRANSMISSION

Here we use the notation proposed above and results from BT91 to find how the genetic composition of a population changes over the course of a generation. We first show how selection and transmission change a population and then end with some statistical bookkeeping.

Selection: Fitness is just another phenotypic character. Consequently, Equation 6 applies when the trait under consideration is fitness. BT91 showed that this insight is useful because the b coefficients then take on special significance: They can be used to calculate how the genetic state of the population changes.

We noted earlier that fitness (that is, relative reproductive output) can depend on just the genotype of the individual (as with viability selection), on the genotype of a mated pair (as with any form of nonrandom mating), or on the genotypes of a larger group of individuals (as with kin or group selection). To describe the effects of selection, we need to consider together the genomes of the selection group, by which we mean the set of individuals that interact to determine their mutual fitness. With simple viability selection and random mating, the selection group is a single individual. Often it is useful to define the selection group to be a male and female mated pair, which allows for nonrandom mating as well as viability selection. The selection group can be expanded to more than two individuals to accommodate group selection.

We denote the set of all positions in a selection group as W. For example, take a selection group consisting of a male and female in a mated pair. With a single biallelic diploid locus i, the selection group is W={imm,imf,ifm,iff}. With three alleles, the selection group is the same, but with each element appearing twice.

The absolute fitness of a selection group is defined to be the ratio of its frequency after selection to its frequency before. When a selection group consists of more than one individual, its “frequency” before selection is equal to the product of the frequencies of the genotypes of those individuals. Take, for example, a selection group consisting of a mated pair of male and female genotypes. The frequency before selection can often be taken as the product of the frequencies of the respective male and female genotypes, since the premating “groups” are equivalent to randomly chosen pairs of males and females. The frequency of the selection group after selection is the frequency with which those genotypes are found together among all mated pairs (weighting the pairs by their relative fecundities, if they differ). This representation of selection can account for viability selection within each sex as well as nonrandom mating and fecundity selection.

The genotype of the selection group is described by the vector X, which includes the allelic state for every position in every individual in the group. Denoting the frequencies of the group's genotype before and after selection as f(X) and f′(X), the group's absolute fitness as W(X), and the population's mean fitness as W̄, we see from Equation 6 that we can always write the expected relative fitness of the genotype of a selection group in the form
w(X)=W(X)W¯=f′(X)f(X)=1+ΣU⊆WaU(ζU−DU)(7)
(see BT91, Equation 6).

The coefficients aA defined by Equation 7 are called selection coefficients. The coefficient aA represents the force of selection acting on the position in set A. These coefficients can account for any form of selection within individuals (including dominance, epistasis, and genomic imprinting) and any form of nonrandom mating. Note that selection coefficients defined this way typically depend on allele frequencies, associations, and reference values, even if the fitnesses of genotypes are constant (BT91). If phenotypes include environmental (nongenetic) components, then the frequencies f() and f ′() represent expectations averaging over those components (see applications). Note that the selection coefficients defined by (7) differ from those defined in BT91. Although similar in form, the fitness functions are not the same, and so selection coefficients from our system and that of BT91 cannot be interchanged.

Selection coefficients have simple interpretations. With biallelic loci, the coefficient ai measures the force of direct selection acting on position i to increase the frequency of allele 1. Selection coefficients with multiple subscripts indicate that those positions have nonadditive effects on fitness. For example, dominance at a locus i in diploid males is measured by aimfimm. This coefficient measures the force of selection favoring allele 1 at locus i when it appears in two copies, one inherited from a female (the individual's mother) and the other from a male (the father). Nonadditive fitness interactions between loci are represented by selection coefficients that have multiple positions with the same sex of carrier. The selection coefficient aiffjff, for example, measures the departure from additivity for the alleles at loci i and j that are carried by females and were inherited from females. The effects of nonrandom mating appear in selection coefficients that include both male and female sexes of carrier. When there are more than two alleles per locus, there are selection coefficients that have the same position repeated. [The notation can accommodate a continuum-of-alleles model where there are an infinite number of alleles per locus, provided that fitness can be approximated by a polynomial function (see BT91). It may not be possible, however, to obtain a good approximation to a continuum-of-alleles model using a finite set of moments.]

Two points about Equation 7 are worth keeping in mind. If the phenotype contains an environmental component, the relative fitness w(X) is understood to mean the relative fitness averaged over that environmental variation. Second, the selection coefficients depend on how the selection group is defined. For example, with random mating the selection group can be defined as a single individual, and no selection coefficients that include both sexes of carrier appear. But if the selection group is defined as a mated pair, the fitness function of Equation 7 generates selection coefficients with both sexes of carrier even under random mating. This discrepancy is not a problem, though, since the alternative definitions of the selection group will produce the same results so long as the definition that is chosen is used consistently throughout the calculations.

Given any set of assumptions about how genotypes (or phenotypes) affect lifetime fitness, Equation 7 can be used to calculate the corresponding selection coefficients. appendix a presents a simple example of two loci under epistatic viability selection. When several selection events occur over the course of a generation, the job is made easier by calculating coefficients for each event in isolation and then combining them. For example, suppose that fitness is the product of viability through two stages of the life cycle, each represented by Equation 7 but with coefficients bU and cU. For biallelic loci, the overall selection coefficient is
aU=ΣV⊆UbVcU∖V.(8)
[To prove Equation 8, write wa(X) = wb(X) wc(X), expand each of the w's using Equation 7, use Equations 5 to eliminate products of ζ's, and finally match the corresponding coefficients of the ζ's on the right and left sides.] In the event of weak selection, the situation can be simplified further by approximation: If the selection coefficients b and c are of order s, then aU=bU+cU to leading order in s.

Given the selection coefficients, we can determine the state of a population following the selection event. BT91 showed how the new allele frequencies and associations are given by
DA′=DA+ΣU⊆WaU(DAU−DUDA).(9)
This is our main result for the effects of selection. We see that the change in the associations for positions in set A, represented here by the second term on the right, is equal to a sum of all the selection coefficients acting on sets of positions in the population, weighted by the association between those positions and the ones in set A.

Equation 9, which gives the new moments in terms of the old reference values, can be used to calculate changes in allele frequencies caused by selection. If we choose the reference values to be the allele frequencies before selection (℘i=pi), then the change in allele frequency at position i is equal to Di′. With two alleles per locus, Equation 9 gives
Δpi=Di′=ΣU⊆WaUDUi=aipqi+ΣU⊆WU≠i,aUDUi,(10)
where
pqU≡Πi∈Upiqi.(11)
On the right side of Equation 10, the first term represents selection acting directly on alleles at position i. The second term represents the effects of indirect selection: the force of selection acting on other positions that is transmitted to position i through the associations.

Equation 10 gives an exact expression for the change in allele frequency at position i caused by selection. If all positions at locus i are equivalent, then this is equal to the change in allele frequency at that locus. If not, the overall change at locus i is found by averaging Δpi over all the positions at that locus. (Note, however, that the average allele frequency is not sufficient to fully describe the population.)

Transmission: “Transmission” refers to an event that changes the contexts of genes. Obvious examples are meiosis, where a gene carried by a diploid individual becomes a gene in a haploid individual (the gamete), and fertilization, where the reverse transition happens. Migration can also be considered as a form of transmission, since genes change their context as they move from one location to another. The effects of transmission on the state of the population are determined by the transmission coefficients. The transmission coefficient tA←B is defined simply as the probability that the positions in set A were inherited from positions in set B. (Note that this is generally not the same as the probability that the positions in set B are transmitted to set A.)

To clarify the meaning of these coefficients, consider autosomal loci in a haploid population with two sexes. The transmission coefficient tim←if is the probability that a gene at locus i in a male was inherited from a gene at locus i in a female and is therefore ½. The transmission coefficient t{im,jm}←{if,jf} is the probability that the genes at loci i and j in a male were both inherited from a female (the mother), which is (1 – rij)/2, where rij is the recombination rate between loci i and j.

There are three constraints on transmission coefficients. First, transmission coefficients are zero unless each position in set A has a corresponding position in set B from which it descended. This implies that sets A and B must be equal when the context information is stripped from all of their positions; that is, tA←B=0 if A ≠ B. (For example, tiff←jfm = 0 because i ≠ j; a gene at locus i cannot be descended from a gene at locus j.) Second, the coefficients representing transmission to any given set A must sum to 1. (In the notation introduced below, ΣB:B=AtA←B=1). A third constraint on the coefficients applies when transmission represents recombination, segregation, and/or syngamy. Then the sex of origin for each position in set A must equal the sex of carrier for the corresponding position in set B, since that is the sex of the parent from which a gene in set A descended.

Transmission coefficients often involve recombination between groups of more than two loci. We use rA to denote the probability that recombination occurs somewhere in the set of loci A; that is, the alleles at those loci passed to a gamete are a mixture of those inherited from the individual's mother and father. Table 2 gives the transmission coefficients for several cases of interest, including autosomes, X-linked loci, and cytoplasmic factors.

Many models assume that there is no genetic variation for the rules of transmission. In that case, the effect of transmission on the moments (the allele frequencies
and associations) that describe a population is simple. Equation 3 implies that the moments after transmission, DA″, are then just a linear combination of the moments before, DA′. When the reference values are chosen to be equal for positions at each locus, the effect of transmission is particularly simple:
DA″=ΣU:U=AtA←UDU′.(12)
This is our main result for the effects of transmission. The summation is over all sets of positions U that could become set A following transmission. The notation “U:U=A” means that U and A must be equal when the context information is stripped from them, that is, when U = A. (Taking the example of dioecious diploids, with A=ifm, the sum in Equation 12 has four terms at U takes the values iff, ifm, imf, and imm.) This requirement follows from the first constraint on transmission described above. Equation 12 needs modification if different positions at the same locus have different reference values, as discussed in Changing reference values below. The two-locus example presented in appendix a shows how transmission coefficients are used in calculating changes in allele frequencies.

Equation 12 can be easily generalized to allow different genotypes to follow different transmission rules. Examples include cases where there is meiotic drive or genetic variation in recombination rates. As in Equation 6, we write the transmission coefficients as a polynomial function of genotype,
tA←U(X)=t¯A←U+ΣV⊆TδtA←U∣V(ζV−DV′),(13)
where T is the set of all positions that influence transmission. The transmission coefficient t¯A←U is the mean probability that genes at positions A were inherited from positions U, averaged over all genotypes. The coefficient δtA←U∣V represents the effect of the set of positions V on the transmission coefficient tA←U, in the same way that the selection coefficient av represents the effects of the set V on fitness. To find the effects of transmission on the associations, substitute (13) into (12) and then average over all genotypes:
DA″=EX[ΣU:U=AtA←U(X)ζU]=ΣU:U=At¯A←UDU′+ΣU:U=AΣV⊆TδtA←U∣V(DUV′−DU′DV′)(14)
(cf.Barton 1995, Equation 3). This equation can be used to study the evolution of alleles that modify the transmission system, for example, by altering recombination rates or the breeding system. In fact, Equation 14 is similar to Equation 9, which describes the effect of selection. One can think of selection as a special form of genotype-dependent transmission, where the transmission is between corresponding positions at consecutive stages in the life cycle and where the transmission coefficients are just the relative fitnesses.

Changing reference values: As described earlier, our system of describing a population is defined relative to a set of reference values. The investigator is free to leave these fixed or to change them as often as desired. It is often convenient, however, to change the reference values once per generation. By updating the reference values to the current allele frequencies, the associations have simple interpretations, and we can calculate the per-generation changes in allele frequencies. Moreover, updating only once per generation avoids a proliferation of alegebra, involving reference values at intermediate stages that eventually cancel. Sometimes it is convenient to update the reference values at the zygote stage. Alternatively, it may be easiest to update them before transmission, since under normal meiosis (no meiotic drive, etc.) allele frequencies are unchanged and the change in associations caused by transmission often takes a simple form when the associations have already been centered. If we are interested only in finding the evolutionary equilibrium, changing reference values from one generation to the next is not an issue.

Changing the reference values changes the associations D, because the latter are defined in terms of the former. Denote the associations before and after the change as DA″ and DA″′, respectively, and the reference values before and after as ℘i″ and ℘i″′. (If the reference values have not been changed since the start of the generation, then ℘i″=℘i.) The associations after the reference values change are found using Equations 1, 2, 3:
DA″′=EX[Πi∈A(Xi−℘i″′)]=EX[Πi∈A(Xi−℘i″)+(℘i″−℘i″′)]=ΣU⊆AΠi∈U(℘i″−℘i″′)EX[Πj∈A∖U(Xj−℘j″)]=ΣU⊆A[DA∖U″Πi∈U(℘i″−℘i″′)].(15)
Because the sum in (15) is not asterisked, it includes the term corresponding to U=∅, which is DA″. This last expression is the main result for changing reference values.

The previous section notes that the transmission equation (12) does not hold when different positions at the same locus have different reference values (that is, ℘i≠℘j for some i = j). In that case, the associations between positions before transmission must be adjusted to the reference values for the positions that those genes will occupy after transmission,
DA″=ΣU:U=AtA←UΣU⊆A[DA∖U′∏i∈U(℘i−℘i∗)],(16)
where ℘i∗ is the reference value for the position in set A that corresponds to i in set U.

With the exception of random drift, all of genetic evolution can be concisely represented by two equations: Equation 9 for the effects of selection and other deterministic forces and Equation 12 or 16 for the effects of transmission. These can be supplemented by Equation 15, which does the bookkeeping needed to ensure that the associations have a simple interpretation. Other deterministic forces, like mutation and migration, can also be described by these equations. Mutation can be represented as a form of frequency-dependent selection and migration as a form of transmission (since genes change their contexts when they move). It is easier to find the effects of mutation directly, however, which we do below. Before doing that, however, we develop an approximation that greatly simplifies the equations for selection and transmission.

THE QLE APPROXIMATION

The recursions derived above can be used to calculate the exact dynamics for a wide range of multilocus population genetic models. Although this approach may give more insight than directly following genotype frequencies, it will not necessarily be any more tractable. That is because exact results require following the dynamics of the same number of variables, regardless of whether they are genotype frequencies or moments (that is, allele frequencies and associations). One of the great appeals of the moment-based approach introduced by BT91 is that in some situations expressions for the associations can be greatly simplified by approximation. In this section, we derive approximate expressions for the associations and changes in allele frequencies when the population is in a state of QLE. The concept was introduced by Kimura (1965) and greatly generalized by Nagylaki (1993) and Nagylakiet al. (1999); a concise summary of those results is given in Bürger (2000, p. 82).

The first fundamental assumption we must make is that all the associations D are of order a, by which we mean that they are not larger than a constant factor times the largest of the a's. BT91 shows that this condition is met when the forces that generate associations within a sex (epistasis, migration, etc.) are weak relative to recombination and when nonrandom mating is not strong. An intuitive justification is that the associations are produced by evolutionary forces that are of order a (see Equation 9) and will not accumulate to values that are much larger than that if the forces breaking them down (recombination, segregation, and mutation) are sufficiently strong. The second assumption needed for the QLE approximation is that all the selection coefficients a are ≪1. BT91 shows that when these two conditions hold, a population rapidly settles into a state where the allele frequencies are changing slowly, and the associations are close to the equilibrium values they would reach if the allele frequencies were in fact stationary (see also Nagylaki 1993). We can then neglect terms involving higher powers of the a's and also higher powers of the D's (because they are of order a). Furthermore, the effects of a series of events of selection, migration, and mutation can be added together, provided they are each of order a (Kirkpatrick and Servedio 1999).

Approximations for the associations: We assume that there are two alleles at each locus, which simplifies the analysis. The approach can be extended to multiple alleles following the leads of BT91. The main results developed below are illustrated with a simple two-locus example in appendix a.

Consider a life cycle in which we define the reference values to be the allele frequencies at the zygote stage. A series of selection events occur during the course of the generation. The generation ends with transmission, creating the zygotes for the next generation. We seek to derive an approximation for the dynamics of allele frequencies that is accurate up to (and including) terms of order a2, which we denote O(a2). From Equation 10, we see that approximation requires in turn that we find an approximation for the associations D that is accurate to order a. To do that we find the values that give an equilibrium for the recursion equations for the D that are accurate to order a; those solutions are our QLE approximations for the associations. The results apply not just to selection but to other deterministic forces that generate associations (such as migration) so long as they are weak relative to recombination and segregation. To simplify the derivations, we assume that there is no genetic variation in the transmission coefficients, an assumption that could be relaxed (see Barton 1995). However, we must assume that the transmission coefficients are sufficiently large that forces of order a do not eventually generate strong associations. (This requires that the largest of absolute values of the selection coefficients a is much smaller than the smallest of the absolute values of the eigenvalues of the matrix of transmission coefficients tA←U.)

To begin deriving an approximate recursion for the D, Equation 9 gives the cumulative effect of selection and other deterministic forces on the associations between positions in set A,
DA′=DA+ΣU⊆W∗aUDUA+O(a2)=DA+aApqA+O(a2),(17)
where A is a set of distinct positions and pqA is defined by Equation 11. The asterisked Σ* indicates that the sum does not include the term with U=∅; it has been separated out to give the first term, DA.

The first step of Equation 17 follows from Equation 9 because the D are of order a and therefore the term aUDUDA in Equation 9 is of order a3 and so can be neglected. The second step follows because the term aUDUA in the first line is of order a2 except when U=A, in which event the reduction formula Equation 5 gives us DUA=DAA==pqA+O(a).

The effect of changing reference values can also be simplified. Equation 10 shows that the change in the allele frequencies is of order a. If we define the reference values to be the allele frequencies, then the quantities (℘i″−℘i″′) that appear in the product in Equation 15 are of order a. That equation therefore reduces simply to DA″′=DA″+O(a2), meaning that the effect of updating the reference values can be neglected. Assume that differences between positions at the same locus are O(a), which holds under normal sexual inheritance. With help from Equations 12 and 17 we then get the full recursion for the associations over an entire generation:
DA″′=ΣU:U=AtA←U(DU+aUpqU).(18)
On setting DA″′=DA≡D∼, we get a QLE approximation for the associations that is accurate to order O(a):
D^A=ΣU:U=AtA←UD∼U+ΣU:U=AtA←UaUpqU.(19)
The first sum on the right is zero if the positions in A include more than one sex of origin. That is because if A includes more than one sex of origin, then U in the first sum would have to include more than one sex of carrier. But DU is of order a2 if the positions in U include both sexes of carrier, since it represents associations between alleles in two randomly chosen zygotes.

Equation 19 is the main result of this section. It gives the solutions for the associations implicitly: The QLE value D∼A on the left side depends on the QLE values for the other associations, which appear as D∼U on the right side. The relationship is linear (because of the linear form of the transmission Equation 12), and so the solution can always be found using standard matrix algebra. Thus the D̃ can be calculated directly, using standard matrix methods, given a set of transmission rules that specify the t's, a set of allele frequencies from which we can calculate pqU, and a set of selection coefficients a. We have implicitly assumed here that the selection coefficients are constant in time, but the approach can be generalized to changing environments (see BT91, Appendix B; Barton 1995, Appendix 4). Briefly, the associations are determined by a time average of the selection coefficients that are discounted by terms like exp(−rAt). If, however, the environment changes on a time scale that is slow relative to the rate at which associations are changed by transmission, then the results given above apply.

The next two sections illustrate how to do this by carrying out the calculations for autosomal genes in dioecious haploids and for autosomal, sex-linked, and cytoplasmic genes in diploids.

Autosomal genes in haploids: The QLE approximation for autosomal inheritance in a haploid population with two sexes was found by BT91. This section rederives their result to illustrate the new notation and how to use Equation 19 to find a QLE approximation.

The context for each gene now contains only its sex of carrier. That is because an individual carries only one gene at each locus, rather than the two that must be distinguished in the case of diploids. The comments following Equation 19 imply that for this case the first sum on its right side reduces to (tA←AfD∼Af+tA←AmD∼Am), where Af stands for set A with the sexes of carrier for all its positions converted to f and similarly for Am. The associations among a set of autosomal loci are equal in male and female zygotes, so D∼Af=D∼Am, and further those quantities must be equal to D∼A on the left side of Equation 19 because all the positions in A must have the same sex of carrier. With no sex differences in recombination we have tA←Af=tA←Am=(1−rA)∕2, where (1−rA) is the probability that the loci in set A are not broken apart by recombination, and the factor of ½ accounts for the probability that genes in set A were inherited from a given parent.

Putting those facts together gives the QLE approximation
D∼A=1rAΣU:U=AtA←UaUpqU+O(a2).(20)
This is equivalent to BT91's Equation 25. Some superficial differences are caused by three changes in notation. Their result is expressed in terms of recombination rates rather than transmission rates. Second, BT91 separately defined within-male, within-female, and between-sex (nonrandom mating) selection coefficients; all of these are included in the sum on the right side of Equation 20. Last, they counted separately selection coefficients with different permutations of the same set of positions, which generates the combinatorial terms in their expression.

Autosomal, sex-linked, and cytoplasmic genes in diploids: Now consider autosomal genes in a diploid population with two sexes. The context for a gene now includes both its sex of carrier and sex of origin. We allow for nonrandom mating and sex differences in selection and recombination. To simplify the calculation, however, we assume that there is no genetic variation in recombination rates and no genomic imprinting (that is, an allele's sex of origin does not affect its expression). The approach outlined here can be directly extended to allow for more than two sexes, as might be appropriate to describe a population with partial selfing.

Careful consideration of Equation 19 shows that the associations fall into three cases. Case 1 are associations among a set of positions A that include both sexes of carrier. The QLE approximation for these associations is simply D∼A=0. That is because these represent associations between genes in two or more randomly chosen zygotes.

Case 2 are the associations between a set of positions that all have the same sex of carrier, but some have a male and others a female sex of origin. This kind of association exists for some sets of positions (for example, autosomal), but not others (for example, sets with only cytoplasmic loci). If they do exist, Equation 19 gives the QLE approximation
D∼A=ΣU:U=AtA←UaUpqU.(21)
These associations come from nonrandom mating in the previous generation: Associations between genes with different sexes of origin within an individual appear when there are correlations between the genotypes of mating males and females in the previous generation. These associations, which include Hardy-Weinberg disequilibria (an excess or deficit of heterozygotes), are zero under random mating because then the selection coefficients for positions with both sexes of carrier are of order a2. The transmission coefficient tA←U can be translated into recombination rates according to the way that the genes in set A are inherited, as discussed above in the section on transmission.

Case 3, the last category of association, is when all positions in A have the same sex of carrier and all have the same sex of origin. Here DA represents an association among genes within a single individual that were inherited from the same parent. The QLE approximations for this case depend on how the genes in set A are inherited. They can be calculated by first writing out Equation 19 for the associations that do exist, given the mode of inheritance, out of the four possible cases D̃Aff,D̃Afm,D̃Amf, and D̃Amm, where, for example, Afm means that all positions in set A have a female sex of carrier and a male sex of origin. Inspection of the transmission coefficients reveals that these expressions do not depend on any associations that do not exist (e.g., D̃Afm does not depend on D̃Amm when all genes in set A are X-linked, because tAfm←Amm = 0). Last, solve the resulting equations. That procedure leads to the following results for autosomal, X-linked, Y-linked, and cytoplasmic genes. The transmission coefficients used in the calculations are shown in Table 3.

When all the genes in set A are autosomal, all four of the possible case 3 associations exist. Solving Equation 19 then gives
D∼Afx=D∼Amx=(pqArAf+rAm){[aAx+F(Axx)](1+rA∼x)+[aA∼x+F(Ax∼x)](1−rAx)},(22)
where
F(Axy)=ΣS+T=A∗tAxy←{SyfTym}asfTm.(23)
Here rAx is the recombination rate for set A in sex x, where x can take the values m and f, and ∼x stands for the opposite sex of x (for example, if x = f then ∼x = m). The term F(·) results from nonrandom mating, which creates associations between alleles inherited from different parents in the next generation. These alleles are brought together in single gametes by recombination, producing associations within the same gametic genome (i.e., linkage disequilibria) two generations later. The summation in (23) is over all the different ways that the set of loci A can be partitioned into nonnull sets. With A = {i, j, k}, for example, the sum includes six terms: S = {i} and T = {j, k}, S =
{j, k} and T = {i}, S = {j} and T = {i, k}, etc. F(·), which appears in results below, vanishes under random mating. The selection coefficients aAf and aAm reflect epistatic selection within each sex and vanish when there is no epistasis.

When A is a set of either all X-linked genes or a mixture of X-linked and autosomal genes, D̃Amm does not exist. Solving Equation 19 for the remaining three kinds of associations in case 3 gives
D∼Aff=D∼Amf=pqArAf(2−rAm)+rAm×{2[aAf+F(Aff)]+[aAm+F(Afm)](1−rAf)},D∼Afm=pqArAf(2−rAm)+rAm×{2[aAf+F(Aff)](1−rAm)+[aAm+F(Afm)](1+rAf)}.(24)

When A is a mixture of cytoplasmic and nuclear genes or only a set of cytoplasmic genes, then D∼Afm and D∼Amm do not exist. Using the transmission coefficients given in Table 3, we find the remaining two kinds of case 3 associations are
D∼Aff=D∼Amf=pqArAf[aAf+F(Aff)].(25)
If A has only cytoplasmically inherited genes, then rAf=0 and that expression becomes undefined. In this situation, there is no recombination to break down associations generated by selection, so they become large and the QLE approximation fails.

A similar situation occurs with Y-linkage. When A is a set of Y-linked genes or a mixture of Y-linked and autosomal genes, the only kind of case 3 association that exists is
D∼Amm=pqArAm[aAm+F(Amm)].(26)
If A includes only Y-linked loci, then rAm=0 and again there is no QLE approximation for D∼Amm.

We stop our inventory of the case 3 associations at this point. There are modes of transmission not discussed above, as, for example, when male, female, and hermaphroditic individuals occur in the population. Associations for those cases can be calculated, however, from Equation 19 using the same method.

Changes in allele frequencies at QLE: When a population is in quasi-linkage equilibrium, changes in allele frequencies can be approximated by simple expressions. The exact expression for allele frequency change is given by Equation 10. The QLE approximation for Δpi is found by substituting into that equation the QLE approximations for the associations. We saw in the previous section that the approximations for the D depend on the specifics of the genetic system that is being modeled. When the model consists of diploid autosomal loci, for example, the associations are given by Equations 21 and 22.

MUTATION AND MIGRATION

Mutation and migration are two other deterministic forces that change the genetic composition of a population. This section shows how they change allele frequencies and the associations between loci.

Mutation: While the effects of mutation on allele frequencies have been understood since Haldane (1927), its effects on associations among loci have not been fully worked out. Indeed, it seems to us that the general case, in which there is an arbitrary matrix of mutation rates between alternative alleles and an arbitrary representation of allelic state, does not lead to simple expressions. Bürger (2000, p. 190) gives expressions for the effects of “random walk” mutation, in which the change in allelic effect Xi has a distribution independent of its current value, and for “house of cards” mutation, in which the absolute value of the new mutation has a constant distribution. With multiple alleles, it is natural to represent the allelic state by a vector of length equal to the number of alleles (e.g.,Xi={0,0,1,0} represents a position i carrying the third of four alleles). Then, the change in moments due to mutation at position i is just given by multiplying the moments by the matrix of mutation rates. Baake (2001) uses this representation to derive an explicit expression for the change in multilocus associations due to mutation and weak recombination.

Here, we give the general result for two alleles at each locus, with allelic state taking the values 0 and 1; see appendix b for details. Denote the mutation rate at position i from allele 0 to allele 1 as ui and the reverse mutation rate from 1 to 0 as vi. The change in frequency of allele 1 at position i caused by mutation is
Δpi=viqi−uipi(27)
(Haldane 1927). The associations after mutation are
DA″=DAΠi∈A(1−ui−vi),(28)
where the reference values are equal to the current (postmutation) allele frequencies. This result shows that mutation erodes the associations. A striking fact is that the rate at which they decline depends only on the mutation rates at the positions involved and not on the allele frequencies or other properties of the genetic state of the population.

Migration: The effects of migration on single loci were found by Wright (1931), and later workers have understood that migration can generate associations between loci. General results for the effects of migration on associations, however, have apparently not been worked out previously.

The change in the frequency of allele 1 at position i caused by migration is
Δpi=m(piM−piR),(29)
where piM and piR are the allele frequency among the migrants and residents, respectively (Wright 1931). The exact values for the centered associations after migration are derived in appendix c as
DA″=ΣU⊆A(−m∣U∣)dU{(1−m)DA∖UR+mΣV⊆A∖UdVD(A∖U)∖VM},(30)
where
dU=Πi∈U(piM−piR)(31)
and dØ ≡ 1. The DAM are the centered associations in the migrant population and the DAR are the centered associations among the residents before migration. (That is, the reference values for DAM are the allele frequencies among the migrants, and the reference values for DAR are the allele frequencies in the residents before migration.) The first term in Equation 30 is produced when U=∅ and V=A, giving (1−m)DAR+mdA.

We illustrate the use of this result with two special cases that may be of general interest. The first is the association between pairs of positions generated by migration. The association between positions i and j after migration is in general
Dij″=(1−m)DijR+mDijM+m(1−m)(piM−piR)(pjM−pjR).(32)
The associations following migration are a weighted average of the associations in the contributing populations (the first two terms) and a component caused by differences in allele frequencies in the two populations (the third term).

A second situation that may also be of general interest is when the associations among the residents and migrants are initially zero. Then Equation 30 gives
DA″=dA{(−m)∣A∣(1−m)+m(1−m)∣A∣}.(33)
Thus the associations generated in this case are simply proportional to the differences in allele frequencies between the contributing populations. More generally, Equation 30 shows that if a pair of demes is initially at linkage equilibrium, associations will evolve such that DA remains proportional to the product of allele frequency differences at the loci involved, dA (Barton 2000).

The exact results can be used to find simple approximations for the effects of migration on the associations. Equation 30 shows that the change caused by migration is
ΔmDA=mdA+O(mD,m2).(34)
To find an approximation for DA at a migration-recombination equilibrium, we need the change caused by transmission. Taking the example of a set of autosomal positions with the same sex of origin in a random mating population, that change is ΔtDA=−rA+O(D2). Setting the net change to zero gives a leading-order approximation for the associations at a migration-recombination balance:
D^A=mdArA+O(mD,m2,D2).(35)
That result was derived by Kirkpatrick and Servedio (1999) by treating migration as a form of frequency-dependent selection. Our new approach extends that earlier result in two ways. It can be applied to genes other than autosomes by using the appropriate transmission probabilities. Further, the approximation can be expanded to include higher-order terms, for example, those involving m2 and mD.

APPLICATIONS

This section uses the machinery described above to develop results for the effects of natural and sexual selection. The aim is both to illustrate how these methods work and to develop some results that are biologically interesting in their own right.

In the first application, we find the selection coefficients generated by natural selection acting on an additive polygenic trait and use those results to study how it evolves under autosomal inheritance. Next, we find approximations for the genetic correlation between a female mating preference and a male display trait produced by sexual selection. Then, we see how the mode of inheritance affects this correlation by deriving results for haploid autosomal, diploid autosomal, and diploid X-linked genes.

Quadratic stabilizing selection on an additive polygenic trait: Many problems in evolutionary biology involve evolution of traits controlled by multiple genes of approximately additive effect. In this section we derive exact expressions for the selection coefficients on single positions and sets of positions that result from quadratic stabilizing selection. These can be used to calculate the evolutionary changes in the mean, variance, and higher moments of the trait. The same methods can be used to calculate selection coefficients more generally, for an arbitrary form of selection acting on any number of additive genes.

Consider a trait controlled by a set of genes with additive effects under stabilizing selection. The calculations illustrate a general strategy for calculating selection coefficients: Write an explicit model for the phenotype, write the fitness function as a polynomial, substitute the expression for the phenotype into that fitness function, equate the result with Equation 7, and finally pick out the coefficients of the fitness function that correspond to the a's. This example is very similar to one in BT91 (p. 244). It introduces readers who are not familiar with that article to the approach and shows those who are how the new notation works.

The model for the phenotype of an individual comes from Equation 6, which simplifies under our assumption that genes have additive effects,
Z=Z¯+Σi∈Wbiζi+eZ,(36)
where eZ, the random environmental component, has variance Ve. The sum is over all positions i that affect expression of the trait and hence fitness; for example, with autosomal genes in a diploid male, the set W includes positions inherited from both males and females: {imm, imf...}.

Our model for fitness is the quadratic function
W(Z)=1−Z22ω2,(37)
where ω2 is the width of the fitness function and is inversely related to the strength of stabilizing selection. The trait has been scaled so that the fitness optimum is at Z = 0. Clearly, selection must be weak enough that fitness remains positive [Var(Z) ≪ ω2]. This quadratic model is a weak-selection approximation to other forms of stabilizing selection, such as the Gaussian W(Z) ∝ exp(–Z2/2ω2) (see BT91).

Substituting Equation 36 into Equation 37 and averaging over the environmental variation gives
Ee[W(X)]=1−Ve2ω2−12ω2(Σibiζi+Z¯)2=1−Ve2ω2−Z¯22ω2−Z¯ω2Σi∈Wbiζi−12ω2Σi,j∈Wbibjζiζj,(38)
where Ee is the expectation over the distribution of eZ. To find the selection coefficients aU, the fitness function must be put in the form of Equation 7. We do that by rewriting the last sum in Equation 38 to separate out the terms with repeated positions and then using Equation 5 to reduce those terms:
Ee[W(X]=1−Ve2ω2−Z¯22ω2−Z¯ω2Σi∈Wbiζi−12ω2Σi∈Wbi2[pqi−ζi(pi−qi)]−1ω2Σi<jbibjζiζj.(39)
The selection coefficients are found from Equation 39 by dividing through by the mean fitness W̄ and identifying the coefficients of ζi and ζiζj. All coefficients aU involving more than two positions (∣U∣>2) are zero. The selection acting on the single position i is
qi=−biZ¯W¯ω2+bi212W¯ω2(pi−qi).(40)
The first term represents the effect of directional selection, which pushes the mean of the trait toward the optimum at Z = 0. It is proportional to bi, which is the effect that allele 1 at position i has on the phenotype. The second term represents the effect of stabilizing selection, which favors fixation by driving the allele frequency toward 0 if pi<12 and toward 1 if pi>12 (Wright 1935). This effect will be weak when the effect of the locus on the phenotype is small relative to the width of the fitness function (bi≪ω). [Note that if allele frequencies differ in males and females (pimx ≠ pifx) or in genes inherited from males and females (pixm ≠ pixf), then the different positions at a single locus can have different selection coefficients (e.g., aixm ≠ aixf) even if selection acts identically on males and females.]

The selection coefficient acting jointly on positions i and j is given by the coefficient of ζiζj in the fitness function Equation 39:
aij=−1W¯ω2bibjfori≠j.(41)
To see the meaning of this coefficient, adopt the convention of naming the alleles such that allele 1 produces a larger phenotype than allele 0 at each locus; then bi and bj are positive. With stabilizing selection, aij is negative, indicating that selection favors the combination of allele 0 at one locus with allele 1 at the other. This is simply the well-known fact that stabilizing selection produces negative associations. Disruptive selection can be modeled by taking ω2 as negative. In that event, aij is positive: Selection favors positive associations between alleles that increase the trait and also associations between alleles that decrease the trait.

To complete the analysis of this model we find how the population evolves. To determine the state of the population in the following generation, we need to make some assumptions about inheritance, that is, the rules of transmission. To keep things simple, we assume that i and j are autosomally inherited diploid genes, that mating is random, and that there is no meiotic drive. We define the reference values for each position to be the frequency of allele 1 there.

The change in allele frequencies caused by selection is found using Equation 10 with some help from Equation 5,
Δpi=ΣU⊆WqUDiU=aipiqi+Σj≠iajDij+Σj≠iaijDiij+Σj,k≠ij<kajkDijk=aipqi+Σj≠i[aj−aij(pi−qi)]Dij+Σj,k≠ij<kajkDijk,(42)
where the special cases such as i=j have been separated out. Note that the sums are over all positions and so include contributions from genes on both maternal and paternal genomes.

Since transmission does not change allele frequencies, the overall change in the mean of the trait from the start of one generation to the next can be found by summing Δpi and substituting for the ai, aij from Equations 40 and 41. This leads to the simple expression
ΔZ¯=ΣibiΔpi=−Z¯W¯ω2Σi<jbibjDij−12W¯ω2Σj,k≠i,j<k,bibjbkDijk.(43)
The result is a sum of two terms. The first is the standard equation for the selection response of a quantitative trait under directional selection: It is the product of the directional selection gradient, which in this case is –Z̄/ω2, and the additive genetic variance for the trait, which appears as the first summation on the right side. The second term is the result of stabilizing selection acting on the genetic variance. That form of selection also changes allele frequencies, and hence the trait mean, if the trait distribution is skewed: The second sum is just the skew of the distribution of breeding values of Z (Bürger 1991; Turelli and Barton 1994).

Now consider the change in the association Dij. Using Equations 5 and 9, we find that the (uncentered) value after selection is
Dij′=ΣU⊆WaU(DijU−DijDU)=Dij+(aiDiij+ajDijj)+Σk≠i,jakDijk+aij(Dijij−Dij2)+Σk≠i,jaik(Diijk−DijDik)+Σk≠i,jajk(Dijjk−DijDjk)+Σk<l,k,l≠i,jakl(Dijkl−DijDkl),(44)
where again, the various special cases have been separated out. The expression can be further simplified using Equation 5 to reduce associations with repeated indices, as above. It is convenient to change the reference values at this point from the old allele frequencies to the new. Using Equation 15,
Dij″=Dij′−ΔpiΔpj−ΔpiΔpj+ΔpiΔpj=Dij′−ΔpiΔpj.(45)
Finally, the association at the start of the next generation is given by Dij″′=(1−rij)Dij″ for positions, i,j inherited from the same parent and 0 for positions inherited from different parents. Just as for the trait mean, this procedure again leads to a relatively simple expression for the genetic variance at the start of the next generation (cf. Equations 53 and 54 of BT91). However, this expression involves third- and fourth-order associations, and so approximations are required to obtain a closed set of equations.

Sexual selection by female choice: We mentioned earlier that the multilocus machinery developed here can be used to study the genetic consequences of nonrandom mating. This section shows how to calculate a QLE approximation for the genetic correlation (or covariance) between a female mating preference and male display trait that is generated by sexual selection. This quantity is important to many theories about sexual selection (Kirkpatrick and Ryan 1991). Previous work has calculated the covariance expected under autosomal inheritance in diploids (Lande 1981; Barton and Turelli 1991) and haploids (Kirkpatrick 1982; Kirkpatrick and Barton 1997). Here we extend the earlier results by finding the covariance when some genes are sex linked. In addition to the biological interest in the result, the derivations illustrate how nonrandom mating and nonautosomal inheritance are modeled in our framework.

Consider a pair of characters, one expressed in females and the other in males, that together affect the probability that a male and a female will mate. We refer to the first as a “preference” and the second as the “male trait.” In fact, the preference need not be a behavioral phenotype: It can be any character that affects mating probabilities. The value of a female's preference phenotype is denoted P and that of the male trait is T. A set of genes P affects the preference and a set T affects the male trait. We assume that these sets are disjoint (that is, no loci have pleiotropic effects on the preference and male trait) and that both sets act additively, so that the preference and male trait phenotype of an individual can be written in the form of Equation 36. We further assume that the parent from which a gene was inherited (its sex of origin) does not affect the gene's expression. Carrying allele 1 at position i rather than allele 0 increases the preference phenotype by biP; the corresponding effect of allele 1 at position j on the male trait is bjT.

We begin by calculating the selection coefficients, which are independent of the inheritance rules. We then use them to find QLE approximations for the preference-trait covariance under three types of inheritance: haploid autosomal, diploid autosomal, and diploid X-linked.

Selection coefficients: Derivation of the selection coefficients follows Kirkpatrick and Barton (1997). The phenotypic distribution of the preference among females at birth, fP(·), has mean P̄ and variance σP2; the corresponding distribution fT(·) for the trait among males at birth has mean T̄ and variance σT2. The frequency of matings between a female with preference phenotype P and a male with trait phenotype T, denoted M(P, T), has means P̄* and T̄*, variances σP∗2 and σT∗2, and correlation ρPT. We define the selection group as a mated pair of individuals. The relative fitness of a selection group is defined as its frequency after selection divided by its frequency before. The frequency after selection is given by M, while the “frequency” of a mated pair before pairing occurs is simply the frequency with which it would occur under random mating.

We now make two kinds of approximations. Assume that the preference and trait are not evolving rapidly, so that P̄* ≈ P̄, T̄* ≈ T̄, σP∗2≈σP2, and σT∗2≈σT2. Next, approximate fP, fT, and M with Gaussian densities. The relative fitness of a mated pair in which the female has preference P and the male has trait T is
w(P,T)=M(P,T)fP(P)fT(T)≈11−ρ2exp{[−ρ((P−P¯)2σT2ρPT−2(P−P¯)(T−T¯)σPσT+(T−T¯)2σP2ρPT)]∕(2(1−ρPT2)σP2σT2)}.(46)
To calculate the selection coefficients, the fitness function must be expressed as a polynomial. Taking the first-order Taylor series of w(P̄,
T̄) around P and T gives
w(P,T)≈11−ρPT2+ρPT(P−P¯)(T−T¯)(1−ρPT2)3∕2σPσT≈1+ρPT(P−P¯)(T−T¯)σPσT.(47)
The last step, which is not required for the QLE approximation, linearizes the fitness function in ρPT. The result is quite accurate for ρPT < 0.4.

Now substitute the expressions for P and T written in the form of Equation 36 into Equation 47. The selection coefficient for a set of loci A is then again given by the coefficient of ζA in the fitness function. We find that the selection coefficient for a preference position i in a female and a trait position j in a male is
aifjm≈ρPTbifPbjmTσPσT.(48)
The context for each locus has only the sex of carrier (female for the preference locus i, male for the trait locus j). That is because of our assumption that the parent from which a gene was inherited (its sex of origin) does not affect its expression.

This result shows that the force of sexual selection that unites a female preference gene with a male trait gene is simply proportional to ρPT, the phenotypic correlation between the preference and trait among mated pairs. It is also proportional to the size of each gene's effect on the phenotype relative to the character's phenotypic standard deviation. Selection coefficients for all other sets of positions are 0.

These selection coefficients are valid regardless of how the genes affecting the preference and the male trait are inherited. In the following two sections we find the genetic correlation generated by this type of selection when the loci are haploid autosomal, diploid autosomal, and diploid X-linked. The examples show how the QLE approximation accommodates different modes of inheritance.

Haploid autosomal inheritance: We begin by calculating the genetic correlation in males between the female preference and male trait. The definition of the additive genetic correlation in zygotes is
rPT=GPTGPGT,(49)
where GPT is the genetic covariance between the preference and male trait, GP is the additive genetic variance of the preference in females, and GT the genetic variance of the trait in males. The genetic variances are
GP=Σi,j∈PbiPbjPDij=Σi∈P(biP)2pqi+O(a)GT=Σi,j∈TbiTbjTDij=Σi∈T(biT)2pqi+O(a).(50)
In haploids, a context needs to keep track of a gene's sex of carrier only, so the covariance is
GPT=Σi∈PΣj∈TbifPbjmTDimjm.(51)
When the preference and trait loci are autosomal, the covariance is the same in male and female zygotes. Now substitute the selection coefficients we just calculated (Equation 48) into the expression for the associations at QLE (Equation 20), which gives
D∼im,jm=(timjm←ifjmrij)aifjmpqij=12ρPTbifbjmσPσTpqij.(52)
Assembling these results shows that the genetic covariance and genetic correlation between a female mating preference and a male display trait in haploids,
GPT≈12ρPThPhPhTGPGT,rPT≈12ρPThPhT,(53)
where hP and hT are the square roots of the heritabilities of the preference in females and the trait in males. This result agrees with Kirkpatrick and Barton (1997), who used the BT91 framework for their calculation.

Diploid autosomal inheritance: In diploids, there are two positions at the preference locus and two at the trait locus, corresponding to copies of those genes inherited from mothers and fathers. The genetic covariance between preference and trait in males is therefore
GPTm=Σi∈PΣj∈TbifPbjmT(Dimfjmf+Dimfjmm+Dimmjmf+Dimmjmm).(54)
The genetic variances are twice the values given by Equation 50 because diploid zygotes carry two haploid genomes. The covariance in female zygotes has the same value because the loci are autosomal.

To find those associations, we use Equations 21 and 22. The function F(·) that appears there is calculated using (23):
F(ijff)=F(ijmf)=tiffjff←iffjfmaifjm=12rijfaifjm,F(ijfm)=F(ijmm)=tifmjfm←imfjmmaifjm=12rijmaifjm.(55)
The QLE approximations for the associations are found by substituting the selection coefficient from Equation 48 into those expressions and then those results into (21) and (22), giving
D∼imfjmf=D∼immjmm=12ρPTbifPbjmTσPσTpqij,D∼imfjmm=ρPTbifPbjmTσPσTpqij,D∼immjmf=0.(56)
Substituting these into (54) shows that the genetic covariance and genetic correlation are the same as those we found for the haploid case, Equations 53. This is a useful result, as it shows that several basic conclusions based on haploid models regarding the evolution of mating preferences (Kirkpatrick and Barton 1997) and reinforcement (Kirkpatrick and Servedio 1999) carry over to diploids. An interesting conclusion is that the genetic correlation is independent of the recombination rates, even when they differ between the sexes.

X-linked inheritance: To illustrate how our methods extend to other forms of inheritance, consider next a case in which genetic variation in a female mating preference is X-linked while variation in the male trait is autosomal. We see that the genetic covariance and correlation in males are different than when both characters are autosomally inherited.

The preference-trait covariance in males is
GPTm=Σi∈PΣj∈TbifPbjmT(Dimfjmf+Dimfjmm).(57)
Because males do not inherit an X chromosome from their mothers, the terms Dimmjmf and Dimmjmm that appear in the diploid case (see Equation 54) do not exist. The genetic variance for the trait is given by twice the expressions in Equation 50 for haploids, because those loci are diploid and autosomal, while the variance for the male trait is given by (50), because males have only one haploid genome for X-linked loci.

The associations that appear in Equation 57 are given by Equations 21 and 24. The values for the function F(·) that appears in (24) are now
F(ijff)=F(ijmf)=tiffjff←iffjfmaifjm=14aifjm,F(ijfm)=tifmjfm←imfjmmaifjm=12aifjm,F(ijmm)=timmjmm←imfjmmaifjm=0.(58)
The QLE approximations for the associations in males are therefore
D∼imf,jmf=35aifjmpqij=35ρPTbifPbjfTσPσTpqij,D∼imf,jmm=aifjmpqij=ρPTbifPbjmTσPσTpqij.(59)

Putting these facts together shows that in male zygotes the genetic covariance and correlation between the mating preference and male trait are
GPTm≈45ρPThPhTGPGT,rPTm≈45ρPThPhT.(60)
An interesting conclusion is that the genetic covariance and correlation are 60% larger here than they are when the preference and male trait are both autosomally inherited (given by Equation 53). Thus the impact of indirect selection on a female mating preference depends on how the preference and trait are inherited.

DISCUSSION

We have set out a general notation that describes arbitrary modes of selection and genetic transmission. The key components are the representation of genotype frequencies in terms of means and higher moments of the distribution of allelic states, of selection as a polynomial function of genotype, and of genetic transmission as the movement of genes between different contexts. The first two components are already well developed, particularly in models of additive quantitative traits. The main contribution of this article is to combine them with a generalized representation of transmission. The calculations can be automated, as described in appendix d.

How does a general multilocus notation help us to better understand evolution? A set of equations for changes in genotype frequencies can be derived automatically for arbitrary models, but will in all but the simplest cases be impenetrably complicated. The value of an algebraic expression written in terms of multilocus moments or cumulants is that it allows one to identify and interpret the key processes responsible for evolutionary change. For example, Equations 30, 31, 32, 33, 34, 35 show that migration builds up associations among loci in proportion to the product of the allele frequency differences. A second example comes from the analysis of sexual selection. The analysis shows that the genetic correlation between a female preference and a male trait is directly proportional to the phenotypic correlation between the preference and trait in mating pairs. Further, the genetic correlation depends on the way in which the preference and trait genes are inherited (Equations 53 and 60).

Defining a model in a standard notation can reveal similarities among apparently different mechanisms. For example, if the indirect selection on a modifier of recombination is expressed in terms of selection coefficients (aU), it can be shown to depend on the effects of recombination on the mean and variance of log(fitness), regardless of the causes of fitness variation (Barton 1995). An unambiguous notation may also clarify conceptual issues. We believe that models of group selection using our notation may clarify definitions of fitness and of “levels of selection.”

Models of multiple loci are most fruitful when combined with appropriate approximations. The best developed is the QLE approximation, which assumes that processes such as epistasis and migration that generate associations among loci are weak, relative to those that break them down, such as recombination, segregation, and mutation. This leads to simple expressions for associations of all orders and is likely to be accurate for most sexually reproducing populations. Several workers have explored this approach, using different measures of association (linkage disequilibrium). Table 4 summarizes these measures and their corresponding versions of the QLE approximation. (For a more detailed treatment, see Bürger 2000, pp. 82 and 183–190.)

The different measures can be divided into two classes. Most measures are defined for each genotype, usually as a difference between its actual frequency and
the frequency expected at linkage equilibrium. In contrast, we define associations as moments of the distribution of allelic states; these moments are defined for each set of positions (Di, Dijk, etc.), and correspond in a natural way to a polynomial representation of selection. Moreover, they lead to a simple representation of transmission (Equation 12). When selection acts on an additive quantitative trait, the effects of selection are more elegantly represented in terms of the cumulants of the distribution of allelic effects, rather than the moments (Bürger1991, 2000; Turelli and Barton 1994). (This is because the cumulants of the trait distribution are then simply sums over multilocus cumulants of the same order.) Since we do not deal only with additive traits in this article, we have used moments throughout. Moments and cumulants are equivalent to leading order when the moments are small, which holds under QLE. In any event, it is simple to transform between the two representations as necessary (e.g., using the Mathematica packages).

The multilocus notations used by Christiansen (1999) and Bürger (2000) are closest to that used here. The main difference is that we deal with sets of genes in context, or positions, which allows us to avoid restrictive assumptions such as autosomal diploid inheritance, random mating, and equal transmission rates in males and females. The relation between the notations can be illustrated by comparing expressions for associations among loci at QLE. Christiansen's (1999) Equation 7.19 for the associations among a set of loci M in a gamete is
D∼M≈[εC^M][π^MR(∅)π^MR(M)]1−2RM(0).(61)
Here, [ϵĈM] is a measure of epistasis among the loci in set M; under his assumption of no sex differences in selection, it is equivalent to our selection coefficients aMff, aMfm, aMmf, and aMmm. The quantity [π^MR(∅))π^MR(M)] is the frequency of M gametes at linkage equilibrium, which is equal to our product of allele frequencies pqM. Last, RM(Ø) is the chance that at meiosis, a gamete derives all the genes in the set M from the maternal genome, which is (1 – rM)/2 in our notation. Under Christiansen's assumptions of autosomal inheritance, random mating, equal selection in males and females, and equal recombination rates in males and females, our Equation 22 gives
D^Mfx=D∼Mmx=aMxypqMrM.(62)
Thus the two approaches are consistent.

Bürger's (2000, p. 188) expression for cumulant associations at QLE in generation t is
cm(t)=h∼m,LE(C,t)rm+O(s2).(63)
Here, m is a vector containing the number of times that each allele is included in the association; for example, with positions {i,j,k}, Bürger's c{1, 0, 2} corresponds to our Dikk. (Note that Christiansen's 1999 notation is restricted to two alleles per locus, whereas Bürger's 2000 and ours allow multiple alleles.) The term h̃m,LE(C, t) is a composite measure of epistasis and allele frequencies. It is equal to the change due to selection at linkage equilibrium, which is aMpqM in our notation. Because the corresponding cumulants and moments are equal to leading order when QLE applies, Equations 62 and 63 agree. Thus Bürger's and our approaches are consistent.

Although the QLE results developed in this article are consistent internally and with independent derivations, we have not rigorously shown that this quasi-equilibrium is unique or that the population will always converge to it when the assumptions are met. Nagylaki (1993; see also Nagylakiet al. 1999) showed that autosomal loci in a random mating diploid population under weak selection converge to a QLE. In this article we have relaxed his assumptions to allow for nonrandom mating, other forms of inheritance, and other evolutionary forces (migration and mutation). Since the effects of migration and mutation are equivalent to forms of frequency-dependent selection, Nagylaki's results should also apply when they act so long as the effective selection coefficients that they generate are small. The consequences of nonrandom mating and variations in inheritance are more difficult to account for. We expect convergence to our QLE values whenever all the a's are sufficiently small and all eigenvalues of the matrix of transmission coefficients tA←U are of order 1, but that conjecture remains to be proven.

The QLE approximation developed here can be used to study a variety of interesting models. Selection on the genetic system (recombination, selfing, and mutation rate, for example) can be studied by assuming a modifier allele of small effect. Even if the system as a whole is under strong selection, associations involving the modifier will be weak and can therefore be modeled by a set of linear equations (Barton 1995). The infinitesimal model is based on a different kind of approximation, in which the limit of a large number of loci, each with infinitesimal effect, is taken. Turelli and Barton (1994) give a heuristic argument that extends this model to allow for linkage and epistasis. However, as Bürger (2000, p. 189) points out, this extension remains to be proven rigorously.

The biggest lacuna in the framework presented here is the lack of a model for random genetic drift. Its effects can be included by accounting for the variation in allele frequencies and associations caused by random sampling. The effects of drift on allele frequencies and pairwise associations have been well studied (Ewens 1979), but exact results and approximations for the joint probability distribution of the higher-order associations remain to be worked out. A basic implementation of random drift is included in the Mathematica packages.

For the future, there is considerable scope for applying the methods set out in this article to bring together analyses of particular models and to explore better ways to approximate general multilocus systems.

Acknowledgments

We thank Ophélie Ronce, Maria Servedio, Stuart Thomas, and two careful reviewers for their comments on the manuscript. We are grateful for support from the National Science Foundation (grant DEB-9973221), the Biotechnology and Biological Sciences Research Council (postgraduate studentship no. 97/B1/G/03163 to NB/TJ), the Wellcome Trust (International Prize Travelling Research Fellowship no. 061530 to T.J.), the Scottish International Education Trust (travel grant to T.J.), and Darwin Trust for funding.

APPENDIX A: A TWO-LOCUS EXAMPLE

The goal is to develop a simple example that illustrates the notation and calculations involved in selection, transmission, and the QLE approximation. We look at the effects of viability selection acting on two diploid autosomal loci in a random mating hermaphroditic population.

Consider a simple selection scheme in which allele 1 at locus i changes relative fitness by si and allele 1 at locus j by sj. Interactions between the loci also affect fitness. More specifically, assume that each interaction between an allele 1 at locus i and an allele 1 at locus j changes relative fitness by eij. These assumptions lead to the fitnesses for diploid genotypes shown in Table A1.

Because mating is random, we can define the selection group as a single individual. Since the population is
hermaphrodite, the contexts for the two loci need only to give the sex of origin for each gene. For example, im stands for a gene that was inherited via a sperm.

To calculate the selection coefficients (the a's), first write the fitnesses that appear in Table A1 in the form of a polynomial in the X's:
W(Xif,Xjf,Xim,Xjm)=1+si(Xif+Xim)+sj(Xjf+Xjm)+eij(XifXjf+XifXjm+XimXjf+XimXjm).(A1)
Next, write out the fitness function that defines the selection coefficients, Equation 7:
W(Xif,Xjf,Xim,Xjm)=W¯{1+aif[(Xif−pi)−Dif]+aim[(Xim−pi)−Dim]+ajf[(Xjf−pj)−Djf]+ajm[(Xjm−pj)−Djm]+aifjf[(Xif−pi)(Xjf−pj)−Difjf]+aifjm[(Xif−pi)(Xjm−pj)−Difjm]+aimjf[(Xim−pi)(Xjf−pj)−Dimjf]+aimjm[(Xim−pi)(Xjm−pj)−Dimjm]+aifim[(Xif−pi)(Xim−pi)−Difim]+ajfjm[(Xjf−pj)(Xjm−pj)−Djfjm]+aifimjf[(Xif−pi)(Xim−pi)(Xjf−pj)−Difimjf]+aifimjm[(Xif−pi)(Xim−pi)(Xjm−pj)−Difimjm]+aifjfjm[(Xif−pi)(Xjf−pj)(Xjm−pj)−Difjfjm]+aimjfjm[(Xim−pi)(Xjf−pj)(Xjm−pj)−Dimjfjm]+aifjmjfjm[(Xif−pi)(Xim−pi)(Xjf−pj)×(Xjm−pj)−Difimjfjm]}.(A2)
There are 15 selection coefficients, as required with four positions with two alleles each (15 = 24 – 1). Equating (A1) and (A2) and identifying coefficients with the same combinations of X's gives the selection coefficients
aif=aim=siW¯,ajf=ajm=sjW¯,aifjf=aimjm=aifjm=aimjf=eijW¯.(A3)
The other 8 selection coefficients that appear in (A2) are 0.

The change in the frequency of allele 1 at locus i is calculated using Equation 10. Because the population is hermaphroditic, allele frequencies and associations in genes inherited via sperm will be equal to those inherited via eggs: pif = pim ≡ pi and Difjf = Dimjm ≡ Dij. After one generation of random mating, associations involving more than one sex of origin (e.g., Difjm) are zero. The exact change in allele frequency will then be
Δpi=Δpif=tif←if{aifpqi+ajfDij+aifjfDiij}+tif←im{aimpqi+ajmDij+aimjmDiij}=12{siW¯pqi+sjW¯Dij+eijW¯Diij+}12{siW¯pqi+sjW¯Dij+eijW¯Diij}=1W¯[sipqi+sjDij+eij(1−2pi)Dij].(A4)

That result can be written entirely in terms of the selection coefficients and allele frequencies once the population reaches QLE. At that point, Equation 22 shows that the association between the loci is
Dij=(pqij2rij)[aifjf(1+rij)+aimjm(1+rij)]+O(a2)=eijpqijrij+O(a2),(A5)
where by O(a2) we mean terms that are no larger than a constant of order 1 times the largest of the a's given above in Equation A3. At QLE, the mean fitness is to O(a) equal to the mean fitness of a population with no associations (at linkage equilibrium):
W¯=1+2sipi+2sjpj+reijpipj+O(a2).(A6)
Putting together these last three results shows that the change in frequency of allele 1 at locus i in a population at QLE is
Δpi=1W¯[sipqi+sjeijpqijrij+(1−2pi)eij2pqijrij]+O(a3)=sipqi(1−2sipi−2sjpj−4eijpipj)+eijpqijrij[sj+(1−2pi)eij]+O(a3).(A7)

APPENDIX B: MUTATION

The centered associations following mutation can be written as
DA″=EX,M[Πi∈A((Xi+δXi)−(pi+Δμpi))],(B1)
where δXi is the random change to allelic state at position i caused by mutation. The expectation EX,M [] is over random mutational events, M, as well as over allelic states, X. Δμpi is the change in the frequency of allele 1 at position i. Denoting the mutation rate at position i from allele 0 to allele 1 as ui and the reverse mutation rate from 1 to 0 as vi, that change is
Δμpi=uiqi−vipi.(B2)Equation B1 can be simplified by defining the random variable R[x] as having value 1 with probability x and zero otherwise. Let R̂[x] ≡ R[x] – x. If Xi = 0, then δXi =+R[ui], and if Xi = 1, then δXi =–R[vi]. Hence, δXi = (1 – Xi)R[ui] – XiR[vi]. Rewriting this in terms of R̂ and ζi = Xi – pi,
δXi=(qi−ζi)(ui+R^(ui))−(pi+ζi)(vi+R^(vi)).(B3)
Substituting into Equation B1,
DA″=EX,M[Πi∈A(pi+ζi+(qi−ζi)(ui+R^(ui))−(pi+ζi)(vi+R^(vi))−pi−Δμpi)]=EX,M[Πi∈A(qiR^(ui)−piR^(vi)+ζi(1−ui−R^(ui)−vi−R^(vi)))].(B4)
Now, the R̂ are independent across loci and are independent of allelic state and so vanish when the expectation over mutational events is taken. This leads immediately to Equation 28.

APPENDIX C: MIGRATION

We calculate the centered associations following a migration event. The result is most easy to interpret when it is expressed in terms of the centered association among the residents and migrants before the event. We therefore define the reference values in the resident population, ℘iR, to equal the allele frequencies in that population before migration, piR.

Just after migration, the (uncentered) associations are given by Equation 3,
DA′=EX[Πi∈A(Xi′−piR)],(C1)
where Xi′ is the allelic state of an individual in the population after migration. That expression can be rewritten in terms of XiM, the allelic states of the migrants, and XiR, the allelic states of the residents, as
DA′=(1−m)EX[Πi∈A(XiR−piR)]+mEX[Πi∈A(XiM−piR)]=(1−m)DAR+mDAMR,(C2)
where m is the migration rate. The DAR are the (centered) associations among the residents before migration, while DAMR are the uncentered associations of the migrants measured relative to the resident reference values, ℘iR=piR.

The next step is to express the DAMR in terms of the centered associations in the migrant population, DAM. To do that we need to rescale those associations in terms of the reference values for the migrant population, ℘iM. Using Equation 15 for changing reference values and some algebra gives
DAMR=ΣU⊆AdUDA∖UM,(C3)
where dU was defined earlier in Equation 31. The sum includes the term in which U equals the empty set Ø, which generates the term DAM. Substituting this expression for DAMR in Equation C2 produces
DA′=(1−m)DAR+mΣU⊆AdUDA∖UM.(C4)
These associations DA′ are based on the reference values for the residents before migration, ℘iR=piR.

To find the central moments after migration, we set the new reference points to the allele frequencies after migration: ℘i″=(1−m)piR+mpiM. Using Equation 15 as before gives Equation 30, above.

APPENDIX D: MATHEMATICA EXAMPLES

Here, we give some examples that outline how Mathematica (Wolfram 1999) can be used to find algebraic expressions for the changes in associations due to various evolutionary processes. Software that extends Mathematica to do the calculations described in this article is available on the Web at http://helios.bto.ed.ac.uk/evolgen. The notation used by that software is essentially the same as that in the text, but there are a few differences that need to be explained. The examples below show the user's input to Mathematica in boldface type and the program's output in regular type.

Stages in the life cycle: Different stages in the life cycle are denoted by the first element in the context, rather than by primes. Thus, the association between two positions in male gametes would be written D{i{G,m},j{G,m}}, where G denotes the gamete stage. (Special German font is used to denote contexts to avoid clashes with other symbols.) Similarly, the association between two positions in a male diploid zygote, one inherited from the mother and one from the father, would be written D{i{Z,m,m},j{Z,m,f}}.

The first step is to define the contexts for each stage. For example, this defines gamete (G), zygote (Z), adult (A), and new gamete (G∗) stages:
DefineContext[G,{Sex}];DefineContext[Z,{Sex, SexOfOrigin}];DefineContext[A,{SexOfOrigin}];DefineContext[G∗,{Sex}];

Recombination: Expressions for the mean and for the associations at one stage are expressed in terms of variables at the previous stage by applying various rules. For example, the mean contribution of a gene in the new gamete pool is
mj{G∗,m}∕∕.Recombination[]mj{A,m,m}r{},{j}+mj{A,m,f}r{j},{}.
Here, //. denotes the application of a transformation rule, and r{},{j} is a generalized recombination rate, which gives the chance that the gene j was inherited from the father at meiosis. Further rules apply the usual assumptions of Mendelian inheritance,
mj{G∗,m}∕∕.Recombination[]∨SymmetricRecombination[]12mj{A,m,f}+12mj{A,m,m},
where rule1 ∨ rule2 denotes a collection of rules. When applied to an association, the result depends on the change in the reference values, ℘, defined for each stage:
D{i{G∗,m},j{G∗,m}∕∕.Recombination[]∨SymmetricRecombination[]∕∕Simplifyr{},{i,j}(D{i{A,m,f},j{A,m,f}}+D{j{A,m,f}}(−Pi{G∗,m}+Pi{A,m,f})−(D{i{A,m,f}}−Pi{G∗,m}+Pi{A,m,f})(Pj{G∗,m}−Pj{A,m,f}))+r{i},{j}(D{i{A,m,m},j{A,m,f}}+D{j{A,m,f}}(−Pi{G∗,m}+Pi{A,m,m})−(D{i{A,m,m}}−Pi{G∗,m}+Pi{A,m,m})Pj{G∗,m}−Pj{A,m,f}))+r{i},{j}(D{i{A,m,f},j{A,m,m}}+D{j{A,m,m}}(−Pi{G∗,m}+Pi{A,m,f})−(D{i{A,m,f}}−Pi{G∗,m}+Pi{A,m,f})(Pj{G∗,m}−Pj{A,m,m}))+r{},{i,j}(D{i{A,m,m},j{A,m,m}}+D{j{A,m,m}}(−Pi{G∗,m}+Pi{A,m,m})−(D{i{A,m,m}}−Pi{G∗,m}+Pi{A,m,m})(Pj{G∗,m}−Pj{A,m,m})).
Such expressions are simplified by defining the reference values (see below).

Stabilizing selection: Stabilizing selection is represented by defining fitness as a function of genotype. It is convenient to define the trait and fitness separately for each sex, even though they are in fact the same. The expressions can be simplified later:
zm=Σi∈{j,k}bi(Xi{Z,m,m}+Xi{Z,m,f}−1);zf=Σi∈{j,k}bi(Xi{Z,f,m}+Xi{Z,f,f}−1);Wm=1−s2Zm2;Wf=1−s2Zf2.
Fitness depends on two loci, j and k. This defines the set of positions that influence fitness of males and females:
Wm={j{Z,m,m}j{Z,m,f},k{Z,m,m},k{Z,m,f}};Wf={j{Z,f,m}j{Z,f,f},k{Z,f,m},k{Z,f,f}}.
The fitness is now an explicit polynomial function of the states of genes in zygotes:
Wm=1−12s((−1+Xj{Z,m,f}+Xj{Z,m,m})bj+(−1+Xk{Z,m,f}+Xk{Z,m,m})bk)2.
Since we assume two alleles per locus, the fitness must be simplified to remove superfluous higher powers of the X's:
W2m=Expand[Wm]∕∕.Biallelic[];W2f=Expand[Wf]∕∕.Biallelic[];W2m∕∕Simplify12(2+s(−1+Xj{Z,m,f}(1−2Xj{Z,m,m)+Xj{Z,m,m))bj2−2s(−1+Xj{Z,m,f}+Xj{Z,m,m})(−1+Xk{Z,m,f}+Xk{Z,m,m))bjbk+s(−1+Xk{Z,m,f}(1−2Xk{Z,m,m})+Xk{Z,m,m))bk2).
The rule Biallelic[] implements the reduction formula that led to Equation 5.

The effect of selection on the mean contribution of a position is
mi{A,m,f}∕.Selection{{Sex}]∕.{asex−,U−:→(MakeSelectionCoefficient[U,W2sex]∕wb)}∕.℘i−{Z,−,−}:>ρi∕.ReferencePointZero[Z]∕∕Simplifyρi−s2wb(((−1+2ρj)D{i{G,m,f},j{Z,m,f}}+(−1+2ρj)D{i{G,m,f},j{G,m,m}}+2D{i{G,m,f},j{G,m,f},j{Z,mm}})bj2+2((−1+2ρk)D{i{Z,m,f},j{G,m,f}}+(−1+2ρk)D{i{G,m,f},j{Z,m,m}}−D{i{G,m,f},k{G,m,f}}+2ρjD{i{G,m,f},k{G,m,f}}−D{i{G,m,f},k{G,m,m}}+2ρjD{i{G,m,f},k{G,m,m}}+D{i{G,m,f},j{G,m,f}}+D{i{G,m,f},j{G,m,f},k{G,m,m}}+D{i{G,m,f},j{G,m,m},k{G,m,f}}+D{i{G,m,f},j{G,m,m},k{G,m,m}}bjbk+((−1+2ρk)D{i{G,m,f},k{G,m,f}}+(−1+2ρk)D{i{G,m,f},k{G,m,m}}+2D{i{G,m,f},k{G,m,f},k{G,m,m}})bk2).
MakeSelectionCoefficient[U, W2sex] generates the coefficient aU from the fitness function. ReferencePoint Zero [Z] sets the reference values in zygotes equal to the current allele frequencies, which are assumed equal across the sexes. The mean fitness in the denominator is represented by wb and is best substituted later.

The complete life cycle: This set of rules defines the change over the whole life cycle:
rules=(Recombination[]∨SymmetricRecombination[]∨ReferencePointSame[G,{A,Z}]∨ReferencePointZero[{G∗,G}]∨Selection[{Sex}]∨UnionOfGametes[]∨RandomUnion[G]∨Symmetrize[Disequilibrium]∨SymmetricSexes[]);
The mean contributions in the two sexes are assumed equal, as are all the linkage disequilibria; thus, SymmetricSexes[] is applied. Symmetrize [Disequilibrium] simplifies D{k,j} to D{j,k}. ReferencePointSame [G,{A,Z}] sets the reference points in diploid stages to be inherited from the gamete stage, where they are equal to the current allele frequencies. UnionOfGametes[] derives genes at the zygote stage from those in the gamete stage, and RandomUnion [G] makes the further assumption that gametes unite at random.

The new mean depends on selection coefficients up to fourth order:
nm=mj{G∗,m}∕∕.rules∕∕Simplify×12(2mj{G,f}+D{j{G,f},k{G,f}}am,{k{G,m,f}}+D{j{G,f},k{G,f}}am,{k{G,m,m}}+D{j{G,f},j{G,f}}am,{j{G,m,f},k{G,m,f}}+D{j{G,f},j{G,f},k{G,f}}am,{j{G,m,m},k{G,m,m}}+D{j{G,f},j{G,f}}(am,{j{G,m,f}}+am,{j{G,m,m}}+D{j{G,f},k{G,f}}(am,{j{G,m,f},j{G,m,m},k{G,m,f}}+am,{j{G,m,f},j{G,m,m},k{G,m,m}}))+D{j{G,f},k{G,f}}2am,{j{G,m,f},k{G,m,f},k{G,m,m}}+D{j{G,f},k{G,f}}2am,{j{G,m,m},k{G,m,f},k{Z,m,m}}+2D{j{G,f},k{G,f}}D{j{G,f},j{G,f},k{G,f}}am,{j{G,m,f},j{G,m,m},k{G,m,f},k{G,m,m}})
We now substitute in the selection coefficients for this specific scheme:
nms=nm∕∕.{asex−,U−:→(MakeSelectionCoefficient[U,W2sex]∕wb)}∨Biallelic[]∨rules∕.{mi−→ρi}∕∕Simplifyρj{G,f}+swb(bj(1−ρj{G,f})ρj{G,f}(12(1−2ρj{G,f})bj+(1−2ρk{G,f})bk)+D{j{G,f},k{G,f}}bk(12(1−2ρk{G,f})bk))
This expression is the same as Equation 42 with two loci.

Other processes: Non-Mendelian inheritance is handled by interpreting recombination rates appropriately. For example, a cytoplasmically inherited gene is certain to be inherited from the mother:
{r{i,j},{k}r{j},{i,k}}∕.NonMendelian[CytoplasmicLoci→{i}]{r{j},{k},0}r{i,j},{k} is the chance that genes i, j are inherited from the mother and k from the father; if i is cytoplasmically inherited, then this is equal to the recombination rate for the autosomal loci, r{j},{k}. The converse, r{j},{i,k}, is impossible if i is cytoplasmically inherited.

Migration is implemented in a similar way to recombination, as a set of transmission rules. For example, this is the cross-genome association between loci j and k among juveniles, J, after migration:
D{j{J,1,m,m},k{J,1,m,f}}∕.Migration[{1,2}&]M1,1(D{j{G,1,m,m},k{G,1,m,f}}+D{k{G,1,m,f}}(−Pj{J,1,m,m}+Pj{{G,1,m,m})+D{j{G,1,m,m}}(−ρk{J,1,m,f}+Pk{z,1,m,f})+(−Pj{J,1,m,m}+Pj{G,1,m,m})(−Pk{J,1,m,f}+Pk{G,1,m,f}))+M1,2(D{j{G,2,m,m},k{G,2,m,f}}+D{k{Z,2,m,f}}(−Pj{J,1,m,m}+Pj{G,2,m,m})+D{j{G,2,m,m}}(−Pk{J,1,m,f}+Pk{G,2,m,f})+(−Pj{J,1,m,m}+Pj{G,2,m,m})(−Pk{J,1,m,f}+Pk{G,2,m,f}))
These expressions are linear sums, weighted rates, Mi,j. The set of possible source demes is defined by {1,2} &—that is, all demes can receive immigrants from demes 1 or 2. The deme in which a gene is found is indicated by an extra element in the context, which takes value 1 or 2 in this example.

† Institute of Cell, Animal and Population Biology, University of Edinburgh, Scotland EH9 3JT, United Kingdom‡ Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada

† Institute of Cell, Animal and Population Biology, University of Edinburgh, Scotland EH9 3JT, United Kingdom‡ Department of Zoology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.