Abstract

Advanced intercross populations, in which multiple inbred strains are mated at random for many generations, have the advantage of greater precision of genetic mapping because of the accumulation of recombination events across the multiple generations. Related designs include heterogeneous stock and the diversity outcross population. In this article, I derive the two-locus haplotype probabilities on the autosome and X chromosome with these designs. These haplotype probabilities provide the key quantities for developing hidden Markov models for the treatment of missing genotype information. I further derive the map expansion in these populations, which is the frequency of recombination breakpoints on a random chromosome.

Advanced intercross populations, in which multiple inbred strains are mated at random for many generations, have the advantage of greater precision of genetic mapping because of the accumulation of recombination events across the multiple generations. The most commonly used form, which begins with two inbred strains, was formally introduced by Darvasi and Soller (1995) and called advanced intercross lines (AIL). A closely related design is that of heterogeneous stock (HS; see Mott et al. 2000), in which eight inbred strains are randomly mated for many generations. Svenson et al. (2012) developed the diversity outcross population (DO), which was formed with progenitors that were partially inbred individuals drawn from intermediate generations in the development of the Collaborative Cross (so-called pre-CC mice; see Aylor et al. 2011).

The mapping of quantitative trait loci in such populations, whether by interval mapping (Lander and Botstein 1989) or Haley-Knott regression (Haley and Knott 1992), generally requires conditional genotype probabilities at putative quantitative trait loci, given the available marker genotype data. Such probabilities are often calculated using a hidden Markov model (HMM; see Broman and Sen 2009, App. D). An HMM for this purpose formally requires the calculation of two-locus diplotype probabilities, although if the populations are formed with a large number of mating pairs, the two haplotypes within an individual are independent, and so it is sufficient to calculate two-locus haplotype probabilities.

Darvasi and Soller (1995) derived the two-locus haplotype probabilities for the autosome in AIL. I am not aware of any work considering the X-chromosome. In this article, I derive the two-locus haplotype probabilities for the autosome and X-chromosome in AIL, HS, and the DO. The calculations for the DO rely on recent results on haplotype probabilities in pre-CC mice (Broman 2012). Throughout, I assume an effectively infinite set of mating pairs at each generation, no sex difference in recombination, and no selection or mutation.

Let us first revisit the two-locus autosomal haplotype probabilities in AIL, as they serve as a simple example of the technique used in these calculations (see also Bulmer 1980, Ch. 3). Let ps denote the frequency of the AA haplotype at generation Fs. Then and we have the recurrence relation(1)where r is the recombination fraction (in one meiosis) between the two loci. Equation (1) is derived by noting that an AA haplotype drawn from generation Fs+1 is either an intact AA haplotype at generation Fs, transmitted without recombination, or it is a recombinant haplotype bringing two independent A alleles together. Note that the frequency of the A allele is at every generation.

For the X-chromosome in AIL, I will first consider a balanced case, begun with equal proportions of F1 individuals from reciprocal crosses, A × B and B × A, so that the F1 males are equally likely to be hemizygous A or B. Let ms and fs denote the frequency of the AA haplotype in males and females, respectively, at generation Fs. Then and we have(3)

This recurrence relation is derived in a similar way to that for the autosome, noting that the male haplotype was drawn from his mother, with a chance for recombination, and a random female haplotype is equally likely to have been drawn from her father, without recombination, or from her mother, with the potential for recombination. I again make use of the fact that the frequency of the A allele is in both males and females at every generation. The solution to this relation is, for s ≥ 2,(4)where , w = (1 − r + z)/4, and y = (1 − r − z)/4. Note that the frequencies of recombinant haplotypes in males and females are 1 − 2ms and 1 − 2fs, respectively, and that the overall frequency is 1 − (2ms +4fs)/3.

Now I turn to the unbalanced case for the X-chromosome, in which all F1 individuals are derived from the cross female A × male B, so that all F1 males are hemizygous A. This appears to be widely used in practice (e.g.,Norgard et al. 2008; Kelly et al. 2010). The calculations are more difficult, because the allele frequencies are different in males and females and across generations.

I first calculate the single-locus allele frequencies. Let qs be the frequency of the A allele in females at generation Fs. Note that the frequency in males at Fs is qs−1. The initial values are q0 = 1 and , and we have the recurrence relation , which comes from the fact that a random allele drawn from the female at generation Fs+1 is equally likely to be an allele from the female or male at generation Fs, and the allele in the male at Fs is a random allele from the female at Fs−1. The solution of the recurrence relation is , for s ≥ 0.

I now turn to the two-locus haplotype probabilities. Let and denote the frequencies of the AA haplotype on the X chromosome in males and females at generation Fs in an unbalanced AIL, and note that and . The haplotype probabilities satisfy a recurrence relation similar to that in equation (3):(5)

Note the distinction between equations (3) and (5): if a recombinant haplotype is transmitted from the Fs female, the chance that it brings two A alleles together depends on the frequency of the A allele in males and females in the Fs−1 generation. In the balanced case, these are each ; in the unbalanced case, they are different from each other and vary across generations.

I have been unable to obtain closed-form solutions for and . However, the values can be quickly calculated numerically, using equation (5). Note that .

Haplotype probabilities in the DO are calculated similarly. The progenitors for the DO were pre-CC mice. I assume a large number of progenitors, that they were drawn from independent lines, and that the order of the crosses that generated the different lines were random, giving complete balance across the eight alleles.

In a potential abuse of notation, I will redefine the q, p, m, and f variables used previously. Let qk denote the frequency of the AA haplotype at generation G2:Fk in the pre-CC; this is times the haplotype probability in Table 4 of Broman (2012). Let ps be the probability of the AA haplotype at generation s of the diversity outcross.

The pre-CC progenitors of the DO were drawn from independent lines at a variety of different generations along the course to inbreeding. Let αk denote the proportion of the pre-CC progenitors that were at generation G2: Fk, and note that a pre-CC progenitor at generation G2: Fk will transmit the AA haplotype with frequency qk+1 (that is, the frequency of the AA haplotype at generation G2: Fk). Thus, the frequency of the AA haplotype at the first generation of the DO is .

The recurrence relation for the ps is like that in equation (1): ps+1 = (1 − r)ps + r/64. The solution is(6)

Note that the recombinant haplotypes are all equally likely, due to the random order of the initial crosses, and so each has probability (1 − 8ps)/56.

I now turn to the X-chromosome. Let ms and fs denote the frequency of the AA haplotype on the X chromosome in males and females in the DO at generation s. Assuming random orders of crosses to generate the pre-CC progenitors,(7)where and are the frequencies of the AA and CC haplotypes, respectively, on the X-chromosome in females at generation G1: Fk+1 in the construction of four-way RIL by sibling mating (see Broman 2012, Table 4). m1 is calculated in the same way. The recurrence relations are much like equation (3):(8)

The solutions are the following:(9)where w, y, and z are as in equation (4).

In Figure 1, the probabilities of recombinant two-locus haplotypes are displayed for the different populations. For the DO, I used the distribution of k as in Figure 1 of Svenson et al. (2012) and s = 5. For HS and AIL, I used s = 10 and 12, respectively, to match the total number of generations with recombination—the average k in Svenson et al. (2012) was six. Recombinant haplotypes are more frequent on the autosome, and are more frequent in HS than in the DO; inbreeding in the pre-CC progenitors of the DO is accompanied by a loss of recombinants.

Frequency of a two-locus haplotype being recombinant, as a function of the recombination fraction at meiosis, for the diversity outcross population at s = 5 (solid curves), heterogeneous stock at s = 10 (dashed curves), and balanced AIL at s = 12 (dotted curves), for the autosome (black), male X (blue), and female X (red). The green dashed curve is the recombinant frequency for HS at s = 10 assumed in Mott et al. (2000).

It is particularly interesting to consider the map expansion in these populations, which is the frequency of recombination breakpoints on a random chromosome. Let R denote the probability of a recombinant haplotype; then the map expansion is (see Teuscher and Broman 2007). The map expansion on an autosome in AIL is s/2. For the DO, on an autosome, the map expansion satisfies , where M1 is the weighted average (with weights αk) of the map expansion in the pre-CC at generation G2: Fk+1 (see Broman 2012, Table 4). For the particular progenitors detailed in Svenson et al. (2012, Figure 1), this is approximately (7s +37)/8. For HS, we have M1 = 3 and .

For the X-chromosome in balanced AIL, HS and DO, the map expansion is that of the autosome. For the case of the X-chromosome in unbalanced AIL, in which all F1 males are hemizygous A, I cannot derive a closed-form solution, but taking the derivatives of the recurrence relations in equation (5), I can derive a simple recurrence relation for the map expansion. (Note that the overall map expansion on the X-chromosome can be obtained as the average of the sex-specific map expansions, with weight given to the female, since two-thirds of the X-chromosomes are in females.) Let denote the map expansion at Fs, and again let qs be the frequency of the A allele in females at Fs. Then we have(10)with the initial conditions and . Although I have not been able to derive a closed-form solution for , it is easily calculated numerically.

The aforementioned haplotype probabilities provide the key quantities for developing HMMs for advanced intercross populations. However, it should be noted that there are other approaches to handling such data. For example, Besnier et al. (2011) used a variance components model to analyze outbred chicken AIL data, with identity-by-descent probabilities calculated using a modified version of the method of Pong-Wong et al. (2001), for general pedigree data.

The aforementioned result for HS differs from that in Mott et al. (2000) and incorporated into the HAPPY software. They had assumed that the map expansion in HS was , whereas I show it to be . In the first three of generations with recombination, individuals are fully heterozygous, and so all recombination events can be seen; in the subsequent s − 1 generations, there is a 1/8 chance of homozygosity and so only 7/8 of recombination events can be seen.

Mott et al. (2000) further assumed that the transition probabilities along an HS chromosome are a function of genetic distance, but that requires knowledge of the map function. It is more direct to express the transition probabilities in terms of the recombination fraction at meiosis.

The green curve in Figure 1 displays the probability of a recombinant haplotype assumed in Mott et al. (2000) for HS with s = 10 when the map function corresponding to the gamma model with the level of crossover interference estimated for the mouse in Broman et al. (2002) is used. The probability is slightly smaller than that from my calculations; at r = 0.01, the equation in Mott et al. (2000) gives 0.099, whereas I obtain 0.103.

I have assumed an effectively infinite number of mating pairs at each generation. In practice, with a finite number of mating pairs, there will be some inbreeding and so an increased frequency of homozygosity and a decreased frequency of recombination. In addition, the individuals at the final generation will include siblings, and the relationships among individuals might be used to improve the genotype reconstruction. In practice, for computational efficiency, both the inbreeding and the relationships among individuals would probably be ignored in the genotype reconstruction, and with dense genotype data, there will be little loss of information.

Acknowledgments

James Crow generously provided comments for improvement of the manuscript. This work was supported in part by National Institutes of Health grant GM074244.

This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.