Positive directional selection, in which a new mutant rises in frequency and quickly fixes in a population (i.e., a selective sweep), can be rapid on an evolutionary timescale and/or population specific (for reviews see Andolfatto 2001; Aquadroet al. 2001; Schlötterer 2002a; Bamshad and Wooding 2003). One predicted effect of this type of selection on a sample of DNA sequences is an increase in LD in regions flanking the site undergoing selection (Kim and Stephan 2002; Przeworski 2002), but a reduction of LD across the site of selection (Kim and Nielsen 2004). This dual pattern may not be intuitive at first, so consider that reducing the genealogical history of a sequence reduces the number of recombination events, thus generally increasing LD in the region. However, for linked genetic variation to be present immediately after a selective sweep, it must either be a new mutation, and therefore rare and contribute little to overall LD, or have experienced recombination during the sweep with the target of selection. This suggests that haplotypes with LD between polymorphic alleles that span the target of selection will not persist beyond the fixation of the selected allele. Here we do not address gene conversion, which could preserve LD over the site of selection. This pattern of two regions of high LD, separated by low LD, is similar to the pattern of LD expected with a recombinant hotspot. Furthermore, the speed and species specificity of selective sweeps may also mimic the species-specific distribution of recombinant hotspots.

To explore the possible effect of selective sweeps on the inference of recombinant hotspots, we simulated positive directional selection of varying intensities (using SelSim 2.1, Spencer and Coop 2004) and applied hotspot detection software (Hotspotter 1.0, Li and Stephens 2003) to the resulting simulated sequences. The parameter values for the simulations were picked to approximate a 10-kbp sequence of DNA from a human population sample. We found that for strong positive selection (σ ≈ 100, where σ ≡ 2Ns, N is the diploid population size, and s is the relative strength of selection), a locally elevated recombination rate can falsely be inferred in the region of selection with statistical significance 22% of the time, corresponding to a 16% excess over the false positive rate (FPR) from neutral simulations (Figure 1). This selective elevation over the neutral FPR is highly significant (P = 1 × 10−7; see Figure 1 legend). However, as the strength of selection becomes even stronger (σ ≥ 300), there is a rapid drop in the FPR—probably due to a loss in power associated with a paucity of genetic variation remaining immediately after a strong selective sweep. The patterns of LD resulting from positive selection can produce locally elevated estimated rates of recombination that are similar to the relative rates reported in the literature (e.g., Figure 2; cf.Jeffreys and Neumann 2002; Wallet al. 2003; McVeanet al. 2004; Ptaket al. 2004; Verrelli and Tishkoff 2004; Winckleret al. 2005). The geometric mean of estimated recombination rates at the site of selection, from 100 replicates at σ = 100, is 13.25 times higher than the background rate. Furthermore, this elevated FPR can persist for up to N generations (0.5 × 2N generations) after the selective sweep has ended (Figure 3). Assuming an effective human population size of 10,000 and an average generation time of 25 years, this corresponds to a maximum persistence time of ∼250,000 years.

A plot of the relative FPR of inferring significantly elevated local recombination rates along a simulated sequence of recombining DNA. The increase (in percentage) under a selective sweep scenario is plotted relative to the FPR from neutral simulations. The position of the site under positive selection was fixed at 0.45. Each population sample consists of 100 sequences. The recombination parameter ρ ≡ 4Nr (for a diploid, where r is the per-generation recombination rate) was uniform and set to a value of 10 over the total region. The mutation parameter θ ≡ 4Nμ (where μ is the per-generation mutation rate) was also set to 10 for the region. A stochastic model of positive selection was used and the fixation of the selected allele was modeled as just completing in the sampled generation. One hundred replicates at each value of selection, σ, were generated and analyzed assuming a single hotspot 0.1 units wide of fixed location (0–0.1, 0.1–0.2,…, 0.9–1). A significantly elevated recombination rate was called when the lower 95% confidence interval of the local estimated recombination rate was higher than the background recombination rate estimate (this includes both hotspots and “warm” spots). In general, FPR excesses higher than 4 (counts out of 100) are significantly elevated (P < 0.05), assuming a Poisson distribution of false positives with a mean equal to the neutral rate, which is an average of six false positives.

An example of the inferred relative recombination rate from a sample simulated under the conditions described in Figure 1 with σ = 100. The relative recombination rate between each window, with a width 1/10th of the total sequence, and the remaining region were estimated. This example illustrates how a false hotspot of recombination may be inferred. In this case, the “hotspot” is at position 0.45 and has a local recombination rate estimate 49 times higher than that of the surrounding sequence. The upper and lower 95% confidence intervals are also plotted and a significantly elevated point estimate is represented by a solid circle.

A plot of the excess, over neutrality, of the FPR of inferring a significantly elevated recombination rate at the position of selection vs. time in units of 2N generations. Solid circles denote a significantly elevated FPR relative to neutral simulations (see Figure 1 legend).

We do not mean to imply that true recombinant hotspots do not exist in humans; they have certainly been verified by experimental means (e.g., Hubertet al. 1994; Cullenet al. 1995; Smithet al. 1998; Yipet al. 1999; Jeffreyset al. 2001). But we do suggest caution when inferring the existence of hotspots solely on the basis of patterns of LD. The transient nature of positive selection, both over time and between populations, may easily mimic the rapidly evolving nature of recombination in primates. When a hotspot is inferred, it may be useful to also address the relative levels of genetic variation compared to levels of divergence (Ptaket al. 2004) to help rule out past positive selection—particularly since recombination may be associated with a mutagenic process (Rattrayet al. 2002; Hellmannet al. 2003) and selective sweeps can quickly remove genetic variation. However, recombinant hotspots and selective sweeps may be linked at a more basic level. There is evidence that hotspot crossover asymmetry can result in a form of meiotic drive (Jeffreys and Neumann 2002), which itself is a “selfish” form of positive selection (for review see Reedet al. 2005 and references therein). This crossover asymmetry predicts that a derived recombination-suppressing allele will eventually fix in the population (Jeffreys and Neumann 2002), resulting in the co-occurrence of both a recombinant hotspot and a progressing selective sweep.

The possibility exists that inferred recombinant hotspots in gene regions that also appear to have undergone positive selection (e.g., Verrelli and Tishkoff 2004) are not due to nonuniform densities of meiotic recombination, but may simply be a by-product of positive selection. In the same vein, estimates of the rate of hotspot sharing between species, based on LD analysis (Ptaket al. 2005), may be underestimated, and short-scale LD may be lower than expected (e.g., Pritchard and Przeworski 2001) if recent positive selection plays a significant role. If selective sweeps do make a significant contribution to the patterns of LD in the human genome, then a better understanding of the effects of positive selection may have important implications for projects that characterize LD for association studies, particularly to the extent that selective pressures may have varied among human populations (e.g., Hamblin and Di Rienzo 2000; Schlötterer 2002b; Akeyet al. 2004; Storzet al. 2004).

Acknowledgments

We thank Yuseob Kim and Michael Li for helpful suggestions. We also thank two anonymous reviewers for their feedback on a previous version of this manuscript. This work was supported by a Burroughs Wellcome Fund and David and Lucile Packard Career Awards to S.A.T. F.A.R. was partially supported by the Center for Bioinformatics and Computational Biology, University of Maryland.

The Genetics Society of America (GSA), founded in 1931, is the professional membership organization for scientific researchers and educators in the field of genetics. Our members work to advance knowledge in the basic mechanisms of inheritance, from the molecular to the population level.