Abstract

Speciation is a fundamental process responsible for the diversity of life. Progress has been made in detecting individual ‘speciation genes’ that cause reproductive isolation. In contrast, until recently, less attention has been given to genome-wide patterns of divergence during speciation. Thus, major questions remain concerning how individual speciation genes are arrayed within the genome, and how this affects speciation. This theme issue is dedicated to exploring this genomic perspective of speciation. Given recent sequencing and computational advances that now allow genomic analyses in most organisms, the goal is to help move the field towards a more integrative approach. This issue draws upon empirical studies in plants and animals, and theoretical work, to review and further document patterns of genomic divergence. In turn, these studies begin to disentangle the role that different processes, such as natural selection, gene flow and recombination rate, play in generating observed patterns. These factors are considered in the context of how genomes diverge as speciation unfolds, from beginning to end. The collective results point to how experimental work is now required, in conjunction with theory and sequencing studies, to move the field from descriptive studies of patterns of divergence towards a predictive framework that tackles the causes and consequences of genome-wide patterns.

1. Introduction

Speciation is a fundamental process responsible for creating the diversity of life on the Earth. In general, speciation involves the splitting of one reproductive community (or genotypic cluster) of organisms into two [1–3]. Conceptualizing speciation in this manner leads to a clear research programme: to understand speciation, one must understand how genetically based barriers to gene flow (i.e. reproductive isolation) evolve between populations. Much progress has been made on discerning the importance of different factors and traits (including ecological adaptation) in generating reproductive isolation [4–8]. In addition, individual ‘speciation genes’ contributing to reproductive isolation, and, in particular, genes causing intrinsic post-zygotic inviability and sterility in hybrids, have been identified [1,9–13].

In contrast, we lack a good understanding of how these speciation genes are embedded and arrayed within the genome, and thus of how genomes evolve during population divergence. Thus, major questions about the genomic architecture of speciation and how it facilitates or impedes further divergence remain. Generally speaking, our empirical and theoretical understanding of speciation is still largely dominated by what Ernst Mayr described as ‘beanbag thinking’ focused on one or a few individual genes [3,14]. There are exceptions where interactions among multiple loci have been considered, e.g. the ‘snow-ball’ effect for the accumulation of the number of post-zygotic incompatibilities through time [15–17]. However, in general, detailed studies of the genetics of speciation have been restricted to examining one or a few loci [13]. These approaches have worked well enough so far because, until recently, empirical studies were limited to such a gene-centred focus. But as represented by the articles in this issue, we are now capable of rapidly scanning large portions of the genome of both model and non-model organisms for differentiation [18–22]. Consequently, the field of evolutionary genetics is beginning to move away from studies focused on a few genes to those that tackle genome-wide patterns. This theme issue is dedicated to exploring this genomic perspective of speciation. Given recent high-throughput sequencing and computational advances that allow genome-wide analyses, our goal is for this theme issue to be timely in helping move the field towards a more integrative approach for understanding speciation.

Here, in this introductory article, we discuss the major elements, questions and challenges leading us to a more integrative understanding of the genomics of speciation. Our narrative highlights how the 12 papers in this theme issue contribute to building and achieving this genomic synthesis. We begin by laying out the metaphoric foundation on which the study of genomic architecture is currently based, introducing the concept of ‘genomic islands of divergence’ and the processes generating such divergence (see table 1 for definitions). In this section, we articulate the importance of gene flow and the theoretical principle of ‘selection/recombination antagonism’ that has greatly shaped considerations of which types of genetic architectures are needed for reproductive isolation to evolve in the face of gene flow [23]. These questions, their counterpoints and their resolution form the basis for the papers of Feder et al. [24] and Via [25] in the issue.

Definitions of some newer terms being used to describe patterns and processes of genomic divergence.

We next describe broad-scale patterns of genomic divergence observed in nature. These patterns quantified by ‘genome scans’ of numerous loci provide the empirical observations on which our current understanding of genomic divergence is based. Numerous studies in this issue present new empirical data or summarize the literature on how genetic differentiation is distributed throughout the genome [25–33]. The authors relate these patterns to the island metaphor to try and draw inferences concerning the significance of genome architecture and the processes facilitating speciation.

Although the description of patterns of genomic divergence seems straightforward, the papers in this issue highlight the difficulties we still face in accurately resolving genome-wide patterns of genetic differentiation and interpreting their meaning. For example, although highly divergent regions tend to be interpreted as being affected by divergent selection or harbouring genes causing reproductive isolation, other factors such as retention of ancestral polymorphism or recombination rate variation could contribute to generating similar patterns [34,35]. There has been little explicit study to date of the degree to which divergence and reproductive isolation are actually related. The study in this theme issue on genomic isolation in hybrids begins to tackle this issue [36].

Additionally, studies tend to be done on different scales and take different analytical approaches to identify exceptionally differentiated ‘outlier loci’. Thus, the statistical and methodological power to differentiate outlier loci from neutral background differentiation varies among studies, and without critical details concerning the natural history and biology of population divergence (e.g. geographical context, time of divergence, rate of migration, strength of divergent selection) it is difficult to place individual studies within a broader comparative framework. Thus, tests for general relationships between genome structure and speciation will require more accurate information on where particular systems reside along the continuum from freely interbreeding populations, to newly formed and partially isolated populations or ecotypes, to largely reproductively isolated species. Another consideration is whether speciation was initiated in the face of gene flow or whether there was an initial period of allopatry in which differences accumulated in the absence of gene flow [37–40]. In the latter case, genome architectures maintained following secondary contact might differ somewhat from those created de novo in the face of gene flow. Consequently, the processes of divergence and genome hitchhiking that we discuss below may differ in their relative importance between primary and secondary modes of speciation with gene flow.

After considering studies of empirical patterns, we then turn to theoretical expectations for how processes such as divergence and genome hitchhiking promote speciation. Much of our conceptual understanding of how genome structure should facilitate speciation has been dominated by verbal arguments and metaphors based on limited theory. Thus, while basic principles may be known, our understanding of the relative importance of different processes in generating genomic divergence, and how this architecture may feedback to influence further divergence during the speciation process, is surprisingly unresolved. The theoretical papers by Feder et al. [24], Guerrero et al. [41] and Gompert et al. [38] work to fill this void, by examining the processes affecting the establishment of new mutations during genomic divergence, expected patterns of genetic divergence in chromosomal inversions, and the patterns of genomic isolation created by different genetic architectures, respectively. Development of such theoretical expectations allows clearer interpretations of specific empirical patterns.

Finally, we examine some of the new challenges associated with analysing and interpreting the great influx of raw genomic data we are now capable of generating. We conclude by discussing how all these aspects of genomic study might be tied together in future work, emphasizing the importance of now integrating manipulative experiments with observational studies to correctly interpret empirical patterns of genomic divergence.

2. The metaphor of genomic islands of divergence

To aid thinking about divergence in the genome, evolutionary biologists have developed the metaphor of ‘genomic islands of divergence’ [42], where a genomic island is any gene region, be it a single nucleotide or an entire chromosome, which exhibits significantly greater differentiation than expected under neutrality [18]. The metaphor thus draws parallels between genetic differentiation observed along a chromosome and the topography of oceanic islands and the contiguous sea floor to which they are connected. Following this metaphor, sea level represents the threshold above which observed differentiation is significantly greater than expected by neutral evolution alone. Thus, an island is composed of both directly selected and tightly linked (potentially neutral) loci. Factors such as physical proximity between selected and other loci, rates of recombination, and strength of selection then each affect the height and the size of genomic islands. Under this metaphor, the few genes under or physically linked to loci experiencing strong divergent selection can diverge, whereas gene flow will homogenize the remainder of the genome (or insufficient time for genetic drift will preclude divergence of regions that are not divergently selected), resulting in isolated genomic islands (but see [34,35]).

An important consideration for expected patterns of divergence is the geographical context of speciation, specifically, whether and when gene flow accompanied the divergence process. Gene flow is important, because strictly allopatric divergence, be it via selection or genetic drift, proceeds unfettered by the homogenizing effects of migration [17]. Thus, the extent of genetic linkage and recombination among genes relative to the strength of selection is not a major constraint on divergence in allopatry. Although much has been learned about specific reproductive barriers and speciation genes from studying allopatric taxa [1], it is difficult to ascribe any special significance to a particular genetic change or genetic architecture in such systems. Genomic architecture is less relevant to allopatric speciation, as divergence across the genome is inevitable. In contrast, physical linkage relationships and recombination rates among genes, along with levels of gene flow and the strength of selection, are critical considerations with respect to speciation with gene flow, where gene flow constantly introduces the wrong combination of genes into a local population and recombination breaks-up associations between genes under selection and those causing reproductive isolation [17,43,44]. As described in the classic work by Felsenstein [23], there is an antagonism between selection and recombination during divergence with gene flow. Selection must overcome this antagonism for reproductive isolation and widespread genomic divergence to evolve in the face of gene flow.

A key take-home message is that details concerning the natural history of speciation, which are often lacking for many systems, are important. For example, current rates of migration between taxa can affect the course of change to come, but may differ from those that existed when present patterns of genomic differentiation were generated. To correctly interpret empirical patterns, it is thus critical not only to conduct exhaustive genetic surveys, but also to resolve the history of population divergence. For example, the clustering of divergently selected genes in the genome might not have been created owing to new mutations sequentially establishing around already diverged sites in the face of gene flow. Instead, it could be the result of the enhanced retention of such an architecture following secondary contact and introgression. The empirical papers in the current issue underscore the importance of resolving this potential difference.

3. Divergence hitchhiking and genome hitchhiking

How might genomic islands form in the face of gene flow, and then grow in size? The verbal theory of divergence hitchhiking posits that physical linkage to divergently selected loci generates a mechanism by which genomic islands form and can be (or grow to be) of relatively large size, and by which speciation in the face of gene flow may be easier than previously thought [45,46]. As outlined in the contribution to this issue by Via [25], the premise is that divergent selection reduces interbreeding between populations in different habitats [45–47]. This reduces inter-population recombination, and even if recombination occurs, selection reduces the frequency of immigrant alleles in advanced generation hybrids [17]. This localized reduction in effective gene flow at or near genes subject to divergent selection might allow large regions of genetic differentiation to build up in the genome around the few loci subject to divergent selection. The idea rests on the assumption that a site under divergent selection will create a relatively large window of reduced gene flow around it, enhancing the potential to accumulate differentiation (both neutral and selected) at linked sites.

As an alternative to divergence hitchhiking on a few loci, selection acting on many loci distributed throughout the genome could reduce gene flow to drive speciation [48–50]. This process also produces variable patterns of genomic divergence, owing to differences among loci in selection intensities, linkage relationships and recombination rates. However, in the case of selection on many loci, genomic regions displaying weaker differentiation may not all be neutrally evolving, but rather represent regions more weakly affected by selection. In this case, many loci are diverged beyond neutral, ‘sea-level’ expectations such that genomes differ by many ‘archipelagoes’ or even ‘continents’ of divergence (or at very least, numerous small and somewhat interconnected islands). We stress that the island versus continent views represent ends of a continuum, rather than mutually exclusive hypotheses (figure 1). For example, continents can be conceptualized as large islands with variable topography (e.g. mountain tops and lowland continental plains all above neutral sea level). In our own contribution to this issue [24], we define the term ‘genome hitchhiking’ to describe the process by which genetic divergence across the genome is facilitated, even for loci unlinked to those under selection, by the reductions in average genome-wide gene flow that selection causes. This process, unlike divergence hitchhiking, does not invoke a role for physical linkage, and can facilitate divergence across the genome. These considerations lay the foundation for thinking about patterns of genomic divergence, and the processes causing them.

Schematic of the (a) island versus (b) continent views of genomic divergence. These views represent ends of a continuum, rather than mutually exclusive hypotheses. For example, ‘continents’ of divergence can be conceptualized as very large islands with variable topography. Outlier status refers to whether a locus would exhibit statistical evidence for unusually high levels of genetic differentiation in an observational genome scan. See text for details. Reproduced with permission from Michel et al. [49], National Academy of Sciences USA.

4. Patterns of genomic divergence

Numerous questions remain about the empirical patterns of genomic divergence during speciation. For example, how numerous, large and genomically clustered are regions of genomic differentiation? As highlighted in this issue by Renaut et al. [33] and Via [25], the answer to this question will depend, in part, on how regions of differentiation are delimited. Are a few adjacent regions of differentiation along a chromosome considered one large or several small and isolated ‘islands’ of divergence? Below, we address these questions in as standardized a manner as possible, but the contributions in the theme issue clearly define a need for further work on how patterns of genomic differentiation are best quantified, and on the degree to which they represent reproductive isolation (see earlier works [36,38,51–53]).

(a) How numerous?

How many genomic regions differentiate during speciation? The available evidence suggests that the answer to this question is variable, but that multiple regions tend to be differentiated, and each to a different degree. For example, Strasburg et al. [26] reviewed numerous studies of the genomic basis of plant speciation and concluded that multiple regions tend to differentiate both closely and distantly related plant populations. This result is consistent with an older review in which mostly animal taxa were represented [18]. Likewise, as discussed in the article by Hahn et al. [29], initial work on molecular forms of Anopheles mosquitoes detected multiple, yet few (i.e. three), regions of differentiation [42], but subsequent finer scale sequencing detected numerous other regions of differentiation [54].

Although genome scans sometimes report only very few regions of differentiation, genome scans that have poor genomic coverage and that are conducted without complimentary selection experiments can be biased towards supporting a view that divergence occurs in only a few regions. This is because, inevitably, only the most diverged regions will be identified as statistical outliers. Other loci affected by selection, but more weakly, will go unnoticed and be considered part of the mostly ‘undifferentiated’ and neutral genome. In short, although empirical genome scans have usefully identified candidate regions strongly affected by divergent selection and are a good starting point to characterize patterns of genomic architecture, they cannot readily detect selection on less-differentiated regions. Thus, direct experimental measurements of selection on the genome are required to detect both weak and strong selection, and determine the fraction of the overall genome that differentiates during speciation.

Such an experimental test of the genomic island scenario was conducted by Michel et al. [49] in the apple and hawthorn host races of Rhagoletis pomonella, a model for sympatric ecological speciation initiated with gene flow. Contrary to expectations, they reported numerous lines of evidence for widespread divergence and selection throughout the Rhagoletis genome, with the majority of loci displaying latitudinal clines, associations with an ecologically important trait (adult eclosion time), within-generation responses to selection in a manipulative over-wintering experiment and host differences in nature despite substantial gene flow (4–6% per generation).

The results, coupled with linkage disequilibrium (LD) analyses, provide field-based and experimental evidence that divergence was driven by selection on numerous independent genomic regions, suggesting that ‘continents’ of multiple differentiated loci, rather than isolated islands of divergence, can characterize even the early stages of speciation. Their results also illustrate continental topography. The divergence observed throughout the Rhagoletis genome was clearly more accentuated in some regions, such as those harbouring chromosomal inversions. A final point is that standard outlier analyses in this same study were consistent with the genomic island hypothesis: only two independent gene regions were detected as statistical outliers. Thus, experimental data and biological information on gene flow in nature were critical for detecting weaker yet widespread divergence across the genome. Until further such studies emerge, it will be impossible to know if genomic continents are the exception, or the norm.

The contribution to this theme issue by Nadeau et al. [32] on genomic divergence among hybridizing Heliconius butterflies further illustrates the points exemplified by the Rhagoletis study. Nadeau et al. [32] applied a cutting-edge genomic capture methodology to a non-model organism for the first time to sequence through two genomic regions known to harbour genes affecting divergent wing-colour patterns. These colours also play a role in speciation via contributing to selection against immigrants and hybrids and divergent mating preferences. Even within just these two genomic regions, they find multiple peaks of differentiation, but with some regions nonetheless more differentiated than others.

(b) How large?

How large are regions of divergence in the genome? Relatedly, how does genetic divergence decay away from a selected site, and does it decay to zero? There is much evidence that independent regions of differentiation in the genome can be small. This claim is supported by general reviews of genomic divergence in animals [18] and plants [26]. Likewise, the genomic capture study in Heliconius mentioned above reports that regions of differentiation tend to span only a few hundred kilobases [32]. However, there are some strong exceptions to this trend, where regions of differentiation, measured for example by the distance between an outlier locus and the nearest quantitative trait locus (QTL) for an ecologically relevant trait, appear large. Examples from this theme issue stem from pea aphid host races [25] and lake whitefish ecotypes [33]. Likewise, in the first quantification of genome-wide levels of LD in the stickleback genome, Hohenlohe et al. [30] report relatively large LD blocks in the genomes of both freshwater and oceanic populations. We consider some of the causes of this variability in the subsequent section on processes generating genomic divergence.

(c) How dispersed?

Are regions of divergence concentrated on just a few genomic regions, or widely spread across chromosomes? Reviews of genome scans in both plants and animals indicate that regions of differentiation often map to different chromosomes [18,26]. Although regions of differentiation might sometimes cluster [25,27,31], for example in regions of low recombination such as within chromosomal inversions, divergence in other (e.g. collinear) regions nonetheless occurs. This is exemplified by the contribution of McGaugh & Noor [31] to this theme issue, where it was documented that regions of particularly strong differentiation between Drosophila species often lie within inversions, but collinear regions are nonetheless differentiated. Similar results are reported in mice and rabbits in the contribution by Nachman & Payseur [27], but in the context of recombination rate variation among different collinear genomic regions.

(d) What types of genomic regions?

What types of genes or gene regions tend to be differentiated? In particular, to what extent does genomic divergence involve coding versus regulatory regions [55]? Much more data are needed, but it is clear that both can be involved [56]. In their review of the genomics of cichlid speciation, Fan et al. [28] also present new transcriptomic data suggesting that selection on coding regions contributes to genomic differentiation. The whitefish study by Renaut et al. [33] reports that mapped expression differences between dwarf normal ecotypes localize to a few ‘expression QTL hotspots’, but that these hotspots are not associated with elevated genetic divergence between natural populations (whereas, in contrast, regular ‘phenotypic’ QTLs are). Much more work is needed to determine which types of genomic regions differentiate during speciation. Specifically, further studies explicitly comparing coding and non-coding regions [32] are needed.

5. Causes of patterns of genomic divergence

The results above demonstrate that patterns of genomic divergence can be highly variable, both among gene regions within study systems and among different study systems. This variability raises the obvious question of which processes are generating the observed patterns. For example, what are the roles of selection and recombination in generating genomic divergence? In terms of selection itself, how important is the strength of selection acting directly on gene regions, relative to the effects of selection causing hitchhiking of linked regions and overall reductions in gene flow? Other questions concern the importance of reduced recombination. To what extent can reduced recombination facilitate genetic divergence, in a manner analogous to that played by strong selection? In addition, there are issues concerning variation in the evolutionary process itself. For example, even in genomic regions containing divergently selected loci, there can be pronounced differences in the levels of associated neutral differentiation, as probably seen by Nadeau et al. [32] in Heliconius. This can be owing to differences in the age of segregating neutral variants, to the stochastic nature of the drift process itself, and to vagaries in the recombination history of neutral sites with the specific targets of selection. Further work on how this variation be summarized when attempting to describe and compare the topology of genomic divergence is required.

An important message here is that we have a good metaphorical conception of the major population genetic processes causing patterns of genomic differentiation. However, we lack theoretical details of the relative importance of different processes as speciation unfolds over time, particularly when gene flow levels vary temporally owing to biogeography (but see [49]). Metaphors may be insufficient, and vagaries in experimental methodology and statistical analysis too great, to currently distinguish among competing factors shaping genome structure during speciation. Caution is therefore urged in attaching too much significance at this time to verbal associations made between observed empirical patterns and underlying process. These issues raise the need to strengthen the linkage between data and theory to further our understanding of the genomics of speciation, as exemplified by the theme issue contribution by Gompert et al. [38].

The theoretical contribution to this theme issue by Feder et al. [24] illustrates how the relative importance of the different selective processes for generating genomic divergence can begin to be disentangled. Specifically, Feder et al. [24] report new analytical and simulation results which estimate the probability that new beneficial mutations will establish and differentiate populations diverging in the face of gene flow when the new mutation: (i) is the first mutation to arise in a completely undifferentiated genome, (ii) arises in physical linkage to a locus already diverged via selection, and (iii) arises unlinked to any selected loci, but within a genome that has some diverged loci. This sequential approach allows the partitioning of how various mechanisms aid the establishment of new mutations. For example, the effect on a new mutation of arising in linkage to a diverged region represents the effect of ‘divergence hitchhiking’. In contrast, the effect of arising unlinked to selected regions but in an already diverged genome represents the effects of genome-wide average gene flow reductions caused by selection (i.e. ‘genome hitchhiking’).

Feder et al. [24] find that the strength of selection acting directly on a new mutation is an important predictor of establishment, with both forms of hitchhiking having smaller effects in comparison. This result is consistent with past theoretical results focused on the maintenance, rather than the origin, of differentiation [17,57,58] and population-genetic theory considering single populations [59,60]. Nonetheless, divergence hitchhiking aided mutation establishment under certain conditions, in particular, when selection coefficients favouring the new mutation were less than the migration rate (see also [25]). Genome hitchhiking also sometimes promoted mutation establishment, particularly if multiple loci have already diverged prior to the emergence of the new mutation. A key message here is that a rapid transition (in terms of the number of differentiated genes) may occur from early phases of divergence where the selective benefits of new mutations themselves are the primary factor affecting their establishment to a stage where multiple (but not necessarily numerous) loci generate the potential for widespread genomic divergence via genome hitchhiking [24]. Divergence hitchhiking can fortuitously aid this transition, but may not be necessarily vital for it.

Empirical studies have just begun to try and home in on the relative importance of these different processes for generating genomic divergence. Elucidating the specific targets of divergent selection in the first place, and distinguishing them from neutral regions that simply hitchhike to divergence, is a difficult task. The focused sequencing of gene regions implicated in colour-pattern diversification of Heliconius butterflies demonstrates how steps in the direction towards finding the specific targets of selection can be made [32]. Once the specific targets of selection are determined, work can then focus on the consequences for neutral regions linked and unlinked to the selected ones.

(b) Recombination rate variation

As described above, the antagonism between selection and recombination can impede genomic divergence. It therefore follows that factors that reduce recombination, such as chromosomal inversions, can facilitate genomic divergence. Inversions may therefore help create more elevated oceanic islands and broader, more mountainous continents of divergence between taxa. Similar arguments can be made for other features of the genome that result in reduced recombination, such as proximity of genes under selection to centromeres.

The basic premise for a role of inversions in speciation is that they reduce introgression across the regions of the genome they encompass and protect favourable genotypic combinations within them from being broken up by recombination [61–63]. Essentially, the favourable genes within the inversions are more favourable together in their natal habitat than they would be individually and less favourable in the habitats of other populations than they would be alone. Hence, gene flow is reduced. Moreover, in addition to preserving blocks of adaptively diverged genes, inversions also provide larger targets in the genome for divergence hitchhiking to work on; by suppressing recombination, inversions enlarge the area of the genome in which new favourable mutations might arise linked to already diverged genes.

A number of articles in this theme issue focus on reduced recombination and genomic divergence. For example, the articles by McGaugh & Noor [31] and Nachman & Payseur [27] describe elevated divergence in regions of reduced recombination (inversions and proximity to centromeres, respectively), relative to more freely recombining areas. Nonetheless, both studies report divergence in collinear regions as well. These results are qualitatively consistent with some past work, such as the study by Strasburg et al. [64] examining divergence between hybridizing sunflower species, who reported that genetic divergence is not accentuated within inversions, except perhaps near chromosomal breakpoints, where recombination is particularly reduced. Moreover, widespread adaptive divergence in collinear regions is being increasingly documented [18,26,49,54,64]. Indeed, theoretical work by Feder & Nosil [65] showed that there is no reason for the vast majority of loci contributing to reproductive isolation to reside in inversions; they should also be commonly found in collinear regions. Thus, it may be that factors that reduce recombination are not essential for genomic divergence, but they can facilitate it, particularly if recombination is strongly reduced.

A key remaining and difficult task is thus separating the relative roles of strength of selection, rates of gene flow and recombination rates in generating a particular pattern of genomic divergence. This point is exemplified in the empirical patterns described in this issue by Hahn et al. [29] and the new theoretical results on expected coalescent patterns for neutral loci within chromosomal inversions reported by Guerrero et al. [41]. Rather than casting despair, this simply highlights a clear avenue for further work.

(c) Other remaining questions

A number of other questions concerning genomic divergence during speciation remain unanswered. For example, what are the relative roles of new mutations versus pre-existing standing genetic variation in speciation [66]? Likewise, what geographical arrangement of populations tends to most strongly affect genomic divergence? As noted above, genomic divergence in allopatry can proceed relatively easily such that divergence in many regions of the genome can occur. Interestingly, such a pattern of multiple and relatively small islands of divergence was reported in this theme issue in two systems which appear to vary widely in levels of gene flow between them [29,32]. This highlights the need for further work on how geography and gene flow affect patterns of genomic divergence.

Indeed, recent theory demonstrates how there are some instances where gene flow might facilitate, rather than constrain, certain types of genomic changes. For example, Kirkpatrick & Barton [67] demonstrated that new inversions that originate in sympatric populations exchanging genes that fortuitously happen to trap a combination of genes all adapted to one habitat versus another can be selectively favoured over collinear arrangements, because the fit genotypic combinations within the inversions are not broken up by recombination. This process can result in inversions rising to high frequency and differentiating populations.

Feder et al. [37] extended this work by showing how such adaptive spread of inversions is facilitated if the inversions arise in allopatry such that they are the most likely to contain the perfect complement of locally adapted alleles. Such inversions might be maintained at low frequency in allopatric populations, but then subsequently rise to high frequency when gene flow ensues upon secondary contact. Such a ‘mixed mode’ of geographical divergence allows for the establishment of inversions under a wider range of conditions than pure sympatric divergence, where, for example, ongoing gene flow and recombination makes it difficult for new inversions to capture the perfect complement of locally adapted alleles. Such a mixed mode of divergence could potentially explain the results reported in this issue by McGaugh & Noor [31], where chromosomal inversions appear to have originated prior to speciation between two closely related Drosophila species, and then may have spread to high frequency upon secondary contact and gene flow. Additionally, the role of effective population sizes can have important impacts on patterns of genomic divergence [41,58], and this topic warrants more focused empirical study. Finally, traditional topics such as gene duplication and polyploidization require attention in future work as well.

6. Consequences of genomic divergence

Even once patterns of genomic divergence are well described, and the processes causing the observed patterns inferred, a major question remains: what are the consequences of genomic divergence for speciation? If selection and divergence are concentrated on just a few genomic regions, will this more effectively overcome gene flow to drive speciation? Is selection on many genomic regions more likely to incidentally cause reproductive isolation (i.e. divergence in mate preference, hybrid dysfunction, reductions in neutral gene flow) than selection on a few regions? Such questions are central to our understanding of speciation, but have only begun to be addressed [38]. In some cases, it is known that certain divergent genomic regions harbour genes affecting reproductive isolation. For example, the inverted chromosomal regions described by McGaugh & Noor [31] contain genes causing intrinsic dysfunctions in hybrids between species of Drosophila. In addition, regions containing loci involved in divergent adaptation, which thus likely cause extrinsic selection against migrants and hybrids, have been described in several systems, including those in this issue [25,32,33]. Nonetheless, even in such examples, it remains unknown how the specific patterns of divergence and genetic architecture observed truly promoted or constrained the evolution of reproductive isolation. Further work more directly connecting patterns of genomic divergence to reproductive isolation is required [38]. Reaching true causality on this front will probably require the integration of ecological and functional genomic studies.

7. ‘growing pains’: technical and computational challenges

The recent increases in our understanding of genomic divergence during speciation have not come without challenges. For example, although costs of sequencing have decreased rapidly, and will continue to do so, the amount of data that can be obtained, and then analysed, are nonetheless finite. Thus, initial explorations of genomic divergence using cutting edge methods have (necessarily) relied on small sample sizes [31,32]. These studies have been important in paving the way for larger and more integrative work in the future, for example, studies based on larger sampling schemes and which consider changes in both gene expression and nucleotide divergence [56]. Additionally, more flexible and powerful analytical approaches are required. The contribution by Gompert et al. [38] to this theme issue provides a step in this direction, describing a method for examining the genomic architecture of isolation in species hybrids. Finally, most theory of speciation to date has focused on one or a few loci. Further theoretical work examining the causes and consequences of genome-wide divergence are also needed, as represented by other contributions in this issue [24,41]. As progress in our ability to deal with genomic data continues to increase, it will be important to further develop conceptual and theoretical frameworks for understanding such data.

8. Conclusions: towards a natural history of the genome

… these forms may still be only … varieties; but we have only to suppose the steps of modification to be more numerous or greater in amount, to convert these forms into species … thus species are multiplied. ([68], p. 120)

Speciation is often an extended and quantitative process, during which reproductive isolation and genomic divergence builds up [1,17,44,50,69]. Ultimately, we would like to know how these processes unfold, and thus how speciation proceeds from beginning to end. Indeed, different points in this continuum of divergence could involve very different processes [24]. For example, speciation initiated in the face of gene flow may often begin via divergence in the few specific gene regions directly subject to divergent selection. This period may then transition to a second phase where gene flow is reduced in localized regions of the genome surrounding selected sites, and divergence hitchhiking may then act to facilitate differentiation of regions physically linked to those under selection under certain circumstances. As further loci diverge, and perhaps new mutations come to differentiate populations, effective gene flow then gets further reduced across the genome. As this proceeds further, loci diverge, and a transition to widespread genomic divergence, facilitated by genome hitchhiking, occurs. The end result then is strong reproductive isolation and widespread genetic divergence. In addition, these different ‘phases’ of the speciation continuum might differ when divergence evolves between populations in sympatry, allopatry or a mixture of these modes [37]. Thus, it may be critical to discern what aspects of genome structure originated in allopatry versus in the face of gene flow.

We can see glimpses of these phases now, by comparing the results from different taxa lying at different points in this speciation continuum. For example, this theme issue describes results from some systems that have proceeded very far in the speciation process (or even completed it), such as the molecular forms of Anopheles mosquitoes [29], the host races of pea aphids [25] and Drosophila species pairs [31]. In other systems considered in the issue, gene flow appears higher, with speciation having proceeded less far, such as in races of Heliconius butterflies [32] and stickleback and whitefish [30,33]. Although making comparisons among the results of these disparate systems is a starting point, it is somewhat like comparing apples and oranges. What is required now are detailed studies of genomic divergence between very closely related taxa (e.g. population pairs within species that vary strongly in their degree of reproductive isolation or different ecotype and species pairs within a single genus) that span the speciation continuum and that have well-characterized natural and biogeographic histories. Such work is increasing at the phenotypic level [50,69–74], but has yet to be fully applied to the genomic level. Nonetheless, the few cases that have examined genomic divergence at different points in the speciation continuum, such as the study of Heliconius in this theme issue, support the model above where direct selection, divergence hitchhiking and genome hitchhiking are differentially important during different phases of the speciation process [32].

Ideally, genomic studies will be increasingly conducted within an experimental framework that directly tests the processes driving and constraining genomic divergence. Some cutting-edge experiments of this type have now been conducted in the laboratory using microbes [75–77], but these do not address reproductive isolation or speciation in natural populations. Future experimental work in natural populations will probably allow us to measure more directly how selection acts on the genome (note that selection within a generation might be measured even in long-lived organisms where evolution between generations cannot be readily addressed). In turn, this will allow us to reconstruct how genomic divergence unfolds across the speciation continuum in single study systems. We can then compare study systems to search for generalities. If such work is conducted using an integration of ecological and molecular genomic approaches, it will probably yield a new understanding of the ‘natural history of the genome’, and of its relevance for understanding the origins of diversity (figure 2). Although much progress remains to be made, it is often clear what needs to be done, and the tools and expertise to make progress exist.

A diagrammatic depiction of how integrative approaches might yield an understanding of the ‘natural history of the genome’, and of how speciation and genomic divergence unfolds over time. RI, reproductive isolation (note that further work on the relationship between divergence and RI is sorely needed) [38].

Acknowledgements

We thank all the authors for their involvement in the Theme Issue and reviewers of the articles for improving the quality of the contributions.

2010Islands of speciation or mirages in the desert? Examining the role of restricted recombination in maintaining species (Correction to Heredity 2009, vol. 103, p. 434). Heredity104, 418.doi:10.1038/hdy.2010.13 (doi:10.1038/hdy.2010.13)

2009Chromosomal inversions and species differences: when are genes affecting adaptive divergence and reproductive isolation expected to reside within inversions?Evolution63, 3061–3075.doi:10.1111/j.1558-5646.2009.00786.x (doi:10.1111/j.1558-5646.2009.00786.x)