Affiliations
Department of Plant Sciences, University of California Davis, Davis, California, United States of America,
Genome Center and Center for Population Biology, University of California Davis, Davis, California, United States of America

Figures

Abstract

The evolutionary significance of hybridization and subsequent introgression has long been appreciated, but evaluation of the genome-wide effects of these phenomena has only recently become possible. Crop-wild study systems represent ideal opportunities to examine evolution through hybridization. For example, maize and the conspecific wild teosinte Zea mays ssp. mexicana (hereafter, mexicana) are known to hybridize in the fields of highland Mexico. Despite widespread evidence of gene flow, maize and mexicana maintain distinct morphologies and have done so in sympatry for thousands of years. Neither the genomic extent nor the evolutionary importance of introgression between these taxa is understood. In this study we assessed patterns of genome-wide introgression based on 39,029 single nucleotide polymorphisms genotyped in 189 individuals from nine sympatric maize-mexicana populations and reference allopatric populations. While portions of the maize and mexicana genomes appeared resistant to introgression (notably near known cross-incompatibility and domestication loci), we detected widespread evidence for introgression in both directions of gene flow. Through further characterization of these genomic regions and preliminary growth chamber experiments, we found evidence suggestive of the incorporation of adaptive mexicana alleles into maize during its expansion to the highlands of central Mexico. In contrast, very little evidence was found for adaptive introgression from maize to mexicana. The methods we have applied here can be replicated widely, and such analyses have the potential to greatly inform our understanding of evolution through introgressive hybridization. Crop species, due to their exceptional genomic resources and frequent histories of spread into sympatry with relatives, should be particularly influential in these studies.

Author Summary

Hybridization and introgression have been shown to play a critical role in the evolution of species. These processes can generate the diversity necessary for novel adaptations and continued diversification of taxa. Previous research has suggested that not all regions of a genome are equally permeable to introgression. We have conducted one of the first genome-wide assessments of patterns of reciprocal introgression in plant populations. We found evidence that suggests domesticated maize received adaptation to highland conditions from a wild relative, teosinte, during its spread to the high elevations of central Mexico. Gene flow appeared asymmetric, favoring teosinte introgression into maize, and was widespread across populations at putatively adaptive loci. In contrast, genomic regions near known domestication and cross-incompatibility loci appeared particularly resistant to introgression in both directions of gene flow. Crop-wild study systems should play an important role in future studies of introgression due to their well-developed genomic resources and histories of reciprocal gene flow during crop expansion.

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Funding: PL and NCE acknowledge support from UC MEXUS. TP received support from the Academy of Finland. This work was supported by US–NSF grant IOS-0922703 and USDA–National Institute of Food and Agriculture grant 2009-01864. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Hybridization and subsequent introgression have long been appreciated as agents of evolution. Adaptations can be transferred through these processes upon secondary contact of uniquely adapted populations or species, in many instances producing the variation necessary for further diversification [1]. Early considerations of adaptive introgression discussed its importance in the context of both domesticated and wild species [2], [3], viewing both anthropogenic disturbance and naturally heterogeneous environments as ideal settings for hybridization. More recently, studies of adaptation through introgression have focused primarily on wild species ([4], [5] but see [6], [7]). Well-studied examples include increased hybrid fitness of Darwin's finches following environmental changes that favor beak morphology intermediate to that found in extant species [8], [9] and the introgression of traits related to herbivore resistance [10] and drought escape [11] in species of wild sunflower [12], [13]. Molecular and population genetic analyses have also clearly identified instances of adaptive introgression across species at individual loci, including examples such as the RAY locus controlling floral morphology and outcrossing rate in groundsels [14]) and the optix gene controlling wing color in mimetic butterflies [15], [16]. Despite long-standing interest in introgression, however, genome-wide analyses are rare and have been primarily conducted in model systems [17]–[22].

Studies of natural introgression in cultivated species have been limited in genomic scope and have largely ignored the issue of historical adaptive introgression, focusing instead on contemporary transgene escape and/or the evolution of weediness [23]–[27]. One notable exception is recent work documenting introgression between different groups of cultivated rice in genomic regions containing loci involved in domestication [19], [28]–[30]. Few studies, however, have investigated the potential for introgression to transfer adaptations between crops and natural populations of their wild relatives post-domestication. Subsequent to domestication, most crops spread from centers of origin into new habitats, often encountering locally adapted populations of their wild progenitors and closely related species (e.g., [31]–[33]). These crop expansions provide compelling opportunities to study evolution through introgressive hybridization.

Here, we use a SNP genotyping array to investigate the genomic signature of gene flow between cultivated maize and its wild relative Zea mays ssp. mexicana (hereafter, mexicana) and examine evidence for adaptive introgression. Maize was domesticated approximately 9,000 BP in southwest Mexico from the lowland teosinte taxon Zea mays ssp. parviglumis (hereafter, parviglumis; [34]–[36]). Following domestication, maize spread to the highlands of central Mexico [34], [37], a migration that involved adaptation to thousands of meters of changing elevation and brought maize to substantially cooler (∼7°C change in annual temperature) and drier (∼300 mm change in annual precipitation) climes [38]. During this migration maize came into sympatry with mexicana, a highland teosinte that diverged from parviglumis ∼60,000 BP [39].

Convincing morphological evidence for introgression between maize and mexicana has been reported [40], [41], and traits putatively involved in adaptation to the cooler highland environment such as dark-red and highly-pilose leaf sheaths [42], [43] are shared between mexicana and highland maize landraces [40], [44]. These shared morphological features could suggest adaptive introgression [45] but could also reflect parallel or convergent adaptation to highland climate or retention of ancestral traits [46]. Though hybrids are frequently observed, phenological isolation due to flowering time differences [40], [47] and cross-incompatibility loci [48]–[50] are thought to limit the extent of introgression, particularly acting as barriers to maize pollination of mexicana. Experimental estimates of maize-mexicana pollination success (i.e., production of hybrid seed) are quite low, ranging from <1–2% depending on the direction of the cross [51], [52]. Nevertheless, theory suggests that alleles received through hybridization can persist and spread despite such barriers to gene exchange, particularly when they prove adaptive [53], [54].

Molecular analyses over the last few decades have provided increasingly strong evidence for reciprocal introgression between mexicana and highland maize landraces. Early work identified multiple allozyme alleles common in highland Mexican maize and mexicana but rare in closely related taxa or maize outside of the highlands [55]. Likewise, sequencing of the putative domestication locus barren stalk1 (ba1) revealed a haplotype unique to mexicana and highland Mexican maize [56]. Multiple studies have found further support for bidirectional gene flow and have estimated that ∼2–10% of the genome of highland maize is derived from mexicana[34], [57] and 4–8% of the mexicana genome is derived from maize [58]. A more recent study including several hundred markers revealed that admixture with mexicana may approach 20% in highland Mexican maize [36].

Similar to introgression studies in many other plant species (e.g., [31], [59]–[62]), morphological and molecular studies have only provided rough estimates of the extent of introgression between mexicana and maize. Little is known regarding genome-wide patterns in the extent and directionality of gene flow. A genomic picture of introgression could greatly expand our understanding of evolution through hybridization, revealing how particular alleles, genes and genomic regions are disproportionately shaped by and/or resistant to these processes [63], [64]. Additionally, assessment of introgression in crop species during post-domestication expansion can provide insight into the genetic architecture of adaptation to newly encountered abiotic and biotic conditions. Here, we provide the most in-depth analysis to date of the genomic extent and directionality of introgression in sympatric collections of maize and its wild relative, mexicana, based on genome-wide single nucleotide polymorphism (SNP) data. We find evidence for pervasive yet asymmetric gene flow in sympatric populations. Across the genome, several regions introgressed from mexicana into maize are shared across most populations, while little consistency in introgression is observed in gene flow in the opposite direction. These data, combined with analysis of environmental associations and a growth chamber experiment, suggest that maize colonization of highland environments in Mexico may have been facilitated by adaptive introgression from local mexicana populations.

Results

Polymorphism and Differentiation

To assess the extent of hybridization and introgression we collected nine sympatric population pairs of maize and mexicana and one allopatric mexicana population from across the highlands of Mexico (Table S1; Figure 1) and genotyped 189 individuals for 39,029 SNPs (see Materials and Methods). Genotype data at the same loci were obtained from Chia et al.[65] for a reference allopatric maize population. Average expected heterozygosity (HE), percent polymorphic loci (%P), and the proportion of privately segregating sites were higher in maize than mexicana (t-test, p≤0.012 for all comparisons, Table S2), likely influenced by the absence of mexicana from the discovery panel used to develop the genotyping platform [66]. However, substantial variation in diversity was observed across populations within taxa (e.g., %P ranged from 52–88% in maize and from 44–79% in mexicana (Table S2)) and meaningful comparisons can be made at this level. Our analysis of diversity identified the Ixtlan maize population as an extreme outlier, containing 31% fewer polymorphic markers than any other maize population. Discussion with farmers during our collection revealed that Ixtlan maize was initially a commercial variety whose seed had been replanted for a number of generations. Excluding this population, diversity in mexicana populations varied much more substantially than in maize (e.g., variance in %P across mexicana populations was 7-fold higher; Table S2)

At the population level, summary statistics of diversity and differentiation were consistent with sympatric gene flow (i.e., local gene flow based on current plant distributions) between maize and mexicana (Figure 2). First, %P was positively correlated between sympatric population pairs (R2 = 0.65; p = 0.016; Figure 2A), though this trend could reflect local conditions affecting diversity in both taxa rather than gene flow. Second, in a subset of populations, the proportion of shared polymorphisms was higher (Figure 2B) and pairwise differentiation (FST) was lower (Figure 2C) between sympatric population pairs than in allopatric comparisons. Finally, an individual-based STRUCTURE analysis assuming two groups (K = 2) revealed strong membership of reference allopatric individuals of maize and mexicana in their appropriate groups (96% and 99% respectively), yet appreciable admixture in sympatric populations (Figure 2D). Four recent hybrids were identified (3 mexicana and 1 maize) with <60% membership in their respective groups. STRUCTURE analysis also indicated that gene flow was asymmetric, with more highland maize germplasm derived from mexicana (19% versus 12% of mexicana germplasm from maize). Assignment at higher K values continued to indicate admixture in mexicana populations but not in maize, suggesting that gene flow from mexicana into maize may have been more ancient (Figure S1). Consistent with this interpretation, median values of the f3 statistic [67] for SNPs genome-wide were negative or zero for 8 of 9 sympatric maize populations (Figure S2); only the Ixtlan maize population showed a positive median f3 signifying a lack of admixture. Collectively, these population-level summaries are suggestive of historical gene flow from mexicana into maize and, in a subset of populations, of ongoing sympatric gene flow from maize into mexicana.

(A) Correlation of percent polymorphic loci in sympatric populations of mexicana and maize. (B) Proportion of shared and privately segregating polymorphisms in mexicana and maize and fixed differences between taxa. Letters above bars indicate sympatric maize/mexicana comparisons (S), maize from a given population versus allopatric mexicana (Ax) and mexicana from a given population versus allopatric maize (Az). (C) Pairwise differentiation (FST) in sympatric and allopatric comparisons of mexicana and maize. (D) Bar plot of assignment proportions from STRUCTURE analysis at K = 2 for mexicana (maroon) and maize (gold) individuals. The Ixtlan maize population was excluded from this figure and the STRUCTURE analysis.

Variation in Introgression Levels across the Genome

Meaningful information regarding the evolutionary significance of introgression can often be obscured in population-level summaries. However, the large number of SNPs in our data set allowed us to assess variation in the extent of introgression across the genome. We made use of two complementary methods. First, we employed the hidden Markov model of HAPMIX [68] to infer ancestry of chromosomal segments along the genomes of individuals from maize and mexicana populations through comparison to reference allopatric populations. Subsampling of the reference allopatric populations (see Materials and Methods) revealed considerable signal of introgression in the maize reference panel, particularly in low recombination regions of the genome near centromeres (correction for this signal is illustrated in Figure 3 and Figure S3). While this signal could represent genuine introgression predating allopatry, it could also indicate potential false positives in genomic regions with high linkage disequilibrium or less data. We therefore added a complementary analysis using the linkage model of STRUCTURE [69], [70] to conduct site-by-site assignment across the genomes of mexicana and maize. Because STRUCTURE takes allele frequencies across all populations into account during assignment, the approach is robust to potential deviations of individual reference populations from ancestral frequencies.

(A) Stacked bar plots of the HAPMIX introgression scan across sympatric populations. Population labels are indicated between plots for maize (gold) and mexicana (maroon). Lighter colors indicate introgression initially detected in each population and darker colors show these values after subtracting introgression proportions from jackknife samples of the allopatric reference populations that may be due to false positives. A dotted black line indicates the position of the centromere. (B) Stacked bar plots of the STRUCTURE introgression scan across sympatric populations. The mexicana group is indicated by maroon and the maize group is indicated by gold. The y-axis for each population in (A) and (B) indicates the average admixture proportion across individuals. (C) Genomic regions in maize populations showing greater than 50% membership in mexicana.

Both methods allowed quantification of introgression along the genome for individual samples. Rather than investigate every putative introgression, however, we focused further analyses on genomic regions with a high frequency of introgression, requiring an average of one chromosome or 50% assignment to the opposite taxon per individual in a given population (Figure 3; Figure S3; referred to as “introgressed regions” hereafter). Approximately 19.1% and 9.8% of the genome met this criterion in the HAPMIX and STRUCTURE scans respectively for mexicana introgression into maize. In the opposite direction, we observed lower proportions at this threshold (11.4% in the case of HAPMIX and 9.2% using STRUCTURE), corroborating asymmetric gene flow favoring mexicana introgression into maize. Both scans showed a disproportionate number of introgressed regions shared across populations in mexicana-to-maize gene flow. Roughly 50% of regions introgressed from mexicana into maize were shared across seven or more populations in the HAPMIX scan, whereas only 4% of introgressed regions had this level of sharing from maize into mexicana; similar asymmetry was observed using STRUCTURE (12% versus <1%).

By comparing composite likelihood scores from HAPMIX across individuals within each population, we were able to characterize relative times since admixture (see Materials and Methods). We observed qualitative differences between maize and mexicana. The likelihood of the admixture time parameter began to decrease markedly after an average of 83 generations in mexicana populations, whereas the decrease in maize was much more gradual and did not occur until after an average of 174 generations (Figure S4; averages exclude Ixtlan) suggesting older introgression from mexicana into maize. A notable exception to this trend was observed in the Ixtlan sympatric population pair, where the maize population was likely derived in the recent past from a commercial variety and introgression appeared to be more recent from mexicana into maize (Figure S4).

For further population genetic characterization, we focused on the subset of introgressed regions identified in both the HAPMIX and STRUCTURE scans, an approach that should be robust to the individual assumptions of the two methods. These regions spanned an average of 3.6% of the genome in the case of mexicana-to-maize introgression and 3.2% for maize-to-mexicana introgression (Figure 3C; Figure S3). As expected, differentiation between sympatric maize and mexicana was reduced in these introgressed regions in both directions of gene flow (mean 25% reduction of FST mexicana-to-maize, 33% reduction maize-to-mexicana, t-test, p<0.001 for all population-level comparisons of introgressed vs. non-introgressed regions in both directions of gene flow). Introgressed regions also showed more shared and fewer fixed and private SNPs (Table S3), as well as longer tracts of identity by state (IBS) between maize and mexicana (t-test, p<<0.001). Consistent with these results, diversity in introgressed regions was generally different from non-introgressed regions in the recipient taxon and instead comparable to diversity in non-introgressed regions in the taxon of origin (Table S3).

In total, we identified nine regions of introgression from mexicana to maize found by both methods and present in ≥7 sympatric population pairs (Table S4). Three of these shared regions of introgression span the centromeres of chromosomes 5, 6, and 10 (Figure S3), suggesting that maize from the highlands of Mexico may in fact harbor mexicana centromeric or pericentromeric sequence. No such shared introgressions were found in the opposite direction of gene flow (maize into mexicana).

Finally, we characterized regions of the genome notably lacking evidence of introgression. We refer to regions with ≤5% probability of introgression confirmed by both scans in ≥7 populations as being resistant to introgression (Figure S5). In both directions of gene flow, we found these genomic regions to have elevated differentiation, decreased diversity, fewer shared variants, more fixed differences, and a higher number of privately segregating SNPs in the opposite taxon (Table S3).

Evaluating Evidence for Adaptive Introgression

Two non-mutually exclusive hypotheses of adaptive introgression can be readily discerned for gene flow between mexicana and maize: 1) as its natural habitat was transformed, mexicana received maize alleles conferring adaptation to the agronomic setting and 2) as it diffused to the highlands of central Mexico from the lowlands of southwest Mexico, maize received alleles conferring highland adaptation from mexicana, which was already adapted to these conditions. To evaluate evidence for the first hypothesis we gauged enrichment of 484 candidate domestication genes [71] in regions of introgression. We hypothesized that if maize donated alleles adaptive for the agronomic setting to mexicana, we would detect enrichment of domestication loci in regions introgressed from maize into mexicana. However, compared to the rest of the genome, introgressed regions in both directions of gene flow harbored significantly fewer domestication candidates (permutation test, p≤0.001), while regions resistant to introgression showed an excess of domestication candidates (permutation test, p = 0.121 maize to mexicana, p = 0.008 mexicana to maize; Figure S5). For example, two well-characterized domestication genes affecting branching architecture, grassy tillers1 (gt1; [72], [73]) and teosinte branched1 (tb1; [74]) showed very little evidence of introgression (Figure S5). Introgression also appeared to be rare from maize into mexicana across much of the short arm of chromosome 4, a span that includes the domestication loci teosinte glume architecture1 (tga1; [75]), sugary1 (su1; [76]) and brittle endosperm2 (bt2; [76]) and the well characterized pollen-pistil incompatibility locus teosinte crossing barrier1 (tcb1; [48]) that serves as a hybridization barrier between maize and mexicana (Figure S5). These results suggest selection against introgression at loci that contribute to domestication and reproductive isolation.

Several lines of evidence support the hypothesis that maize received introgression conferring highland adaptation from mexicana. Across the nine shared introgressed regions, five contained long stretches (>300 kb) of zero diversity across seven populations, implying a common introgressed haplotype (Figure S6). Given that these regions only have 5–15 SNPs, however, higher-density genotyping might resolve additional haplotypes. Additionally, we used the method of Coop et al. [77] to detect associations of population allele frequencies with 76 environmental variables (see Materials and Methods). Environmental variables were reduced in dimensionality to four principal components that captured 95% of environmental variation. We found that loci associated with the second principal component (loaded primarily by temperature seasonality) were significantly enriched (permutation test, p = 0.017) in genomic regions introgressed from mexicana into maize, but no significant enrichment was observed in regions introgressed from maize into mexicana. We then compared the nine regions of introgression found in ≥7 populations of maize to QTL for anthocyanin content and leaf macrohairs (putatively adaptive traits under highland conditions) identified in a previous study from a cross between parviglumis (lowland teosinte) and mexicana (highland teosinte) [42]. Six of the introgressed regions overlapped with five of the six genomic regions with QTL detected for these traits.

Two of the shared introgressions that overlapped with QTL are of particular interest due to their previous characterization. One of these, on chromosome 4, overlaps with QTL for both pigment intensity and macrohairs [42], and maps to the same position as a recently identified putative inversion polymorphism showing significant differentiation between parviglumis and mexicana ([78]; Figure 4A). The second region, on chromosome 9, overlaps with a QTL for macrohairs [42] and includes the macrohairless1 (mhl1) locus [79] that promotes macrohair formation on the leaf blade and sheath of maize (Figure 4B). The two lowest elevation maize populations in our study (Puruandiro and Ixtlan) showed a conspicuous lack of introgression in these two genomic regions (Figure 4A and 4B). Analysis of pairwise differentiation (FST) between these populations and two populations showing fixed introgression in the two genomic regions (Opopeo and San Pedro; Figure 4A and 4B) revealed substantial differentiation: the region on chromosome 4 contained the only fixed SNP differences genome-wide (Puruandiro/Ixtlan versus Opopeo/San Pedro) and a SNP in the region on chromosome 9 was an extreme FST outlier. To explore the potential phenotypic effects of these genomic regions we conducted growth chamber experiments including ten maize plants from each of these four populations. Under temperature and day-length conditions typical of the highlands of Mexico (see Materials and Methods), the leaf sheaths of plants from populations where introgression was detected in the two genomic regions had 21-fold more macrohairs (t-test, p = 0.0002; Figure 4C and 4D), and showed greater pigmentation (t-test, p = 6E−06; Figure 4C and 4D). Introgressed plants were also ∼25 cm taller (t-test, p = 6E−06; Figure 4D), a finding consistent with adaptation to highland conditions and potentially associated with increased fitness. No significant difference in plant height was observed in a separate experiment under lowland conditions (t = test, p = 0.51), and a significant interaction was observed between introgression status and environmental treatment (ANOVA, F = 4.151, p = 0.045), with a disproportionate increase in plant height under lowland conditions in populations lacking introgression (Figure S7).

Contribution of mexicana to Modern Maize Lines

While our scans for introgression clearly indicated that mexicana has made genomic contributions to maize landraces in the highlands of Mexico, the broader contribution of mexicana to modern maize lines remained unclear. Our HAPMIX and STRUCTURE analyses had low power to detect introgression distributed broadly in maize (see Discussion). Therefore, to assess potential ancestral contribution of mexicana to modern maize, we evaluated patterns of IBS between mexicana, parviglumis[78] and a global diversity panel of 279 modern maize lines [80], [81] using the program GERMLINE ([82]; Figure 5, Figure S8 and S9). Substantial IBS was found between mexicana and modern lines at a number of genomic locations. To assess whether this IBS merely reflected shared ancestral haplotypes, we compared IBS between modern maize and parviglumis to IBS between modern maize and mexicana on a site-by-site basis, identifying regions in which various maize groups distinguished by Flint-Garcia et al. [81] showed stronger IBS with mexicana relative to parviglumis (see Materials and Methods; Figure 5A; Figure S8). As each of the groups identified by Flint-Garcia have distinct evolutionary histories, it is possible that mexicana contributed differentially to the founders of each group. For example, the tropical-subtropical, non-stiff-stalk, and mixed groups showed more genomic regions with stronger IBS with mexicana (versus parviglumis) than found in the stiff-stalk, popcorn, and sweetcorn groups (∼31% of sites with greater IBS with mexicana in the first group versus ∼23% in the latter group; Figure 5B and 5C).

(A) The difference between the average IBS proportion with mexicana individuals and the average IBS proportion with parviglumis individuals calculated across six groups of modern maize lines identified by Flint-Garcia et al. [81] in the maize association population. Positive values indicate greater IBS with mexicana. (B) Average IBS across chromosome 3 in each line in the maize association population compared to both mexicana and parviglumis. The one-to-one line is indicated by the dashed line. Colors are as in Figure 5A. (C) The proportion of sites across the genome showing greater IBS with mexicana than with parviglumis for each of the six maize association population groups.

Discussion

Despite known pre-zygotic and phenological barriers to hybridization between maize and mexicana[47]–[50], we have found evidence consistent with substantial reciprocal introgression. Based on our population genetic analyses, several observations regarding the nature of this gene flow can be made: 1) Gene flow appears to be ongoing and asymmetric, favoring mexicana introgression into maize. 2) Gene flow from mexicana into maize is generally older than gene flow in the opposite direction. 3) Haplotype diversity in nine genomic regions of mexicana-into-maize introgression shared across ≥7 populations suggests single, ancient introgressions followed by spread across the Mexican highlands. 4) Introgression from mexicana into maize is restricted at domestication loci but enriched at loci putatively involved in highland adaptation. 5) Genomic regions of mexicana/maize IBS within a global diversity panel of maize hint at a possible broader contribution of mexicana to modern improved maize.

Several of these observations are in line with previous research. For example, the asymmetric gene flow we detect from mexicana to maize is consistent with findings of substantially higher pollination success in this direction [51]. Asymmetric gene flow would also be expected based on phenology: in Mexico, maize typically flowers earlier than mexicana[47] and pollen shed in both taxa precedes silking (female flowering). Therefore, when maize silks are receptive, mexicana could potentially be shedding pollen, whereas when mexicana silks are receptive, maize tassels are more likely to be senescent. Under these conditions, F1 progeny would be more likely to have a maize seed parent and a teosinte pollen parent and subsequent inadvertent planting of F1's in maize fields would bias the direction of gene flow.

Our data also provide support for previous assertions that shared morphological features between mexicana and maize represent adaptations derived from mexicana[45] rather than from maize [41]. For example, we have found significant environmental correlations in genomic regions of mexicana-to-maize introgression. We have also observed that overlap with QTL and fine-mapped loci for highland Zea traits (e.g., leaf sheath macrohairs and pigmentation) are predominantly found in the direction of mexicana to maize gene flow. Two such regions, on chromosomes 4 and 9, showed particularly strong evidence of introgression. Moreover, these genomic regions of introgression were more common in higher elevation maize populations in our sample, and maize populations with and without introgression in these regions showed differential morphology and greater plant height (a proxy for fitness) when grown under highland conditions. In contrast, we found little evidence of adaptive introgression in the opposite direction of gene flow. For example, domestication loci appeared resistant to gene flow from maize into mexicana, contradicting previous suggestions that gene flow from maize may have been required for mexicana to adapt to an agronomic setting [41]. Instead it appears likely that mexicana, like other wild teosintes [83], was a ruderal species adapted to open and disturbed environments even before the transformation of its natural habitat by maize cultivation.

Our detection of haplotype sharing between mexicana and a diverse panel of modern maize is consistent with previous findings suggesting the spread of introgressed mexicana haplotypes in maize outside of the highlands of Mexico [71]. Both the STRUCTURE and HAPMIX methods we used to identify regions of introgression would likely not detect introgression found ubiquitously in modern maize. Widespread mexicana introgression into maize would result in poor resolution between reference populations of these taxa in the HAPMIX analysis, and extensive haplotype sharing across maize and mexicana would result in a weak signature of introgression in STRUCTURE. Further analysis of representative panels of mexicana, parviglumis and maize haplotypes at greater marker density should help clearly distinguish mexicana from parviglumis haplotypes and determine whether mexicana haplotypes are indeed widespread in maize.

While our results are consistent with previous research and the historical spread of maize, our power to detect introgression may be limited for a number of reasons. First, our analysis conservatively focused on regions of introgression identified by two independent methods and shared across individuals within populations, undoubtedly missing a number of genuine instances of more limited gene flow. Second, our markers were ascertained in a panel consisting entirely of maize. In addition to inflating the diversity of maize relative to mexicana, this ascertainment scheme likely limited our ability to distinguish among mexicana haplotypes and thus to detect local introgression from mexicana into maize. Third, the resolution of our data was on average one SNP per 80 kb, which could result in a bias toward detection of more recent introgression and introgression in low recombination regions of the genome. Finally, mexicana only rarely occurs allopatric from maize [40], and most populations have likely experienced gene flow at some point in time, thus complicating estimation of ancestral mexicana haplotypes and allele frequencies.

Many aspects of mexicana's contribution to highland adaptation in maize remain to be resolved. While our growth chamber experiment was suggestive of adaptive introgression, the loci conferring these traits are still ambiguous. Repetition of these experiments with mexicana/lowland maize near-isogenic introgression lines will be necessary to bolster the case for adaptive introgression. Additionally, a particularly interesting comparison can be made between highland maize in central Mexico, a geographic region sympatric with mexicana, and highland maize in the Andes of South America where no inter-fertile wild Zea species can be found. Future research should address whether highland adaptation in South American maize occurred in parallel to maize from Mexico [37] or whether pre-adapted highland maize was transported through Central America as some have suggested [84].

The potential for adaptive introgression during crop expansion is of course not limited to maize. Data from several crops (e.g., rice [19], [85], barley [86], [87], common bean [88], and wheat [32], [89]) suggest defined centers of origin within a broader distribution of wild relatives. The distributions of these crop-wild pairs span continents and a wide range of environments, and many are known to hybridize (for a review, see [24]). The methods we have applied here to maize and mexicana can therefore be replicated widely, perhaps revealing unexpected aspects of crop evolution and providing insight regarding the genetic architecture of local adaptation based on conserved regions of introgression.

Crops and related wild taxa can also be seen more broadly as models for the study of evolution through hybridization. If crops are viewed as human-facilitated invasive species, clear connections can be made to theoretical work on introgression during invasion and range expansion. For example, our finding of asymmetric gene flow from mexicana into maize is consistent with simulations showing that invaders should receive much higher levels of introgression from local species than occurs in the opposite direction due to differences in population density at the time of invasion [90], [91]. Theoretical research has also explored the divergence threshold for successful hybridization and introgression [53], [92]. Crop expansions are ideal systems to test such predictions because, as ancient agriculturalists moved crops away from their centers of origin, these domesticates came into sympatry with relatives spanning a range of divergence times. For example, parviglumis, the progenitor of maize, has a divergence time from mexicana estimated at 60,000 years, from other members of the genus on the order of 100,000–300,000 years, and from the outgroup Tripsacum dactyloides of approximately 1 million years [39]. While parviglumis is currently physically isolated from these taxa and likely was at the time of domestication [38], maize has subsequently come into sympatry with virtually all of its close relatives, providing extensive opportunities for hybridization. These newly-formed hybrid zones can be seen as testing grounds of the fitness of hybrids across a range of divergence and opportunities to study the evolution of barriers to hybridization.

Materials and Methods

Sample Collection and Genotyping

Samples were collected from nine sympatric population pairs of mexicana and maize that spanned the known distribution of mexicana in Mexico, as well as a single allopatric population of mexicana (Table S1; Figure 1). Seed samples from 12 maternal individuals per mexicana population (N = 120) were selected for genotyping. A single kernel was also sampled from each of 6–8 maize ears collected from sympatric maize fields (N = 69). The tenth kernel down from the tip of each ear was chosen to help control for potential variation in outcrossing rate along the ear. Seeds were treated with fungicide, germinated on filter paper and grown in standard potting mix to the five-leaf stage. Freshly harvested leaf tips were stored at −80°C overnight and lyophilized for 48 hours. Tissue was then homogenized with a Mini-Beadbeater-8 (BioSpec Products, Inc., Bartlesville, OK, USA) and DNA was isolated using a modified CTAB protocol [93]. Purity of DNA isolations was determined with a NanoDrop spectrophotometer (NanoDrop Technologies, Inc., Wilmington, DE, USA). Samples with 260∶280 ratios ≥1.8 were deemed acceptable for genotyping. Concentrations of DNA isolations were determined with a Wallac VICTOR2 fluorescence plate reader (Perkin-Elmer Life and Analytical Sciences, Torrance, CA, USA) using the Quant-iT Picogreen dsDNA Assay Kit (Invitrogen, Grand Island, NY, USA). Single nucleotide polymorphism genotypes were generated using the Illumina MaizeSNP50 Genotyping BeadChip platform and were clustered separately for the two taxa based on the default algorithm of the GenomeStudio Genotyping Module v1.0 (Illumina Inc., San Diego, CA, USA). Clustering for each SNP in each taxon was subsequently inspected and manually adjusted. Of the total of 56,110 markers contained on the chip, 39,029 SNPs that were polymorphic within the entire sample of maize and mexicana and contained less than 10% missing data in both taxa were used for further analysis.

Diversity Analyses

Observed (HO) and expected (HE) heterozygosities were summarized for each taxon in each sympatric population pair using the “genetics” package in R [94]. Polymorphisms were further characterized as shared, fixed, or segregating privately within one of each pair of sympatric populations using the sharedPoly program of the libsequence C++ library [95]. Pairwise differentiation between populations (FST) was calculated based on the method of Weir and Cockerham [96] using custom R scripts and the “hierfstat” package of R [97]. The f3 statistic for identification of admixture [67] was calculated using a custom R script.

Detecting Introgression

To characterize patterns of introgression across the genome in each population we used two complementary methods: 1) Identification of ancestry across chromosomal segments with the hidden Markov model approach of HAPMIX [68]; and 2) A site-by-site analysis of assignment probabilities using the Bayesian linkage model in the program STRUCTURE [69], [70]. For both HAPMIX and STRUCTURE analyses, we used a subset of 38,262 SNPs anchored in a genetic map based on the Intermated B73×Mo17 (IBM) population of maize ([66]; J.P. Gerke et al., unpublished data). The IBM population has been widely used for genetic map development and for determining the genetic architecture of complex traits in maize [98].

Patterns of introgression were assessed using the program HAPMIX by comparing unphased data from putatively admixed individuals from our sympatric populations to phased data from reference ancestral populations. To represent ancestral mexicana haplotypes, we chose a population near the town of Amatlán, Morelos state, Mexico that is currently allopatric to maize. An Americas-wide sample of maize landraces collected largely outside the distribution of mexicana was chosen as the maize reference population [65]. In order to assess putative introgression and/or false positives in these reference populations, we removed each individual and evaluated introgression through comparison to remaining reference samples using a jackknife approach. Evidence for introgression was assessed in both putatively admixed and reference individuals using HAPMIX as described below.

Initial estimates of ancestry proportions for HAPMIX models were based on a previous admixture analysis of mexicana and highland Mexican maize (∼20% introgression of mexicana into maize and ∼10% introgression of maize into mexicana; [36]). The number of generations since the time of admixture was varied from 1–5000 and the maximum likelihood across individuals in a population was used to compare relative time since admixture on a population-by-population basis (Figure S4). Subsequent analyses of HAPMIX output were based on introgression estimates from the highest likelihood run.

Prior to analysis in STRUCTURE, SNP data were phased using the program fastPHASE (version 1.4.0; [99]). Because STRUCTURE does not account for linkage disequilibrium (LD) due to physical linkage, SNPs were grouped into haplotypes separated by at least 5 kb. After grouping, our data set consisted of 20,035 loci with an average of 3.92 alleles per locus across all sympatric and reference allopatric individuals. We ran the linkage model in STRUCTURE with 5,000 steps of admixture burn-in, a total burn-in of 10,000 steps, and 100,000 subsequent steps retained for analysis. Convergence along the chain and consistency across replicate runs were assessed to ensure an adequate number of steps were included in the analysis. Assignment was carried out for K = 2 groups (i.e., maize and mexicana) for each chromosome separately. Probability of assignment was summarized locus by locus across individuals from each population for each taxon.

Local Adaptation at Introgressed Loci

To identify SNPs associated with environmental variables, we employed the association method of BAYENV [77], using a covariance matrix of allele frequencies estimated with 10,000 random SNPs to control for population structure. Seventy-six climatic and soil variables were summarized as four principal components that captured 95% of the variance among mexicana populations. BAYENV was run five times with 1,000,000 iterations for each SNP. A given SNP was considered a candidate if its Bayes factor was consistently in the 95th percentile across all five independent runs and its average Bayes factor was in the 99th percentile. Enrichment of significant SNPs in introgressed regions was determined based on bootstrap resampling for each environmental PC.

Haplotype Sharing

Analyses of haplotype sharing/identity by state between mexicana, parviglumis, and modern maize lines were conducted using the program GERMLINE [82] with haplotypes generated by the program fastPHASE [99] from samples of parviglumis[78] and modern maize [80]. Shared haplotypes were identified with a seed of identical genotypes at five SNPs that were extended until mismatch. Analyses were then based on segments with a minimum size of 3 cM.

Growth Chamber Experiment

Ten seeds were germinated from each of four maize populations showing little evidence of introgression (Ixtlan and Puruandiro) or fixed introgression (Opopeo and San Pedro) at two loci (one on chromosome 4 and one on chromosome 9; Table S4) putatively linked to highland adaptation [42], [78] and showing little evidence of false positives in our reference populations. Plants were grown under highland conditions with 12.5 hours of light at an intensity of 680 µmol/m2*s, a daytime temperature of 23°C and a nighttime temperature of 11°C. Daytime relative humidity was set at 60% and nighttime relative humidity at 80%. Height measurements were taken at 15, 30, and 50 days. Pigment extent was measured on the second leaf sheath from the top of the plant as the proportion of the total sheath showing pigment. Macrohairs were also measured on this leaf sheath as the total count one third of the way down from the leaf blade within the field of a dissecting microscope at 2× magnification. In order to contrast plant height from our highland treatment to those under conditions more comparable to the lowlands of western Mexico, we conducted a separate growth chamber experiment with a daytime temperature of 32°C and a nighttime temperature of 25°C and measured plant height at 30 days. All other conditions were identical to those of the highland treatment.

The difference between IBS modern maize/mexicana and IBS modern maize/parviglumis across each chromosome. All plots are as in Figure 5A. Dashed lines indicate genomic regions of mexicana introgression into highland Mexican maize conserved across ≥7 populations.

Population genetic summaries from introgressed regions and regions resisting introgression. Parameters are as in Table S2. Significant differences (permutation or t-test, p<0.05) between introgressed and non-introgressed regions are indicated as bold values.

Acknowledgments

We thank Lauren Sagara and Pui Yan Ho for assistance with genotyping and the growth chamber experiments and Elena Alvarez-Buylla for assistance during sample collection. John Doebley, Sofiane Mezmouk, and Shohei Takuno provided comments on a previous version of the manuscript. Graham Coop, Peter Morrell, and John Novembre offered helpful discussion.

44.
Collins GN (1921) Teosinte in Mexico - The closest wild relative of maize is teosinte - a forage plant hitherto known only as an annual. A perennial form discovered in southern Mexico should prove of value to the breeder. J Hered 12: 339–350.