Abstract

The plant Silene latifolia has separate sexes and sex chromosomes, and is of interest for studying the early stages of sex chromosome evolution, especially the evolution of non-recombining regions on the Y chromosome. Hitch-hiking processes associated with ongoing genetic degeneration of the non-recombining Y chromosome are predicted to reduce Y-linked genes' effective population sizes, and S. latifolia Y-linked genes indeed have lower diversity than X-linked ones. We tested whether this represents a true diversity reduction on the Y, versus the alternative possibility, elevated diversity at X-linked genes, by collecting new data on nucleotide diversity for autosomal genes, which had previously been little studied. We find clear evidence that Y-linked genes have reduced diversity. However, another alternative explanation for a low Y effective size is a high variance in male reproductive success. Autosomal genes should then also have lower diversity than expected, relative to the X, but this is not found in our loci. Taking into account the higher mutation rate of Y-linked genes, their low sequence diversity indicates a strong effect of within-population hitch-hiking on the Y chromosome.

In evaluating the X–Y diversity difference as evidence for selection affecting Y chromosomes, other possible processes affecting diversity must be considered. The data so far collected cannot distinguish whether the difference is due to the predicted reduction in Y Ne, or to some process increasing high species-wide diversity for X-linked genes. Possibilities include a higher X than Y mutation rate (although this is the opposite of the expectation for any difference between the mutation rates of the two chromosomes, and the data so far available support that expectation, see Filatov & Charlesworth 2002; Filatov 2005), sexual selection causing a high variance in male reproductive success (Laporte & Charlesworth 2002) or introgression of sequences from another species; hybridization is well known to occur between S. latifolia and S. dioica (Baker 1948; Minder & Widmer 2007) and seems to have led to sequences within each species having recombined into X-linked genes of the other (Atanassov et al. 2001; Laporte et al. 2005). Population size changes also affect the relative diversity levels of genome regions with different Ne (Fay & Wu 1999; Pool & Nielsen 2007).

A way to test whether the diversity is reduced for Y-linked genes, or elevated for X-linked ones, is to compare diversity with that at autosomal loci. A previous study of a single autosomal gene (CCLS37.1) found higher diversity than for a Y-linked gene (SlY1), but considerably lower than for an X-linked gene studied in the same set of plants, leaving the possibility open that X-linked genes might have unusually high diversity; however, this gene may have experienced a recent selective sweep (Filatov et al. 2001). Sequences are available for only one other S. latifolia autosomal gene, Slop, whose silent site diversity is 0.65 per cent (Muir & Filatov 2007). A third estimate, from an Ap3 gene (Matsunaga et al. 2003), is no longer relevant, because the gene is probably X-linked (Ishii et al. 2008).

We therefore obtained new data on DNA sequence diversity in S. latifolia autosomal loci, and also increased the sample size for sex-linked genes by adding diversity data for three further gene pairs with homologues on the X and Y chromosomes, SlCyp and Sl7 (Bergero et al. 2007), and a newly discovered gene, SlXY9 (Kaiser et al. in press). Our new data, with more loci, and also from more S. latifolia populations than previous studies, show that Y diversity is considerably reduced, supporting the ongoing genetic degeneration hypothesis.

2. Material and methods

(a) Genes studied

To obtain autosomal genes, a large number of EST sequences were obtained by 454 sequencing of S. latifolia cDNAs (R. Bergero, S. Qiu, H. Borthwick, A. Forrest & D. Charlesworth 2008–2010, unpublished data). The Arabidopsis thaliana genome was used to identify likely orthologues, and genes belonging to large gene families (in A. thaliana) were discarded, as these are unlikely to provide simple genetic segregation data. Translated cDNA sequences were aligned with A. thaliana coding sequences (Swarbreck et al. 2008), using the software fasta (Pearson 1990). Single genes with e-value < 10−5 were retained and annotated in Sequencher v.4.5 (GeneCodes, Ann Arbor, MI, USA), using information from A. thaliana orthologues to suggest intron positions and functions. The genes studied, and the polymerase chain reaction (PCR) primers for each gene are given in table 1.

Genes studied in this paper and the primers used for PCR amplification.

(b) Silene latifolia families and population samples

To identify autosomal (and pseudoautosomal) genes, we studied segregation of length or sequence variants detected in families (following the approach previously used in this species, see Bergero et al. 2007). To study diversity, we sequenced eight autosomal genes, using a set of S. latifolia male plants (electronic supplementary material, table S2) from across Europe, including much of the likely native distribution of this species. These plants were grown from seeds collected from naturally pollinated females, and one male was sampled per locality, to provide a sample suitable for estimating effective population sizes from nucleotide diversity, and in which variant frequencies should follow a neutral coalescent unless selection occurs (whereas samples with multiple individuals per population are expected to have a different variant frequency spectrum, see Wakeley & Aliacar 2001). We also estimated diversity for two recently characterized sex-linked genes, SlCyp and Sl9X, which have been shown to be single-copy genes (Bergero et al. 2007), using two males per locality to obtain similar numbers of X- and Y-linked alleles to the autosomal sample. Because our samples from individual populations are small, our analyses of subdivision (see below) divided the S. latifolia populations into four geographical regions (electronic supplementary material, table S2), roughly corresponding to previously suggested races (e.g. Mastenbroek & Van Brederode 1986).

The orthologous genes were also sequenced from some plants from the closely related dioecious species, S. dioica (electronic supplementary material, table S2), and from S. vulgaris, a related species that does not have sex chromosomes, but whose populations are gynodioecious (with females and hermaphrodites, see Marsden-Jones & Turrill 1957; Baker 1966; Charlesworth & Laporte 1998).

(c) DNA extraction, PCR reaction and cloning

For the segregation study, seeds were germinated on Murashige and Skoog (Murashige & Skoog 1962) medium. DNA was extracted from 20-day-old seedlings using the FastDNA kit (Q-biogene) following the manufacturer's instructions. Seedlings' sexes were determined using a 293-bp indel fixed in intron 2 of all Y-linked copies of the SlCyp gene (Bergero et al. 2007). For the sequence diversity study, DNA was extracted from leaves collected from mature plants grown in greenhouse conditions, using the same method.

Primers for the genes newly sequenced in our study were designed using the EST sequences of S. latifolia (table 1). The primers for SlXY7 were SlX7f3 (GAATGGCAGAAATGTGGCAATGCAAC) and SlX7r2 (GGAAAGCTCCTTTTCAACGAAGTC), and, for SlCypXY, cypXf3 (AATTTTCTGTTACATCGCCTGATCGCAG) and cypR1 (AGTGGATCTCCTGTCTGTATCATGAAACC). Amplifications used a Finnzymes' Piko Thermal Cycler with Phire hot-start DNA polymerase (Finnzyme), under the following PCR conditions: one cycle of initial denaturation at 98°C for 30 s, 10 cycles at 98°C for 5 s, 65°C for 5 s, and 72°C for 50 s, and then 25 cycles of 98°C for 5 s, 60°C for 5 s and 72°C for 1 min, and finally one cycle at 72°C for 5 min.

The PCR amplicons were cleaned with ExoSAP-IT (Amersham Biosciences, Tokyo) and sequenced on an ABI 3730 capillary sequencer (Applied Biosystems). When no indel variants were present, the single nucleotide polymorphisms (SNPs) were verified by direct sequencing from both strands. Most such variants were found in multiple individuals. Amplicons containing intron regions were sometimes heterozygous for indels. In such individuals, and for plants with singleton variants, the DNA was re-amplified with a proofreading DNA polymerase (Phusion, Finnzyme) using the PCR conditions given above, and cloned into a T-tailed pBSKS+ vector (Stratagene) before sequencing.

As described below, two autosomal genes have high diversity. Our genetic results suggest that they are single-copy genes. However, these data apply only to the parents of the families used, and furthermore the genetic results for E393 are based on a single SNP, while for E391 we used an intron region, not the region whose diversity was estimated. To test for the existence of paralogous copies of these genes among the plants used in the diversity survey, we sequenced plants in family H2005-1 (electronic supplementary material, table S1), with the primers used in the survey. For E393, both the F1 parental plants are heterozygous for 13 SNPs, and 10 progeny sequenced include heterozygotes and both homozygotes, as expected for segregation of a single gene. For E391, the parents are heterozygous for a single SNP, which also segregated in the progeny. Therefore, in this family, these genes do not appear to be duplicated.

(d) Sequence analyses

The newly obtained sequences of each gene were first aligned with Sequencher v.4.5 and manually adjusted using the sequence alignment editor Se-Al v.2.0a11 (Se-Al: Sequence Alignment Editor, http://evolve.zoo.ox.ac.uk/). The alignments were submitted to GenBank, under accession numbers HM188567–HM189181. Polymorphism and haplotype analyses (including estimates of nucleotide diversity and population subdivision) were done using DnaSP v.5.00.06 (Librador & Rozas 2009) and MEGA 4 (Tamura et al. 2007). We estimated diversity for silent sites, because this relates closely to Ne values (Loewe et al. 2006).

Hudson-Kreitman-Aguadé (HKA) tests for differences in diversity between sets of genes (see §3) were done with a maximum-likelihood implementation of the test, multi-locus HKA (mlHKA) (Wright & Charlesworth 2004). These tests used the S. vulgaris sequences mentioned above, and divergence from the S. latifolia orthologues was estimated using DnaSP. Silent site divergence values for these two species range from about 9 to 22 per cent. We also did Tajima's (Tajima 1989a) and Fay and Wu's tests for selection, which depend on variant frequencies (see §3), and the DH test, which is designed to be less sensitive to changes in recent demographic history (Zeng et al. 2006). For these tests, we used variants of all site types, and S. vulgaris as the outgroup.

We tested for mutation rate differences in four new X–Y gene pairs, three genes used in our diversity analyses (SlXY7, Cyp-XY, SlXY9) and XY6a, which was not included in the diversity survey, because a duplicate gene exists (Bergero et al. 2007). We extracted non-coding and fourfold degenerate sites using DnaSP v.5.00.06 (Librador & Rozas 2009), and used the baseml program of the PAML v.3e package (Yang 2001) to obtain maximum-likelihood estimates of the relative substitution rates and perform likelihood ratio tests of the significance of silent substitution rate differences. We estimated the likelihood under two models (i) with three evolutionary rate parameters, one estimated for each of the X-linked, Y-linked and non-sex-linked (S. vulgaris), branches, and (ii) with the same rate for X- and Y-linked genes. Significance of differences between the X- and Y-linked genes was tested using a χ2 distribution with 1 d.f.

3. Results

(a) Discovery of autosomal genes

The eight autosomal genes in our diversity survey (table 1) were found to be single-copy and not sex-linked by studying their segregation ratios in families. Electronic supplementary material, table S1 shows that these genes almost always show the expected ratios, with no heterogeneity in ratios between the two sexes of progeny (other than for locus E163, which deviated significantly from the expected ratio in the female progeny, but with no significant heterogeneity between the sexes). Two further genes, E284 and E241, appear to be pseudoautosomal, and will be described in detail elsewhere.

Six of the eight autosomal genes also map to autosomal linkage groups with multiple other genes (R. Bergero, A. Forrest, H. Borthwick, S. Qiu & D. Charlesworth, unpublished data). For two genes, E163 and E265, pseudo-autosomal locations are not ruled out from these data. However, the genotypes in family G2008-3 (electronic supplementary material, table S1) allowed us to test E265 for linkage to the pseudoautosomal gene E241; no linkage was detected.

The haplotypes from males sampled from the natural populations further support autosomal inheritance for all eight genes. If a locus is sex-linked, we expect variants differentiating the Y-linked alleles from all X sequences. Thus males should always be heterozygous (unless the Y chromosome has lost the gene, in which case all males will appear to be homozygotes). Finding both heterozygous and homozygous males, as we found for all loci, including E163 and E265 (table 2) suggests that a gene is in a recombining region of the genome (it is unlikely that they carry an allele that does not amplify with our primers, because no null homozygotes were seen for any locus in our study).

(b) Autosomal locus diversity results and comparisons with X and Y

Figure 1 summarizes the currently available nucleotide diversity estimates based on silent sites (πsilent) for all genes studied so far, including previously published results, and three newly sequenced sex-linked genes, SlXY7, SlXY9 and SlCypXY. The Y-linked genes have much lower diversity values than X-linked or autosomal genes. The X–Y difference is significant by a multi-locus HKA test, using divergence from the homologous S. vulgaris sequences (p = 1.06 × 10−5), excluding SlSsX, where no Y sequences are available, and either including or excluding SlAp3 (where X-linkage is not firmly established, see Ishii et al. 2008); the Y-autosomal significance value was p = 2.35 × 10−4. Overall, πsilent for the eight autosomal loci is 2.88, versus 2.15 per cent for the seven X-linked genes, and 0.14 per cent for their Y-linked homologues. Although the diversity estimates vary widely among loci, the X-autosomal difference is significant (p = 0.0017) based on all the X-linked genes, but not if we exclude SlSsX (p = 0.92), which may have undergone a selective sweep (Filatov 2008), and differs significantly from the other loci.

Summary of nucleotide diversity estimates using silent sites for X, A and PAR genes of S. latifolia (grey bars) and for the Y-linked homologues of the X-linked loci (black bars). The numbers of sites analysed, and references for the data sources are in the electronic supplementary material, table S3.

The overall X/A diversity ratio is 0.747, very close to the value (0.75) expected with a 1 : 1 sex ratio and the same variance in male and female reproductive success, so that the effective size for X-linked genes is 0.75 that of autosomal genes (e.g. Laporte & Charlesworth 2002; Keinan et al. 2009). These assumptions predict an effective size for Y-linked genes one-fourth that of autosomal genes, but the Y/A silent site diversity ratio is 0.049 (one-twentieth). Together, these results suggest hitch-hiking processes caused by selection on the non-recombining Y chromosome, lowering the effective size for Y-linked genes below the expected value (see §1).

Two autosomal genes (E393 and E391) have very high diversity values, but mlHKA tests within the set of autosomal genes detect no statistically significant differences for these genes (see below). Our genetic results suggest that the E393 and E391 genes do not have duplicate copies, and neighbour-joining trees of the sequences contain no separate clusters of S. latifolia sequences (electronic supplementary material, figure S1). The high diversity values are not due to high mutation rates for these genes, because divergence estimates from S. vulgaris and S. dioica sequences are similar to the values for other loci. Finally, we tested for introgression in our S. latifolia populations, using sequences from S. dioica sampled from Southern Finland, Scotland, England, France and Italy (electronic supplementary material, table S2 and figure 1; both species co-occur in all regions except Finland). We found no fixed differences between the species (precluding a formal analysis of introgression), and, even with our non-exhaustive sampling, many polymorphisms are shared (24 for E393 and 7 for E391, versus 18 and 12 polymorphisms, respectively, exclusive to S. latifolia). Silene dioica sequences are also diverse (silent site π > 5% for both loci), but the haplotypes are mostly distinct from those of S. latifolia plants. The few clusters with sequences from both species have very low bootstrap support, and the sequences are mostly not extremely similar (electronic supplementary material, figure S1); for E391, no S. latifolia sequences are among the main S. dioica cluster. For E393, excluding two S. latifolia alleles that cluster with S. dioica sequences reduces diversity only slightly (from 0.077 to 0.075). Introgression from S. dioica thus seems unlikely to be the sole reason for high diversity of these genes, but we cannot exclude past introgression from other species not currently sympatric with S. latifolia.

We also tested for a higher Y than X mutation rate in the four new X–Y genes. The ratio of Y- to X-linked substitution rates, RY/X, exceeds 1 for all four genes, two of them significantly (SlCyp, with p = 0.039 and RY/X = 2.03, and SlXY6a, with p < 0.001 and RY/X = 2.80), while two short sequences (approx. 100 bp analysed) were non-significant (SlXY7, with RY/X = 1.08, and SlXY9, with RY/X = 1.42).

(c) Tests for selection

The overall X/A ratio of diversity estimates is close to the expected value of 0.75. However, diversity varies among both the X-linked and the autosomal genes. The X/A ratio would be affected if some genes in either type of location had been subject to recent selective sweeps, or have high diversity owing to long-term balancing selection, and it is thus possible that the agreement is coincidental. As well as the two autosomal genes with high silent site diversity discussed above, two (E106 and E265) have diversity values even lower than CCLS37.1 (which, as mentioned above, may have undergone a selective sweep). The recent selective sweep at the X-linked SlSsX gene (Filatov 2008) is supported by mlHKA tests among the X-linked loci (see above). Only the two low-diversity autosomal genes (E106 and E265) differ significantly from the other autosomal loci by mlHKA tests. These may thus have undergone recent selective sweeps, though the test is highly significant only for E265 (p = 2 × 10−5; for E106, p = 0.026).

We also detect an unexpectedly high proportion of low-frequency variants, compared with the expectation at equilibrium under neutrality. Tajima's D-values for autosomal genes (using either silent sites or all site types) are mostly negative, often strongly so, but only E304 has an individually significantly negative value (table 3). This overall pattern could be caused by an extreme bottleneck (Tajima 1989a,b), but is more likely to be due to population expansion (Tajima 1993; Innan & Stephan 2000; Ray et al. 2003), which is plausible for this species, which probably expanded after the last ice age. Subdivision may also contribute, since sampling few individuals from many localities can produce negative D-values (Ptak & Przeworski 2002). However, D becomes even more negative when we use a single sequence per population, which produces a standard coalescent process if geographical structure is unimportant (Wakeley & Aliacar 2001; De & Durrett 2007). Geographical structure often produces positive Tajima's D-values, owing to local adaptation, but sampling mostly from a common subpopulation or race, with few individuals from a different race, could produce an unexpected frequency of rare variants. However, the clustering program structure2 (Pritchard et al. 2000) identifies no geographical structure in S. latifolia, using six X-linked genes (G. Muir, R. Bergero, D. Charlesworth & D. A. Filatov 2007–2010, unpublished data).

Tests for selection for autosomal and X-linked genes, using all site types. Statistically significant values (p < 0.05) are shown in bold.

We also applied the DH test for selection, which is less sensitive than other tests to changes in populations' recent demographic history (Zeng et al. 2006). The E304 gene again yielded a significant test result, and two other genes gave marginally significant Fay and Wu's tests (table 3). None of the tests were significant for any Y-linked gene (results not shown). Of the five X-linked genes, selection at the SlSsX locus is confirmed, and is inferred also for SlX7 (table 3). Overall, consistent with the HKA test results described above, there is clearly no convincing evidence for selection in or near most of the loci studied, and thus for excluding loci in estimating the X/A ratio; without the low diversity SlSsX gene, this ratio increases to 0.866, but excluding E265 also, it becomes 0.688.

Finally, we applied haplotype tests, to test for balancing selection (for the two loci with high diversity, E391 and E393, such tests might also detect evidence for paralogous copies). We used DnaSP to run neutral coalescent simulations, using the estimated recombination rate for each locus, and obtain 95% confidence intervals for the expected numbers of haplotypes. The observed values for all loci were within these intervals.

(d) Population subdivision and comparisons between the X and Y

Another possible explanation for high silent site diversity values is population subdivision, but this seems not to be the cause. We analysed diversity separately for the samples from individual populations or from four geographical regions (see §2 and electronic supplementary material, table S2). Despite the large distances between the locations where the plants were sampled, KST values for the autosomal genes (electronic supplementary material, table S4) are not high (the average is 0.125 for silent sites, similar to the value for human populations), with similar values for the different loci. For the sex-linked genes, KST is higher (the X and Y averages are 0.177 and 0.245, reflecting the fact that they both have lower Ne). Estimates using individual localities are higher (electronic supplementary material, table S4), but our sample size of two alleles per locality must considerably overestimate subdivision, because such samples often contain only the commonest allele, and monomorphism elevates KST values (Charlesworth 1998).

4. Conclusions and discussion

Our results establish that diversity is low for Y-linked genes, rather than being unusually high for X-linked ones, since the mean X diversity is slightly lower than the autosomal loci, and close to the value expected with a 1 : 1 sex ratio and the same variance in male and female reproductive success. Hitch-hiking processes on the non-recombining Y chromosome are predicted to lower Y diversity, but a possible alternative reason for low diversity of Y-linked genes is a high variance in male reproductive success. The autosomes should then also have lower diversity than expected, relative to the X, i.e. the X/A ratio should be increased above 0.75 (Laporte & Charlesworth 2002; Ellegren 2009). We therefore conclude that there is no evidence that sexual selection is important in populations of this plant.

Another possibility, however, is a recent change in the demographic history of the population. Changes in population size affect the X and Y chromosomes differently from the autosomes, and can increase or reduce X/A and Y/A diversity ratios for many generations (Fay & Wu 1999; Pool & Nielsen 2007). Rapid growth in population size leads to increased X/A ratios, lasting for considerable times after the size change, whereas a reduction in population size decreases the ratio (the decrease can be severe, but not for long, see Pool & Nielsen 2007). A temporary bottleneck generally lowers X/A ratios, but a very severe bottleneck, followed by growth as the population recovers, can have the opposite effect. The Y/A diversity ratio (and the mitochondrial/autosomal one) are affected in similar ways, but the low effective size of Y-linked genes causes their diversity to recover much faster than for X-linked genes, so that prolonged effects of a bottleneck are less likely to be found.

We therefore compared diversity estimates from two geographical regions, since the size change models predict different ratios in populations whose sizes differ. Although the regions are not strongly isolated (see above), the samples from northern populations (as defined in §2), when analysed separately, tend to have slightly higher diversity than southern ones (14 of 17 genes for which data are currently available), with a south/north diversity ratio for silent sites of 0.78 for autosomal genes (but no difference for the Y-linked sequences). The difference is more pronounced for the X-linked genes: southern populations' silent site diversity is only 56 per cent of that in the northern sample, so that their X/A ratio is much lower (0.35) than the overall value, consistent with a greatly reduced southern population in the recent past. However, Tajima's D-values are not more negative in the southern sample.

A low population size in the recent past, from which the population has not yet recovered, could temporarily greatly reduce diversity for Y-linked genes, relative to X-linked ones, and is thus a possible alternative to hitch-hiking to explain the observed low Y sequence diversity. Figure 2 illustrates this. A larger decrease in size could give an even larger X/Y ratio. On the other hand, any such difference will be lessened if Y genes have a higher mutation rate than X ones (as seems to be the case in S. latifolia, see above). It therefore seems unlikely that a population size change can explain the ratios observed in S. latifolia.

Plot X/Y ratios predicted using equation (4) of Pool and Nielsen (Pool & Nielsen 2007). The calculations assumed the same mutation rate for all chromosomes, and a fivefold decrease (black symbols and line) or increase (grey symbols and line) in population size from an initial value of 10 000.

Our observed higher KST values for Y-linked genes than for other loci, regardless of the spatial scale of the samples used, suggest that the reduction in Y diversity is mainly caused by within-population processes, such as elimination of deleterious mutations, or local adaptations involving Y-linked genes, and is not principally an effect of a species-wide selective sweep owing to an advantageous mutation fixing in a Y-linked gene (which should lead to lower KST for the Y, until equilibrium is re-established). The highest within-population X/Y diversity ratios average 31 for silent sites (electronic supplementary material, table S4). Overall, we therefore estimate that S. latifolia Y-linked sequences have a 20–30-fold reduction in effective population size, consistent with the previous estimates using slightly larger within-population sample sizes, but fewer localities and loci (Laporte et al. 2005). Taking into account the higher mutation rate of Y-linked genes, the low diversity of Y-linked genes is quantitatively consistent with strong within-population hitch-hiking processes associated with the ongoing Y chromosome genetic degeneration (Kaiser & Charlesworth 2009). However, the different kinds of processes are not exclusive, and a species-wide selective sweep could also have contributed to reducing diversity.

Acknowledgements

We are grateful to NERC for 454 sequencing, to BBSRC for funding, Kai Zeng for help with DH tests and A. Betancourt for help with PAML analyses. S.Q. was supported by 973 Programme of China (2007CB815701) and the Chang Hung-Ta Science Foundation of Sun Yat-sen University.