Affiliation: Institute for Plant Protection, Department of Biology, Agricultural and Food Sciences, The National Research Council of Italy (CNR), Sesto Fiorentino, Italy.

ABSTRACTQuercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG) resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs) were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.

pone-0112487-g006: SSRs distribution in the leaf transcriptome of Q.pubescens.

Mentions:
Simple Sequence Repeat (SSR) markers, also known as microsatellites, are short repeat DNA sequences of 2–6 base pairs, which are important for research involving population genetic structuring, demography, relatedness, and the genetic basis of adaptive traits [58], [59]. In this study, a total of 14,202 putative SSRs were identified in 96,006 assembled transcripts, of which, 3,366 were compound SSRs. The SSRs included 7,011 (49.3%) dinucleotide motifs, 3,478 (24.5%) trinucleotide motifs, 212 (1.5%) tetranucleotide motifs, 65 (0.5%) pentanucleotide motifs and 72 (0.5%) hexanucleotide motifs (Figure 6). The most abundant repeat type was (AG/CT) for dinucleotide SSR and (GAA/TTC) for trinucleotide SSR. The observed frequency of SSR (14.8%) was slightly lower than that to that observed in related Quercus spp., (18.6% and 23.7% in [60] and [28], respectively). Surprisingly, di-nucleotide repeats were the most common SSRs in our transcriptome (49.3%), with tri- and tetra-nucleotide repeats being present at much smaller frequencies, in contrast to the most frequent motif (tri-SSRs) found in Q. robur and Q. petraea[28], [60] and in agreement with P. contorta[24]. Based on the 14,202 SSRs, 10,864 primer pairs were successfully designed using Primer3: information on the contig identification (ID), marker ID, repeat motive, repeat length, primer sequences, positions of forward and reverse primers, and expected fragment length are included in Table S3. Twenty microsatellites were randomly selected (15 dinucleotide and 5 trinucleotide SSRs) for PCR amplification in two individuals: 17 (85%) were effectively amplified producing fragments of the expected size, validating the quality of the assembly and the utility of the SSRs herein identified (validated primer pairs are highlighted in Table S3). To confirm marker usability and characterize the selected seventeen SSR markers for variation a total of 8 individuals from two Italian populations (four from Spello and four from Volterra) were analysed. All selected SSRs displayed consistent patterns, eleven loci were polymorphic and six monomorphic (the absence of polymorphism might be due to the small sample size). Primer sequences, repeat motifs and detected alleles are shown in Table S4. Similar research carried out using Illumina sequencing technology in sesame showed that about 90% primer pairs successfully amplified DNA fragments [39]. High-throughput transcriptome sequencing showed to be superior resources for the development of such markers not only because of the enormous amount of sequence data in which markers can be identified, but also because discovered markers are gene-based. Such markers are advantageous because they facilitate the detection of functional variation and the signature of selection in genomic scans or association genetic studies [61], [62]. Transcript-based SSRs are advantageous compared to SSRs in non-transcribed regions owing to their higher amplification rates and cross-species transferability [63]. Currently, although many SSR markers were identified in the Fagaceae family, only a few SSR markers were reported in Q. pubescens[64]. The predicted SSRs from the assembled transcriptome of Q. pubescens, will likely be of value for genetic analyses of Q. pubescens and other related non-model plants.

pone-0112487-g006: SSRs distribution in the leaf transcriptome of Q.pubescens.

Mentions:
Simple Sequence Repeat (SSR) markers, also known as microsatellites, are short repeat DNA sequences of 2–6 base pairs, which are important for research involving population genetic structuring, demography, relatedness, and the genetic basis of adaptive traits [58], [59]. In this study, a total of 14,202 putative SSRs were identified in 96,006 assembled transcripts, of which, 3,366 were compound SSRs. The SSRs included 7,011 (49.3%) dinucleotide motifs, 3,478 (24.5%) trinucleotide motifs, 212 (1.5%) tetranucleotide motifs, 65 (0.5%) pentanucleotide motifs and 72 (0.5%) hexanucleotide motifs (Figure 6). The most abundant repeat type was (AG/CT) for dinucleotide SSR and (GAA/TTC) for trinucleotide SSR. The observed frequency of SSR (14.8%) was slightly lower than that to that observed in related Quercus spp., (18.6% and 23.7% in [60] and [28], respectively). Surprisingly, di-nucleotide repeats were the most common SSRs in our transcriptome (49.3%), with tri- and tetra-nucleotide repeats being present at much smaller frequencies, in contrast to the most frequent motif (tri-SSRs) found in Q. robur and Q. petraea[28], [60] and in agreement with P. contorta[24]. Based on the 14,202 SSRs, 10,864 primer pairs were successfully designed using Primer3: information on the contig identification (ID), marker ID, repeat motive, repeat length, primer sequences, positions of forward and reverse primers, and expected fragment length are included in Table S3. Twenty microsatellites were randomly selected (15 dinucleotide and 5 trinucleotide SSRs) for PCR amplification in two individuals: 17 (85%) were effectively amplified producing fragments of the expected size, validating the quality of the assembly and the utility of the SSRs herein identified (validated primer pairs are highlighted in Table S3). To confirm marker usability and characterize the selected seventeen SSR markers for variation a total of 8 individuals from two Italian populations (four from Spello and four from Volterra) were analysed. All selected SSRs displayed consistent patterns, eleven loci were polymorphic and six monomorphic (the absence of polymorphism might be due to the small sample size). Primer sequences, repeat motifs and detected alleles are shown in Table S4. Similar research carried out using Illumina sequencing technology in sesame showed that about 90% primer pairs successfully amplified DNA fragments [39]. High-throughput transcriptome sequencing showed to be superior resources for the development of such markers not only because of the enormous amount of sequence data in which markers can be identified, but also because discovered markers are gene-based. Such markers are advantageous because they facilitate the detection of functional variation and the signature of selection in genomic scans or association genetic studies [61], [62]. Transcript-based SSRs are advantageous compared to SSRs in non-transcribed regions owing to their higher amplification rates and cross-species transferability [63]. Currently, although many SSR markers were identified in the Fagaceae family, only a few SSR markers were reported in Q. pubescens[64]. The predicted SSRs from the assembled transcriptome of Q. pubescens, will likely be of value for genetic analyses of Q. pubescens and other related non-model plants.

Bottom Line:
These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance.We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq.Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.

Affiliation:
Institute for Plant Protection, Department of Biology, Agricultural and Food Sciences, The National Research Council of Italy (CNR), Sesto Fiorentino, Italy.

ABSTRACTQuercus pubescens Willd., a species distributed from Spain to southwest Asia, ranks high for drought tolerance among European oaks. Q. pubescens performs a role of outstanding significance in most Mediterranean forest ecosystems, but few mechanistic studies have been conducted to explore its response to environmental constrains, due to the lack of genomic resources. In our study, we performed a deep transcriptomic sequencing in Q. pubescens leaves, including de novo assembly, functional annotation and the identification of new molecular markers. Our results are a pre-requisite for undertaking molecular functional studies, and may give support in population and association genetic studies. 254,265,700 clean reads were generated by the Illumina HiSeq 2000 platform, with an average length of 98 bp. De novo assembly, using CLC Genomics, produced 96,006 contigs, having a mean length of 618 bp. Sequence similarity analyses against seven public databases (Uniprot, NR, RefSeq and KOGs at NCBI, Pfam, InterPro and KEGG) resulted in 83,065 transcripts annotated with gene descriptions, conserved protein domains, or gene ontology terms. These annotations and local BLAST allowed identify genes specifically associated with mechanisms of drought avoidance. Finally, 14,202 microsatellite markers and 18,425 single nucleotide polymorphisms (SNPs) were, in silico, discovered in assembled and annotated sequences. We completed a successful global analysis of the Q. pubescens leaf transcriptome using RNA-seq. The assembled and annotated sequences together with newly discovered molecular markers provide genomic information for functional genomic studies in Q. pubescens, with special emphasis to response mechanisms to severe constrain of the Mediterranean climate. Our tools enable comparative genomics studies on other Quercus species taking advantage of large intra-specific ecophysiological differences.