Genome sequence of Gelsemium sempervirens

We have sequenced and annotated the genome of Gelsemium sempervirens. The analysis of the genome is described in the paper "Gene discovery in Gelsemium highlights conserved gene clusters
in monoterpene indole alkaloid biosynthesis" (Franke et al., 2018). The genome and annotation can be downloaded from the Dryad Digital Repository (https://doi.org/10.5061/dryad.08vv50n).

Genome sequence of Calotropis gigantea

We have generated the genome sequence of Calotropis gigantea, a producer of anti-cancer and anti-malarial cardenolides, using whole genome shotgun sequencing. The analysis of the genome is described in Hoopes et al. 2017. We generated an assembly of 157 Mb with an N50 scaffold size of 806 kbp and annotated a total of 18,197 high-confidence genes. The genome can be downloaded Dryad Digital Repository and analysis tools including a genome browser and BLAST search site have been constructed.

Genome sequence of Camptotheca acuminata

We have generated the genome sequence of Camptotheca acuminata also known as the "Happy Tree" using whole genome shotgun sequencing. The analysis of the genome is described in Zhao et al. 2017. We generated an assembly of 403 Mb with an N50 scaffold size of 1.75 Mb and annotated a total of 31,825 genes. The genome can be downloaded Dryad Digital Repository and analysis tools including a genome browser and BLAST search site have been constructed.

Genome sequence of Catharanthus roseus - Update Version 2 available

We have generated the genome sequence of Catharanthus roseus variety Sunstorm Apricot using whole genome shotgun sequencing. The analysis of the genome is described in Kellner et al. 2015. We generated an assembly of 523 Mb with an N50 scaffold size of 26.2 kbp and annotated a total of 33,829 genes. The genome can be downloaded Dryad Digital Repository and analysis tools including a genome browser and BLAST search site have been constructed.

NIH-Funded Transcriptome and Metabalome Study

Release of the first Metabolomic Dataset

The Medicinal Plant Metabolomics Resource website, the metabolomics partner of the Medicinal Plant Genomics Resource, is available with metabolomics data from Atropa belladonna and Digitalis purpurea. Metabolite data from the remaining 12 medicinal plant species covered by this project will be made available soon. A press release is available with complete information about this metabolite resource.

Release of the Final Version of the Transcriptome Data

Methods for transcriptome assembly as performed in the MPGR project are described in Gongora-Castillo et al. 2012. Analysis of the Camptotheca acuminata, Catharanthus roseus, and Rauvolfia serpentina transcriptomes are described in PLoS ONE and analysis of the Valeriana officinalius transcriptome assembly is described in Yeo et al. 2013. For more information about the release of the Final Version of the transcriptome data, please see the press release.

Release of Version 1 of the Transcriptome Data

For more information about the release of Version 1 of the transcriptome data, please see the press release.

About Medicinal Plant Genomics Resource

Natural products from plants serve as rich resources for drug development with almost 100 plant-derived compounds in clinical trials in 2007. Plant derived natural products have had a profound and lasting impact on human health and include compounds successfully used for decades such as digitalis, Taxol, vincristine, and morphine isolated from foxglove, periwinkle, yew, and opium poppy, respectively. The enormous structural diversity and biological activities of plant-derived compounds suggest that additional, medicinally relevant compounds remain to be discovered in plants.

While plant natural products continue to be a prime target for drug development, as evidenced by the number of ongoing clinical trials, the clinical potential of these compounds is often curtailed due to low production levels in plant species. For example, use of the blockbuster drug Taxol almost stopped in the early 1990's because the primary source, yew tree bark, could not be used as a sustainable source of the drug. In this particular instance, a Taxol precursor happened to be more readily available in a renewable part of the tree, and a semi-synthetic protocol could be developed to convert it into the drug. While fortuitous, more generalized solutions, such as metabolic engineering of effective plant and microbial production platforms, are urgently needed to ensure that the wealth of bioactive compounds found in plants enter the clinical pipeline and find widespread use in medicine.

High throughput transcriptome sequencing approaches provide a straightforward means for accessing the gene content in organisms with large genomes (i.e. > 100 Mb). Essentially any tissue (independent of genome size and availability of genetic or molecular tools in the organism) can be used to generate cDNAs from mRNA populations and sequenced to generate Expressed Sequence Tags (ESTs) that are assembled into a non-redundant set of sequences (contigs and singleton ESTs) to represent the transcriptome. The transcriptome sequences are then annotated for putative function using a suite of bioinformatic approaches such as sequence searches of protein databases, motif/domain identification, biochemical pathway mapping, and subcellular localization predictions. Transcript abundance data can also be used to provide in-depth expression profiles of individual genes on a per tissue/treatment basis. The deduced function, coupled with expression frequency, can facilitate identification of candidate genes pertinent to the pathway of interest as well as non-pathway targets (e.g. primary/intermediary metabolism) whose expression is consistent with synthesis of compounds.

November 26, 2013

December 5, 2011

Expression data released for 14 medicinal plants. Expression levels for the representative transcript (the longest transcript isoform) are provided from an array of tissues that were sequenced using RNA-seq for expression abundances. Expression levels are provided in FPKM values (Fragments per kilobase per transcript per million mapped reads). The information can be downloaded from the MPC Download site.

November 1, 2011

Metabolomics data released for Atropa Belladonna and Digitalis purpurea. The information is available via Metabolomics.

October 31, 2011

We have released a set of functional annotation search tools for the assembled transcriptomes for each of the 14 medicinal plant species using keywords, Pfam domains, and sequence identifiers. The transcript and predicted protein sequences for the transcript assemblies are provided. Note that the assembly process using Velvet and Oases generates isoforms and each isoform has been annotated. Functional annotation includes alignments to UniRef, identification of InterPro domains, alignments to Arabidopsis thaliana genes, and alignment to ESTs and peptides from existing public sequences for these 14 medicinal plants. Expression levels for the representative transcript (the longest transcript isoform) are provided from an array of tissues that were sequenced using RNA-seq for expression abundances. Expression levels are provided in the form of FPKM values (Fragments per kilobase per transcript per million mapped reads). Note alternative isoforms generated from Oases are not annotated for expression abundances