Legumes play vital roles in maintaining nitrogen circle of biosphere and in agriculture for their unique ability to carry out symbiotic nitrogen fixation (SNF) through endosymbiotic interactions with bacteria in root nodules. This vital feature cannot be studied using the model plant, Arabidopsis thaliana. Aside from the root nodulation and nitrogen fixation symbiosis with rhizobia, legumes possess many unique features that are not found in A. thaliana, such as mycorrhization, compound leaf development, protein-rich physiology, a profuse secondary metabolism, glandular trichome development, and border cells in roots.

We present LegumeIP, an integrative database platform for comparative genomics and transcriptomics of model legumes to facilitate the study of gene function and genome evolution in legumes, and ultimately to generate molecular-based breeding tools to improve crop legumes.

reconstruction of gene family and gene family-wide phylogenetic analysis across the five hosted species. LegumeIP features comprehensive search and visualization tools to enable the flexible query on gene annotation, gene family, synteny, relative abundance of gene expression.

TrichOME is the integrated genomic database of genes and metabolites expressed in plant trichomes. Comprehensive data hosted in the TrichOME were mainly generated through a NSF-funded project (Award #0605033) and also collected from various public resources, e.g. from NCBI's sequence repositories and ArrayExpress database. TrichOME hosts integrated information including

EST sequences: TrichOME hosts 564,666 ESTs sequenced from 9 species, 107 libraries. These ESTs were sequenced from trichome and non-trichome control tissues; the later were included for Unigene assembling and comparative genomics analysis. These ESTs were assembled into 34,406 trichome-related Unigenes by species and further annotated on the basis of UniProtKB / TrEMBL, Gene Ontology database, KEGG pathway database, TCDB transporter database and transcription factor database. We also implemented an in-silicon gene expression analysis tool for searching trichome-specific genes.

Microarray hybridizations: TrichOME hosts 47 hybridizations from Medicago truncatula, an important model legume and Medicago sativa (Alfalfa). These hybridizations were performed on glandular trichome, non-glandular trichome and control tissues using Affymetrix Medicago gene chip of 61,278 probe-sets. Both raw hybridization signals and pre-normalized expression signals are available for batch download and individual search. A set of tools were also developed to facilitate the analysis and mining of trichome-related genes.

Metabolome Profiling: The TrichOME currently hosts 24 metabolite profiles (~720 tentative peaks) of trichome from Medicago sativa. Three biological repeats of each trichome extract were performed to profile metabolites accumulated in trichomes using UPLC/MS Analysis for Secondary Metabolites, GC/MS Analysis for Polar Metabolites, GC /MS Analysis for non-Polar Metabolites (no hydroxylation) and GC/MS Analysis for non-Polar Metabolites (hydroxylation). We are adding more metabolite profiles as our project progress.

Literature Mining and Curation: We mined and curated over 1,000 literatures to identify trichome-related genes. To date, we've analyzed ~ 200 trichome-related genes from published literatures.

Medicago truncatula is a model or reference species for legume genetics, genomics, and breeding. To leverage the genome sequencing that is currently being done for this species, we have developed a compendium or "atlas" of gene expression profiles for the majority of M. truncatula genes covering all its major organ systems (roots, nodules, stems, petioles, leaves, vegetative buds, flowers, seeds and seed pods) with detailed developmental time-series for nodules and seeds, using the Affymetrix Medicago Gene Chip ®. In the future, these data will be supplemented with transcriptome data from plants subjected to various kinds of abiotic and biotic stresses and data from specific cell and tissue types. We anticipate that these data will aid gene function determination, biological discovery, and molecular breeding efforts.

The study of transcription factors plays an important role in the understanding of the molecular mechanism of gene regulation as a whole in plants. While several transcription factor databases of Arabidopsis have being actively developed, there is no reported work on the model legume species, Medicago truncatula, of which the gene-space genomic sequencing will be completed in the near future.

We developed a pipeline and a relational database for the prediction of Medicago truncatula transcription factors. The prediction was based on transcription factor binding sites and Hidden Markov Models (HMM). The models were built mainly on documented transcription factors and their family information in Arabidopsis and a small number of known transcription factors from legume (soybean, Alfalfa and Medicago truncatula). The prediction was made on the putative genes released by the International Medicago Genome Annotation Group (IMGAG). The prediction results were further grouped into multiple families and sub-families according to the classification of their functions. We also annotated a number of transcription factors that are related to nodule mechanism. The pipeline is being periodically executed to synchronize with the monthly release of IMGAG gene calls and daily update of Medicago truncatula genomic assembly.

Medicago truncatula is a model species for legume genetics, genomics, and breeding.To enable efficient forward- and reverse-genetics studies with this species, large scale mutants are being generated in the Noble Foundation. Currently, Medicago truncatula mutant database hosts two different mutant populations: (i) a Fast Neutron Bombardment (FNB) deletion mutant population of approximately 80,000 M1 lines; and (ii) a Tnt1 retrotransposon insertion mutant population of over 9,000 lines containing an estimated 200,000 random insertions within the genome. Phenotypic information for many of these mutants and Tnt1 Flanking Sequence Tags (FSTs) are available. In addition, FSTs associated with specific Tnt1 lines are contained in a BLAST-able dataset. The distribution of FSTs, gene models, tentative consensus sequences (TCs), and Medicago Affymetrix probe sets across the M. truncatula genome are searchable and visualized by the genome viewer GBrowse. Gene expression data from the Medicago Gene Expression Atlas project has also been integrated into the database, which allows users to examine gene expression profiles for genes tagged by FSTs.