Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University Ames, IA, USA.

Abstract

Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs.

Log-ratio plot of the metabolome of the oxp1 (SALK_078745) mutant. The y-axis plots individual metabolites. The x-axis plots log-transformed relative ratio of abundance of each metabolite in the mutant sample normalized to the levels of that metabolite in the wild-type control sample. The calculation of SE is described in the Section

Hierarchical cluster diagram of mutant alleles from Experiment 3 (E3). The dissimilarity between a pair of genes was computed from the mutant metabolomes using a variance-weighted Manhattan distance measure described in the Section and this distance measurement was used to generate the cluster diagram. The specific mutant allele used to characterize each gene locus, and the GO Molecular Function term that annotates each locus is identified.