You are here

NIH/NCI 341: Development of Metabolomics Data Integration Methods and Software

Fast-Track proposals will be accepted.

Direct to Phase II will not be accepted.

Number of anticipated awards: 2 – 3

Budget (total costs, per award): Phase I: up to $225,000 for up to 9 months

Phase II: up to $1,500,000 for up to 2 years

PROPOSALS THAT EXCEED THE BUDGET OR PROJECT DURATION LISTED ABOVE MAY NOT BE FUNDED.

Summary

Metabolomics is the study of small molecules participating in cellular metabolism. Advances in metabolic profiling technologies have made it possible to simultaneously assay hundreds of metabolites, providing insight into an organism’s metabolic status. Several studies suggest that metabolomics may identify novel biomarkers for a diverse range of disease, including cancer. Furthermore, metabolites may play important regulatory roles in disease pathways and even serve as effectors of disease processes. Metabolomics has only recently been applied to epidemiologic studies, some of which are attempting to leverage existing metabolomics data by establishing consortia such as the COnsortium of METabolomics Studies (COMETS).

There is considerable field-wide interest in the development of algorithms and methods to integrate metabolite data across laboratory platforms and analytical technologies, as is currently done for genetic variation by genome-wide association studies and next-generation sequencing. Advances in this area will help lay the foundation to support the application of metabolomics to epidemiology cohorts and consortia by facilitating replication across cohorts, enabling pooled metabolomics analyses across multiple cohorts, and rapidly scaling up sample sizes for metabolomics studies. This topic will help researchers leverage existing resources to easily compare and combine datasets to detect more subtle and complex associations among variables, thereby promoting greater collaboration, efficiency, and return on investment. In turn, it will enhance our opportunities to identify novel cancer biomarkers.

There are several analytical technologies used in metabolomics, including different separation methods [e.g., gas chromatography (GC), liquid chromatography (LC), and capillary electrophoresis (CE)] and multiple detection methods [e.g., mass spectrometry (MS) and nuclear magnetic resonance (NMR)]. Although MS and NMR are the most widely used detection methods, other methods such as ion-mobility spectrometry and electrochemical detection have been used. These detection methods differ in specificity and sensitivity, resulting in the measurement of metabolites specific to the technology. Additionally, laboratories may use the same analytical technologies, but different sample preparation, which results in the measurement of metabolites specific to the sample preparation. Therefore, there can be distinctly different metabolites measured across laboratory platforms using the same analytical technology. Both the differing analytical technologies and laboratory platforms create a complex pool of data that is challenging to integrate/harmonize without valid and reliable methods that are accessible to the research community. This, in turn, limits the ability to pool and leverage existing data for biomarker discovery.

This topic is intended to develop new and innovative bioinformatic methods to integrate metabolite data across laboratory platforms and analytical technologies and ultimately design scalable software tool(s) that apply these methods to automate the integration of metabolite data.

Project Goals

The purpose of this topic is to support the development of new and innovative methods to integrate metabolite data across analytical technologies and laboratory platforms, and in turn, design software tool(s) applying these methods for data integration.

In the short term, this topic aims to 1) develop bioinformatic methods to integrate metabolite data across various laboratory platforms and analytical technologies, including liquid-chromatography-mass spectrometry (LC-MS), gas chromatography-mass spectrometry (GC-MS), and NMR; and 2) develop scalable software tool(s) to automate these methods for use by the cancer and overall public health research communities. Valid and reliable data harmonization of metabolomics data also builds a critical foundation for the longer term goal of integration of metabolomics data with other ‘omics data (e.g., genomics, proteomics, transcriptomics, epigenomics, etc.). The development of methods to integrate a wide range of -omics data will position the research community to better leverage existing data for the discovery of novel cancer biomarkers of etiology, diagnosis, and prognosis.

Responses to this topic are expected to address the development of efficient bioinformatic methods to:

Demonstrate bioinformatic methods for the integration of metabolite data across different laboratory platforms and analytical technologies with high accuracy;

Store metabolite data from the different data sources in databases that can be easily used for data integration and quality control protocols;

Implement valid quality control (QC) checks; and

Appropriately secure data at each stage of transfer and storage.

An essential task for each proposal is the development of bioinformatic tools in the form of scalable software that can be used by the research community at-large to automate complex data integration tasks for metabolomics data sources.

Phase I activities should provide evidence that metabolite data integration bioinformatic methods, using identified metabolite data, have been effectively developed, can be implemented across data inputs from diverse laboratory platforms and at least two analytical technologies, and demonstrate readiness to proceed to Phase II. Additionally, Phase I will be used to demonstrate the framework for scalable software tool(s) that apply the bioinformatic methods to automate the integration of metabolite data.

Phase I Activities and Deliverables

Establish a project team including proven expertise in metabolomics analytical technologies, epidemiology, biostatistics/bioinformatics, and computer technology. Additionally, a team including expertise in biochemistry/clinical chemistry is preferred.

Participate in the development of a collaboration agreement between the offeror, NCI, and NCI-identified third party sources to access relevant input data types for the proposed project. NCI staff will work with established cohort studies and consortia to provide metabolomics data (identified metabolite data) to successful offerors.

Develop database formats that support the import and export of individual datasets and “combined” datasets, store structured data from different sources of metabolite data, and are readily used for data integration and QC protocols.

Participate in the development of a collaboration agreement between the offeror, NCI, and NCI-identified third party sources to access relevant input data types for the proposed project. NCI staff will work with established cohort studies and consortia to provide metabolomics data (identified metabolites and unidentified peak data) to successful offerors that would serve to: 1) train and validate the expanded bioinformatic methods; and 2) demonstrate the application of these methods through scalable software to automate complex data integration tasks for metabolomics data sources.