Core B will provide database infrastructure and coordinate data deposition to public resources such as INSDC (the International Nucleotide Sequence Database Collaboration), as well as coordinate the data for the project overall to facilitate comparison with other datasets and deployment of advanced algorithms. It will build on the Knight lab's extensive experience with meta-analysis, sequence databases, and data visualization to provide these methods to Project 1, Project 2, and Project 3, and will work closely with Core A to mirror metabolomics data and integrate metabolomics datasets with the rest of the multi-omic data to be collected. Core B has three Aims. Aim 1 -organize the data and metadata collected in Project 1 from mice and Project 2 from humans, curate these datasets, and ensure that analyses are reproducible in an automated fashion using virtual machines. Aim 2 -deposit the data and metadata in standards-compliant form to INSDC, the Gene Expression Omnibus, and other resources (e.g., metabolomics repositories) as they emerge. Aim 3 -Through analyses of existing microbiome datasets, provide best-practices recommendations to investigators in Project 1 and Project 2 to optimize experimental design. Core B will build on an extensive multi-omics data repository funded by multiple sources that is able to accommodate the types of data to be collected in the project overall, including links between human subjects with defined family relationships (e.g. dizygotic twins), humanized gnotobiotic mice colonized with strains derived from these human subjects, timeseries study designs in both humans and mice, combinations of data at multiple levels including 16S rRNA gene sequencing, RNA-Seq, and metabolomics, and other advanced features of this complex project. A key component of our approach is to enable investigators in the laboratory collecting the datasets to perform their own first-pass analyses while at the same time making the data available more broadly within the project for additional advanced analyses, such as those being developed in Project 3, to be applied, and also making the data available to the public in a relatively user-friendly form to supplement public deposition in permanent government-backed sequence data repositories.

Public Health Relevance

Efforts to characterize the human gut microbiome in health and disease are producing vast amounts of data about its organismal and gene content and variations. Storing this complex data in a database and depositing it for public use is critical for making the information generally available. We will organize the data and deposit it in major sequence repositories in order to support efforts to gain understanding about the role of microbes in obesity, and guiding preclinical tests for microbiome-directed therapeutics.