You are here

Ph.D. and Postdoc positions, Antibiotics Discovery by Metabolomics and Metagenomics

Contact

Project description

Microbiomes are the communities of microorganisms living on or in animals, plants, soil, oceans and the atmosphere. Recently, microbiome has been linked to various diseases including Crohn disease, diabetes, asthma and obesity.

Two key techniques for analyzing microbiome are metagenomics and metabolomics. Metagenomics reveals the type of microbes living in microbial community, and metabolomics reveals the chemical compounds these microbes are producing and their function. The goal of this project is to develop statistical and computational approaches for analyzing large scale metabolomics and metagenomics datasets to answer fundamental questions in microbiology, especially those surrounding the interaction between microbes and human host, and combating antibiotic resistance.

It is now clear that microbes are producing a much more diverse set of antibiotics than it was anticipated, making them a potential gold mine for discovering new antibiotics. Antibiotics are the product of biosynthetic gene clusters (BGCs). Machine learning techniques have been developed for predicting BGCs in the bacterial genomes. While these techniques provide a partial prediction of the structure of the BGC product from its DNA sequence, antibiotic discovery requires connecting these error-prone predictions to their molecular structure using metabolomics data, such as mass spectrometry.

Analyzing metagenomics and metabolomics datasets involves solving computational puzzles from error-prone pieces. The metagenomic puzzle amounts to assembling metagenomes from billions of overlapping DNA substrings (called short reads). In error-free case, this problem is equivalent to finding Eulerian path in a giant de bruijn graph. The metabolomic puzzle amounts for breaking the complex antibiotic molecule (expressed as a graph) into overlapping subgraphs, and measuring mass of these subgraphs using MS. In the case of path subgraphs, this is equivalent to the turnpike problem in computer science, where each chemical bond broken by MS stands for the location of a highway exit, and the mass of the substructure constructed from breaking each pair of chemical bonds stands for the pairwise inter-exit distances. The turnpike problem reconstructs the locations of highway exits from their pairwise distances, and has a pseudo-polynomial solution. In both metagenomics and metabolomics puzzles, it is crucial to implement steps to make sure these theoretical solutions hold for error-prone biological data.

The puzzle we are interested in is to develop error-tolerant approaches to connect metagenomics and metabolomics datasets for discovering novel antibiotics. Machine learning predictions of antibiotic structure from metagenomes are error-prone, especially for the less studied microbes for which smaller training datasets are available. This problem resembles the problem of demodulating digital data from an analog signal corrupted by noise in telecommunications. Since Viterbi decoding provides the optimal solution to the demodulation problem, it can be used for connecting metabolomics data to metagenomic data allowing for variations.