Research

Overview

We study stem cell based developmental biology with original computational methods that build predictive models from high-throughput experiments. We design these experiments with our collaborators in other laboratories to reveal key events during development, including dysfunctions that can lead to human disease. In addition, we are interested in the genetic foundations of human disease, and study the broad question of how an individual's genotype influences their phenotype.

Current focus area of our laboratory

Motor Neuron Development and Disease (www.stemcell.mit.edu)
Our laboratory leads an interdisciplinary project that seeks to build computational models of the transcriptional regulatory networks that control the differentiation of neural cells. Elucidating these regulatory networks will enable us to define the regulatory processes that determine a cell's progress to its terminally differentiated state, and understand developmental defects that cause debilitating human diseases such as Spinal Muscular Atrophy. We develop new computational methods for elucidating transcriptional regulatory networks based on the integration of diverse high-throughput experimental data (genome sequence, chromatin structure, transcription factor-DNA binding, gene expression). These methods provide a powerful foundation for discovering the regulatory network control that controls cell differentiation during development.

Pancreatic Development (www.syscode.org)
We have developed engineered mouse stem cell lines and computational models of pancreatic development to gain insight into potential therapeutics for diabetes. Our stem cell work is identifying in vitro differentiation protocols to create pancreatic progenitors, and we are experimentally elucidating the molecular events that occur during the development of these progenitors using a variety of high-throughput technologies (RNA Seq, ChIP Seq, Mass Spectrometry). Data from these experiments are processed with computational methods developed by our laboratory to reveal biological mechanisms for further exploration.

The Genotype to Phenotype Problem
Working with other laboratories we have discovered that different individuals of the same species can require different sets of genes for survival. Genes that are differentially required for survival are called conditional essential genes. Our work uses a yeast model system that permits us to identify the genetic suppressors that permit one strain to survive without a gene that is necessary for the survival of another strain. Ultimately we aim to elucidate a computational description of the genetic variants that produce a common phenotype using new approaches that reveal complex genetic interactions.

Projects

Transcription factor organization during cellular reprogramming
The ability to reprogram cells from one type to another presents a powerful tool to diverse areas of research and medicine. We study the interplay between genomic sequence, transcription factor binding, and chromatin architecture during cell state change, with the goal of composing simple mechanistic models that explain transcription factor binding dynamics and which can be used in reprogramming systems. We characterize this interplay using DNase-seq, ChIP-Seq, ChIA-PET, and RNA-seq data, focusing on developmental and stem cell differentiation systems along the pancreatic lineage.

Detecting high resolution chromatin interactions from high throughput sequencing data
The primary aim of this project is to better understand the regulation of
gene expression through the application of novel computational methods to
high throughput sequencing data. In particular, recent work has focused on
improving the fidelity and resolution of chromatin interactions learned
from ChIA-PET data. We are currently working in collaboration with
experimental biologists to characterize the dynamics of chromatin
interactions during cellular differentiation.

High resolution analysis of regulatory genome grammars: discovery, modeling
and testing
The goal of this project is to develop computational methods to discover
human genome regulatory elements at high spatial resolution from high
throughput sequencing data such as ChIP-Seq, DNase-Seq and RNA-Seq, to learn
models of the regulatory genome grammars, and to test these grammars
experimentally using massively parallel reporter assay (MPRA) to further
improve the grammar models. A deeper understanding of regulatory genome
grammars is important in elucidating the mechanisms of gene regulation and
interpreting the functional role of regulatory genetic variations in health
and diseases.

Computational genetics for model organisms
This project focuses on machine learning and statistical approaches to problems in genetics (model organism and human) and molecular biology. One application is a collaborative project investigating the genetic sources of phenotypic variability in yeast. This involves developing models of genetic complexity and designing and analyzing high-throughput sequencing experiments.

Computational detection of somatic variation
Studies have shown that somatic cells do not exhibit the same genotype. One possible explanation for this somatic mosaicism is that it is caused by genomic changes occurring over the course of development. We are using high-throughput sequencing data to test this hypothesis and identify particular developmentally programmed variants. In general, we are interested in computational methods for understanding regulatory genomics.

Computational prediction of chromatin controlling factors
We are conducting work on lineage-structured DNase-seq data. This work analyzes the transcription factor binding patterns across a variety of cell types. We discovered a new class of transcriptional factors which increase chromatin accessibility in a local region, which gives us a way to predict changes to chromatin over time.

Statistical correction for high-throughtput sequencing
We are developing methods to take advantage of correlations within and between high-throughtput sequencing experiments. This work formalizes the notion that the Poisson distribution, commonly used in sequencing data analysis, does not fit real world sequencing data well. Instead of suggesting that people use some type of more complicated distribution, we developed a method which can preprocess and re-weight data so that existing Poisson based pipelines work correctly.