Description of Research Expertise

Dr. Stoeckert directs the Computational Biology and Informatics Laboratory. The goal of our work is to help make sense of the enormous amount of biomedical data generated by high-throughput genomic approaches and synthesize them into something more than the sum of the parts. To that end, we are developing tools that enable researchers to mine and integrate data from a variety of different sources and types of experiments.

The first step in that process is the development of data warehouses that collect and store information in a useable fashion. In one such project, we have been working with David S. Roos, Ph.D., E. Otis Kendall Professor of Biology at Penn, and Jessica Kissinger, Ph.D., at University of Georgia, to develop a bioinformatics resource center for eukaryotic pathogens, funded by the National Institute of Allergy and Infectious Diseases. Within the resource center, we have built databases that serve research communities interested in specific pathogens. For example, PlasmoDB, houses information on the parasites that cause malaria.

To maximize the utility of data warehouses, we must have ways to represent and store data that enables researchers to make connections between experiments and between data from different types of experiments. Therefore, part of my group is involved in knowledge representation and developing ontologies, which standardizes data through the use of controlled vocabularies and relationships. Our goal is to provide the tools, including ontologies, to allow people to annotate their experiments or mark up their papers in a way that another researcher could efficiently search for and combine particular kinds of results from a variety of sources.

We work with a number of groups on ontology projects, including the Ontology for Biomedical Investigations Consortium which is a member of the Open Biological and Biomedical Ontologies (OBO) Foundry. I have also been involved in a number of standards projects over the years, and am currently on the board of the FGED society, which promotes data sharing and standardized representation of data, particularly from genomic experiments.

In addition to building systems that help other researchers maximize the value of their data, my team is involved in model building and network analysis with the aim of discovering new insights into biology. One area we focus on is type I diabetes. As a member of the Beta Cell Biology Consortium, we have established a data warehouse that houses datasets from consortium members. Additionally, our role has been to help the consortium integrate information from those datasets, as well as from key datasets from outside consortium, and to put those data into the context of beta cell development and diabetes.

For example, while many researchers look at the list of genes produced in a microarray experiment, we try to go beyond list making and use computational methods to uncover connections between genes. To do that, we are developing networks of genes based on expression data and information from a variety of other sources, including published information on known interactions and computational analyses that predict interactions between genes. Once we have that data, we can start to visualize interacting partners and show where and when they are important in beta cell function and development.

The approaches and tools we develop in one research arena can often be applied to another one. For instance, we are applying our data integration and analysis approaches to high-throughput sequencing data, including RNA-seq and ChIP-seq data.