Context-sensitive Methods for Learning from Genomic Data (thesis)

Abstract:

Recent developments in biotechnology have enabled high-throughput measurement of several cellular phenomena including gene expression, protein-protein interactions, protein localization, and DNA sequences. The wealth of data generated by this technology promises to support computational prediction of network models, but so far, successful approaches that translate these data into accurate, experimentally testable hypotheses have been limited. This dissertation focuses on machine learning and signal processing approaches that utilize contextual clues often inherent in genomic data to extract useful information and make precise predictions.

First, we describe methods for using microarray technology to detect chromosomal aberrations. Amplification and deletion of portions of chromosomes often serve as a mechanism of rapid adaptation and have been associated with numerous cancers. Accurate and precise identification of when and where these changes occur will help us understand this important adaptive mechanism and is an important step towards effective cancer treatment.

Secondly, we address the more general problem of integrating diverse types of functional genomic data to understand gene function and predict biological networks. We demonstrate that Bayesian methods can leverage unique noise characteristics of genomic data to predict accurate network models. We illustrate the practical use of these methods in a web-based system that supports intelligent exploration of large repositories of noisy genomic data. We have used this system to generate specific hypotheses about previously uncharacterized genes, many of which have been confirmed through experimental validation.

Finally, this dissertation addresses the question of how to use machine learning methods to direct genome-scale experiments. Until now, most bioinformatics methods have been used exclusively downstream of data-generating experiments. Here, we discuss approaches for using computational predictions to actually direct further large-scale experiments. We demonstrate that such approaches can dramatically improve the efficiency with which we use high-throughput genomic technology and, ultimately, help us to discover more novel biology.