Thursday, December 5, 2013

Abstract for Talk at University of Connecticut Health Center (i0ccam)

Human Genome Analysis

Plummeting sequencing costs have led to a great increase in the number
of personal genomes. Interpreting the large number of variants in
them, particularly in non-coding regions, is a central challenge for
genomics. We investigate patterns of selection in DNA elements from
the ENCODE project using the full spectrum of sequence variants from
1,092 individuals in the 1000 Genomes Project Phase 1, including
single-nucleotide variants (SNVs), short insertions and deletions
(indels) and structural variants (SVs). We analyze both coding and
non-coding regions, with the former corroborating the latter. We
identify a specific sub-group of non-coding categories that exhibit
very strong selection constraint, comparable to coding genes:
"ultra-sensitive" regions. We also find variants that are disruptive
due to mechanistic effects on transcription-factor binding (i.e.
"motif-breakers").

We make great use of networks -- contrasting them with linear
annotation -- and describe how we construct a practical instantiation
of the human regulatory network. Using connectivity information
between elements from protein-protein interaction and regulatory
networks, we find that variants in regions with higher network
centrality tend to be deleterious. Indels and SVs follow a similar
pattern as SNVs, with some notable exceptions (e.g. certain deletions
and enhancers).

Using these results, we develop a scheme and a practical tool to
prioritize non-coding variants based on their potential deleterious
impact. As a proof of principle, we experimentally validate and
characterize a small number of candidate variants prioritized by the
tool. Application of the tool to ~90 cancer genomes (breast, prostate
and medulloblastoma) reveals ~100 candidate non-coding cancer drivers.
This approach can be readily used in precision medicine to prioritize
variants.