Abstract

Background

Copy number variants (CNVs), including deletions, amplifications, and other rearrangements,
are common in human and cancer genomes. Copy number data from array comparative genome
hybridization (aCGH) and next-generation DNA sequencing is widely used to measure
copy number variants. Comparison of copy number data from multiple individuals reveals
recurrent variants. Typically, the interior of a recurrent CNV is examined for genes
or other loci associated with a phenotype. However, in some cases, such as gene truncations
and fusion genes, the target of variant lies at the boundary of the variant.

Results

We introduce Neighborhood Breakpoint Conservation (NBC), an algorithm for identifying
rearrangement breakpoints that are highly conserved at the same locus in multiple
individuals. NBC detects recurrent breakpoints at varying levels of resolution, including
breakpoints whose location is exactly conserved and breakpoints whose location varies
within a gene. NBC also identifies pairs of recurrent breakpoints such as those that
result from fusion genes. We apply NBC to aCGH data from 36 primary prostate tumors
and identify 12 novel rearrangements, one of which is the well-known TMPRSS2-ERG fusion
gene. We also apply NBC to 227 glioblastoma tumors and predict 93 novel rearrangements
which we further classify as gene truncations, germline structural variants, and fusion
genes. A number of these variants involve the protein phosphatase PTPN12 suggesting
that deregulation of PTPN12, via a variety of rearrangements, is common in glioblastoma.

Conclusions

We demonstrate that NBC is useful for detection of recurrent breakpoints resulting
from copy number variants or other structural variants, and in particular identifies
recurrent breakpoints that result in gene truncations or fusion genes. Software is
available at http://http.//cs.brown.edu/people/braphael/software.htmlwebcite.