Technical Abstract:
Copy number variation (CNV) is abundant in livestock, differing from SNPs in extent, origin and functional impact. Despite progress in CNV discovery, the nucleotide resolution architecture of most CNVs remains elusive. Using modified forms of open-source variant detection software packages, we have created a high throughput pipeline that detects SVs from paired-end NGS data in an efficient manner. Our pipeline uses previously reported SV detection strategies evaluated in the 1000 genomes pilot project such as read depth (RD), read pair (RP), split read (SR) and local assembly (AS). We also use an algorithm called "Precision Aware Merger" (PAM) to combine rare overlapping events and resolve conflicts in the output. As a demonstration of the power of this combined approach to variant discovery, we present preliminary data obtained from a survey of 14 water buffalo individuals. We show that our SV detection methods are complementary and do not share a significant overlap. We are doing extensive experimental validation using CGH arrays, qPCR experiments and SNP arrays. Our analytical framework and the results it produces will serve as resources for future genome resequencing and animal selection studies.