Abstract

Background

Copy number variants (CNVs) account for a large proportion of genetic variation in
the genome. The initial discoveries of long (> 100 kb) CNVs in normal healthy individuals
were made on BAC arrays and low resolution oligonucleotide arrays. Subsequent studies
that used higher resolution microarrays and SNP genotyping arrays detected the presence
of large numbers of CNVs that are < 100 kb, with median lengths of approximately 10
kb. More recently, whole genome sequencing of individuals has revealed an abundance
of shorter CNVs with lengths < 1 kb.

Results

We used custom high density oligonucleotide arrays in whole-genome scans at approximately
200-bp resolution, and followed up with a localized CNV typing array at resolutions
as close as 10 bp, to confirm regions from the initial genome scans, and to detect
the occurrence of sample-level events at shorter CNV regions identified in recent
whole-genome sequencing studies. We surveyed 90 Yoruba Nigerians from the HapMap Project,
and uncovered approximately 2,700 potentially novel CNVs not previously reported in
the literature having a median length of approximately 3 kb. We generated sample-level
event calls in the 90 Yoruba at nearly 9,000 regions, including approximately 2,500
regions having a median length of just approximately 200 bp that represent the union
of CNVs independently discovered through whole-genome sequencing of two individuals
of Western European descent. Event frequencies were noticeably higher at shorter regions
< 1 kb compared to longer CNVs (> 1 kb).

Conclusions

As new shorter CNVs are discovered through whole-genome sequencing, high resolution
microarrays offer a cost-effective means to detect the occurrence of events at these
regions in large numbers of individuals in order to gain biological insights beyond
the initial discovery.