Abstract

Given the huge amount of single nucleotide polymorphism (SNP) data available from high-throughput sources such as HapMap, data mining using various methods is a reasonable approach to identify SNPs that are informative for genetic ancestry. We extensively investigated the distribution and density of the SNPs across the genome of African and European populations available within the framework HapMap database to prioritize potential candidate SNPs useful for ancestry mapping in an admixed population. About 4 million SNPs were compared between Africans and Europeans using various measures of ancestry informativeness in use today viz. absolute allele frequency differences (Δ), Fisher Information Content (FIC), Shannon Information Content (SIC), informativeness for assignment (I), and Fixation Index (FST). Each method provides different sets of candidate ancestry informative markers (AIMs). The selected SNPs represent valuable resources for both controlling population structure and gene mapping studies. The overlap and non-overlap between selected AIMs by different measures of informativeness, data filtering strategies and the accuracy of each measure in classifying ancestral populations and applications in cancer genomics are discussed.