Effective selection of informative SNPs and classification on the HapMap genotype data.

Effective selection of informative SNPs and classification on the HapMap genotype data.

Author:

Zhou, Nina.; Wang, Lipo.

Copyright year:

2007

Abstract:

Background: Since the single nucleotide polymorphisms (SNPs) are genetic variations which
determine the difference between any two unrelated individuals, the SNPs can be used to identify
the correct source population of an individual. For efficient population identification with the
HapMap genotype data, as few informative SNPs as possible are required from the original 4 million
SNPs. Recently, Park et al. (2006) adopted the nearest shrunken centroid method to classify the
three populations, i.e., Utah residents with ancestry from Northern and Western Europe (CEU),
Yoruba in Ibadan, Nigeria in West Africa (YRI), and Han Chinese in Beijing together with Japanese
in Tokyo (CHB+JPT), from which 100,736 SNPs were obtained and the top 82 SNPs could
completely classify the three populations.