Bottom Line:
They provide the highest-resolution genetic fingerprint for identifying disease associations and human features.Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population.Haplotype block structures are used in association-based methods to map disease genes.

ABSTRACTSingle nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.

ijms-16-01096-f002: TagSNPs required to cover 10% increments of the chromosomal region: (a) number of TagSNPs and (b) percentage of tagSNPs.

Mentions:
Figure 1a relates the block number to the percentage of the chromosomal region (common SNPs) covered by the total block. Note that a wide region of the chromosome is covered by only a few blocks. More specifically, in all cases, approximately 40% of the blocks (see Figure 1b) cover 70% of the chromosomal region. Figure 2a shows the number of tagSNPs required for the blocks to cover a certain percentage of the chromosomal region. According to this figure, 8000 tagSNPs are sufficient for a 70% coverage of the genome (less than 50% of the tagSNPs required in Figure 2b). This coverage captures most of the haplotype information, confirming that our method embodies most of the regional chromosome information in just a few tagSNPs. Figure 3a shows the percentage of common SNPs covered by each tagSNP on average, versus the percentage of the chromosomal region covered by the blocks. Note that as more of the chromosomal region is covered by the blocks, fewer common SNPs are covered by each tagSNP (on average). Figure 3b shows the number of SNPs covered per tagSNP for each 10% coverage of the chromosomal region. Interestingly, the marginal utility of tagSNPs decreases with increasing genome coverage. Figure 3c relates the percentage coverage of the chromosomal region to the number of tagSNPs required for each coverage.

ijms-16-01096-f002: TagSNPs required to cover 10% increments of the chromosomal region: (a) number of TagSNPs and (b) percentage of tagSNPs.

Mentions:
Figure 1a relates the block number to the percentage of the chromosomal region (common SNPs) covered by the total block. Note that a wide region of the chromosome is covered by only a few blocks. More specifically, in all cases, approximately 40% of the blocks (see Figure 1b) cover 70% of the chromosomal region. Figure 2a shows the number of tagSNPs required for the blocks to cover a certain percentage of the chromosomal region. According to this figure, 8000 tagSNPs are sufficient for a 70% coverage of the genome (less than 50% of the tagSNPs required in Figure 2b). This coverage captures most of the haplotype information, confirming that our method embodies most of the regional chromosome information in just a few tagSNPs. Figure 3a shows the percentage of common SNPs covered by each tagSNP on average, versus the percentage of the chromosomal region covered by the blocks. Note that as more of the chromosomal region is covered by the blocks, fewer common SNPs are covered by each tagSNP (on average). Figure 3b shows the number of SNPs covered per tagSNP for each 10% coverage of the chromosomal region. Interestingly, the marginal utility of tagSNPs decreases with increasing genome coverage. Figure 3c relates the percentage coverage of the chromosomal region to the number of tagSNPs required for each coverage.

Bottom Line:
They provide the highest-resolution genetic fingerprint for identifying disease associations and human features.Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population.Haplotype block structures are used in association-based methods to map disease genes.

ABSTRACTSingle nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.