When case-control studies are analyzed, in addition to the columns with genotyping data, extra columns for cases and controls are necessary in order to distinguish the groups.

It is not necessary to provide any extra columns to distinguish groups when Linkage Disequilibrium Analysis or Haplotype Inference are conducted. However, by inputting information to distinguish groups, the analyses for each group will be possible.

It is possible to open multiple datasheets in SNPAlyze.

Automatic selection of polymorphic markers

In Ver.5.0 or later, SNPAlyze can select the appropriate polymorphic markers automatically.

SNPAlyze can provide the function that applying filtering for the polymorphic markers in three kinds of methods:
HWE, MAF and marker type. And, this filtering can apply to registered groups. For example, when the case group does not satisfy HWE due to genetic bias but the control group has to satisfy HWE, it is possible to apply filtering for control group only.

Case-Control Study

Tabulation method of genotype data

Genotype data can be modified for easy evaluation according to your preference. Four methods are available to tabulate genotype data. The first is termed "Automatically" because it automatically defines two types to perform statistical calculations and creates a contingency table as follows:

Genotype model

Allele model

Recessive model

Dominant model

The second is termed "User customize." It manually defines the contingency table to select polymorphic markers at will.

—————————————————–

Use of Chi-square test
SNPAlyze can evaluate Chi-square test and Fisher’s exact test for constructed contingency table. In case of 2×2 contingency table, odds ratio can also be calculated.

—————————————————–

Use of AIC
This software evaluates the relationship among individual SNPs and diseases by the chi-square test and AIC. With AIC, this evaluation can be performed with higher accuracy than with the chi-square test.

Independent and dependent analyses of the contingency table are performed from the AIC value of both models by assuming an independent model (AIC (IM)) and a dependent model (AIC (DM)) to create the contingency table as described in the previous section. Since a model that leads to minimum AIC values is the best,

AIC (IM) > AIC (DM) represents that SNP and a disease are dependent,

while,

AIC (IM) < AIC (DM) represents that SNP and a disease are independent.

—————————————————–

Use of FDR

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

Detailed information containing analysis results and settings is available in text format.

Linkage Disequilibrium Analysis

In this method, the Linkage Disequilibrium Analysis coefficient is calculated by using the difference between haplotype frequency and the allele frequency at two arbitrary gene loci. SNPAlyze can output the Linkage Disequilibrium Analysis coefficients such as D-value, D’-value, and r^2. In addition, the software can output chi-square and AIC values, and it can even display the graphical analysis result of the Linkage Disequilibrium Analysis.

Display of the haplotype frequency, LD coefficients, and statistics.

—————————————————–

Graphical analysis result of LD analysis.

The LD coefficient and statistics between multiple SNPs can be seen at a glance. The area with a strong LD coefficient can be easily specified. The following three display settings are available in SNPAlyze:

Comparative display of the analysis result for two different groups. -Figure 1

Comparative display of the analysis result for two different LD coefficients and statistics. -Figure 2

Superimposed display of the analysis result for two different groups (LD map type of BMP only). -Figure 3

In the case of the comparative display of the analysis result between two different groups or two different LD coefficients, the following grid type is also available. The LD coefficient or statistics is displayed on each cell and each cell can be color-coded according to the preset threshold value.

Haplotype Inference

Estimate haplotype frequency & tagSNP selection

Haplotype candidates in a group and their frequency are calculated. In addition, you can obtain a Diplotype sample individually, which is concluded as maximum likelihood by EM algorithm.

—————————————————–

Estimation of diplotype distribution

SNPAlyze shows the diplotype distribution calculated during the process of haplotype frequency estimation by using the EM algorithm.

—————————————————–

Output of detailed information

htSNP (tagSNP) combinations were displayed in a "Haplotype detail information" window. This window shows haplotype frequencies and diplotype information as well as htSNP combinations.

Hardy-Weinberg Equilibrium Test

The differences between the actual allele number observed and the assumed allele number in the Hardy-Weinberg equilibrium at an SNP site are evaluated by the chi -square test. In addition, SNPAlyze can evaluate the Exact test and the Exact test (Monte Carlo simulation) to complement the case that chi -square test is unsuitable to test.

—————————————————–

Output of detailed information

Detailed information containing analysis results and settings is available in text format.

—————————————————–

Use of FDR

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

Cochran-Armitage Trend Test is to investigate if genes associated with disease by means of comparison between two groups, one of which is a patient group and another is a non-patient group. This analysis assesses for the presence of a linear trend association between case-control category and allele counts.

—————————————————–

Definition of Contingency table

The distribution of case-control and genotype counts can be put in a 2 × 3 contingency table.

—————————————————–

Display of the results

The overall results and the statistical value of each group are shown in [Statistics] window and [Detail information] window. SNPAlyze outputs the result of the test as below, which are statistics (Chi-square, p-value, FDR q-value, and others) and information of loci.

—————————————————–

Use of FDR

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

Differences in the haplotype frequency that can be estimated at arbitrary SNP sites among several groups are determined. The significance of this determination is evaluated by permutation tests.
This analysis provides each estimated haplotype frequency and also the permutation result by the EM algorithm.

Moreover, graphs that show the frequency distribution of statistics and the frequency distribution acquired from permutation tests are output.

The output items is as below:
(1)Haplotype block candidate and frequency
(2)htSNP
(3)Frequency of appearance between two haplotypes
(4)LD co-efficient between two haplotype blocks(D’-value)
(5)p-value from chi-square test

Furthermore, it is also possible to display the identified blocks visually.

Cooperate with HealthSketch

HealthSketch is a multivariate analysis tool for clinical and/or lifestyle data. The following functions are available by cooperating with HealthSketch.

* The following functions are available by the purchase of the appropriate version of HealthSketch.

Data passing between SNPAlyze and HealthSketch

Combinational analysis of DNA polymorphism and clinical and/or lifestyle data.
SNPAlyze pass the diplotype configuration for each sample (judged to be maximum likelihood by the EM algorithm) to HealthSketch. HealethSketch can perform the analysis such as logistic regression by using the diplotype configuration.

Use of classification results by clustering using clinical information.
According to the Clustering function of HealthSketch, sample data was classified by using clinical and/or lifestyle data. The classified data which have a similar clinical and/or lifestyle data makes it possible for effective DNA polymorphism analysis.

SNPAlyze perform logistic regression analysis for each SNP. You can calculate Odds Ratio (OR), 95% Confidence Interval of the OR and p-value of likelihood ratio test for Dominant, Recessive and Genotype model about each SNP.

SNPAlyze Data file includes genotyping data and all analysis data collectively. If you open a file that saved as this file format, the genotyping data and all analysis data will appear.

You can continue your analysis, or share the genotyping data and all analysis data by distributing this file to other SNPAlyze users. (Please mind this file include genotyping data)

Principal Component Analysis

Scatter plot is using Eigenvectors. The horizontal axis is first principal component and the vertical axis is second principal component.
You can confirm samples with outliers.

Manhattan plot

p-value is calculated from case-control study by using NGS Data, and the p-value is showed on Manhattan plot.In the lower part of the display, It is a statistical value of SNP. This value was selected on the Manhattan plot.

These values (p-value, chi-square, degrees of freedom and effect size) are calculated from Genotype, Allele, Recessive and Dominant models.