Outline

Background: Rare causal variants are believed to fill a significant part of the observed gap between heritability estimates of common diseases or quantitative traits and explained variance by discovered common genetic variants. Modern high-throughput sequencing methods allow the identification of rare variants for a reasonable number of individuals which can be analyzed for screening of disease variants or genetic modifiers of traits. Appropriate statistical methods to detect these variants are required.

Method: The publically available GAW17 dataset is based on a preliminary 1000 Genomes dataset of 697 individuals combined with a simulated complex disease model comprising intermediate quantitative phenotypes. By calculating top-gene lists, the dataset was used to test and compare different collapsing and scoring methods aiming to detect causal genes with or without rare variants for three different phenotypes. By a detailed analysis of the causal gene characteristics (e.g. number of independent causal markers, size of effect), we derived conditions under which single methods perform well. These analyses are accompanied by additional simulation studies.

Results: While collapsing methods performed equally well for the dataset, scoring methods of markers performed clearly differently. The minimum statistics was superior when there is a single causal markers with a dominating effect in the gene and when a relatively liberal cut-off of the gene-list is used, while the Hotelling test was superior when there are several independent causal markers of the gene and when a stringent cut-off of the gene-list is used. Other methods such as multivariate analysis or LASSO performed worse than these statistics in most situations. This pattern is more or less the same for all phenotypes and was confirmed by the single gene analyses and simulation studies. Analysis of rare variants was only useful for the quantitative phenotypes. Results were highly similar when analysis was based on the original complete GAW17 dataset or a filtered dataset after strict quality control.

Discussion: We conclude that the search for causal rare variants can best be accompanied by improved scoring techniques. However, no clear general recommendation can be given since the performance of methods depends on the structure of the genetic effect.