Abstract

Recently there has been great interest in identifying rare variants associated with
common diseases. We apply several collapsing-based and kernel-based single-gene association
tests to Genetic Analysis Workshop 17 (GAW17) rare variant association data with unrelated
individuals without knowledge of the simulation model. We also implement modified
versions of these methods using additional information, such as minor allele frequency
(MAF) and functional annotation. For each of four given traits provided in GAW17,
we use the Bayesian mixed-effects model to estimate the phenotypic variance explained
by the given environmental and genotypic data and to infer an individual-specific
genetic effect to use directly in single-gene association tests. After obtaining information
on the GAW17 simulation model, we compare the performance of all methods and examine
the top genes identified by those methods. We find that collapsing-based methods with
weights based on MAFs are sensitive to the “lower MAF, larger effect size” assumption,
whereas kernel-based methods are more robust when this assumption is violated. In
addition, many false-positive genes identified by multiple methods often contain variants
with exactly the same genotype distribution as the causal variants used in the simulation
model. When the sample size is much smaller than the number of rare variants, it is
more likely that causal and noncausal variants will share the same or similar genotype
distribution. This likely contributes to the low power and large number of false-positive
results of all methods in detecting causal variants associated with disease in the
GAW17 data set.