Division of Statistical Genomics and Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA.

Abstract

Interest is increasing in epistasis as a possible source of the unexplained variance missed by genome-wide association studies. The Genetic Analysis Workshop 16 Group 9 participants evaluated a wide variety of classical and novel analytical methods for detecting epistasis, in both the statistical and machine learning paradigms, applied to both real and simulated data. Because the magnitude of epistasis is clearly relative to scale of penetrance, and therefore to some extent, to the choice of model framework, it is not surprising that strong interactions under one model might be minimized or even disappear entirely under a different modeling framework.

Dependence of the existence of epistasis on scale of response and model choice (link function)

In Figures 2a and 2b, we show the interaction between genes G and F in which the probability of disease (penetrance) for the baseline genotype group is P[disease | G=aa, F=bb]=0.001, and each A allele dose for gene G increases the risk by six-fold while each B allele for gene F increases the risk by five-fold. In Figure 2a, both genes show no dominance and no interaction on the multiplicative probability scale (all three genotype lines are linear and parallel), which would be the conclusion from a log-linear model, whereas these same data show strong dominance (non-linear response by genotype) as well as strong G×G interaction (non-parallel lines) on the log-odds scale, which would be the conclusion from logistic regression. By contrast, in Figures 2c and 2d, we show two other genes J and K in which the baseline genotype group penetrance is P[disease | J=cc, K=dd] =0.5, (which corresponds to odds(P)=1), and each C allele dose for gene J increases the odds four-fold, while each D allele dose for gene K increases the odds two-fold. Here, both genes show strong dominance as well as G×G interaction on the multiplicative probability scale (Figure 2c) which would be the conclusion from a log-linear model, but these same data show no dominance and no interaction on the log(odds) scale for logistic regression (Figure 2d). Hence, the existence or lack of epistasis and/or dominance is dependent upon the scale of the response and therefore also on the choice of model (link function).