Abstract

In genetic mapping of complex traits, scored haplotypes are likely to represent only a subset of all causal polymorphisms. At the extreme of this scenario, observed polymorphisms are not themselves functional, and only linked to causal ones via linkage disequilibrium (LD). We will demonstrate that due to such incomplete knowledge regarding the underlying genetic mechanism, the variance of a trait may become different between the scored haplotypes. Thus, unequal variances between haplotypes may be indicative of additional functional polymorphisms affecting the trait. Methods accounting for such haplotype-specific variance may also provide an increased power to detect complex associations. We suggest ways to estimate and test these haplotypic variance contrasts, and incorporate them into the haplotypic tests for association. We further extend this approach to data with unknown gametic phase via likelihood-based simultaneous estimation of haplotypic effects and their frequencies. We find our approach to provide additional power, especially under the following types of models: (a) where scored and unobserved variants are epistatically interacting with each other; and (b) under heterogeneity models, where multiple unobserved mutations are linked to nonfunctional observed polymorphisms via LD. An illustrative example of usefulness of the method is discussed, utilizing analysis of multilocus effects within the catechol-O-methyl transferase (COMT) gene.

Introduction

There is a substantial body of work on characterizing haplotype-trait associations using unrelated individuals. Simple methods of dealing with the unknown haplotype phase in association tests have been proposed [Schaid et al., 2002; Zaykin et al., 2002; Luo et al., 2006]. Xie and Stram [2005] showed asymptotic equivalence of two common types of these approaches. These methods have been found to provide adequate inference concerning both hypothesis testing, and association parameter estimation, and have been recommended for usage [Stram et al., 2003; Kraft et al., 2005; Xie and Stram, 2005; Kraft and Stram, 2007]. Tzeng et al. [2006] extended these approaches to incorporate evolutionary clustering of haplotypes. Unbiased estimation of association parameters may require a simultaneous estimation of haplotype frequencies and association parameters. The maximum-likelihood methods are theoretically preferable [Lin and Huang, 2007; Allen and Satten, 2008]. A variety of such methods have been proposed [Tregouet et al., 2002; Epstein and Satten, 2003; Stram et al., 2003; Zhao et al., 2003; Shibata et al., 2004; Lin et al., 2005; Lin and Zeng, 2006]. In this article, we incorporate haplotype and diplotype-specific variances into the likelihood for unphased data. The motivation for this extension comes from a scenario where haplotypes under investigation are either linked via LD to causal variants, or represent only a part of all causal variation. In both cases, the effect of an observed variant (i.e. haplotype) on the trait (Y) is a weighted average of the effects that correspond to all relevant polymorphisms considered jointly. The weights are given by the frequencies of these unobserved joint polymorphisms. To simplify the exposition, we first assume that the observed variant is an SNP “A” with alleles A1, A2. We denote haplotypes that include A1 by hA1, and assume no dominance effects. Then the effect of A1, i.e. the expected value E(Y|A1) is

μA1=∑j∈hA1μjpj∑j∈hA1pj

(1)

where j is indexing over all haplotypes hA1 that carry the allele A1, μj denotes E(Y|hj), and pj is the population frequency of the j-th haplotype (hj). Unless A is the only polymorphism that is relevant with regard to the trait under study, there is uncertainty associated with the frequencies pj, because the haplotypes are partially typed. This leads to unpredictability of the effect size, μA1. As the frequencies vary, the observed effect changes in value, and might even change its sign [Lin et al., 2007; Zaykin and Shibata, 2008], thus the power to detect associations can be greatly reduced. This is a consequence of “partial knowledge”: if the causative haplotypes are completely typed, the effects are just μj, independent of the frequencies.

Using a similar notation, the variance of a trait among individuals carrying A1 is

VA1=∑j∈hA1pjVj∑j∈hA1pj+∑j∈hA1pj(μj−μA1)2∑j∈hA1pj

(2)

where VA1 Var(Y|A1), and Vj Var(Y|hj) - the variance among individuals carrying a haplotype hj. We can assume that all Vj are equal to a common value, σ2, and even then the allele-specific variance remains elevated compared to σ2, due to differences between the underlying effects:

VA1=σ2+∑j∈hA1pj(μj−μA1)2∑j∈hA1pj

(3)

The contrast between the variances, VA1vs. VA2 can also be unequal, and this can be incorporated into association methods. Models where A is either a functional locus, or a variant in LD with nearby causative polymorphisms can produce situations where VA1 ≠ VA2, even when the population frequencies pj are such that there is no effect at the observed locus, μA1 = μA2 [Zaykin and Shibata, 2008].

The main idea of the approach that we develop here is to estimate and test the variance contrast, VA1 vs. VA2, in addition to, or simultaneously with the usual inference about equality of the genetic effects, μA1 vs. μA2. We extend this approach to the case where A is represented by multi-SNP haplotypes, estimated from unphased, single-locus genotype data.

Our approach is based on random population samples, and we assume that there is no statistical confounding, such as confounding due to population stratification. In the absence of confounding, both mean and the variance contrasts are interpreted in a similar manner: μA1 ≠ μA2, or VA1 ≠ VA2 indicate either (a) a functional involvement of the observed polymorphisms, or (b) association via LD with nearby, unobserved functional variants [Zaykin and Shibata, 2008]. In the first case, the unobserved factor, B (e.g. a factor with two levels, B1,B2) can be non-genetic, such as an environmental factor that interacts with the observed variation.

Methods

We are concerned with haplotype and diplotype association methods that are capable of dealing with unphased data. Nevertheless, we start with a single di-allelic (A1, A2) SNP, because this allows us to succinctly describe the essence of the methods. We define the following three null hypotheses (H0) of interest:

H0μ,V:(μA1=μA2;VA1=VA2)H0μ:(μA1=μA2)H0V:(VA1=VA2)

(4)

with the corresponding likelihood ratio test statistics. A rejection of the more general hypothesis,
H0μ,V, indicates that there are differences in either means or the variances of the trait between the alleles A1 and A2. The other two hypotheses are formed specifically regarding the allelic means or the variances. Genotypic rather than allelic-based hypotheses and tests are similarly defined. For example, the H0 regarding the genotypic means is stated as
H0μ: (μA1A1 = μA1A2 = μA2A2). The likelihoods are constructed assuming the normal distribution for the trait. Log-likelihood contributions are added over individuals in the sample, which are assumed to be independent. L0 in the following subsections refers to the log-likelihood under the common mean and the variance for all alleles or genotypes; L1 allows for allele- or genotype-specific means and variances, L2 allows for allele- or genotype-specific means and common variance, and L3 allows for allele-or genotype-specific variances. The estimates , used in the likelihoods are specific to alleles or genotypes at the scored locus, A: it appears difficult to take advantage of incorporating mixture proportions pj given in equations (1, 2, 3), since these depend on the unknown factor, B.

Allelic-based tests

The model that corresponds to allelic tests and estimates is also referred to as the additive model. To construct allelic likelihood ratio test statistics (LRTs) that correspond to our hypotheses,
H0μ,V,H0μ,H0V, we define the following log-likelihoods:

Tests for the means as well as tests for the variances can be constructed while allowing for the other parameter to be different between alleles. There is a concern that the test for the mean with unequal variances (which we denote by Tμ*) might be less robust than the test based on the common variance, Tμ. Thus, we investigate both versions of the LRTs. The LRT for the variance is denoted by TV. The combined test for the both parameters, Tμ,V, is obtained in a straightforward way as a difference of the likelihoods 2(L1 − L0). This difference is equal to the sum, Tμ + TV. We also consider a different combined test, Tμ*,V, based on the sum, Tμ* + TV. The LRTs are:

The notation indicates that, for example, Tμ,V, has an asymptotic chi-square distribution with two degrees of freedom. The degrees of freedom are given by the difference in the number of parameters between the likelihoods forming the LRT.

Genotypic-based tests

Genotypic rather than allelic-based tests are constructed similarly. The mean and the variance estimators for the homozygote A1A1 are obtained as

μ^A1A1=∑i∈A1A1nA1A1YinA1A1;V^A1A1=∑i∈A1A1nA1A1(Yi−μ^A1A1)2nA1A1

(8)

These are similarly defined for the genotypes A1A2 and A2A2. The log-likelihoods are

The dominant and the recessive models are defined in the same way as the full genotypic, but without making distinctions between certain genotype classes. For example, in the dominant model we estimate A1A1 +A1A2, A1A1+A1A2, A2A2, A2A2, and form the likelihoods according to these four, rather than six parameters.

Haplotype and diplotype tests for unphased data

The LRT statistics for the situation with multiple SNPs and the unknown haplotype phase have a form similar to the statistics just described. However, because of the phase uncertainty, the form of a likelihood function is now

L∝∏in∑haplotypepairsPr(hs,hr∣H)φ(Yi;μ^sr,V^sr∣hr,hs)

(11)

where H are sample haplotype frequencies and Pr(hshr|H) is a per-subject probability of a haplotype pair hs, hr, under the assumption of Hardy-Weinberg equilibrium (HWE). Haplotype and diplotype means, variances as well as sample frequencies of haplotypes are estimated simultaneously via an EM algorithm. An extension to multiple haplotypes tested all at once in an overall test is straightforward. The likelihoods described above simply incorporate more parameters, specific to means and variances of multiple haplotypes or diplotypes. When the haplotype phase is known, the overall test statistics for the k haplotypic means can be shown to take the form

Tμ=∑i=1kni(μ^i−μ^)2V^

(12)

Tμ∗=∑i=1kni(μ^i−μ^)2V^i,

(13)

where ni are haplotype counts. Under the normal model, the haplotypic overall tests for the means are analogous to the “2N” ANOVA described in Zaykin et al. [2002], where asymptotic comparisons were made with a corresponding trend test. The variance test has the form

TV=ln(V^)∑i=1kni−∑i=1kniln(V^i),

(14)

This statistic is equivalent to the Bartlett test [Bartlett, 1937]. Under
H0μ,V and normality, it is independent of the statistic that is based on the ratio of the mean difference and the pooled variance. Thus, asymptotically equivalent forms of the LRTs for testing the means and the LRTs for the variances are asymptotically independent. Frequencies of rare classes become critical with regard to the properties of an overall test. In our comparisons, we used an alternative overall test, the Simes test, based on p-values obtained from tests on individual haplotypes. Simes [1986] described this combination test that allows for dependencies among p-values. The Simes test rejects the overall H0 at the level α if p(i) ≤ iα/k for at least one i, where k is the number of tested haplotypes, and p(i) are ordered p-values. Equivalently, the overall p-value is given by min{kp(i)/i}. Thus, this approach is to take one haplotype (hi) at a time, contrasted against the group of all other haplotypes, hī. As we will verify by simulations, this test can be more powerful than a multi-parameter overall test when only a small proportion of all haplotypes carry pronounced effects. Although the Simes test is an overall test with regard to the family-wise error rate, it has the same form as the false discovery rate (FDR) controlling procedure by Benjamini and Hochberg [1995] that can be used to make statements for a specific set of haplotypes that clear the FDR threshold.

A note should be made regarding the interpretation of results for the variance-specific tests. When a haplotype hi is compared against a combined group that consists of all remaining haplotypes, hī, a possible heterogeneity of effects combined within hī may lead to an increase of the variance associated with that group (Appendix 2). In addition, the variance-specific haplotypic tests are sensitive to a variance contrast that can be induced by strong dominance deviations. Thus, diplotypic, rather than haplotypic tests, especially when the diplotypes hi/hi, hi/hī (compared to hī/hī) are found to be associated with an increased variance, provide a more clear evidence for an additional unknown functional variation outside of the studied set of SNPs.

The normality assumption can be relaxed in our implementation by utilizing a resampling distribution of the test statistics for computing p-values of the tests. Asymptotic tests appear to offer a reasonable performance, unless the expected count of a tested variant is low. Nevertheless, it is a good strategy to verify low p-values with a permutational test. Trait transformations to normality, such as the common Box-Cox transformation [Box and Cox, 1964] can also be considered. The common-variance test for the mean, Tμ appears to be robust to deviations from normality (Results section). Similar tests contrasting means for the dominant, recessive, and diplotype models were described in Shibata et al. [2004].

The two combined statistics considered here can be factored as sums, e.g. Tμ,V = Tμ+TV. That is, these statistics are sums of two chi-square contributions, which are independent under the hypothesis that all means and variances are the same, and assuming normality. At a reviewer’s request we will describe the relation of the p-value for this test with p-values obtained via combination methods, from p-values of the two other tests. Denote p-value for testing the means by p1, and p-value for testing the variances by p2. Zaykin et al. [2007] described properties of a method by which L p-values are combined as
pc=1−GλL[∑LGa−1(1−pi)], where Gx is a cumulative Gamma distribution function with the shape parameter equal to x, and an arbitrary rate parameter. Setting λ = 1 gives Fisher’s combined p-value, which in the case of just two p-values simplifies to pc = p1p2 [1 − ln(p1p2)]. This p-value corresponds exactly to the p-value for the genotypic/diplotypic test, Tμ,V, or Tμ*,V, because Fisher’s test is based on a sum of two degree of freedom chi-squares. The allelic/haplotypic p-value for testing
H0μ,V is equivalent to pc obtained with λ = 1/2, because in this case, single degree of freedom chi-squares are being added. With only two p-values, the combined values are expected to be similar for different values of λ. Thus, the above simplification for Fisher’s pc gives a straightforward way to examine the expected behavior of the test for the combined hypothesis. In particular, the test is expected to provide an additional power only when both of the individual null hypotheses are false. Of note, the solution of p = p2[1 − ln(p2)] shows that when p1 = p2< ≈ 0.285, the combined value is smaller than the individual values; otherwise, it is larger. Elston [1991] showed that as L becomes large, this “critical value” approaches 1/e.

Cohort description

To illustrate an application of the proposed association tests, we performed an analysis of the data collected within a larger prospective study, concerned with the pain sensitivity as a risk factor for the development of facial pain [Diatchenko et al., 2005].

Subject recruitment

One hundred ninety six healthy European American pain-free females with an age range of 18 to 34 years were genotyped and phenotyped. Phenotypic procedures and demographic characteristics of the cohort at the time of recruitment were described previously [Diatchenko et al., 2005].

Pain phenotyping procedures

Subjects were phenotyped with respect to their sensitivity to pressure pain, thermal pain, and ischemic pain. A summary measure of thermal threshold was used in this study as it has been shown to be the most sensitive measure with respect to association with catechol-O-methyltransferase (COMT) genotypes, and has the lowest measurement error [Diatchenko et al., 2006, 2005]. A summary measure was calculated by summing centered and scaled individual measures of thermal pain threshold, collected as subjective estimates of rating thermal stimuli that were applied to the skin overlaying the right masseter muscle, right forearm, and dorsal surface of the right foot. Positive values imply higher pain thresholds, and negative values imply lower pain thresholds.

Results

We will first present results of theoretical models, followed by the analysis of association of pain characteristics with the COMT polymorphisms. In the theoretical part of the section, we will refer to a model where the observed and the unobserved loci are denoted by A and B, correspondingly. We will study situations where A and B represent loci with multiple haplotypes. When both A and B are either di-allelic, or when A2, for example, can be considered as the “not-A1” group of haplotypes, we will denote the vector of four effects, μAiBj = E(Y|Ai, Bj), by M, and the four population frequencies by P:

Haplotype

P

M

A1B1

pA1B1

μA1B1

A1B2

pA1B2

μA1B2

A2B1

pA2B1

μA2B1

A2B2

pA2B2

μA2B2

Non-interactive models

First, we consider models where both observed (A) and unobserved (B) loci are functional, contributing additively, with no interaction between the loci. In the di-allelic case, the array of effects takes the form M = {κA1 + κB1, κA1 + κB2, κA2 + κB1, κA2 + κB2}, where κ denote additive contributions of respective alleles. Figure 1 shows logs of theoretical mean and variance contrasts for this model: ln(μA1/μA2), and ln(VA1/VA2), as functions of the frequency of A2B1. The frequency pA1 was set to the frequency of a common COMT haplotype, 0.3034 (Table 1). We used pA1B1 = 0.3034; pA1B2 = 0 in the left graph, and pA1B1 = pA1B2 = 0.3034/2 in the right graph. We assumed κA1 = 2, κA2 =3, κB1 =4, κB2 =1. The graphs show an extended region of VA1 ≠ VA2. To evaluate the proposed statistical tests, we took random samples of n = 500 from diploid, randomly mating populations, that represent various points along the A2B1 frequency axis of the left graph of Figure 1. The phenotype for the subject i with the haplotype pair AsBt/AuBv in these and the subsequent simulations was generated as Yi = μAsBt + μAuBv + ξi; where ξi ~ N(0, σ2). The locus B was assumed to be unobserved. Unless stated otherwise, all simulations used at least 10,000 samples of 500 individuals and the nominal 5% level for the tests.

COMT Haplotype frequencies and parameters used in simulation experiments

The left three columns of Table 2 show resulting power values for a single SNP. Because the goal was to evaluate the relative performance of the tests, the σ2 values, as shown in the table, were taken to be different for each setting, so that medium to high power values could be obtained. The corresponding results for unphased data, with population haplotype frequencies, modeled after the COMT, are given in the right columns of the table. In the case of multiple unphased haplotypes, the haplotype AGCG corresponds to the allele A1, while the group A2 is composed of the seven remaining haplotypes, with their respective frequencies.

The results show a good correspondence of power values for the di-allelic case (Table 2, left three columns), with the case of unphased haplotypes (right columns of Table 2), with some loss of power due to phase ambiguity. There is also a cost of considering diplotype, rather than haplotype associations in this case, because the underlying model consists of phenotypic contributions of haplotypes, and lacks dominance deviations due to entire diplotypes. These findings will be confirmed in the subsequent simulations as well. In these simulations, the unequal variance test for the mean is slightly, but consistently more powerful than the test based on the common variance. This power advantage appears to be confined to cases when the minor allele (or a tested haplotype with frequency less than 0.5) is associated with a decreased variance. There is a power advantage associated with the inclusion of the variance parameter in the range of pA1B1 from 0.25 to 0.65. On the other hand, the variance contrast in the right graph of Figure 1 gives a substantial power boost only at the left part of the graph (results not shown). The difference between the two graphs is that the frequency of A1 in the left graph is confined to one of the haplotypes. This results in the complete standardized LD, D′ = 1. With interactive models, high LD is no longer a requirement for the variance contrast to provide a power advantage, as will be illustrated next.

Interactive models

The previous model was modified to include an epistatic component ε, as M = {ε + κA1 + κB1, κA1 + κB2, κA2 +κB1, κA2 + κB2}. Figure 2(a) is a such a modification of the model that was used to produce Figure 1(b), now with ε = 2. The effect of it is that the variance contrast curve is pushed upward, compared to the previous picture. This is prominently reffected in the power values for the test that includes the haplotype-specific variance (Table 3). The advantage of including the variance parameter now extends throughout the entire range of the haplotype frequency. Moreover, this finding holds under linkage equilibrium (LE) as well (Figure 2(b), Table 4). Under LE, the behavior of the ratio of the haplotypic means and variances depends only on allele frequencies at the unknown locus [Zaykin and Shibata, 2008], thus the x-axis of Figure 2(b) tracks the frequency of B1. Power of the the unequal variance test for the mean is slightly worse than that for the test based on the common variance, however, its size is closer to the nominal 5% level in the presence of variance heterogeneity.

Models of association via LD

Models of association via LD (proxy models) assume an unobserved functional locus B, while the observed locus A associated via LD with the B has no functional involvement. In the di-allelic case, the array of effects is M = {x, y, x, y}. Graphs for this model (Figure 3) bear similarity to the graphs for the non-interactive model (Figure 1). The distinction is that the behavior in this case is completely governed by LD: both the mean and the variance contrast curves cross zero at the same point. Figure 3(a) shows the LD correlation instead of the D′, because D′ is equal to one throughout this graph. The power gained from the inclusion of the haplotype-specific variance extends through only a limited part of the graph (Table 5). Nevertheless, in the remainder of the table, the association is so pronounced that it is easily detected with high power by either of the tests (
H0μ,V or
H0μ), even though very large values of σ2 were assumed. For Figure 3(b), where the LD is not complete, there is no advantage in including the haplotype-specific variance into the test (results not shown) - unlike with interactive models, high LD is required for proxy models to induce a substantial variance contrast.

The LD proxy model can be extended to multiple mutations, B1, …, BK. In this case, the resulting mixture distribution at the observed haplotypes linked via LD with the B may provide considerable power advantages for the test that includes the variance contrast. The trait distribution at the scored haplotypes is a mixture with weights that depend on pBiA (Appendix 1). Table 6 lists power values for a model with ten mutations B1, …, B10. The type-I error entries in this case correspond to a situation when we are examining a “wrong” SNP, i.e. an SNP in LE with the mutations Bi. The distribution of the trait in this case is still a mixture, although there is no contrast between the distributions at the scored haplotypes. The inclusion of the variance parameter yields a moderate gain in power. The gain is highest with the allelic/haplotypic test, because of the lack of dominance deviations attributable to diplotypes. The test Tμ* is found to be slightly more powerful than Tμ in these simulations.

LRT power values under the heterogeneity model (10 mutations, B1,…,B10).

Association of COMT polymorphisms with the pain threshold

Table 7 shows an application of the LRT tests to data on five polymorphisms in the COMT gene. COMT is an example of a gene where multiple interacting variants have been suggested to influence variation in several complex phenotypes. COMT codes for an enzyme that metabolizes catecholamines, such as dopamine (DA), norepinephrine (NE) and epinephrine (Epi), and thus affects an array of cognitive-affective traits, including pain perception [Diatchenko et al., 2005; Nackley et al., 2007; Egan et al., 2001; Enoch et al., 2003; Oroszi and Goldman, 2004; Meyer-Lindenberg et al., 2006]. Common nonsynonymous variation val158met (rs4680) has been shown to influence thermostability of the enzyme [Lotta et al., 1995], however, the associations between the low-activity Met158 allele and numerous complex phenotypes [Egan et al., 2001; Enoch et al., 2003; Karayiorgou et al., 1999; Zubieta et al., 2003; Oroszi and Goldman, 2004] have been often modest, and occasionally inconsistent. Repeated retesting of the val158met has been partially driven by numerous molecular and biochemical studies confirming that Met at position 158 does lead to a 3–4 times lower enzymatic activity of COMT at both cell culture and organismal levels [Lotta et al., 1995; Mannisto and Kaakkola, 1999]. There is evidence that additional functional SNPs in the COMT gene locus can modulate COMT activity, as supported by a number of recent positive association studies and molecular work using cell-based assays [Oroszi and Goldman, 2004; Meyer-Lindenberg et al., 2006; Mannisto and Kaakkola, 1999]. The list of potential functional sites includes the following SNPs: rs2097603 in the promoter region of brain-expressed, membrane-bound (MB-) COMT form [Palmatier et al., 2004; Zhu et al., 2004; Chen et al., 2004], rs737865 upstream in the intron 1, and rs165599 in the 3′ untranslated region [Shifman et al., 2002; Bray et al., 2003]. Furthermore, in studying the association between COMT genotypes and variability in human pain perception, it has been found that the val158met polymorphism alone was not significantly associated with a derived measure of global pain sensitivity [Diatchenko et al., 2005]. Instead, three common haplotypes of the COMT gene, consisting of two synonymous (rs4633 and rs4818) and one nonsynonymous val158met SNPs are coding for different levels of enzymatic activity. Corresponding differences in pain sensitivity are associated with regulation of the translation efficiency through haplotype-dependent secondary mRNA structure [Nackley et al., 2006]. Thus, COMT contains at least five functional polymorphisms that potentially impact the index of pain sensitivity. Interactions of functional alleles at COMT imply that the genetic effects may not be easily inferred from the information on one SNP at a time, and that the SNP-specific effects may in fact be misleading. In our application, we tested four SNPs which have been previously independently associated with COMT-dependent phenotypes: rs2097603, rs737865, rs4680 and rs165599. An additional SNP rs4818 was chosen as a major contributor to functional COMT haplotypes that together with SNP rs4680 defines three functional haplotypes and influences pain sensitivity [Diatchenko et al., 2005]. All SNPs were found to conform to Hardy-Weinberg expectations. As a functional measure of COMT activity we used sensitivity to noxious stimuli. A summary measure of thermal threshold has been chosen for this study as it has been shown to be the most sensitive measure with respect to association with COMT genotypes, and has a small associated measurement error [Diatchenko et al., 2005, 2006, 2007].

Two SNPs, rs737865 and rs4818 with the corresponding LD correlation of r = 0.5, show a significant mean effect as well as an indication that variability might be different between individuals carrying different alleles. There is a good correspondence between the asymptotic and the resampled p-values for the mean and the variance tests (based on 50,000 trait value permutations). There is less correspondence for the combined multilocus tests, Tμ,V, which could be attributed to insufficient closeness to normality, small sample size, and accumulation of errors from Tμ and TV that constitute the combined statistic. The major alleles, A of rs737865, and G of rs4818 are associated with the same direction of the effect, as well as with the variance increase. Haplotypes and diplotypes that carry either AG or GC combinations of alleles show significance associated with the variance parameter. The genotypic/diplotypic tests were included in those cases where there was an indication of a contrast in variance at the allelic/haplotypic test, to rule out a possible variance increase due to dominance effects. Consistent with previous publications [Diatchenko et al., 2005, 2006, 2007], homozygotes for AG haplotype display the highest pain thresholds and the lowest sensitivity to pain, while homozygotes for GC haplotype display the lowest pain thresholds and the highest sensitivity to pain. Among the four haplotypes with the frequency of at least 0.05 that include all five markers and carry the opposite effect alleles at rs737865 and rs4818, one shows a significant variance effect (haplotype AAGGA). Results found to be significant at the 5% level by the Tμ, were also found to be significant by the test Tμ* (data not shown). However, there was less correspondence between the asymptotic and the resampled values for the Tμ*. There is an issue of multiple testing that should be accounted for in the interpretation of these results. Nevertheless, prior findings and the significance of the overall linear model that includes additive effects of all five SNPs (p=0.013) suggest the existence of an association in the region. The haplotype AGCGA with the significant mean effect corresponds to a high pain sensitivity (“HPS”) haplotype of COMT [Diatchenko et al., 2005] that codes for a low translation efficiency of COMT. This leads to a 25-fold lower enzymatic activity when compared with the low pain sensitivity haplotype, LPS [Nackley et al., 2006]. The LPS corresponds to the haplotype AAGGA of the present study. A significant variance contrast at rs737865 and an increased variance associated with diplotype classes that include the haplotype GC point to a possibility of additional unobserved functional variants that have not been typed in this study. In those cases where a haplotype (or a diplotype) of interest shows an associated decreased variance relative to the group that includes several other haplotypes, a possibility should be examined that the variance in the combined group is increased due to a mixture of effects associated with heterogeneous haplotypes in that group.

Parameter estimates, type-I error, and overall tests

When testing one haplotype at a time, a given haplotype hi is contrasted against the rest of the haplotypes, collapsed into a combined group, hī. Population means and variances for additive, recessive, dominant, and diplotypic models of analysis can be written in terms of the population frequencies and effects of haplotypes. These theoretical values are given in Appendix 2, and can be compared to the output of our EM algorithm for unphased data. The hypothesis tests assume a normal distribution, however the test Tμ for comparison of haplotypic means is robust to deviations from normality. The parameter estimates appear to be precise for non-normally distributed traits. In this section, we assumed haplotype contributions to the phenotype to follow either a normal or a Gamma distribution. For a haplotype hi in the Gamma model, its contribution is represented by a random variable Xi ~ Gamma(αi, βi), with parameters given in Table 1. Two haplotype contributions combine in an individual as Xi + Xj. A trait histogram for a sample under this model would show a decidedly skewed distribution. The results (Table 8) using samples of 300 individuals show good correspondence with theoretical values, calculated according to Appendix 2.

Frequencies of the first five COMT haplotypes, as given in Table 1, cover the range from 0.064 to 0.3. These haplotypes were used to evaluate the type-I error rates as a function of frequency. Results of the simulations, for different models of analysis, are shown in Table 9. Tests for the haplotypic and the dominant models of analysis maintain the nominal 5% error rate even for the rare haplotype (the last row in the table), and the error rate for the recessive model is the worst, for the tests that incorporate the variance. The Tμ* was evaluated but not included into the table for the dominant and the recessive models. This test maintained type-I error rate for the dominant model. For the recessive model, it was approximately as liberal as that shown for the diplotypic model. These results may be explained based on the expected number of categories used in the comparisons. With N = 500, and the haplotype frequency p = 0.064, the expected number of haplotypes is 2Np = 64. However, the diplotype and the recessive models are both estimating the frequency of the homozygote, with the expected number (Np2) that is equal to only about two. On the other hand, p = 0.3 gives the expected number of 45 individuals, which appears to be sufficient. Based on these results, a permutational test is recommended to verify small asymptotic p-values. Performance of the overall tests is summarized in Table 10. The additive (haplotypic) model of analysis was used. We included results for the Haplotype Trend Regression (HTR) method [Zaykin et al., 2002] for comparison. The first, “HTR”, column gives power for the overall HTR test as described in Zaykin et al. [2002]. The second column gives power for the Simes-based HTR, and these values are very similar to those for the Simes-based Tμ (fifth column). Before the application of the Simes test, both of these approaches compare the effect of each haplotype in turn with the weighted average for the rest of the haplotypes. We note that HTR tests for individual haplotypes in these simulations gave power that was nearly identical to the power of the haplotypic test Tμ (data not shown). For non-normal data, such values are given in Table 11, again showing similarity between the two tests. It is also expected that a multi-parameter haplotypic LRT (Tμ) would yield power values that are similar to those given in the first column of Table 10, as based on the form of the test (12) and on results from Appendix 1 of Zaykin et al. [2002]. Columns 2–7 of Table 10 give values for the Simes-based tests, where p-values that correspond to the first five most frequent haplotypes, listed in Table 1, are combined into the overall p-value. One exception is the last row (Setting 8). For this row, we used the FUSION data frequencies of 5-SNP haplotypes, as given in Table 1 of Lin [2004], and originally described in Valle et al. [1998]. Epstein and Satten [2003] estimated susceptibility effects for these haplotypes and found two of the haplotypes to be significantly associated (the first with an increased, the second with a decreased susceptibility). Therefore, for these two haplotypes we allocated effects that are in opposite directions from the baseline value.

The results can be summarized as follows. Predictably, the Simes-HTR gives power values that are close to the values for the overall (Simes) Tμ. The Simes used as an overall test is more powerful than a multi-parameter test when there are one or two haplotypes with effects that are distinct from a baseline value. When all haplotypes have distinct mean effects, the multi-parameter overall test is more powerful than the Simes, however power values for the two methods are more similar when frequencies of haplotypes included into analysis are similar to each other. The type-I error for the Simes test is well maintained. In the presence of multiple haplotypic effects with the common variance (Settings 6,7,8), some increase in the type-I error for the TV may be expected because of the mixture of effects associated with the combined group (hī, Appendix 2). Since the combined group is expected to show an increased rather than decreased variance, a possible modification of the test is to consider a one-tailed hypothesis Vi > Vī.

To investigate the effect of normality violation on the performance of the tests, we performed additional simulations under the Gamma model, using tests for a single haplotype (h1 with the frequency 0.3034). The power simulations (Table 11) indicate that only the tests Tμ and HTR retain a proper type-I error. The other tests are sensitive to non-normality, however the Box-Cox transformation remedies this problem. When the values αiβi are the same for all haplotypes, the tests incorporating the haplotypic variance show a substantial gain in power, as expected, based on the fact that
E(Xi)=αiβi;V(Xi)=αiβi2. The relative performance of Tμ and Tμ* is switched in settings 3 and 4: As we have noted, Tμ* tends to provide an increased power when a tested haplotype (contrasted against the combined group) is associated with a smaller variance, and when its frequency is smaller than the frequency of the combined group.

Caution is needed regarding the interpretation of results obtained for the transformed data. Even under normality, the mean and the variance-specific tests are independent only under the complete null. There might be some inflation of the type-I error for these tests when the second parameter is heterogeneous between the haplotypes, and problem is likely to be exaggerated by both non-normality as well as by the transformation. Summarizing results across different simulations, we conclude that when variances are unequal, size of the test Tμ* is closer to the nominal level than that of the Tμ. However, Tμ* is more sensitive to small expected counts, and we recommend that its asymptotic p-values need to be verified by a follow-up with a permutational test. Size of the variance-specific test in the presence of heterogeneity between the means is close to the nominal level (e.g. pB1 = 0 and pB1 = 1 rows of Table 4). Interpretation of results from testing the combined hypothesis (
H0μ,V) is most straightforward. We note that the HTR method [Zaykin et al., 2002] and a similar diplotype-based method (DTR) [Luo et al., 2006] can be recommended for testing hypotheses concerning specifically the haplotypic or diplotypic means. These methods are robust against deviations from normality, and they retain proper size regardless of whether the haplotypic variances are unequal. In these simulations, the HTR showed power values that are almost identical to the haplotypic Tμ (Table 11).

Discussion

The association models that we considered here assume that some relevant variation is unknown to investigators. Although this incomplete knowledge impairs power to detect haplotypic effects, it also induces differences in trait variances between the scored haplotypes. We exploit this phenomenon in the proposed approach. Inclusion of the variance parameters into haplotype association methods for unphased data is useful for two reasons. Firstly, this adds power to detect associations under certain models, especially under models where known and unknown variants interact with each other. Secondly, a variance increase at a haplotype serves as an indication that the variant under study is either correlated, or epistatically interacts with additional unobserved polymorphisms. We conclude that tests that account for the haplotype- or diplotype-specific variance are useful for discovering complex associations with quantitative traits. Power can be gained while examining only a subset of SNPs that are either directly involved in joint multilocus genetic effects, or linked with functional variants via LD. Our approach is not a replacement for the conventional way of comparing haplotypic means. The number of tests is an issue, therefore at a hypothesis-generating stage, for example in a whole genome association analysis, a routine utilization of the conventional approach is more suitable. Testing additional hypotheses that involve haplotypic and diplotypic variances is more relevant in a smaller, or a follow-up study, concerned with a number of preselected loci.

The unknown factor influencing the trait can be non-genetic. Both, the usual type of an associations test for comparison of the haplotypic effects, as well as a test that incorporates haplotype-specific variances are sensitive to confounding. However, in the absence of confounding, differences between either the haplotypic means, or the variances in this scenario would indicate a partial functional involvement of the scored polymorphisms in the association with the trait, as well as an interaction with the unknown factor.

The approach outlined here is applicable to random samples from homogeneous populations, and thus, it is subject to the usual concerns about possibilities of confounding and stratification. Recently, Epstein et al. [2007] described a simple and effective method to correct for population stratification that can be used in conjunction with our approach. Further extensions are possible that will incorporate family-based samples.

A priori, untyped polymorphisms that reside within the same gene might be considered to be the first candidates that may account for a variance contrast between haplotypes. This is because models without the LD are “less general”, in that they require a particular form of epistasis (or, equivalently, a particular form of interaction with an environmental factor) in order to induce a haplotypic variance contrast [Zaykin and Shibata, 2008]. A gain in power for the proposed method is higher under models that involve some degree of interaction with or without LD. Considering that compensatory changes that follow functional mutations are likely to be a ubiquitous force in the genome evolution [Kern and Kondrashov, 2004; Kirby et al., 1995; Kondrashov et al., 2002], multiple interacting functional SNPs within a gene locus could be relatively common in the haplotypic organization of human genome. How easily these interactions and the corresponding genetic variants might be identified remains an open question, given statistical difficulties related to detection and characterization of complex multilocus effects.

Software availability

Software implementing statistical approaches described here is available from the authors upon request. Simulation scripts used in this study are available upon request from DVZ.

Acknowledgments

This research was supported in part by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences. L.D. was supported by NIH grants PO1NS045685, DE16558 and DE017018. We thank Shyamal Peddada, Norman Kaplan, Rongheng Lin, Clare Weinberg, and David Umbach for stimulating discussion. We thank two reviewers for insightful suggestions.

Appendix 1. Effects at the scored locus under the heterogeneity model

The second equality holds because A is now only a marker for nearby functional variants. The probability Pr(Bi | A1) is a function of LD, i.e. Pr(B | A1) = pB +D/pA1, where D is the LD coefficient, pA1Bi − pA1pBi. Consider the effect difference

μA1−μA2=∑iμBi[Pr(Bi∣A1)−Pr(Bi∣A2)]=∑iμBi[pBiA1pA1−pBiA21−pA1]

The difference is zero under linkage equilibrium. Otherwise, the sign of the effect difference depends on the haplotype frequencies, pBiA1, pBiA2. The association effects and the LD coefficients can effectively cancel each other at the observed “proxy” locus, yet the variance contrast may remain substantial.

In the simulation experiments, the joint frequencies pBiAj were modeled assuming a Dirichlet distribution. This distribution of population haplotype frequencies is justified under certain population genetic models, such as models of drift-mutation equilibrium, as well as under some non-equilibrium models [Wright, 1951; Balding, 2003]. Ten randomly chosen Dirichlet parameters were set to 50, while the remaining 10 parameters were set to 1/20. By rejection sampling, the frequency pA1 was ensured to be in the interval [0.27, 0.33], to approximate the frequency of the common COMT haplotype. This model allocates an expected frequency of around 0.09 to ten of the haplotypes, while generating a tail of rare haplotypes. The average LD correlation across simulations was rBiA1 = 0.2, and the 95% quantile for the correlation was 0.55. Common variance of the trait was assumed for all functional variants, σ2 = 1. Assuming the underlying normal distribution and additivity of allelic effects, the effect means were taken to be 1/2, 1,…,5. To conclude a single simulation experiment, a random sample of 500 diploid individuals was taken from the population constructed as just described. The advantage of generating new population haplotype frequencies for each simulation run is in that it allows us to examine the expected behavior of statistical tests over a range of different haplotype frequency configurations.

Appendix 2. Population parameter values

Here we consider theoretical parameter values for the mean and the standard deviation given additive, diplotype, dominant and recessive models of analysis. We assume the population HWE. For the purpose of evaluating the estimation techniques, we assume that the two haplotypic contributions, represented by random variables Xi,.Xj, combine in a diplotype as Xi + Xj, discarding dominance deviations. We denote the mean and the variance of Xi by θi and
σi2, correspondingly. Thus, such model assumes additivity of haplotypic effects θi, θj in a diplotype, θij = θi + θj. This allows to simplify the formulas, however more general expressions that keep notation in terms of the diplotype effects and their frequencies are obtained similarly. The population mean value and the trait variance in this model are

Additive (haplotypic) model

Additive effect of the haplotype hi is defined as the effect of a diplotype that contains the hi, while the second haplotype is randomly drawn from the same population. The variance is defined similarly. From this definition, the additive mean and the variance for the haplotype hi are

μi=θi+μ¯/2Vi=σi2+∑jkpjσj2+(θj−μ¯/2)2

For the rest of the haplotypes, collapsed into a composite group, hī, the values are

μi¯=μa+μ¯/2Vi¯=∑jkqjσj2+(θj−μa)2+∑ikpiσi2+(θi−μ¯/2)2

where

qi=pi∑j≠ikpjμa=∑j≠ikqjθj

Diplotype model

The diplotype model distinguishes between the three diplotype classes, hi/hi, hi/hī, hī/hī. The means and the variances are