Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of <math>Y | X, \tau, B</math> remains a single real number, <math>\tau</math>.

The likelihood of the parameters given the data is therefore:

The likelihood of the parameters given the data is therefore:

Line 42:

Line 44:

-

* '''Priors''': we use the usual conjugate prior

+

* '''Priors''': we use the usual [http://en.wikipedia.org/wiki/Conjugate_prior conjugate prior]

Revision as of 02:23, 22 November 2012

Bayesian model of univariate linear regression for QTL detection

See Servin & Stephens (PLoS Genetics, 2007).

Data: let's assume that we obtained data from N individuals. We note the (quantitative) phenotypes (e.g. expression levels at a given gene), and the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).

Goal: we want to assess the evidence in the data for an effect of the genotype on the phenotype.

Assumptions: the relationship between genotype and phenotype is linear; the individuals are not genetically related; there is no hidden confounding factors in the phenotypes.

Likelihood: we start by writing the usual linear regression for one individual

where β1 is in fact the additive effect of the SNP, noted a from now on, and β2 is the dominance effect of the SNP, d = ak.

Here and in the following, we neglect all constants (e.g. normalization constant, YTY, etc):

We use the prior and likelihood and keep only the terms in B:

We expand:

We factorize some terms:

Let's define . We can see that ΩT = Ω, which means that Ω is a symmetric matrix.
This is particularly useful here because we can use the following equality: Ω − 1ΩT = I.

This now becomes easy to factorizes totally:

We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:

Posterior of τ:

Similarly to the equations above:

But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:

Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):

As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior.
Therefore, we can take τN / 2 out of the integral and start guessing what looks like a Gamma distribution.
We also factorize inside the exponential:

We recognize the conditional posterior of B.
This allows us to use the fact that the pdf of the Normal distribution integrates to one:

We finally recognize a Gamma distribution, allowing us to write the posterior as:

Joint posterior (2): sometimes it is said that the joint posterior follows a Normal Inverse Gamma distribution: