Coming out of the (Bayesian) closet: multivariate version

This week I’m facing my—and many other lecturers’—least favorite part of teaching: grading exams. In a supreme act of procrastination I will continue the previous post, and the antepenultimate one, showing the code for a bivariate analysis of a randomized complete block design.

Just to recap, the results from the REML multivariate analysis (that used ASReml-R) was the following:

The corresponding MCMCglmm code is not that different from ASReml-R, after which it is modeled anyway. Following the recommendations of the MCMCglmm Course Notes (included with the package), the priors have been expanded to diagonal matrices with degree of belief equal to the number of traits. The general intercept is dropped (-1) so the trait keyword represents trait means. We are fitting unstructured (us(trait)) covariance matrices for both Block and Family, as well as an unstructured covariance matrix for the residuals. Finally, both traits follow a gaussian distribution:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

library(MCMCglmm)

bp=list(R=list(V=diag(c(0.007,260)),n=2),

G=list(G1=list(V=diag(c(0.007,260)),n=2),

G2=list(V=diag(c(0.007,260)),n=2)))

bmod=MCMCglmm(cbind(veloc,bden)~trait-1,

random=~us(trait):Block+us(trait):Family,

rcov=~us(trait):units,

family=c('gaussian','gaussian'),

data=a,

prior=bp,

verbose=FALSE,

pr=TRUE,

burnin=10000,

nitt=20000,

thin=10)

Further manipulation of the posterior distributions requires having an idea of the names used to store the results. Following that, we can build an estimate of the genetic correlation between the traits (Family covariance between traits divided by the square root of the product of the Family variances). Incidentally, it wouldn’t be a bad idea to run a much longer chain for this model, so the plot of the posterior for the correlation looks better, but I’m short of time:

7 thoughts on “Coming out of the (Bayesian) closet: multivariate version”

this is fascinating stuff, thanks for putting up a worked example! I have lots of questions, heh, but I'll start with the main one (and one I guess many people interested in this may have): say you were putting together a mixed model in which you were interested in the effect of a predictor variable as a fixed effect (say, for your example, tree height or root density). How would the coefficient of that effect come out, and how would you interpret it? Would it be a matrix describing the effect on the multivariate normal mean, or a single coefficient that relates in some way to it?

It is interesting that you ask about the fixed effects, which I often don't care about except to account for population mean differences in genetic evaluation. I think that we would present the model to something like this: cbind(veloc, bden) ~ trait – 1 + trait:FixedEff, random = ~…
which would separately fit FixedEff for each trait and store the samples of the posterior under Sol (for solutions). Thus, if we run something like posterior.mode(bmod$Sol) we would get a vector of posterior modes for each of the terms of the model, first the fixed effects (trait 1 mode, trait2 mode, levels of FixedEff) and then random effects (Block and Family levels). We can also get their credible intervals using HPDinterval(bmod$Sol).

I'm not sure if I am answering your question, but we can keep on chatting.

Thanks for the detailed explanation! So in a sense, it wouldn't be that different to conducting separate univariate analyses (with the exception of the random effect partitioning) for the two response variables – i.e. it doesn't explore the covariance properties of them in estimating the fixed effects?

My speculation: I think the key part is that we are simultaneously estimating both fixed and random effects, so any flow of information from one trait to the other (via covariances of random effects) will have an impact on the estimation of fixed effects. I'm not totally sure how it works in a Bayesian setting, but I'll be developing some small calculations soon to show the effect using a toy example.

To the issue at hand: I don’t quite understand the parameterization of the prior(s). I didn’t in the coursenotes, and I don’t understand in your example. Would you be willing to explain this a bit more?

The priors are specified in nested lists. At the top level we have one element for R and one for G; the one for G happens to be a list too, with one element for each random factor.

For each random factor we have the starting values and an indication of strength of believe. For the bivariate case, the starting values are a matrix, assuming a variance for each trait and zero covariance, which is why they can be started with a diagonal matrix. Somewhere in the manual there is a comment that n should be the number of traits in the analyses.

So we have:
* Prior for R
* Prior for G, which involves:
** Prior for G1 (first random effect in the model).
** Prior for G2 (second random effect in the model).
** etc.

Thank you, that was very helpful. Unfortunately, I still don’t seem to get things working. My ordinal, multivariate model with separate regressions from x1 and x2 on each response, correlated subject effects and correlated observation effects is this:

1) prior=list(R=list(V=1, nu=0.002),G=list(V=1, nu=0.002))
and
2) prior=list(R=list(V=1, nu=0.002),G=list(G1=list(V=1, nu=0.002)))
do not work, neither G lists with two ore three Gs.
("prior$G has the wrong number of structures").