Tuesday, April 1, 2014

More miscellaneous communications with readers. (I have omitted all the salutations and pleasantries to save space here, but most correspondents do begin and end their messages with introductions and salutations that I truly appreciate.) Again, apologies to those I don't reply to -- too many!

Topics:

Adapt ANOVA-style models for mixed between- within- designs?

glm() in R, with a poem

Metropolis algorithm

I have some data, what should I do?

Could you look over my conference paper?

Time series models?

How to write up a Bayesian analysis

Trying to adapt programs, getting errors

Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups

Sequential updating (including more data) when using MCMC

How to take the difference between two parameter distributions

What book should be studied after Doing Bayesian Data Analysis?

Exercise 13.4

March 9: Adapt ANOVA-style models for mixed between- within- designs?

My data set comprises two populations: patients and control subjects.
All subjects were repeatedly assessed - albeit in different conditions, because the treatment manipulation of the patients is not applicable for controls.
While our main interest pertains the patients' treatment conditions
(4 levels), I'd like to compare them with the "natural variation"
observed in the control group, who just repeated our experiments twice.
Thus abstractly, the patient group is split by a within-factor of 4
levels P1-P4, and the controls are split by a 2-level within factor, C1
and C2. What's the best method to compare these, assuming normal
distributed response variables?

Currently I use one-way BANOVAs, extended by a subject factor to
account for repeated measures. But with these models (4 and 2 levels,
respectively), I think I can only model the two population groups
separately in different models, which hinders direct comparisons between
parameter estimates.

My questions in particular:
1) Can/should I extend the BANOVA model to include both populations?
2) Or is it possible to compare the parameters of different BANOVA models? If so, how?
3) Does this kind of design have a specific name? This might be handy to know to search for modeling examples.
Thank you very much in advance!
I talked to a colleague, who was of the opinion that I can directly
compare posteriors, provided that they are in the same units and play
the same "role" in the model.
In my example, it would be admissible to compare the b0 parameters,
i.e. the BANOVA baselines from different models. Using that comparison, I
would at least be able to state whether the groups are different from
each other on average.

Specifically, the comparison has to be directly computed from
the MCMC chains. For each pair of values, randomly sampled from the
chains, I would compute the difference, and thus build up a distribution
of differences. This then may be inspected with the usual methods, e.g.
using its 95% HDI.

While this sounds correct to me, I'd like to know some more
details. How many samples should/can I draw from the chains? Should I
inspect the autocorrelation? Any help and/or pointers to literature are
greatly appreciated!

Regarding the model structure, yes, you should be able modify the ANOVA-style
models to suit the structure of your data. This is not to say it is
necessarily trivial or obvious how to do that, especially when trying to
impose the sum-to-zero constraints (if you choose to do so). For
example, see this blog post about a split-plot design, which is related
to the sort of design you have:http://doingbayesiandataanalysis.blogspot.com/2012/05/split-plot-design-in-jags-revised.html

Regarding computing
differences: At every step in the MCMC chain, you can compute
differences or ratios or other transformations of parameters in whatever
way is meaningful. At each step in the chain, the parameters are
*jointly* representative of the posterior distribution.

If
you have split your data into two parts that are modeled separately,
then the parameter values are, by definition, independent. You can then
set the two chains side-by-side and take differences between them as if
they were in a joint space.

March 11: glm() in R, with a poem

n R question that I cannot get to the bottom of. I was hoping that
you might be able to shed some light on it, as everyone I ask is drawing
a blank.

It is rather simple actually, I have two factors, and I want to do a logistic regression against a (0,1) target using glm in R.

I have a 0,1 target and then use some explanatory variables which are factors
with levels e.g. ... When
I try and perform the logistic regression it refuses to use the first
level of each factor, as you can see from the screen cap jk1.png
attached to this e-mail. ... Is this something
you have experience of? Am I overlooking something trivial?! Apologies
if so!

As your question is about glm() in R, and not about Bayesian analysis per
se, I will defer to the R experts. You might try posting your question
on http://stats.stackexchange.com/ or related groups.
Best wishes.

In my desperation
To elucidate the facts
I disturbed you unnecessarily
When I should have consulted stacks.

You're welcome, it's a superb book, and has helped me a great deal.
Kind regards

March 1: Metropolis algorithm

I am currently doing my statistics thesis on ...
which requires quite a great knowledge of Bayesian Theory especially
MCMC methods. However I have some minor problems regarding the Metropolis algorithm:

Any distribution can be attributed to the proposal density function?

Also, the initial values given to the parameters (i.e. the first step of
the Metropolis algorithm) are random or there must be some thinking
behind the choice of values?

Thank you very much

If you are programming your own Metropolis sampler, you should
study the general Metropolis-Hastings algorithm and learn about methods
for "tuning" the proposal distribution during an adaptation phase. A
great thing about JAGS/BUGS is that they do it for you.

The initial values can be anywhere that has non-zero prior and
non-zero likelihood (actually, even that is okay - the real constraint is that the prior times likelihood needs to be defined at the initial values), but initial values near the mode of the posterior
make burn-in a lot faster. In either case, you need to monitor
convergence to be sure that the chains have lost any bias from their
starting positions.

Good luck with your work.

March 14: I have some data, what should I do?

I'm contacting you to ask for your advise on the following problem.

I have 500 samples, each sample being ... [extended description omitted].
Using Bayesian analysis I want to determine the error rate of ... .
Below is an exemplary of the data set. [data omitted] I have never used Bayesian Analysis before so I dont really know how
to proceed. I'm thinking that Bayesian analysis is suitable to answer
the research question.
-The idea is to use a non informative prior, so the sample can be positive or negative.
-Not sure what distribution to use (binomial, bernoulli, multinomial,). not sure.
-The rest, I do not know.

I look forward to your reply.
Thanks very much.

Thanks for you interest in Bayesian analysis. I hope you do choose to pursue it.

The
first thing you need to analyze your data is a descriptive model. The
goal of the analysis is to estimate parameter values that are meaningful
descriptions of the data. Once you have a model, Bayesian analysis
provides a distribution of credible values of the parameters, given the
data.

As your field of research is not my specialization, I'm not in a
position to prescribe the model you should use. But, whatever it is, I
am confident that Bayesian analysis is a richly informative method for
analyzing your data.

Best wishes,

March 16: Could you look over my conference paper?

... I will be presenting a paper at the annual meeting of the ... Association. My paper is
based on your BEST software, and I was wondering if you would be willing
to take a look at it to point out any major errors I may have committed
and save me from potential embarrassment.

Thanks for thinking of me and showing me a draft of your paper. [And I meant that sincerely!]
Unfortunately I don't have time to study its 20 pages. If you have a
specific question about some topic, however, I might be able to provide a
quick answer. Best of luck with your presentation!

March 16:
Time series models?

Do you think it is possible to apply Bayesian data methods to time
series? What I would hope to avoid is trying to splice GARCH methodology
with the Bayesian approach. GARCH is lumbering and clumsy enough in
itself. For high frequency financial data, it must surely be possible to
assume distribution stability, until evidence begins to emerge that the
distribution has broken down, and needs to be reformulated? I would
greatly value your advice.

I have not ventured deeply into time-series models, so I have no personal
endorsements of particular resources. But Bayesian methods can (in
principle) be applied to any model that has parameters, even so-called
"non-parametric" models (which have up to an infinite number of
parameters). For example, there is a 2011 book titled "Bayesian time
series models" that collects recent advanced applications. Although it's
not a beginner's tutorial, it might provide useful pointers.

Feb. 25:
How to write up a Bayesian analysis

Sir, thanks for Bayesian Estimation Supersedes the t-Test, I can make the plots, sorry but how to analyse the plots and write a paper. Will you please put up a model paper with such analyses. [Here] we have been brought up on a strict diet of ANOVA-RBD/CRD, and t tests with CV values (Analysis of Variance - Randomized Block Design / Completely Randomized design). How can I do these analyses / similar in Bayesian methods,

Reply from Mike Meredith:

I can't point you to papers describing a Bayesian analysis of ANOVA-type
models, but I will indicate sources with advice and pass this on to
John Kruschke who may have more references to hand.

The main points are summarised in Kruschke, J.K., Aguinis, H., &
Joo, H. (2012) The time has come: Bayesian methods for data analysis in
the organizational sciences. Organizational Research Methods, 15,
722-752 in the section "Recommendations and Illustration of How to
Report Bayesian Analysis Results", where they also give an example of
writing up a regression analysis.

Also Kruschke, J.K. (2013) Bayesian estimation supersedes the t test.
*Journal of Experimental Psychology: General*, 142, 573-603 has a
section on "Reporting the results of a Bayesian analysis". You would
need to cite that paper as the description of the methods for robust
Bayesian analysis.

In my own field (wildlife ecology) we can rarely do experiments, so
hypothesis testing is scarcely relevant anyway. The focus is on
estimation (of density, abundance, occupancy, survival, etc) or on
modelling. In the latter context, information theoretic approaches
(using Akaike's Information Criterion, AIC) have been the norm, and
moving from there to a Bayesian approach is relatively painless. So I
can't point you to papers reporting a Bayesian analysis of ANOVA-type
data; Marc Kery's book, An Introduction to WinBUGS for Ecologists
(Academic Press, 2010), shows how to do this, but not how to write it
up.

April 1: Trying to adapt programs, getting errors

I have been looking through your textbook Doing Bayesian Data Analysis and specifically have been looking at how to do the Bayesian ANOVA in Chapter 18. I have been running into many problems, and I was wondering if you would have time to take a quick look over my code to see what's going wrong. I have no familiarity with R prior to this book and no experience programming, so I am not sure where my errors are. Generally speaking, the problem is that objects and functions are not being found. I have attached a the .csv of my imaginary data set and a .txt of the code I am running in RStudio in case you are able to take a look (I don't mean to presume you will, it's just quicker to already have the materials attached). I do load the .csv into RStudio before running the code.
I suppose my core problem here is that I don't know how to take the code from the book using the examples there and apply the code to my own data sets for personal use.
Thank you for your time,

Thanks very much for your interest in the book.
To have any chance to help you, I need to know what error messages are being shown. Could you please run your script and then copy and paste the R console, with error messages, into an email? Especially important is to identify the very first error that occurs.

Then I run the code, which I have attempted (unsuccessfully) to adapt from what is in the textbook for a Bayesian ANOVA.

After
running the code, this is what I get. It seems like it's failing to
create several objects that it needs to perform the analysis, or is not
finding the functions it needs to do it. I'm not really sure where the
problem begins. I know in programming code a single error can lead to
other errors, but this is my first exposure to R programming and the
code in the book is for specific practice examples and I'm just not sure
if I am adapting it to my pretend data set properly (which I'm most
likely not given the errors).

I also get several windows where graphs are supposed to be, but
nothing is in them. I colored the errors red to make them easier to
find.

is that R cannot find BRugs. I think you need to first say
library(BRugs)

or whatever (I've forgotten since it's been so long that I've used BRugs).

* Your second error is that datarecord() is reading in model.txt, not data.

Hope that helps.

March 25:
Modifying FilconJags.R and FilconJagsPower.R for two groups instead of four groups

The R code assumes the existence of four conditions (consistent with the ongoing example), but the data file has data for only two, so the code fails when it attempts to access the non-existent data. I can remove the code for handling the two conditions not in the data, but I assume that the intent was to have all four in the Rdata file. Is there a version of FilconJagsMuKappa.Rdata that has the expected four conditions?

If I understand your question correctly, what you are trying to do
is apply the FilconJags.R and FilconJagsPower.R programs to data with
only 2 groups instead of 4 groups. I think the programs should run
without complaint for just 2 groups, although there won't be much
benefit (i.e., shrinkage) from the hierarchical structure when there are
only 2 groups.

Try this:

In FilconJags.R, include only the first 2 groups. Find the data section and insert the lines shown here:

# THE DATA.
# For each subject, specify the condition s/he was in,# the number of trials s/he experienced, and the number correct.# (These lines intentionally exceed the margins so that they don't take up# excessive space on the printed page.)
cond = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4)
N = c(64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64,64)
z = c(45,63,58,64,58,63,51,60,59,47,63,61,60,51,59,45,61,59,60,58,63,56,63,64,64,60,64,62,49,64,64,58,64,52,64,64,64,62,64,61,59,59,55,62,51,58,55,54,59,57,58,60,54,42,59,57,59,53,53,42,59,57,29,36,51,64,60,54,54,38,61,60,61,60,62,55,38,43,58,60,44,44,32,56,43,36,38,48,32,40,40,34,45,42,41,32,48,36,29,37,53,55,50,47,46,44,50,56,58,42,58,54,57,54,51,49,52,51,49,51,46,46,42,49,46,56,42,53,55,51,55,49,53,55,40,46,56,47,54,54,42,34,35,41,48,46,39,55,30,49,27,51,41,36,45,41,53,32,43,33)

Then source
FilconJags.R. This will save the MCMC chain for the 2-group model. It
will also try to produce graphs, but because the graphs assume four
groups, the program will balk. But the posterior MCMC is saved before
that happens.

FilconJagsPower.R will now run as is, except
for two things. First, its graphics assume 4 groups. Therefore you can
make this change:

I'm sorry to bother you, but I have a question for which I did a lot of researches, but I couldn't get any result.
Basically, I implemented a Hierarchal Bayesian model in R and it works fine. However, what I will like to do now is to apply the model on a trial by trial basis (instead of applying it on the whole set of trials) and get the parameters estimations after each trial to use them both as "prior" parameters on the next trial and to track the evolution of these parameters. I do not want to use sequential methods, I just want to apply the model on a trial-by-trial methods. Do you think you can give some idea?

If you are using MCMC methods, and you really want to use the posterior from one run as the prior for the next, it is not directly possible. This is because the posterior distribution from the first run will usually be complex (e.g., curved and correlated in parameter space, with possibly skewed tails and so on) and not possible to be exactly expressed mathematically in the prior specification. (The beauty of using conjugate priors is that the posterior has the same mathematical form as the prior, so incremental updating is elegantly easy. For complex models, there are no [known] conjugate priors.)
Some people attempt to get around that problem by approximating the posterior of the first run with some simpler mathematical form, and using that mathematical re-description as the prior for the next run. But this method is only as good (or bad) as the approximation, and I wouldn't recommend it unless you have really cogent reasons to think that the approximation is excellent.
If you are merely illustrating how the posterior changes with incremental updating, then you can simply do the data analysis using incremental data sets. Do an analysis with {x1,...xN}, then an analysis using the original prior with {x1,...,xN,xN+1}, then an analysis using the original prior with {x1,...xN,xN+1,xN+2}, and so on.

March 28:
How to take the difference between two parameter distributions

one thing that has me perplexed right now is how you take the difference between two posterior distributions to see if there is a change. One very common application to us is to have similar data on a specific metric before and after a ... change. Naturally, the intent of our analysts is to understand if there has been a difference after that ... change. Our first step is to calculate the posteriors of the two datasets (before and after). Then I assumed that the posteriors are a random variable so they are compared by forming the convolution of the two distributions. However, if you just visualize two normal distributions (the posteriors) where one has a mean near zero and the other is shifted (say, to 5), the convolution will still cover zero. So although these two datasets are clearly different and the software change has had an impact, the HDI will still straddle zero and the conclusion will be that there was no change. Is this method of using convolution how you did your poster comparisons in the textbook? If not, I would very much like to know the methods for taking this difference like you did with respect to recall and classical music in your text (chapter 12). Thanks very much for you time,

At every step in the MCMC chain, the parameter values at that step are jointly representative of the multidimensional posterior distribution. Therefore the difference of relevant parameter values at each step is a representative difference.

I think the first example of this in the book occurs at the top of p. 176.

Because posterior distributions can have strange shapes, it is important to compute the difference at each step, maintaining the joint parameter values. See Fig. 12.3, p. 300.

March 27:
What book should be studied after Doing Bayesian Data Analysis?

What do you recommend as a good second book [after Doing Bayesian Data Analysis]? I’m a long time R user and my problem space is in engineering.

None of those is specifically engineering, but Ntzoufras and Lee & Wagenmakers give lots of BUGS/JAGS examples and Gelman et al. is a great authoritative resource.

March 27:
Exercise 13.4

Exercise 13.4 refers to Kruscke1996CSbugs.R; I don’t see it (or the Jags version) in the collection of R programs and data. The associated data file is there (Kruschke1996CSdatsum.Rdata), however. Is it not there on purpose (i.e., the student is expected to write the code from scratch)?

The program for that exercise was deliberately excluded from the posted programs because it was supposed to be a programming exercise, not merely "click the 'source' button and watch what happens." I should not have mentioned the program name in the exercise description, but I did because that was how I got my word processing software (LaTeX) to include code snippets. The program I have for that exercise is old and not prepared for release, so I would prefer not to circulate it. I apologize that this is not the answer you were hoping for!