“The crux of the situation is that we lack theoretical insight into even quite basic questions about what is going on. More particularly, we cannot sayy anything about the limiting posterior marginal distribution of α compared to the prior marginal distribution of α.” (p.142)

Bayesian inference for partially identified models is a recent CRC Press book by Paul Gustafson that I received for a review in CHANCE with keen interest! If only because the concept of unidentifiability has always puzzled me. And that I have never fully understood what I felt was a sort of joker card that a Bayesian model was the easy solution to the problem since the prior was compensating for the components of the parameter not identified by the data. As defended by Dennis Lindley that “unidentifiability causes no real difficulties in the Bayesian approach”. However, after reading the book, I am less excited in that I do not feel it answers this type of questions about non-identifiable models and that it is exclusively centred on the [undoubtedly long-term and multifaceted] research of the author on the topic.

“Without Bayes, the feeling is that all the data can do is locate the identification region, without conveying any sense that some values in the region are more plausible than others.” (p.47)

Overall, the book is pleasant to read, with a light and witty style. The notational conventions are somewhat unconventional but well explained, to distinguish θ from θ* from θ†. The format of the chapters is quite similar with a definition of the partially identified model, an exhibition of the transparent reparameterisation, the computation of the limiting posterior distribution [of the non-identified part], a demonstration [which it took me several iterations as the English exhibition rather than the French proof, pardon my French!]. Chapter titles suffer from an excess of the “further” denomination… The models themselves are mostly of one kind, namely binary observables and non-observables leading to partially observed multinomials with some non-identifiable probabilities. As in missing-at-random models (Chapter 3). In my opinion, it is only in the final chapters that the important questions are spelled-out, not always faced with a definitive answer. In essence, I did not get from the book (i) a characterisation of the non-identifiable parts of a model, of the identifiability of unidentifiability, and of the universality of the transparent reparameterisation, (ii) a tool to assess the impact of a particular prior and possibly to set it aside, and (iii) a limitation to the amount of unidentifiability still allowing for coherent inference. Hence, when closing the book, I still remain in the dark (or at least in the grey) on how to handle partially identified models. The author convincingly argues that there is no special advantage to using a misspecified if identifiable model to a partially identified model, for this imbues false confidence (p.162), however we also need the toolbox to verify this is indeed the case.

“Given the data we can turn the Bayesian computational crank nonetheless and see what comes out.” (p.xix)

“It is this author’s contention that computation with partially identified models is a “bottleneck” issue.” (p.141)

Bayesian inference for partially identified models is particularly concerned about computational issues and rightly so. It is however unclear to me (without more time to invest investigating the topic) why the “use of general-purpose software is limited to the [original] parametrisation” (p.24) and why importance sampling would do better than MCMC on a general basis. I would definitely have liked more details on this aspect. There is a computational considerations section at the end of the book, but it remains too allusive for my taste. My naïve intuition would be that the lack of identifiability leads to flatter posterior and hence to easier MCMC moves, but Paul Gustafson reports instead bad mixing from standard MCMC schemes (like WinBUGS).

In conclusion, the book opens a new perspective on the relevance of partially identifiable models, trying to lift the stigma associated with them, and calls for further theory and methodology to deal with those. Here are the author’s final points (p.162):

“Identification is nuanced. Its absence does not preclude a parameter being well estimated, not its presence guarantee a parameter can be well estimated.”

“If we really took limitations of study designs and data quality seriously, then partially identifiable models would crop up all the time in a variety of scientific fields.”

“Making modeling assumptions for the sole purpose of gaining full identification can be a mug’s game (…)”

“If we accept partial identifiability, then consequently we need to regard sample size differently. There are profound implications of posterior variance tending to a positive limit as the sample size grows.”

While there are already several books about BUGS and WinBUGS on the market, e.g. the one by Ioannis Ntzoufras I reviewed a while ago, I was quite pleased to discover in the mail today that CRC Press had sent me a copy of The BUGS Book, written by no-one else but the parents of the BUGS software themselves! As I was not aware the book had been published a month or so ago… Before anyone send me an email or a comment requesting the book for review in CHANCE, it has already been sent to a knowledgeable BUGSpert to whose detailed analysis I am looking forward. (I still had a quick look at the book and noticed a reference to our PNAS paper on ABC model choice, yay!)

Last week in Aachen was the 3rd Edition of the Bayes(Pharma) workshop. Its specificity: half-and-half industry/academic participants and speakers, all in Pharmaceutical statistics, with a great care to welcome newcomers to Bayes, so as to spread as much as possible the love where it will actually be used. First things first: all the slides are available online, thanks to the speakers for sharing those. Full disclaimer: being part of the scientific committee of the workshop, I had a strong subjective prior.

3 days, 70 participants, we were fully booked, and even regretfully had to refuse inscriptions due to lack of room-space (!! German regulations are quite… enforced). Time to size it up for next year, maybe?

My most vivid impression overall: I was struck by the interactivity of the questions/answers after each talk. Rarely fewer than 5 questions per talk (come on, we’ve all attended sessions where the chairman is forced to ask the lone question — no such thing here!), on all points of each talk, with cross-references from one question to the other, even from one *talk* to the other! Seeing so much interaction and discussion in spite of (or, probably, thanks to ?) the diversity of the audience was a real treat: not only did the questions bring up additional details about the talk, they were, more importantly, bringing very precious highlight on the questioners’ mindsets, their practical concerns and needs. Both academics and industrials were learning on all counts — and, for having sometimes seen failed marriages of the kind in the past (either a French round-table degenerating in nasty polemic on “research-induced tax credit”, or just plain mismatch of interests), I was quite impressed that we were purely and simply all interested in multiple facets of the very same thing: the interface between pharma and stats.

As is now a tradition, the first day was a short course, this time by Pr. Emmanuel Lessaffre: based on his upcoming book on Bayesian Biostatistics (Xian, maybe a review someday?), it was meant to be introductory for newcomers to Bayes, but was still packed with enough “tricks of the trades” that even seasoned Bayesians could get something out of it. I very much appreciated the pedagogy in the “live” examples, with clear convergence caveats based on traceplots of common software (WinBUGS). The most vivid memory: his strong spotlight on INLA as “the future of Bayesian computation”. Although my research is mostly on MCMC/SMC, I’m now damn curious to give it a serious try — this was further reinforced by late evening discussions with Gianluca BaioM, who revealed that all his results that were all obtained in seconds of INLA computing.

Day 2 and half-day 3 were invited and contributed talks, all motivated by top-level applications. No convergence theorems here, but practical issues, with constraints that theoreticians (including myself!) would hardly guess exist: very small sample sizes, regulatory issues, concurrence with legacy methodology with only seconds-long runtime (impossible to run 1 million MCMC steps!), and sometimes even imposed software due to validation processes! Again, as stated above, the number and quality of questions is really what I will keep from those 2 days.

If I had to state one regret, maybe, it would be this unsatisfactory feeling that, for many newcomers, MCMC = WinBUGS — with its obvious restrictions. The lesson I learned: all the great methodological advances of the last 10 years, especially in Adaptive MCMC, have not yet reached most practitioners yet, since they need *tools* they can use. It may be a sign that, as methodological researchers, we should maybe put a stronger emphasis on bringing software packages forward (for R, of course, but also for JAGS or OpenBUGS!); not only a zip-file with our article’s codes, but a full-fledged package, with ongoing support, maintenance, and forum. That’s a tough balance to find, since the time maintaining a package does not count in the holy-bibliometry… but doesn’t it have more actual impact? Besides, more packages = less papers but also = more citations of the corresponding paper. Some do take this road (Robert Gramacy’s packages were cited last week as examples of great support, and Andy Gelman and Matt Hoffman are working on the much-expected STAN, and I mentioned above Havard Rue’s R-INLA), but I don’t think it is yet considered “best practices”.

As a conclusion, this Bayes-Pharma 2012 workshop reminded me a lot of the SAMSI 2010 Summer Program: while Bayes-Pharma aims to be much more introductory, they had in common this same success in blending pharma-industry and academy. Could it be a specificity of pharma? In which case, I’m looking very much forward opening ISBA’s Specialized Section on Biostat/Pharmastat that a few colleagues and I are currently working on (more on this here soon). With such a crowd on both sides of the Atlantic, and a looming Bayes 2013 in the Netherlands, that will be exciting.

“As history has proved, the main reason why Bayesian theory was unable to establish a foothold as a well accepted quantitative approach for data analysis was the intractability involved in the calculation of the posterior distribution.” Chap. 1, p.1

The book launches into a very quick introduction to Bayesian analysis, since, by page 15, we are “done” with linear regression and conjugate priors. This is somehow softened by the inclusion at the end of the chapter of a few examples, including one on the Greek football team in Euro 2004, but nothing comparable with Christensen et al.’s initial chapter of motivating examples. Chapter 2 on MCMC methods follows the same pattern: a quick and dense introduction in about ten pages, followed by 40 pages of illuminating examples, worked out in full detail. CODA is described in an Appendix. Compared with Bayesian ideas and data analysis, Bayesian modeling using WinBUGS spends time introducing WinBUGS and Chapter 3 acts like a 20 page user manual, while Chapter 4 corresponds to the WinBUGS example manual. Chapter 5 gets back to a more statistical aspect, the processing of regression models (including Zellner’s g-prior). up to ANOVA. Chapter 6 extends the previous chapter to categorical variables and the ANCOVA model, as well as the 2006-2007 English premier league. Chapter 7 moves to the standard generalised linear models, with an extension in Chapter 8 to count data, zero inflated models, and survival data. Chapter 9 covers hierarchical models, with mixed models, longitudinal data, and the water polo World Cup 2000. Continue reading →

The Parco Nazionale Gran Paradiso and the Università di Pavia are organising a summer school on “Advances in species distribution modelling in ecological studies and conservation” in Pavia and Cogne, 12-18 September 2011. This school includes R and Winbugs tutorials, regular classes, plus a field trip to the park, so this sounds quite exciting (at least if you are interested in statistical ecology or ecological statistics). Following my great experience there two years ago, I wish I could get back to Aosta to attend (and try once again to climb La Grivola!) but this is the start of the semester… (The deadline for registration is July 31.)

A Two-Day Course, April 12.- 13. 2010, University of Southern Denmark
This course is designed to provide an introduction to the area of Bayesian disease mapping in applications to Public Health and Epidemiology:

2) “*Advanced Bayesian Disease Mapping*”

A Two-Day Course, April 15. – 16. 2010, University of Southern Denmark, Copenhagen, Denmark
This course is designed to provide advanced coverage of Bayesian disease mapping topics in applications to Public Health and Epidemiology: It is intended as an extension to the course: *An Introduction to Bayesian Disease Mapping*. Emphasis on the course is placed on spatial and spatio-temporal Bayesian modeling issues, and some knowledge of Bayesian computation and WinBUGS is assumed.