Sophie Donnet pointed out to me this arXived paper by Tianxi Li, Elizaveta Levina, and Ji Zhu, on a network resampling strategy for X validation, where I appear as a datapoint rather than as a [direct] citation! Which reminded me of the “where you are the hero” gamebooks with which my kids briefly played, before computer games took over. The model selection method is illustrated on a dataset made of X citations [reduced to 706 authors] in all papers published between 2003 and 2012 in the Annals of Statistics, Biometrika, JASA, and JRSS Series B. With the outcome being the determination of a number of communities, 20, which the authors labelled as they wanted, based on 10 authors with the largest number of citations in the category. As it happens, I appear in the list, within the “mixed (causality + theory + Bayesian)” category (!), along with Jamie Robbins, Paul Fearnhead, Gilles Blanchard, Zhiqiang Tan, Stijn Vansteelandt, Nancy Reid, Jae Kwang Kim, Tyler VanderWeele, and Scott Sisson, which is somewhat mind-boggling in that I am pretty sure I never quoted six of these authors [although I find it hilarious that Jamie appears in the category, given that we almost got into a car crash together, at one of the Valencià meetings!].

Just received the great news for the turn of the year that our paper on ABC using Wasserstein distance was accepted in Series B! Inference in generative models using the Wasserstein distance, written by Espen Bernton, Pierre Jacob, Mathieu Gerber, and myself, bypasses the (nasty) selection of summary statistics in ABC by considering the Wasserstein distance between observed and simulated samples. It focuses in particular on non-iid cases like time series in what I find fairly innovative ways. I am thus very glad the paper is going to appear in JRSS B, as it has methodological consequences that should appeal to the community at large.

I recently read a fairly interesting paper by Daniel Yekutieli on a Bayesian perspective for parameters selected after viewing the data, published in Series B in 2012. (Disclaimer: I was not involved in processing this paper!)

The first example is to differentiate the Normal-Normal mean posterior when θ is N(0,1) and x is N(θ,1) from the restricted posterior when θ is N(0,1) and x is N(θ,1) truncated to (0,∞). By restating the later as the repeated generation from the joint until x>0. This does not sound particularly controversial, except for the notion of selecting the parameter after viewing the data. That the posterior support may depend on the data is not that surprising..!

“The observation that selection affects Bayesian inference carries the important implication that in Bayesian analysis of large data sets, for each potential parameter, it is necessary to explicitly specify a selection rule that determines when inference is provided for the parameter and provide inference that is based on the selection-adjusted posterior distribution of the parameter.” (p.31)

The more interesting distinction is between “fixed” and “random” parameters (Section 2.1), which separate cases where the data is from a truncated distribution (given the parameter) and cases where the joint distribution is truncated but misses the normalising constant (function of θ) for the truncated sampling distribution. The “mixed” case introduces an hyperparameter λ and the normalising constant integrates out θ and depends on λ. Which amounts to switching to another (marginal) prior on θ. This is quite interesting even though one can debate of the very notions of “random” and “mixed” “parameters”, which are those where the posterior most often changes, as true parameters. Take for instance Stephen Senn’s example (p.6) of the mean associated with the largest observation in a Normal mean sample, with distinct means. When accounting for the distribution of the largest variate, this random variable is no longer a Normal variate with a single unknown mean but it instead depends on all the means of the sample. Speaking of the largest observation mean is therefore misleading in that it is neither the mean of the largest observation, nor a parameter per se since the index [of the largest observation] is a random variable induced by the observed sample.

In conclusion, a very original article, if difficult to assess as it can be argued that selection models other than the “random” case result from an intentional modelling choice of the joint distribution.

As advertised and re-discussed by Dan Simpson on the Statistical Modeling, &tc. blog he shares with Andrew and a few others, the paper Visualization in Bayesian workflow he wrote with Jonah Gabry, Aki Vehtari, Michael Betancourt and Andrew Gelman was one of three discussed at the RSS conference in Cardiff, last week month, as a Read Paper for Series A. I had stored the paper when it came out towards reading and discussing it, but as often this good intention led to no concrete ending. [Except concrete as in concrete shoes…] Hence a few notes rather than a discussion in Series B A.

Exploratory data analysis goes beyond just plotting the data, which should sound reasonable to all modeling readers.

Fake data [not fake news!] can be almost [more!] as valuable as real data for building your model, oh yes!, this is the message I am always trying to convey to my first year students, when arguing about the connection between models and simulation, as well as a defense of ABC methods. And more globally of the very idea of statistical modelling. While indeed “Bayesian models with proper priors are generative models”, I am not particularly fan of using the prior predictive [or the evidence] to assess the prior as it may end up in a classification of more or less all but terrible priors, meaning that all give very little weight to neighbourhoods of high likelihood values. Still, in a discussion of a TAS paper by Seaman et al. on the role of prior, Kaniav Kamary and I produced prior assessments that were similar to the comparison illustrated in Figure 4. (And this makes me wondering which point we missed in this discussion, according to Dan.) Unhappy am I with the weakly informative prior illustration (and concept) as the amount of fudging and calibrating to move from the immensely vague choice of N(0,100) to the fairly tight choice of N(0,1) or N(1,1) is not provided. The paper reads like these priors were the obvious and first choice of the authors. I completely agree with the warning that “the utility of the the prior predictive distribution to evaluate the model does not extend to utility in selecting between models”.

MCMC diagnostics, beyond trace plots, yes again, but this recommendation sounds a wee bit outdated. (As our 1998 reviewww!) Figure 5(b) links different parameters of the model with lines, which does not clearly relate to a better understanding of convergence. Figure 5(a) does not tell much either since the green (divergent) dots stand within the black dots, at least in the projected 2D plot (and how can one reach beyond 2D?) Feels like I need to rtfm..!

“Posterior predictive checks are vital for model evaluation”, to wit that I find Figure 6 much more to my liking and closer to my practice. There could have been a reference to Ratmann et al. for ABC where graphical measures of discrepancy were used in conjunction with ABC output as direct tools for model assessment and comparison. Essentially predicting a zero error with the ABC posterior predictive. And of course “posterior predictive checking makes use of the data twice, once for the fitting and once for the checking.” Which means one should either resort to loo solutions (as mentioned in the paper) or call for calibration of the double-use by re-simulating pseudo-datasets from the posterior predictive. I find the suggestion that “it is a good idea to choose statistics that are orthogonal to the model parameters” somewhat antiquated, in that this sounds like rephrasing the primeval call to ancillary statistics for model assessment (Kiefer, 1975), while pretty hard to implement in modern complex models.

Reposting an email I received from the Royal Statistical Society, this is to announce a discussion session on three papers on Data visualization in Cardiff City Hall next September 5, as a free part of the RSS annual conference. (But the conference team must be told in advance.)

Today is the last and final day of Series B’log as David Dunson, Piotr Fryzlewicz and myself have decided to stop the experiment, faute de combattants. (As we say in French.) The authors nicely contributed long abstracts of their papers, for which I am grateful, but with a single exception, no one came out with comments or criticisms, and the idea to turn some Series B papers into discussion papers does not seem to appeal, at least in this format. Maybe the concept will be rekindled in another form in the near future, but for now we let it lay down. So be it!

Since the above announcement in the RSS newsletter a few months ago, about the Series B’log coming to life, I have received exactly zero comments from readers, despite several authors kindly contributing an extended abstract of their paper. And announcements to various societies…

Hence I now seriously wonder at the survival probability of the blog, given this collective lack of interest. It may be that the information did not reach enough people (despite my mentioning its existence on each talk I give abroad). It may be that the blog still sounds like “under construction”, in which case I’d like to hear suggestions to make it look more definitive! But overall I remain fairly pessimistic [even conditional on my Gallic gloom] about our chances of success with this experiment which could have turned every Series B paper into a potential discussion paper!