There is an exciting opening for several PhD positions at Warwick, in the departments of Statistics and of Mathematics, as part of the Centre for Doctoral Training in Mathematics and Statistics newly created by the University. CDT studentships are funded for four years and funding is open to students from the European Union without restrictions. (No Brexit!) Funding includes a stipend at UK/RI rates and tuition fees at UK/EU rates. Applications are made via the University of Warwick Online Application Portal and should be made as quickly as possible since the funding will be allocated on a first come first serve basis. For more details, contact the CDT director, Martyn Plummer. I cannot but strongly encourage interested students to apply as this is a great opportunity to start a research career in a fantastic department!

Another presentation by our OxWaSP students introduced me to the notion of distributed posteriors, following a 2018 paper by Botond Szabó and Harry van Zanten. Which corresponds to the construction of posteriors when conducting a divide & conquer strategy. The authors show that an adaptation of the prior to the division of the sample is necessary to recover the (minimax) convergence rate obtained in the non-distributed case. This is somewhat annoying, except that the adaptation amounts to take the original prior to the power 1/m, when m is the number of divisions. They further show that when the regularity (parameter) of the model is unknown, the optimal rate cannot be recovered unless stronger assumptions are made on the non-zero parameters of the model.

“First of all, we show that depending on the communication budget, it might be advantageous to group local machines and let different groups work on different aspects of the high-dimensional object of interest. Secondly, we show that it is possible to have adaptation in communication restricted distributed settings, i.e. to have data-driven tuning that automatically achieves the correct bias-variance trade-off.”

I find the paper of considerable interest for scalable MCMC methods, even though the setting may happen to sound too formal, because the study incorporates parallel computing constraints. (Although I did not investigate the more theoretical aspects of the paper.)

In one of the presentations by the last cohort of OxWaSP students, the group decided to implement an ABC model choice strategy based on sequential ABC inspired from Toni et al. (2008). and this made me reconsider this approach (disclaimer: no criticism of the students implied in the following!). Indeed, the outcome of the simulation led to the ultimate selection of a single model, exclusive of all other models, corresponding to a posterior probability of one in favour of this model. Which sounds like a drawback of the ABC-SMC model choice approach in this setting, namely that it is quite prone to degeneracy, much more than standard SMC, since once a model vanishes from the list, it can never reappear in the following iterations if I am reading the algorithm correctly. To avoid this degeneracy, one would need to keep a population of particles of a given size, for each model, towards using it as a pool for moves at following iterations… Which also means that running in parallel as many ABC-SMC filters as there are models would be equally or more efficient, a wee bit like parallel MCMC chains may prove more efficient than reversible jump for model comparison. (On the trivial side, the OxWaSP seminar on the same day was briefly interrupted by water leakage caused by Storm Eric and poor workmanship on the new building!)

As a reading suggestion for my (last) OxWaSP Bayesian course at Oxford, I included the classic 1973 Marginalisation paradoxes by Phil Dawid, Mervyn Stone [whom I met when visiting UCL in 1992 since he was sharing an office with my friend Costas Goutis], and Jim Zidek. Paper that also appears in my (recent) slides as an exercise. And has been discussed many times on this ‘Og.

Reading the paper in the train to Oxford was quite pleasant, with a few discoveries like an interesting pike at Fraser’s structural (crypto-fiducial?!) distributions that “do not need Bayesian improper priors to fall into the same paradoxes”. And a most fascinating if surprising inclusion of the Box-Müller random generator in an argument, something of a precursor to perfect sampling (?). And a clear declaration that (right-Haar) invariant priors are at the source of the resolution of the paradox. With a much less clear notion of “un-Bayesian priors” as those leading to a paradox. Especially when the authors exhibit a red herring where the paradox cannot disappear, no matter what the prior is. Rich discussion (with none of the current 400 word length constraint), including the suggestion of neutral points, namely those that do identify a posterior, whatever that means. Funny conclusion, as well:

“In Stone and Dawid’s Biometrika paper, B1 promised never to use improper priors again. That resolution was short-lived and let us hope that these two blinkered Bayesians will find a way out of their present confusion and make another comeback.” D.J. Bartholomew (LSE)

and another

“An eminent Oxford statistician with decidedly mathematical inclinations once remarked to me that he was in favour of Bayesian theory because it made statisticians learn about Haar measure.” A.D. McLaren (Glasgow)

and yet another

“The fundamentals of statistical inference lie beneath a sea of mathematics and scientific opinion that is polluted with red herrings, not all spawned by Bayesians of course.” G.N. Wilkinson (Rothamsted Station)

Lindley’s discussion is more serious if not unkind. Dennis Lindley essentially follows the lead of the authors to conclude that “improper priors must go”. To the point of retracting what was written in his book! Although concluding about the consequences for standard statistics, since they allow for admissible procedures that are associated with improper priors. If the later must go, the former must go as well!!! (A bit of sophistry involved in this argument…) Efron’s point is more constructive in this regard since he recalls the dangers of using proper priors with huge variance. And the little hope one can hold about having a prior that is uninformative in every dimension. (A point much more blatantly expressed by Dickey mocking “magic unique prior distributions”.) And Dempster points out even more clearly that the fundamental difficulty with these paradoxes is that the prior marginal does not exist. Don Fraser may be the most brutal discussant of all, stating that the paradoxes are not new and that “the conclusions are erroneous or unfounded”. Also complaining about Lindley’s review of his book [suggesting prior integration could save the day] in Biometrika, where he was not allowed a rejoinder. It reflects on the then intense opposition between Bayesians and fiducialist Fisherians. (Funny enough, given the place of these marginalisation paradoxes in his book, I was mistakenly convinced that Jaynes was one of the discussants of this historical paper. He is mentioned in the reply by the authors.)

As in every year since 2014, I am spending a few days in Oxford to teach a module on Bayesian Statistics to our Oxford-Warwick PhD students. This time I was a wee bit under the weather due to a mild case of food poisoning and I can only hope that my more than sedate delivery did not turn definitely the students away from Bayesian pursuits!

The above picture is at St. Hugh’s College, where I was staying. Or should it be Saint Hughes, since this 12th century bishop was a pre-Brexit European worker from Avalon, France… (This college was created in 1886 for young women of poorer background. And only opened to male students a century later. The 1924 rules posted in one corridor show how these women were considered to be so “dangerous” by the institution that they had to be kept segregated from men, except their brothers!, at all times…)

The reason for my short visit to Berlin last week was an OxWaSP (Oxford and Warwick Statistics Program) workshop hosted by Amazon Berlin with talks between statistics and machine learning, plus posters from our second year students. While the workshop was quite intense, I enjoyed very much the atmosphere and the variety of talks there. (Just sorry that I left too early to enjoy the social programme at a local brewery, Brauhaus Lemke, and the natural history museum. But still managed nice runs east and west!) One thing I found most interesting (if obvious in retrospect) was the different focus of academic and production talks, where the later do not aim at a full generality or at a guaranteed improvement over the existing, provided the new methodology provides a gain in efficiency over the existing.

This connected nicely with me reading several Nature articles on quantum computing during that trip, where researchers from Google predict commercial products appearing in the coming five years, even though the technology is far from perfect and the outcome qubit error prone. Among the examples they provided, quantum simulation (not meaning what I consider to be simulation!), quantum optimisation (as a way to overcome multimodality), and quantum sampling (targeting given probability distributions). I find the inclusion of the latest puzzling in that simulation (in that sense) shows very little tolerance for errors, especially systematic bias. It may be that specific quantum architectures can be designed for specific probability distributions, just like some are already conceived for optimisation. (It may even be the case that quantum solutions are (just next to) available for intractable constants as in Ising or Potts models!)