Statistical dinosaurs and other creatures

Evolution is change in the heritable traits of biological populations over successive generations. Through these developments new species are formed, others continue through gradual adaptation, and others altogether disappear. Natural selection is typically the mechanism that intervenes in a species’ survival or extinction, but perhaps not the only one. A bolide with a diameter of a few kilometers hitting the earth or a massive climate-changing volcanic explosion may well explain why some species have kissed their… ice goodbye.1

Whether survival of the fittest or a catastrophic event, we wondered how a process analogous to extinction would look in statistics: are there any ideas running through statistical theory and methods which were once widely used or debated and have vanished from scientific practice or have never taken off? Needless to say, this quest is performed purely within a fun framework – scholars more serious than us may study this interesting area of the history of statistics in a more rigorous way.

The moment of extinction is generally considered to be the death of the last individual of the species. It is difficult to establish this moment for seemingly forgotten statistical ideas, insomuch as at times species presumed to be extinct sometimes reappear (e.g. Lazarus taxa). In any case, statistical methods never die: like old soldiers, they simply fade away.2

We considered three non-exhaustive categories of extinct (or almost) statistical creatures:

those which roamed the Earth a long time ago and are now extinct (Tyrannosaurus rex);

those which are now virtually extinct but were among us until recently (Dodo); and

those which are endangered and may well be extinct in the near future or perhaps reappear (Coelacanth).

For instance, the “method of situation” appeared in geodetic studies of the 18th century when the problem of determining the ellipticity of the earth had attracted the interest of scholars with remarkable statistical aptitude. This method was essentially an algorithm for fitting the bivariate model y = a + bx from measurements of arc length (y) at given locations and the sinus of the corresponding latitude (x). Originally developed by Boscovich and then studied by Laplace, the method of situation was a combination of the least squares and absolute deviation methods since the intercept a was estimated as a mean, but the slope b as a median. As a result of natural selection (or perhaps failed interbreeding), we can put this creature in the first category. SM Stigler describes the historical context and the technical details of this method.3 See also Koenker and Bassett for an exquisite work of paleontological investigation.4

An example of a dodo-like statistical creature is RA Fisher’s fiducial inference. First published in the 1930s the fiducial argument quickly attracted much criticism.5 A brief description by Fisher himself appears in the fifth chapter of his Statistical Methods and Scientific Inference. The central idea of the argument is to express inferential statements about future observations when no prior distribution can be specified. Kendall and Stuart provide a clear critique of the fiducial argument in the context of comparing confidence intervals, Bayesian intervals and fiducial intervals.6 In the fiducial case, inference is made in terms of the parameter’s fiducial distribution, which is not a probability distribution in the classical sense of the term but, as Kendall and Stuart (1973, p. 142) state, “a new concept, expressing our belief in the various possible values of a parameter”. A review paper7 simply states that Fisher “never gave an acceptable general definition of fiducial probability”, and another8 starts thus: “Among R.A. Fisher's many crucial contributions to statistics the fiducial argument has had a very limited success and is now essentially dead.” A more personal view of this argument can be seen in the Fisher Memorial Lecture delivered by Anthony Edwards in 1994.9 Certainly there have been very few statistical analyses done within this inferential basis and almost all papers published in this topic refer to historical or philosophical developments. Apart from the philosophical issues arising from the definition of the fiducial method, Fisher’s examples of its applications concern inference for a single parameter, and very few attempts have been made of generalising it for two or more parameters. With the exponential increase of analyses performed within the Bayesian framework the relevance of fiducial inference seems weaker than ever.

The third category of statistical creatures has interesting examples of ideas which, if not quite extinct, are certainly very old-fashioned. For instance, systems of continuous distributions arising from (i) a differential equation leading to characterisations based on the moments of order 3 and 4 (Pearson system), and (ii) approximations based on moments (e.g. Gram-Charlier expansion), or cumulants (Edgeworth expansion). An important point of these systems of distributions was to graduate (i.e. to smooth) data available only as group frequencies, for instance actuarial mortality tables. Non parametric regression models, e.g. based on splines,10 and better computing and surveillance facilities have replaced the role these systems had in applied statistics. However, they are widely used in theoretical work and the Pearson system is having a bit of a revival: for instance Sir William Elderton’s classic book Frequency curves and correlation, first published in 1906 was reprinted in 2011,11 and library pearsonDS12 in R fits such probability models.

Of course, these are not the only examples: we have created an online survey and look forward to reading any suggestions you may have to populate this virtual museum.

Marco Geraci is associate professor of biostatistics at the Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, and a member of the Significance editorial board.

Mario Cortina Borja is chairman of the Significance editorial board, and professor of biostatistics in the Population Policy and Practice Programme, Institute of Child Health, University College London.