Abstract: In empirical modeling, an important desideratum for deeming theoretical entities and processes real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwine with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments of the two types of models. The key to untangling them is the realization that behind every substantive model there is a statistical model that pertains exclusively to the probabilistic assumptions imposed on the data. It is not that the methodology determines whether to be a realist about entities and processes in a substantive field. It is rather that the substantive and statistical models refer to different entities and processes, and therefore call for different criteria of adequacy.

There is much in the paper with which I agree, there is also much with
which I disagree. On the agreement side the authors emphasize that
models are approximation and that they are adequate rather than true
as in `statistically adequate’. On the disagreement side their vocabulary
contains words associated with a concept of truth such as `actual’ as in
`{\it actual} error probabilities’ and `wrong’ as in `wrong
likelihood’. I have no idea what an actual error probability is
unless the concept is restricted to simulations. Does the use of `wrong
likelihood’ mean that there is some `correct likelihood’ and, if so,
how to be recognize it when we see it (or them)? The disagreement is
about substantial matters which are reflected in the vocabulary.

The authors concept of statistical adequacy relies on the
ability to simulate data sets under the model and comparing these
simulated data sets with the real data. This is to be applauded but
unfortunately the form of comparison is never made precise. Here is
how it is done in `Data Analysis and Approximate Models’. A model is a
fully specified probability measure, that is, all parameters have
explicit values as they must have if the model is to be used for
simulations. The next step is to decide which features of the data
set are to be replicated by the model. Suppose for the sake of
argument the model is that of i.i.d. Gaussian random variables and
the features of interest are (i) shape as measured by the Kolmogorov
distance between the empirical and model distributions
$T_1=d_{ko}(\ep_n, N(\mu,\sigma^2))$ and (ii) the
lack of outliers as measure by $T_2=\max_i \vert X_i-\mu\vert/\sigma$.
These play the role of the mis-specification tests of the authors.
One now generates data and the model $N(\mu,\sigma^2)$ and calculates
say the 0.975-quantiles of $T_1$ and $T_2$, say $q_1(0.975)$ and
$q_2(0.975)$ respectively. Given data $x_1,\ldots,x_n$ the set of
adequate Gaussian models are those $N(\mu,\sigma^2)$ for which
$d_{ko}(\ep_n, N(\mu,\sigma^2))\le q_1(0.975)$ and $max_i \vert
x_i-\mu\vert/\sigma\le q_2(0.975)$.

Note that this concept of adequacy specifies the parameter
values. Maximum likelihood has nothing to add although one can include
the behaviour of the mean and standard deviation in the features to be
replicated. One can define mis-specification tests without specifying
parameters. Thus $T_1$ can be replaced by $T_3=\inf_{\mu,\sigma}
d_{ko}(\ep_n, N(\mu,\sigma^2))$ and $T_2$ by $T_4=\max_i \vert
x_i-mean(x)\vert/sd(x)$ where $mean(x)$ and $sd(x)$ are the mean and
standard deviation of the data. Now a model can be declared adequate
without specifying any parameter values. This leaves the statistician
free to use say maximum likelihood in the interests of efficiency
and severity. This can however go completely wrong as the resulting maximum
likelihood estimate can produce parameter values for which the
resulting model is an arbitrarily poor approximation to the data.

Finally a comment on severity. Suppose the model is $N(\mu,\sigma^2)$
and the null hypothesis is $H_0: \mu=0$. Presumably a severe test will
be based on the mean of the sample. However the careful statistician
decides first to check the adequacy of the model using some
mis-specification tests. The data pass the mis-specification tests and the null
hypothesis is accepted. Suppose we now consider all symmetric
location/scale models which pass the mis-specification tests and then
use maximum likelihood to define a severe test of $H_0$. It turns out
that the test using the Gaussian model is the least severe of all the
tests. The moral is that severity depends not only on the data but on
the model and that severity can be imported from the model. Tukey
calls this a free lunch. In mathematical terms the testing $H_0$ is an
ill-posed problem if the model can also be chosen. The problem need
regularizing and one way of doing this is to use minimum Fisher
information models. The Gaussian model is one such. The test based on
the mean is the severest test using the least severe model, that is
that model which does not introduce spurious severity. I miss a
discussion of this problem in the paper.

Follow Blog via Email

Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Deborah G. Mayo and Error Statistics Philosophy with appropriate and specific direction to the original content.