Statistics: Its use may require a doctor’s advice

Recently, a very interesting article has appeared in Nature1. It is mainly directed to non-specialists, in order to help to interpret the scientific results with care, making the appropriate conclusions and, sometimes even more important, avoiding the wrong ones. The twenty tips can be directly applied when the results coming from statistical analysis are described. Very often, a phenomenon of interest related with some population is studied through a given sample. Statistics is the basic field that develops the tools necessary to summarize information and extract possible patterns from the data available, of course in the proper way. As a great statistician and college, W. Stute(*), says, Statistics is the art of weighting information. Main tasks developed by statistics and other fields related are the selection and estimation of the appropriate models and the study of conditions under which the obtained results for the sample can be extrapolated to the whole population.

However, sometimes the application of the statistical methods is made in a manner that might lead to extract wrong conclusions. We can find studies where the quantitative analysis made are not supporting the conclusions inferred by the authors and/or the readers of the study.

Apart from the general tips for readers, it is helpful to consider some other important issues to be aware of before inferring conclusions.

1.- Sampling: Think carefully about the population of interest and think about the sample you have. If you want to extract conclusions for a population, you must have a sample randomly selected among the population. Sometimes it is difficult if not impossible to have a sample that perfectly represents the population you have in mind. Therefore, you must take into account, not only the usual statistical error due to the size of the sample, but also to the selection bias.

A very famous example of this bias is in the Hite report2. The Hide report had a shocking impact on the American society in the seventies and eighties. No doubt of the qualitative importance of doing such a pioner work about sexual relations. However, the quantitative results were completely distorted. For instance, to state that the 70% of the American women, married for 5 years or more, had sexual relations out of the marriage was simply not reliable. The questionnaire was sent by post mail to one hundred thousand women. Only 4,5% of them were posted back. Therefore, to extrapolate the statistical results from the sample to the total population of women was a clear mistake.

It is important to remark that a number of 4500 is, in principle, large enough for a sample size. 4500 questionnaires from a well randomized population is enough to do inference. But the sampling to ensure that the statistics from the sample are good approximations of the population statistics is the first aspect to take care of. In the Hite report, these questionnaires were not a randomized sample from the population.

2.- The model: It is common to use some independent variables to estimate the value of others, the dependent variables. To establish the relation between them a model is proposed, which is estimated using the data from the sample. Sometimes the model comes from the theory related with the field we are working with, others the relation is established using an ad-hoc model, which becomes a dangerous framework. The first framework is the most recommended situation. Under this framework, we have a theory (that enables to formulate a reasonable null hypothesis) and the data can be used to test the model. Think that whenever we test a hypothesis, rejection means that we have founded empirical evidence against it. The lack of rejection does not mean that the model is correct. No matter how small the error was when the model was applied to the data from the sample. Further work should be made in order to support an ad-hoc model, like out of sample tests or stability of parameters, among others. Never forget that it is not difficult to find a model that performs reasonably well when we adapt it to a particular set of data. But it is also possible that the model, if it is not adequately formulated, does not answer any general question.

A recent example of the dangerous conclusions coming from ad-hoc models appears in Journal of Obesity 3. In this study, the authors develop a model to estimate a new body mass index. The model is only justified by the good fit to the data, among the set of models considered. The conclusion is that the body mass index can be estimated considering age, height and weight. The linear dependence on age makes that, for instance, an 80 years old woman, 170cm high and weighting 50kgs is considered overweight!.

3.- Latent variables and causality: In many studies, it is of interest to test whether a variable is significant or not in order to explain some results. In medicine, the effect of a medicament is studied using groups of control in order to detect the pure effect of the medicament. In general, if the experiment can be designed by the researcher, the effects of the variables can be controlled and the results can lead to conclusions about the effect of the particular variable of study. Randomized experiments allow the researcher to establish causal relationships. Unfortunately, oftentimes randomized experiments are unfeasible due to many reasons. In fact, in social sciences this is a major problem. To establish connections between variables that lead to causality must be done with special care. Otherwise, we can wrongly input some effect to a variable that should be input to other(s).

Prejudices and ideologies may support quick conclusions, pretended to be based in “scientific” studies.

An example is the still open question about how to promote equality between women and men, if it must be done, which find controversial answers. The empirical studies are consistent in finding gaps between both sexs. However, the causes for that gap are attributed to many different circumstances. The answer is not trivial and any study focused on finding differences between human groups must be done with a lot of care. Moreover, it needs more careful study when one of the groups has been historically discriminated, since the differences founded between groups may be due to other latent variables. Whenever we find differences between blacks and whites or men and women, a deeper study to control the pure effect of the group must be done.

Thus, when a study states that a segregated education system is better than a no segregated one, the results must be checked using the proper tools to find that segregated systems “causes” better results.

To find causal relationships when no randomized experiments can be made needs tools that enable to find the results for the control group. One technique is to construct a synthetic pattern. This is a tool used by Abadie and Gardeazabal4. They find that the GDP in the Basque Country declined about 10% points relative to a synthetic control region without terrorism.

When these tools to find causality are applied to segregated education, Halpern et al5claim:

There is no well-designed research showing that single-sex (SS) education improves students’ academic performance, but there is evidence that sex segregation increases gender stereotyping and legitimizes institutional sexism

In summary, using statistics in the proper way we know that we make mistakes. The hard task is to learn how to use the data properly and be able to say something with a high confidence of being close to the truth.

(*) Winfried Stute is a well known statistician. University of Giessen, Germany.

Written by

Eva Ferreira is a Full Professor at the Econometrics and Statistical Department at the University of the Basque Country (UPV/EHU). She graduated in Mathematics there, got her master's in Probability and Statistics at the Courant Institute (New York University) and earned his PhD at UPV/EHU. Her main research interests are related with stochastic processes, theory and applications.