Tag Archives: Data Analysis

Reproducible research is important for three main reasons. Firstly, it makes it much easier to revisit a project a few months down the line, for example when making revisions to a paper which has been through peer review.

Secondly, it allows the reader of a published article to scrutinise your results more easily – meaning it is easier to show their validity. For this reason, some journals and reviewers are starting to ask authors to provide their code.

Thirdly, having clean and reproducible code available can encourage greater uptake of new methods. It’s much easier for users to replicate, apply and improve on methods if the code is reproducible and widely available

Throughout my PhD and Postdoctoral research, I have aimed to ensure that I use a reproducible workflow and this generally saves me time and helps to avoid errors. Along the way I’ve learned a lot through the advice of others, and trial and error. In this post I have set out a guide to creating a reproducible workflow and provided some useful tips. Continue reading →

At the last ISEC, in Montpellier in 2014, an informal survey suggested that Methods in Ecology and Evolution was the most cited journal in talks. This reflects the importance of statistical methods in ecology and it is one reason for the success of the journal. For this year’s International Statistcal Ecology Conference in Seattle we have produced a virtual issue that presents some of our best recent papers which cross the divide between statistics and ecology. They range over most of the topics covered at ISEC, from statistical theory to abundance estimation and distance sampling.

We hope that Methods in Ecology and Evolution will be equally well represented in talks in Seattle, and also – just as in Montpellier – some of the work presented will find its way into the pages of the journal in the future.

Last week the Center for Open Science held a meeting with the aim of improving inference in ecology and evolution. The organisers (Tim Parker, Jessica Gurevitch & Shinichi Nakagawa) brought together the Editors-in-chief of many journals to try to build a consensus on how improvements could be made. I was brought in due to my interest in statistics and type I errors – be warned, my summary of the meeting is unlikely to be 100% objective.

True Positives and False Positives

The majority of findings in psychology and cancer biology cannot be replicated in repeat experiments. As evolutionary ecologists we might be tempted to dismiss this because psychology is often seen as a “soft science” that lacks rigour and cancer biologists are competitive and unscrupulous. Luckily, we as evolutionary biologists and ecologists have that perfect blend of intellect and integrity. This argument is wrong for an obvious reason and a not so obvious reason.

We tend to concentrate on significant findings, and with good reason: a true positive is usually more informative than a true negative. However, of all the published positives what fraction are true positives rather than false positives? The knee-jerk response to this question is 95%. However, the probability of a false positive (the significance threshold, alpha) is usually set to 0.05, and the probability of a true positive (the power, beta) in ecological studies is generally less than 0.5 for moderate sized effects. The probability that a published positive is true is therefore 0.5/(0.5+0.05) =91%. Not so bad. But, this assumes that the hypotheses and the null hypothesis are equally likely. If that were true, rejecting the null would give us very little information about the world (a single bit actually) and is unlikely to be published in a widely read journal. A hypothesis that had a plausibility of 1 in 25 prior to testing would, if true, be more informative, but then the true positive rate would be down to (1/25)*0.5/((1/25)*0.5+(24/25)*0.05) =29%. So we can see that high false positive rates aren’t always the result of sloppiness or misplaced ambition, but an inevitable consequence of doing interesting science with a rather lenient significance threshold. Continue reading →