March 25, 2014

How should scientists, and reporters, discuss work that has failed to replicate? The original Barr and colleagues article remains in the scientific literature; failed replication alone is not grounds for retraction.

He’s right, of course: we certainly don’t want to retract every paper whose conclusions can’t be replicated, for all sorts of reasons: they may subsequently be replicated after all; the paper may contain other useful information even if the experiment in question was flawed; the replication studies themselves probably rely on the original’s Methods section; authors should not be punished for unfortunate outcomes unless they were fraudulently obtained.

What we want is for that Barr et al paper, whenever anyone looks at it, to be displayed with a prominent header that says “The following studies attempted to replicate this finding but failed:”, and a list of references/links. And, for that matter, another header saying that the following other studies did replicate it.

For web-sites to automatically produce that kind of annotation, they need articles that cite the original to include an additional piece of metadata, along with the author/year/title/journal/etc. metadata that identifies the cited paper. That additional ingredient is the citation’s type, which should be one of a small set of defined values.

What values are relevant? I won’t try to come up with an exhaustive list at this point, but obvious ones include:

Replicates — the current paper replicates work done in the cited paper (and so provides evidence, though not proof, that the cited paper’s conclusion is correct).

FailsToReplicate — the current paper attempts to replicate work done in the cited paper, but fails (and so provides evidence that the cited paper is mistaken).

Falsifies — the current paper shows definitely that the cited paper is wrong. This is a stronger statement than FailsToReplicate, and would be used for example when the new work shows conclusively that the experimental protocol of the original was critically flawed.

DependsOn — the current paper depends on information from the cited paper, such as the phylogeny that it proposes or the vertebral formula that it gives. For these purposes, the cited paper is treated as an authoritative source.

Acknowledges — the current paper uses ideas proposed in the cited paper, and gives credit to the original.

There are all sorts of practical issues that will impede the adoption of this idea (not least the idiot fact that the citation graph is a trade secret rather than a freely available database), but let’s ignore those for now, and figure out what taxonomy of citation-types we want.

If standard citation types were implemented widely, I can imagine all kinds of interesting analytic tools you could apply to a scientific corpus.

I gather that legal researchers have long used a similar (though I believe proprietary) set of typed citations to see if a court decision has been reused in precedents, overturned, expanded on, and so forth. See the explanation of “Shepard Signals” in this Lexis-Nexis documentation:

There is CiTO along these lines: http://www.jbiomedsem.com/content/1/S1/S6
I think it would be fantastic. First I think we’d need to stop using the dreaded word processors for manuscripts and start using html editors or similar.

Either the Acknowledges definition needs to be looser, or additionally have AcknowledgesExistenceOf.

There are many related work sections that just list a bunch of papers in the general area that don’t have any direct relation to the paper itself. And often with zero explanation of how similar or different the work is…

I think an issue is that most papers are extremely complex, with numerous nested proposals. It’s rare to find the stereotypical experimental paper of “does X happen? yes/no”, at least in biology. Instead we have anatomical interpretation, phylogeny reconstruction, behavioral conjecture, mechanical function, etc.. Most phylogenetic analyses both replicate and fail to replicate nodes from previous analyses for instance, and neither the exact same topology or content is necessary for a node to qualify as being ‘replicated’.

Another problem is that conclusions are often probabilistic, so that falsification as such isn’t really possible. Yet this isn’t a mere failure to replicate the old result, as new papers often provide some amount of evidence in favor of the new position. Even with the latter, sure a new result may be more parsimonious than an old result, but by how many steps, under what assumptions, with what differences in character or taxon inclusion, what coding differences, etc.? Even if we agree a result is strong enough to be falsified, we all know of examples where new discoveries tipped the scales again so that it was obviously never truly falsified after all.

Finally, I think there’s too much qualitative disagreement in many areas of science to be usefully objective when it comes to what is supported or not. Feduccia and colleagues would say their papers falsify all of those others that propose birds are dinosaurs, but to us mainstream dinosaur workers the idea of having a database of “falsified by Feduccia, 1996; falsified by Martin, 2001; falsified by Czerkas, 2002” appended to our papers is laughable at best. Sure they’re cranks, but debates are usually based on data which is far too numerous and esoteric for any independent party to truly evaluate, and if left to the authors themselves, the Dunning-Kruger effect would mean that the worst papers would be catalogued as falsifying the most successfully.

1. This is a massive amount of work for the writer to do, having to tag each and every reference on their own depending on its relationship to the manuscript. Can’t use citation management tools to do this, either, since the citation’s relationship to the manuscript will differ to the next. This thus represents much more work than provides a benefit.

2. To whom, exactly, would this process benefit? A metadata collection analysis? Another way to measure the value of your paper? Another way to determine how someone else views your work other than reading the paper? As Mickey also wrote, we run into the problem of tagging separately from those we disagree with, meaning this work becomes inconsistent with the larger body of work on the subject.

3. And what would you do when a single paper satisfies multiple criteria at once? Say you are using a result presented, wish to test but cannot find support for another, but additionally find a result you do support, and are expanding on or relying on a methodology developed within? Is each citation in the manuscript itself to have a meta tag for its function? To copy Lab Lemming, above, isn’t that just reading?