Replication of results

by Henry on April 20, 2006

David Glenn has another article of topical interest today; the best write-up so far of the recent twists and turns in l’affaire Lott.

Last week Mr. Lott filed a defamation lawsuit against Steven D. Levitt, a professor of economics at the University of Chicago … Mr. Lott charges that in the book and in private e-mail messages, Mr. Levitt spread lies about the quality and integrity of Mr. Lott’s work (The Chronicle, April 13). Much will hinge on exactly what Mr. Levitt meant by the words “replicate” and “peer refereed.” … Mr. Lott’s lawsuit comes at a time when Mr. Levitt is riding high; Freakonomics has sold more than one million copies. Mr. Lott, meanwhile, is in transition; on April 3, one week before he filed the lawsuit, he abruptly left his position as a resident scholar at the American Enterprise Institute, where he had worked since 2001. He did not answer The Chronicle’s inquiries about where he would go next. A representative of the institute declined to characterize the reasons for Mr. Lott’s departure, citing a policy against discussing personnel matters. …”The term ‘replicate’ has an objective and factual meaning in the world of academic research and scholarship,” the lawsuit reads. “When Levitt and Dubner allege that ‘other scholars have tried to replicate his results,’ the clear and unambiguous meaning is that ‘other scholars’ have analyzed the identical data that Lott analyzed and analyzed it the way Lott did in order to determine whether they can reach the same result.” … It is far from clear, however, that “replicate” is in fact consistently used by social scientists in the way Mr. Lott and Mr. Moody say it is used.

There’s also a second allegation that Levitt, in a private email, said that a special issue edited by Lott wasn’t peer-reviewed – but it’s hard for me to imagine how this allegation could be libellous. And on the question of the meaning of replication – I’ve always understood it in the wider, more ambiguous sense that Levitt appears to have been using it in. That said, I’m a political scientist (one of the economists quoted in the Chronicle piece says that political scientists and economists use the term in different ways). I’d be astonished if this ever gets to trial, but if by some bizarre chance it does, it should make for some entertaining arguments about the nuances of the social science lexicon.

Share this:

To take the same data, and run the same calculations on it, in the anticipation of possibly getting a different result, would be the very definition of insanity, and economics journals don’t require referees to be insane (or at least, not in this particular way). I have no idea why anyone thought they were going to get away with this.

Freakonomics was written for a lay audience. No lay audience will have in mind as the “clear and unambiguous” meaning of “replicate” the preferred usage of some subgroup of specialists. I can’t wait to see Lott’s lawyers try to scare up seemingly disinterested people (even from among specialist) who will say under oath that they read the text and came away with the meaning Lott’s lawyers attach to it.

“It is far from clear, however, that “replicate” is in fact consistently used by social scientists in the way Mr. Lott and Mr. Moody say it is used.”

Hmm, I suppose that I think I am with Lott on this. Replication means the same data and the same methods. You’d be surprised the number of papers that can’t be replicated if you try to use the same data and the same methods. That’s the point of trying to replicate others’ research – often you can’t, sadly.

Oops. A typical use for ‘replicate’ is in biology, where, say, a group publishes a protocol that produces some specific product that has some specific property. A second group replicates the result when using the same inputs and the same protocol yields the same product with the same property. Looks like Levitt slipped.

In economics “replicate” can mean running the same data the same way, but this is usually only done as a learning exercize meaning someone trying to fully understand a given paper. Replicate also means same data set different assumptions or different data same assumptions or different data different assumptions. Testing robustness of results is a huge issue (and usaully addressed in a given paper).

“To take the same data, and run the same calculations on it, in the anticipation of possibly getting a different result, would” have been a very useful thing to do in the Bellesiles case, don’t you think? It’s not so useless as you claim, even in economics.

Didn’t Martin Feldstein famously screw up the code for his work showing that social security lowered private saving, which was caught by the simplest possible replication exercise i.e. same data & model and same (corrected) methods? Of course he then did the classic “well, I’ve redone the results a different way and my findings still stand.”

Brett,
Did Bellesiles have “calculations” as such? Wasn’t he (supposedly) just reporting what he found? It seems he made it up, but I don’t think there were calculations involved. He was a historian, not an economist, after all. As some have pointed out above, for a result to be interesting it has to be robust, and for it to be robust means that it stands up when done in some somewhat differnt ways. Lott’s could not so stand up. This made them seem like artifacts of his data, though in the end even that seems too generous.

My reading of most of this is that the definition of “replication” is pretty strict–same data, same techniques–but there’s a lot of literature out there in which people attempt to use different data sets (e.g., differenc parts of a time series, longer or shorter time series) or different techniques to test for robustness of results.

My sense is that Levitt meant that Lott’s results were not robust, and he said not replicable. Sloppy language–defamation, though is a legal term, and I’lll defer to whether lawyers think sloppy language = defamation.

matt – as far as I recall, Bellesiles didn’t have any calculations more complicated than simple addition and subtraction in his book. But as Brett illustrates, that’s not the point. The point is that every mention of John Lott has to be accompanied by an equal and counterbalancing mention of Michael Bellesiles, however silly and irrelevant, lest the fundamental equilibrium of the universe be forever shattered. It’s like those particles that spontaneously appear in pairs and immediately cancel each other out of existence.Understand this simple principle and its many corollaries, and you’ll be well on the way to being able to appreciate Brett Bellmore’s unique contribution to discussions on this blog.

Donald – replicate usually has a much looser meaning in my experience, as Charles King, who’s quoted as an authority in the APSA-CP piece acknowledges in Glenn’s article. But more to the point, the “first witness for the defense”:http://scienceblogs.com/deltoid/2006/04/what_does_replicate_mean.php on different meanings of the word replication is John Lott.

I always felt replication means “does the finding hold up elsewhere”, not “did the authors get their maths right in the original study”.

In clinical trials there’s a distinction made between internal and external validity. A trial can be internally valid for subjects within the trial (your maths is right, and the effect is real for your subjects); but not externally valid (the effect isn’t true for the general population, and isn’t found in other trials with other subjects).

In epidemiology when people speak of results not being “replicated”, they mean the effect isn’t found elsewhere. But it doesn’t mean the analysis on the original study was wrong. On that view, if people rerun your maths and it is wrong the problem isn’t that your result hasn’t been “replicated”, it’s that your result was false (and there was nothing to “replicate”).

Data archives and checking people’s maths is great and useful, but it’s not replication as I understand it. (Except perhaps where the same analysis is run with another pseudo-random number generator, where you could get a different result, but that’s getting a little obscure.)

I am an economist, and I have used the expression “replicate someone’s results” in many papers. Not a single time did I mean “with the exact same data and same methodology”. I have tried to replicate empirical results from the prior literature using different data sets, different time periods, different methodologies – that’s pretty much the point of most of my (and of most empirical) papers.

“To take the same data, and run the same calculations on it, in the anticipation of possibly getting a different result, would be the very definition of insanity”

Hmmm, just read a short paper in AER (2002) about redoing/replicating your regressions on different software packages, particularly if you’re doing MLE. The convergence mechanism used differs by package and in some cases can get crap results. It can get you minimums, saddle points and apparantly some will even tell you the process converged when in fact there ain’t no maximum. Of course I don’t think that’s what Levitt meant.

Lott’s version of replication sounds like reanalyzing the data. Surely, replication is a way of validating an experiment at a different time and location, necessarily generating a new data set. Experimental protocols or models must be the same. If the protocols are changed, it’s a different experiment. If the model is changed, it’s a different analysis.

Defamation is about damage to reputation. This means that Leavitt’s attorneys would (if this turkey ever goes to trial) have the pleasure of parading Lott’s entire dubious history before the jury. Can you imagine Lott being cross-examined under oath regarding Mary Rousch?

Given all the self-inflicted damage to Lott’s reputation, how could “replicate,” even if used falsely and maliciously, hurt Lott’s reputation further?

nik, the terms internal and external validity are used in exactly the same way in the labour market program evaluation literature (not surprising as they’ve pinched lots of techniques from biometrics).

But clearly this fool Lott just keeps digging himself deeper and deeper.

In #4, Matt wrote:A typical use for ‘replicate’ is in biology[…] a second group replicates the result when using the same inputs and the same protocol yields the same product with the same property. Looks like Levitt slipped.

Hmmm. In that case, the method is the same but the data are collected independently. Lott claims that both the data and the method must be the same.

Hmmm, just read a short paper in AER (2002) about redoing/replicating your regressions on different software packages, particularly if you’re doing MLE

ahhh I see what you mean, although given the crap that Lott has produced with simple linear regressions, the idea of him getting access to maximum likelihood estimation fills me with terror. I seem to remember he did actually develop an enthusiasm for the twice censored tobit once actually, although not in the guns n crime work.

I can’t see Lott having any objection if people went out and collected the same data he did, instead of trusting his collection of it.

However, his objection to people using *different* data sets, appears to be that he went out and collected the largest possible data set, all 50 states, and that anybody who is using a *different* set of data can only be using a *subset* of the data he used.

“However, his objection to people using different data sets, appears to be that he went out and collected the largest possible data set, all 50 states, and that anybody who is using a different set of data can only be using a subset of the data he used.

He’s claiming that they’re cherry picking, in other words.”

Nope. The “failure to replicate” statement came with a citation to an article that used Lott’s methodology and dataset but that corrected some of the coding errors he made. If “replicate” unambiguosly means: run the analysis with the same method on the same data if these contain errors, then Lott has a case.

If “replicate” means run the analysis with the same method on the same data, then the term is essentially useless in the social sciences.

For example, assume that the data was manifestly flawed. Assume that the mathematical model was equally flawed. Obviously, the product of such a model would be crap. And yet you could “replicate” the analysis by applying the flawed model on flawed data.

If scientific “replication” requires accepting manifestly flawed data and methodologies, then the whole enterprise is worthless.

I am an economist, and I have used the expression “replicate someone’s results” in many papers. Not a single time did I mean “with the exact same data and same methodology”. I have tried to replicate empirical results from the prior literature using different data sets, different time periods, different methodologies – that’s pretty much the point of most of my (and of most empirical) papers.

I am also an economist, and that would be my understanding of “replication” as well. My take on the issue is that this is yet another example of John Lott attempting to stifle any and all legitimate criticism of his research.

“To take the same data, and run the same calculations on it, in the anticipation of possibly getting a different result, would be the very definition of insanity”

Unless there was fraud in the original work. To say that results cannot be replicated in the strictest possible sense is an accusation of fraud, to say that they are not robust to different models or slightly different data sets is another thing.

I work in medical genetics. The meaning of “replicate” in a genetic study
is the same as in comment 28 above by an economist. It means to show the
same effect but on different data. “Failure to replicate” may indicate
a problem with the earlier study, or bad statistical luck, but is in no way
considered an allegation of fraud. Any issue of “the American Journal of Human
Genetics” will demonstrate this usage. So Levitt’s usage
follows standard scientific practice.