Tuesday, October 03, 2006

Chopped liver

In 1999, Voros McCracken discovered what has come to be known as the “DIPS” theory – that pitchers have only a small influence on whether a ball in play becomes a hit. Since then, and in the standard fashion of scientific enquiry, there have been frequent studies testing McCracken’s hypothesis. The consensus, seven years later, after countless hours of research, analysis, and dialogue, is that the theory is generally true – pitchers have much less control over batted balls than previously believed, and any material deviation from league average is almost always just luck.

As he writes, all that research was done only by the “analytical baseball community” – not by reputable Ph.D.s in economics. It has not undergone “formal peer review.” Further, “it has not been tested with sufficient statistical rigor” and “has undergone very little formal scrutiny.” It does not use “proper econometric techniques.”

And so Dr. Bradbury sets out to correct this. How? Not by reviewing the existing research, and validating it academically. Not by finding those studies which have “insufficient statistical rigor” and analyzing them statistically. Not by summarizing what’s already out there and criticizing it.

No, Dr. Bradbury’s paper ignores it. Completely. He mentions none of it in his article, not even in the bibliography. Instead, Dr. Bradbury’s explains his own study as if it’s the first and only test of the DIPS hypothesis.

This happens all the time in academic studies involving baseball. Years of sabermetric advances, as valid as anything in the journals, are dismissed out of hand because of a kind of academic credentialism, an assumption that only formal academic treatment entitles a body of knowledge to be considered, and the presumption that only the kinds of methods that econometricians use are worthy of acknowledgement.

The truth is, there’s pretty decent statistical rigor in some of what us amateurs have done. In “Solving DIPS,” a bunch of really smart people, statistically literate, probably no less intelligent than academic economists and much better versed in sabermetrics, do an awesome and groundbreaking job of determining the causes of variation in the results of balls in play. There’s Tom Tippett’s famous study that showed that power pitchers performed better than projected by DIPS -- the same conclusion that Dr. Bradbury reaches. (Tom didn’t do any significance testing, though, which I guess makes his excellent analysis unworthy of citation.) The May, 2001 issue of “By the Numbers” had three articles on DIPS, one of which (mine) did roughly the same kind of regressions (although admittedly much less thorough) as Dr. Bradbury does in his own paper.

Dr. Bradbury knows about our previous work. In this article from last year, he mentions the Tippett article and a few others. He knows what’s out there. He must know that all this research has been peer reviewed, albeit informally, in hundreds, or even thousands, of blog postings and message boards. He must understand that the network of sabermetricians is perfectly respectful of the scientific method and understanding of the way theories must be tested and revised in the face of contradictory evidence. He’s got to know that our method of peer review, while informal, is much, much more likely to expose flaws in the theory than submitting a paper to a couple of economics referees who know little about sabermetrics.

And he’s got to be aware that in the field of sabermetrics, the achievements of non-academics, starting with Bill James and Pete Palmer, have been orders of magnitude above anything that’s come out of academia. Think of all the discoveries that have changed the way we look at baseball – DIPS, runs created, linear weights, the clutch hitting myth, catcher ERA, major league equivalencies, and so forth. Of the most important principles of sabermetrics, how many of those ideas were developed in academia? If the answer isn’t zero, it’s pretty close.

I feel bad singling out Dr. Bradbury. He does some good work on his blog, and, outside of this paper, seems very supportive and appreciative of the work we do. He seems like a nice guy. And ignoring non-academic research is pandemic when professors venture into sabermetrics – “The Wages of Wins” being the most prominent recent example.

So perhaps I’m being unfair. Maybe it’s one of the realities of academic life that to get an article published, you have to ignore non-academic work. Maybe the professorial culture requires that you presume no knowledge is real until it’s been published in peer-reviewed journals. Maybe Dr. Bradbury is right that all the previous results are questionable unless he uses the exact technique that he does. And maybe I’m just overreacting to a couple of throwaway sentences intended only to get his paper past the referees.

But still, there is a basic tradition among scientists of acknowledging that we stand on the shoulders of giants. When it comes to sabermetrics, academia repeatedly pretends that it hoisted itself up on its own bootstraps.

I posted my comments about Phil's blog and J.C. on J.C. blog, which I'll reprint here:http://www.sabernomics.com/sabernomics/index.php/2006/10/in-case-youre-interested

"I only have two issues with the presentation of the paper.

The first is: “However, this finding, though widely accepted, has not been tested with sufficient statistical rigor”.

I don’t know how widely accepted it is, and I’m not sure that “sufficient statistical rigor” is a necessary requirement.

The “widely accepted” is a line thrown in, without basis. I wouldn’t claim it’s widely accepted by the mainstream, nor by the analysts. Maybe it is, but unless you substantiate it, it shouldn’t be mentioned.

Likely the most rigorous of these studies is the Allen/Hsu called Solving DIPS on my site. Even if it doesn’t meet whatever threshholds are required to make J.C.’s statement true, it is easily the most compelling of the anti-strong-DIPS position out there. That is doesn’t meet the threshholds doesn’t mean that it gets to be dismissed or rejected. The Allen/Hsu paper is a prime example of smart guys, who love baseball, who spend tons of their time, doing things that didn’t require formal training. The results, if undergone through sufficient statistical rigor, would have likely yielded the same conclusions.

The second: to the extent that a bibliography is created, other DIPS-related research should have been prevalent.

The unfortunate part of the Birnbaum criticism is that he wanted to take exception to academia, and focused his thoughts on J.C.’s paper in particular. Other than the line I quoted, which many may read as being dismissive of otherwise quality work, the rest of the paper stands well on its own. "

My comment was preceded by a comment from another reader, being pro-J.C.'s-paper.

J.C. responded with:"I don’t think there is much useful in the Solving DIPS discussion. I agree with Walt Davis’s characterization in the BTF thread.

I have sufficiently addressed the critique that I “under-cited” the sabermetric literature. I’m not going to comment on it any further. I didn’t do anything even borderline inappropriate. It’s nit picking. I am shutting down comments on this subject. If you want to write or talk about what an awful person I am, you are free do so in another forum. " (emphasis mine)

Seeing that only two people commented on his blog, and the other was definitely pro, I can only presume that the bold comment was directed at my comment. And, that is rather shocking. To characterize anything I said about anything in his paper being anything close to inappropriate is to completely misread my comment. In fact, in my second post, where I said he should have had a more DIPS bibiliography, was conceded by Sean Forman at BBTF as a possibility.

Unfortunately, J.C. shut down commentary on his site following my comments, and therefore, that blog entry must stand on it own, without an agreeable resolution.

"The unfortunate part of the Birnbaum criticism is that he wanted to take exception to academia, and focused his thoughts on J.C.’s paper in particular."

J.C.'s paper is more illustrative of the point than any other academic paper I've seen, because

1. Unlike other papers I've seen, J.C. *explicitly stated* that non-academic research is not up to standard, and

2. J.C. takes active part in our non-academic sabermetric community, and is aware of the large body of research on the topic in his paper, research he chose not to cite.

In that light, I think using J.C.'s paper as an example was certainly appropriate.

Having said that, I concede that Tango may be right. It is entirely possible that my criticism would have been better had it explicitly focused on academia in general a bit more, and on J.C.'s paper a bit less.

Others should concede that they should have looked at your remarks for what they were, honest and appropriate. It was a typical Phil blog entry. Certainly not the kind of remarks that requires others to stand up for J.C., and shoot down your remarks.

Roger Ebert is tougher on filmmakers whom he gives 3 1/2 stars to, than Phil was on J.C.

What should count are the individual points that Phil brings up, rightly or wrongly, and not an overall summary judgement as to whether Phil did the right thing or not.