Saturday, October 07, 2006

Did the baseball salary market anticipate DIPS?

According to the famous Voros McCracken DIPS hypothesis, pitchers have little control over what happens once the ball is hit off them. As long as they stay in the park, balls hit off bad pitchers are no more likely to drop for hits as balls hit off good pitchers. No matter who the pitcher is, his batting average on balls in play (BABIP) should be roughly the same.

If that’s true, then teams shouldn’t be evaluating pitchers based on their BABIP, since it’s not evidence of skill. And, therefore, they shouldn’t be paying players based on BABIP.

The paper starts off by verifying whether the DIPS hypothesis holds. That part of the paper is technical and hard to summarize concisely, so I’ll skip the details and run it down in one paragraph. What Bradbury does is, first, he shows that if you choose the right combination of variables to predict this year’s ERA, adding last year’s ERA and BABIP doesn’t help with the prediction. Then, he runs a second regression to see what other statistics correlate with BABIP – and the answer, it turns out, is strikeouts and home runs.

One important point Bradbury makes is that even if that last year’s BABIP doesn’t help other stats to predict this year’s BABIP, that doesn’t by itself imply that pitchers have no control over BABIP. It could be that pitchers do have BABIP skill, but that skill correlates perfectly with the other stats he considered. He finds a high correlation with strikeouts, but not a perfect one.

In summary, Bradbury writes,

“… it appears that pitchers do have some minor control over hits on balls in play; but, this influence is small … this skill just happens to be captured in strikeouts.”

Having concluded that BABIP isn’t much of a skill, Bradbury now checks whether players are nonetheless being paid based on it. He regresses (the logarithm of) salary on various measures of the pitcher’s performance in his previous year – strikeouts, walks, home runs, hit batters, and BABIP. He takes every year separately from 1985-2004.

His findings:

-- strikeouts are a significant predictor of salary 16 out of 20 seasons;-- walks are significant 11 out of 20 seasons;-- home runs allowed are signficant 8 out of 20 seasons;-- HBP are significant 0 out of 20 seasons;-- BABIP is significant 4 out of 20 seasons.

Bradbury concludes that because BABIP shows as so many fewer significant seasons than some of the other factors, this means that GMs were less likely to base decisions on it. In effect, they (or rather, the market) knew about DIPS before even Voros did:

“… the market seems to have solved the problem of estimating pitcher MRPs [marginal revenue products – the benefit of signing a pitcher versus not signing him] well before McCracken published his findings in 2001.”

This is not as farfetched as it sounds – economists are fond of finding ways in which markets are capable of appearing to “figure out” things that no individual knows. For instance, even though every individual has a different idea of what a stock may be worth, the market “figures it out” so well that it’s very, very difficult for any individual to outinvest any other. (And here is another possible example of the market “knowing” something about sports that most of its participants do not.)

So it’s certainly possible. But I think the evidence points the other way.

Bradbury’s conclusion, that GMs aren’t paying for BABIP, is based only on the number of years that came out statistically significant. Those other seasons, the ones that do not show significance, are treated as if they confirm no evidence of effect that year. But, taken all together, they show obvious significance.

A look at the study’s Table 6 shows that from 1985-1999, the direction of the relationship between BABIP and salary is almost exactly as you’d expect – all negative but one. (Negative, because higher BABIP means lower salary.) Plus, the positive one is only 0.02, and only two of the other t-statistics are lower than 0.5. Clearly, this is a very strong positive relationship between BABIP and salary.

Here, I’ll list those 15 scores so you can see for yourself. Remember, if there were no effect, they should be centered around zero:

Only the three highest of those are individually significant, but taken as a whole, there’s no doubt. The chance of getting fourteen or more negative t-statistics out of fifteen is 1 in 2048. (Of course, that’s not a proper test of significance because I chose the criterion after I saw the data. But still…)

If you combined all fifteen years into one regression (you’d have to adjust for salary inflation, of course), you’d wind up with a massive t-statistic. It’s the data being broken up into small samples that hides the substantial significance.

If you look at an entire season, Rod Carew would easily score a statistically significantly better hitter than Mario Mendoza. But if you carved their season into individual weeks, not that many of those weeks would show a statistically significant difference.

In terms of actual salary impact, the numbers show a great deal of baseball significance. Take 1990, which is pretty close to average (z=-1.04). If your BABIP in 1989 was .320, you would earn about 5.4% less money in 1990 than if your BABIP was only .300. (Assuming I’ve done the math right.) That’s a reasonable difference, 5% of salary for only a moderate increase in BABIP.

Furthermore, the real-life effect is almost certainly higher than that. Players don’t sign new contracts every year, so there’s a time lag between performance and salary. Suppose, on average, half of pitchers sign a new contract in any given year. The 5% difference overall must then be a 10% difference on the pitchers who actually sign (to counterbalance the 0% effect on pitchers whose salary didn’t change).

And still further, pitchers aren’t evaluated on just their most recent season. Suppose that GMs intuitively give the most recent season only 50% weight in their forecast of what the pitcher will do for them. Again, that makes the 5% difference really a 10% difference in the GMs evaluations.

Combine the two adjustments, and now you’re talking real money.

So it seems to me that in terms of both statistical significance and baseball significance, it seems pretty solid that the market for pitchers does consider BABIP to be significant in determining pitcher skill.

But there’s still something interesting in the data. From 2000 to 2004, the years McCracken’s original DIPS study was in the public domain, the numbers become less consistent:

-0.64+0.10-1.40+0.57-0.05

Two of the five years have the correlation between BABIP and salary going the wrong way. One of the others is close to zero, and the numbers seem to be jumping up and down a bit more. All this might be because the number of pitchers in the sample is a bit lower – but it also might be that GMs are catching on.

It’s weak, but it’s something. This study provides the first hint I’ve seen that baseball’s labor market might actually have learned something about DIPS. It’ll be very interesting to extend the table ten years from now and see if that’s really true.

8 Comments:

[Sorry, mistakenly posted this at previous entry.] Good catch on the pattern of results on BABIP. It does look like there is an impact. However, the study's flawed methodology of using the prior year's performance data, rather than something like career performance prior to signing of last contract (and annualized value of current contract as dependent variable), makes it hard to even infer the size of the relationship, and I don't think you can draw any conclusions about the last few year's coefficients.

Finding a relationship also doesn't necessarily mean the market was "wrong." The market should give some weight to BABIP, but not a lot. For example, I'd argue that Zito will, and should, be rewarded this winter for his career .269 BABIP. It's part of his skill set. However, paying more for a pitcher because of one or two years of low BABIP, w/o any other evidence it was a real skill, would clearly be a mistake.

One quibble: you describe the difference between a BABIP of .320 vs. .300. as "only a moderate increase in BABIP." That's actually a very substantial difference: it translates into an ERA difference of about 0.50. If that were a true talent difference, 20 points of BABIP would separate an average pitcher from a star, or a star from a HOFer. That's important to keep in mind as you review the rest of the DIPS literature. A common error is to dismiss BABIP differences as "small" because they are small as a percentage of the mean, compared to a stat like K/9 (where a player can commonly be 50% above average). But what matters is RUNS, and superficially small changes in BABIP have a large impact on runs.

I agree with you that a 20-point difference is large in terms of *talent*. But it's "moderate" in terms of observed values.

For instance Barry Zito gives up about 750 BIP per year. The standard deviation of his BABIP is therefore about .017. So .020 is about 1 SD away from the mean, and I think it's reasonable to call it "moderate".

I agree that the market isn't necessarily "wrong" -- my feeling is that BABIP is indeed a skill, and like any other skill, players should be paid based on it. But I was judging the results on the author's assumption about it, which is that BABIP skill barely exists outside of what's captured in strkeouts.

I agree with you that the .269 is signficant for a career. But J.C.'s paper was based on *one season*, and I'm calculating based on only one season as J.C. would.

That is, IF you accept J.C.'s conclusion that all the difference in DIPS is captured in strikeouts (and I know you don't, but bear with me), THEN the difference between .300 and .320, all things being equal, is due to luck.

But since the salary difference between .300 and .320 is significant in a baseball sense, teams must be basing their decisions on it, contrary to J.C.'s conclusion.

I agree with you and Guy that the difference between .300 and .320 is quite material (ERA difference of .50) to winning ballgames.

But I wasn't addressing that. I was making the point that if J.C. is correct that all the difference in DIPS is captured in strikeouts, a 20-point difference between two identical pitchers is "moderate" in the sense that it will happen fairly often by luck.

I haven't read all of the materials very carefully or rigorously - my apologies - but I wonder about the fundamental relevance of the question of whether or not teams ('the market') had 'solved' the BABIP question prior to DIPS theory.

Teams choose their players on the basis of expected performance; that's a given. But were teams even necessarily looking predominantly at previous results to determine player value in the future? Voros' DIPS theory was a corrective among those trying to project pitcher ERA *mathematically*. How many teams were even trying to do that in 2000 and before?

Teams, much more than any of McCracken's audience, base their evaluations largely on scouting information, or at least I've been led to lead that that's the case. Well, if the spread in BABIP is insignificant relative to the spread in K and BB, you would expect that the scouting info wouldn't reflect BABIP too strongly, right?

Sure, scouting info and stats are heavily cross-contaminated, and much of the scouting info that makes its way to the fans is predicated on explaining results moreso than on giving objective analysis. But, as a WAG, I'd expect that the effect of that will never come close to vaulting BABIP near K and BB in terms of statistical significance vis-a-vis player salary.

So my hypothesis, which I hope someone here can speak to, is that this is not at all similar to the supposed "market inefficiency" wrt OBP, because the inefficent 'market' that DIPS was correcting (i.e., statheads' projections) was almost entirely separate from the market which may or may not have 'solved' that question prior to DIPS.

And I would have to wrap my head around it much more before feeling confident in this final point, but here goes: given that BABIP determines, to varying degrees, the key factors that determine(d) player salaries in the time span in question (Wins, ERA, scouting info), we would expect BABIP to influence salaries at the margins and shouldn't expect it to be statistically significant wrt player salary.

There's a good chance I'm screwing my thinking up with gross negligence, but I wanted to throw that out there.