Sabermetric Research

Phil Birnbaum

Thursday, March 27, 2008

Players being "clutch" when targeting 20 wins -- a study

In my previous post, I speculated about the anomaly, discovered by Bill James, that there are more 20-game winners than 19-game winners in the major leagues. That is the only case, between 0 and 30, where a higher-win season happens more frequently than a lower-win season.

Here, once again, are some of the win frequencies. For instance, there were 123 seasons of exactly 19 wins since 1940. (All numbers in this study are 1940-2007.)16 31117 22118 18519 12320 14421 9222 54In the previous post, I suggested that the bulge at twenty wins appears to be about 29 "too high". So we'll proceed as if there are an extra 29 twenty-win seasons to be explained.

I did a little digging to see if I could figure out what caused this to happen. I think I have an answer, and it's a bit of a surprise.

1. Extra StartsThe first thing I looked at was whether pitchers with 18 or 19 wins late in the season would be given an extra start near the end of the season to try to hit the 20. So for each group of pitchers, I checked what percentage of their starts came in September or later:

16 win pitchers: 17.53% of starts in September17 win pitchers: 17.77% of starts in September18 win pitchers: 18.36% of starts in September19 win pitchers: 18.49% of starts in September20 win pitchers: 18.47% of starts in September21 win pitchers: 18.15% of starts in September22+ win pitchers: 18.18% of starts in SeptemberSo it looks like there's a positive relationship between September starts and eventual wins, and a little bulge that happens in the 18-20 range. Maybe those pitchers are getting extra starts, or, as Greg Spira suggested, perhaps the other pitchers miss a start in favor of a minor-league callup, while the pitchers with a shot at 20 are given all their usual starts. The bulge appears to be about a quarter of a percent.

If we assume that, if not for targeting 20 wins, the 19-20 pitchers would have been 0.25% lower without the special treatment, that's 23 of their 9,229 combined starts. If half those starts led to a 19-game winner becoming a 20-game winner, that's an extra 11 pitchers in the twenty-win column. It seems reasonable – it's less than 29, anyway, which is the number we're trying to explain.

2. Relief AppearancesGreg also suggested, in the previous post, that pitchers with 19-wins may be given an extra late-season relief appearance to try to get their twentieth win. I checked Retrosheet game logs, and Greg is right – there has been some of that going on.

I found all September relief appearances for eventual 20-game winners, where they had at least 18 wins at the time of the relief appearance, and they got a decision (Retrosheet game logs won't list a reliever unless he wins or loses, but that doesn't matter for this study). Here they are:

So it looks like pitchers do get extra relief appearances in pursuit of high win totals (or *did* -- most of these guys were pre-1970). There were four 19-game winners created this way; seven 20-game winners; and six 21-game winners.

The difference between 19 and 20, here, is three players – a lot fewer than I would have thought. But 3 is something, again when there's only 29 to explain.

3. Clutch PitchingMaybe, when going for their 20th win, a player will bear down and pitch better than usual. I found all starting pitchers with exactly 19 wins, and looked at how they did in the start(s) that would give them their 20th win.

In 490 such starts, they went 227-142 (.655). I couldn't get their ERA or runs allowed from the Retrosheet game logs, but I did get the average number of runs their *team* allowed in those games. It was 3.63.

That doesn't mean much without context. Here are the results for some other win totals:

17 wins: 3.72 runs allowed, .658, 670-348 in 1385 starts18 wins: 3.54 runs allowed, .652, 487-260 in 982 starts19 wins: 3.54 runs allowed, .655, 367-193 in 704 starts20 wins: 3.62 runs allowed, .615, 227-142 in 490 starts21 wins: 3.53 runs allowed, .676, 138-66 in 273 starts22 wins: 3.34 runs allowed, .774, 82-24 in 148 startsNow, we have something: immediately after hitting the 20-win mark, the starters suddenly became a lot less likely to win. Instead of a winning percentage of maybe .660, which you would have expected (remember that the more wins, the better the pitchers, so the winning percentage should increase down the list), they wound up only .615. That's .045 points in 369 decisions, or about 17 wins – almost half the 35 wins we're trying to explain!

By this measure, it looks like this half of the anomaly is not too many 20-game winners relative to 19-game winners, but that poor performance at 20 causes a logjam keeping the 20s from getting to 21.

But: if you look at runs allowed, the performance at 20 wins doesn't seem all that bad. It should be around 3.54, and it's at 3.62. That's .08 runs for each of their 490 starts -- about 40 runs. How did these pitchers win 17 fewer games while allowing only 40 extra runs? Forty runs is 4 games, not 17 games.

The answer: run support. Here's the pitchers' run support for each category:

The 4.05 is not a typo. When starting a game with 20 wins, pitchers got four-tenths of a run less support than they should have. That's huge. Over 490 games, it's almost 200 runs. That wipes out 20 wins, which keeps twenty 20-win pitchers from getting to 21 wins.

I have no idea why this should happen. I suppose it's possible that, seeing how the ace already has 20 wins, the manager might play his bench for this meaningless September game. But how often would that happen? No way it would be enough for 0.4 runs per game, would it?

By the way, it looks like these 20-game winners beat Pythagoras in these starts. They finished only 17 games below expectation, while losing 240 runs (40 pitching, 200 hitting). Assigning blame in proportion over those 17 extra games, we'll say that 3 of the extra losses came from pitching, and 14 from run support.

I find it something of a relief that it was run support, and not (positive) clutch performance on the part of their pitchers, that caused the effect – it wasn't the case that they pitched better when close to a (selfish) goal. Going for their 20th win, pitchers did not appear to do any better or worse than when going for their 18th, 19th, or 22nd wins. And they pitched only marginally better than when going for their 21st.

It's human nature that pitchers want to win 20 for personal reasons, but at least the evidence is that they try just as hard every other game of the year.

4. ConcludingSummarizing these results, we were looking for 29 "extra" 20-game seasons. We got:-- 11 from extra starts-- 3 from extra relief appearances-- 3 from pitchers' own poorer performance in subsequent games-- 14 from poor run support from their teammates in subsequent games.

That adds up to 31 games, which is close enough to our original estimate of 29.

It's interesting that about half the effect comes from 19-game winners getting extra chances to hit 20, and the other half comes from 20-game winners being unable to rise to 21.

And, to me, the biggest surprise is that almost 40% of the 20-game-winner effect came from that huge hole in run support. In other words, a big part of the surplus of 20-game pitchers is probably just random luck.------UPDATE (12/2014, almost seven years later): Bill James points out that the results don't quite work. I've updated the analysis in a new post here.

For pitcher wins, Bill found a similar exception that's even more striking. More pitchers win 0 games than 1. More pitchers win 1 game than 2. More pitchers win 2 games than 3. And so on, all the way up to 30 wins. But there's one exception – 20. Significantly more pitchers finish with 20 wins than with 19.

"[Brooks Robinson] had a miserable year in 1963, and went into his last at bat of the season hitting exactly .250—147 for 588. If he made an out, he wound up the season hitting under .250—but he got a hit, and wound up at .251. He said it was the only hit he got all season in a pressure situation. ... "[P]layers WANT to wind up the season hitting .250, rather than in the .240s. They tend to make it happen."

The implication is that there's a kind of clutch effect happening here, where the player somehow gets better when the target is near. But if that's true, wouldn't that point to baseball players as selfish? Studies have shown very little evidence for clutch hitting when the *game* is on the line. If players care more about hitting .300 than winning the game, that doesn't say much for their priorities.

(Although, in fairness, it should be acknowledged that the opposition is probably trying harder to stop Brooks Robinson from driving in the game-winning run than it is to keep him from getting to .250. For the record, Robinson's final 1963 hit drove in the third run in the ninth inning of a 7-3 loss to the Tigers.)

The study also finds that while this kind of targeting happens for batting average, RBIs, wins, and (pitcher) strikeouts, there's no evidence for targeting in SLG, OBP, OPS, saves, or runs scored. For ERA, there's some evidence of targeting, but not enough to say for sure.

Also, Bill finds that targeting seems to have started around 1940. He argues that's the same time as a jump in fan interest in players' statistical accomplishments.

These are very interesting findings, and I wouldn't have expected as much targeting as seems to have actually occurred. But I'm a bit skeptical about clutchness, and whether players really can boost their performance in target-near situations. I wondered if, instead of clutch performance, it might be something else. Maybe, if a player is close to his goal, he is given additional playing time in support of reaching the target.

That is, if a pitcher has 19 wins late in the season, perhaps the manager will squeeze in an extra start for him. Or if a player is hitting .298, maybe they'll let him play every day until he gets to .300, instead of resting him in favor of the September callup. If and when he reaches .300, then they could sit him (as, I think I remember reading, Bobby Mattick did for Alvis Woods in 1980).

To test the "extra start theory," I looked at pitchers since 1940, grouping them by number of wins. I then looked at their winning percentage, number of starts, and the number of seasons in the group:

So, reading one line of the chart, 20-win pitchers had a .667 winning percentage and an average of 34.9 starts that year. There were 144 seasons in the group.

Looking at the numbers, we do see a bit of an anomaly. More wins normally means more starts, except that pitchers with 20 wins had more starts than pitchers with 21 wins. And, there's a big jump between 18 and 19, more than you'd expect based on the other gaps in that win range.

Suppose we wanted to smooth out the "number of starts" column. We might adjust them like this:

17 33.018 33.319 34.4 33.820 34.9 34.321 34.722 35.9

Now we have a smooth increasing trend. To get it, we had to remove 0.6 starts from each of the 19- and 20-win groups.

One possible interpretation: when a pitcher has 19 wins near the end of the season, they give him 1.2 extra starts. Half the time, that gives him an extra win, and he goes to 20 (which now shows 0.6 extra starts). The other half, he fails to get the win, and stays at 19 (which also shows 0.6 extra starts).

Another way to look at this in the "win percentage" column: pitchers with 19 wins have almost the same winning percentage as the 18-win guys, which means more losses. And the 20-win guys, at .667, are only .006 away from the 21-win pitchers, which suggests more wins. That's exactly what happens if you take a bunch of 19-win guys, give them an extra start, and reclassify them.

So what do you think of this as an explanation? Does the *average* 19 win late-September pitcher really get 1.2 extra starts? That seems too high to me, although I don't really know. And, some of the effect might not be extra starts, but leaving the pitcher in the game longer when he's losing or tied, long enough for his offense to bail him out and give him the win.Now look at the last column, the number of seasons. If we were to smooth out that column, we might do it this way:

17 22118 18519 123 15120 144 11521 9222 5423 34

The difference is 29 pitchers in the nineteen-win row, and 29 pitchers in the twenty-win row. Assume those 29 pitchers moved from 19 to 20 because of the extra start. If you figure that these pitchers generally win half their starts, that means about 58 pitchers were given that one extra shot.

58 pitchers in the 68 baseball seasons since 1940 means about a little less than one pitcher a year getting that extra start. There are normally only about two 19-win pitchers a year, so that means about half of them would have to get the special treatment.

Again, that seems high. However, in support of this theory, the effect diminishes after 1980. In fact, there are now *fewer* pitchers winning 20 than 19:

17 9718 8419 4320 4121 2522 12

There's still a bit of an effect, but not as much – in line with Bill's idea that, these days, managers are less likely to pitch an ace on short rest (or leave him in longer in a tie game) just to help him reach a personal goal.

There are probably other things that might be causing this, that I haven't thought of.

In any case, it wouldn't be too hard to figure out a decent answer: just head to Retrosheet and look at 19- and 20-game winners. See if their days of rest varied late in the season, which would mean the "extra start" theory is correct. Check whether they were left in the game longer than normal. And check whether they pitched better in late-season games, which would mean the "clutch" theory is correct.

And you can do the same thing for hitting, for players around .300. Is it just a matter of opportunities, or is there some clutchness too? If the latter, that would be a very significant finding. It would suggest, perhaps, that

(a) clutch hitting does exist, and either

(b1) it only shows up for personal goals, or(b2) it only shows up when the situation is not clutch for the other team.

Maybe I'll do this myself, if nobody else does ...UPDATE: Part II is here.

Monday, March 24, 2008

Would you pay 61,000,000,000,000% interest on a loan?

Every so often, "payday loan" operators come under attack for the high-fees they charge to their customers, most of whom are not well off. A couple of years ago, one of the local papers, the Ottawa Citizen, ran a bunch of stories on the issue. If I recall (the articles are not online), they sent a reporter out to get a loan, and it wound up costing $75 for a three-day loan of $300.

What's the annual compound interest rate on the loan? The charge was 25% of the principal; to get an annual rate, you'd figure there are 121.667 three-day periods in a year, and calculateInterest Rate in % = 100 * [ (1.25)^(121.667) – 1 ]

Do that arithmetic, and you get an annual interest rate of over 61 trillion percent.

The Citizen took this number and went to town with it. The legal limit on interest in Ontario is 60%, and 61,000,000,000,000% is higher than that. The payday loan people actually charge 60% interest plus a fixed service charge, and argue that the service charge shouldn't count as interest. Advocates opposed to payday loans, of course, argue that it should.

Anyway, after a couple of days of this, I wrote the following and sent it to the editor of the paper's Op-Ed page. They didn't run it.

-----

In his breathless expose of the payday loan industry [Dec. 3, 2005], the Citizen asks, "would you pay 61 trillion percent interest on a loan?" Well, in some cases, yes, I would, and you probably would too.

I'm in line at the coffee shop at work, and realize I forgot to bring my wallet. I turn to my co-worker. If he can lend me $20, I tell him, I'll buy him a $2 coffee. He agrees. The next morning, after I've been to the bank machine, I pay him back his $20.

Clearly, my friend is an exploitative loan shark. With compounding, the twenty-four-hour loan cost me an annual interest rate of about 128 quadrillion percent -- 128,330,558,031,335,269, to be more exact. That's 2,000 times higher than even the payday loan operators.

And it's a good thing I didn't stop by the bank machine until the next day. There's a bank machine between the coffee shop and my desk. If I had paid back the loan in five minutes, rather than 24 hours, the effective interest rate would be much higher. Much, much higher -- it would have 4,354 digits!

Here's a real-life example. Most of the major banks charge a service fee of $2.50 for a credit card cash advance (in addition to the annual interest of 21% or so). Having forgotten my cash at home, I stop at the bank for an advance of $50. I pay it back the next day. My total cost is $2.50 for the service fee, and three cents in interest, for a 24-hour total of $2.53. Compounded, that's an annual rate of 6,678,050,678 percent.

Should I feel ripped off that the bank charged me over six billion percent on my loan? Actually, it was worth it! The total cost to get my $50 was only $2.53. Had I used the same bank machine to draw money from my savings account, it would have cost me $3 in ATM fees. Accepting an interest rate in the billions actually saved me 47 cents!

The moral: when loan periods are very short, and compounding does not actually occur, the astronomically high compound rates are grossly misleading of the nature of the actual transaction. Reasonable people should agree that both my transactions were quite acceptable – when you come down to it, they cost me only around $2 each.

Which brings us to the payday loan services.

In their calculations, the Citizen assumes that service charges must legally be included in the interest rate charged. (The payday services disagree on this legal question, and it will be up to the courts to decide, of course.) But it must be noted, first, that the actual service does have a real cost, and, second, that including those costs in the interest rate cap would make it impossible to provide these services without going bankrupt.

The Citizen's experience was a charge of $75 on a three-day loan of $300, which worked out to 61 trillion percent annually. At a capped annual rate of only 60 percent, the lender could have charged no more than about $1.50 (including both interest and service charges) for that same loan.

But there's no way $1.50 will cover costs. It probably costs more than that in wages just to process the loan, or even just to pay the teller to help the borrower fill out the form. Add the cost of storefront rent, advertising, credit checking, and security (these stores are open 24 hours), and it becomes overwhelmingly obvious that it is not possible to provide short term loans at $1.50 each. The banks, with full credit checks, preprocessed applications, and automated machines, charge $2.50. How can a small storefront operator lend to total strangers, in cash and in person, at 3 am, for only $1.50?

Now, an alternative argument could be made that even if $1.50 is too low, $75 is too high. Suppose we can agree that $10 is a reasonable profit on a loan of this type. A $10 charge on a three-day loan still works out to an interest rate of 5,402 percent, which is obviously much higher than the limit. This $10 doesn't even include the overhead cost of bad loans – and when you lend to strangers in the middle of the night on the basis of a pay stub, there is bound to be a lot of fraud.

Every loan requires some administrative overhead. For long-term loans, like mortgages, the administrative cost is trivial compared to the interest on the transaction. But for short-term loans, administration is the lion's share of the cost. In these cases, to lump administration in with interest is disingenuous. More importantly, to insist on enforcement of the 60-percent limit is dishonest. For three-day loans, a 60% limit is the equivalent of a ban.

If a ban is what advocates want, they should come right out and say so -- instead of pretending the limit is workable, running scary but irrelevant numbers, and insisting that willing customers are being "exploited."

Thursday, March 20, 2008

Does a "hot hand" improve a team's March Madness chances?

When seeding the teams, NCAA organizers include a measure called "L12," which represents the team's record in the previous twelve games. It's just one of many factors that go into the rankings. The idea is that if a team has played well in the recent past, it might be on a roll, and more worthy of inclusion or an improved ranking.

Glockner thinks the L12 rating should be dropped because it doesn’t have any predictive validity. This is in keeping, I think (and Alan can correct me) with current "hot hand" thinking, which holds that streaks are mostly just random. So a .600 team that got hot lately is probably no better than a .600 team with a more even sequence of wins and losses.

But the evidence that Glockner uses to prove his point, is, I think, not relevant. Not just because the sample is very small (as Glockner acknowledges), but because his conclusions don't match the evidence.

Glockner compared low-ranked teams that won their first March Madness game to teams that lost. He then looked at the L12 record of the two sets of teams, and found they were pretty much identical:

7.7-4.3 -- L12 record of teams that won7.6-4.4 -- L12 record of teams that lost

Since the records are roughly equal, Glockner argues that L12 is "non-predictive."

But these results are exactly what you'd expect to see if L12 is a legitimate factor! Remember, L12 is one of many criteria used to create the rankings. So a team that gets in with a good L12 record is probably worse in other ways than a team that gets in with a poor one. (That is, a team with a *bad* L12 record has to be a better team overall, or it wouldn't have made it in against the teams with good L12 records.)

So if L12 was useless, you'd see the winners being the better teams, which got in with worse L12 records. And so you'd see winning teams have a *worse* L12 record than losing teams – NOT an identical record.

(An easier way to see why this is true: imagine that "amount of money used to bribe the NCAA" was also one of the criteria. Only bad teams would need to hand out big bribes to get in, so you'd expect losing teams to have given disproportionally large bribes.)

So I think Glockner has it backwards. The fact that the winners are equal to the losers in terms of L12 is evidence – very weak evidence, but evidence nonetheless -- that the committee got it right.

Wednesday, March 19, 2008

Guest post: Don Coffin on golf performance measures

(This is Phil. In response to a previous post on golf scores, Don Coffin, an economist and sabermetrician from Indiana University Northwest, did a little extra research. Here's Don:)--------I [Don] have done some research on the relationship between various measures of golfer performance and overall performance (strokes per round; prize money), but have found it difficult to know exactly where to go with it. Here’s how I have approached it.

Overall performance, measured as strokes per round, has three components:

1. Shots off the tee. There is essentially no variation in this measure of performance. Everyone has essentially one tee shot per hole, or 18 per round, and the standard deviation is almost zero.

2. Putts. There is some variation here, but less than one would expect. In 2007, according to data at PGATour.com, the average number of putts per round (averaged across golfers, so this is an average of averages) was 29.30, with a standard deviation on 0.52. The coefficient of variation was 1.77%.

3. All other shots. There’s a very little more variation here; again, using 2007 data, the average was 23.98 “other” shots per round, with a standard deviation of 0.63, and a coefficient of variation of 2.62%.

Overall, golfers in 2007 averaged 71.28 strokes per round, with a standard deviation of 0.59 strokes per round, for a coefficient of variation of only 0.83%. Overall performance was, then, less variable than the components of scoring with some variation.

The PGA reports a number of what it calls “skill statistics;” all of these are reported in Table 1. (Putts per round shows up in the “skill statistics;” “Other Shots” is Strokes per Round, minus Putts per Round, minus 18). If our objective is to explain overall performance, as measured by Strokes per Round, then we have to select explanatory variables from among the available performance measures. For 2007, the PGA reported all the “skill statistics” data for 196 golfers.

I believe it is inappropriate to use Putts per Round as an explanatory variable. If we could control adequately for other performance measures, then the (expected) coefficient on Putts (in a multiple regression) would be 1—each additional putt would raise Strokes per Round by 1. What would be useful, however, would be to find explanatory factors for the components of Strokes per Round—Putts, and Other Shots. ( ... continued)

An NCAA basketball pool for experts

It's an entertaining exercise, but I wouldn't make too much of the results.

For one thing, as Tom Federico says in the article, there's a lot of luck involved in any small sample, and the ultimate winner would probably have been aided more by good fortune than by ability. Second, the sample is in a sense actually "smaller" than it looks at first glance, because the results of one round affect the next. So a single upset could actually cost you several games' worth of losses.

I'd call this more a publicity exercise than a true test of an expert's abilities. The ultimate test of a forecaster is still how well he does against the Vegas betting market – the best aggregate predictor of sporting events known to humankind.

Sunday, March 16, 2008

Long tee shots: how much do they improve PGA golf scores?

PGA golfers are hitting for distance better than ever before. Is that contributing to an improvement in their scores? Or, by going for the long drive, are they losing so much accuracy that the increased distance doesn't help?

An article (fortunately available online) in the new issue of Chance Magazine (published by the American Statistical Association) tries to answer that question. It's Called "Today's PGA Tour Pro: Long but Not so Straight," by Erik L. Heiny.

Heiny starts by showing us that average PGA driving distance has increased substantially in recent years. Between 1992 and 2003 (these seasons are the ones used throughout the paper), Figure 1 shows an increasing trend from 260 yards to 287. Driving accuracy, though, is flatter; Figure 19 shows a relatively stable trend up to 2001, at which point accuracy suddenly drops from 68% in 2001 to 66% in 2003.

Then, Heiny runs some straight regressions between various aspects of golf performance, and gives us the year-to-year correlations. Unfortunately, he doesn't give us the regression equations, which is where most of the knowledge is. For instance, driving distance is positively correlated with score. But what's the size of the effect? If Phil Mickelson increases his distance by 5 yards, what can he expect as an improvement? Half a stroke? One stroke? Two strokes? This is important and useful information, but Heiny doesn't tell us.

I found the correlations for the different variables much more interesting than the year-to-year trends, which are mostly flat, with occasional exceptions in 2003. Given that 2003 is also the year driving accuracy dropped, you've got to wonder if something specific happened that year to make all these things happen at once.

(I should give you the definitions for some of the less obvious variables. "Driving accuracy" is percentage of drives (excluding par 3s) that landed on the fairway. "Greens in regulation" is percentage of holes in which the green was reached in (par – 2) strokes or fewer. "Sand saves" is percentage of balls in sand traps holed in two shots or fewer (I think). "Scrambling" is how often a par (or better) was made, as a percentage of holes where the green was *not* reached in regulation. And "bounce back" is the percentage of times a player gets a birdie or better after a hole of bogey or worse.)

I was surprised at some of the correlations. Which of the seven factors above would you think had the biggest influence on score? I would have thought "greens in regulation" and "putts per round." I was half right. Here are the approximate numbers, as I eyeballed them from the graphs:

(The author also repeats these correlations for money instead of scoring, but, for the most part, the conclusions are the same, so I won't mention them further.)

From these correlations, Heiny draws some tentative conclusions about how driving distance has affected play. For instance, in 2003, the correlation between driving accuracy and scoring decreased from 0.2 to 0.0. Heiny writes that this suggests that

"... with the driver going so far, it just didn’t matter whether the player was in the fairway. With longer drives, players could get close enough to the green to play short irons or wedges. Even from rough, they could control the shot into the green."

I'm not sure I'd agree with that. Driving distance went up only 6 yards that year, and I'd be more inclined to view the big drop in correlation as random. Indeed, in the first ten years of the study, distance increased 20 yards, with little change in accuracy or correlation.

In any case, the author proceeds to multiple regressions, where he predicts a player's score based on the seven variables above. He runs one regression for each year.

Again, we don't get any equations, just signficance levels. Summarizing the 12 regressions, here's what the article shows:

Strangely enough (to me), a couple of results were slightly different when predicting (the logarithm of) money winnings instead of score: drive accuracy went from 3/12 to 7/12, and scrambling went from 12/12 (all of which were < .0001) to 2/12 (many of which were greater than 0.5, which means *negative* correlation!) I can think of a few reasons why this might occur (high money might be correlated with a longer course, which means higher scores; high money might mean better caliber golfers, which might have different characteristics; and so on). In the money regressions, driving distance was extremely signficant (p < .0001) up to 2000, when it started becoming less important, hitting p = .0545 (not significant at 5%) in 2003. Also, driving accuracy seemed to be more signficant in the early years than the later years. This prompts Heiny to say that

"it seems to be evident that accuracy off the tee is becoming less important as driving distances increase."

But it seems to me that you can't really draw any conclusions like that from the regression, because of the choice of variables.

For instance: how would in increase in driving accuracy improve scoring? It would do so by increasing the chances of landing on the green in regulation. But "greens in regulation" is another variable being used in the regression! And so even if driving accuracy doesn't come out as significant, that might be because most of its influence is on the "greens in regulation" variable.

We know that if you're a pitcher, giving up a lot of hits increases your chances of losing the game, right? But if we run a regression on losses, and include hits *and runs*, hits will come out as completely insignificant. Why? Because your chance of winning depends on how many runs you give up – but, if you give up four runs, *it doesn't matter* how many hits you allowed in producing those four runs. Hits and runs don't indpendently cause wins: hits cause runs, and runs cause wins. Hits are irrelevant if you know runs.

Hits -----> Runs -----> Wins

The same kind of thing holds for driving accuracy:

Accuracy -----> Greens in Regulation -----> Score

If Vijay Singh his 72% of greens in regulation, it doesn't matter much how he got to that 72%: his score will be roughly the same if he got it by inaccurate drives with good recoveries from the rough, or accurate drives with average approach shots. And so, just like hits are irrelevant if you know runs, accuracy is irrelevant if you know GIR.

The analogy isn't perfect: accurate tee shots affect more than just GIR. They may lead to a better position on the green, which will affect putts per round (which is also a variable in this regression). In cases where you miss the green, a more accurate tee shot might lead to an easier scramble (also a variable), or a better lie in the sand (again a variable).

Since all those other things are accounted for in the regression, the surprise is that accuracy would ever be significant at all! There must be other ways in which accuracy improves scores. What might those be?

We can start by noting that score can be computed *exactly* from these four variables:

A = Percentage of GIRB = Percentage of greens in one stroke less than regulationC = Number of putts taken per holeD = On missed GIRs, average number of extra strokes taken to get on to the green

So if you ran a regression on B, C, D, and DA, you would predict score perfectly -- you'd get a correlation of 1, and the above regression equation. (If you ran it on A, B, C and D, you'd come close to 1, but wouldn't hit it exactly, because there's an interaction between D and A that you wouldn't capture.) And adding other variables to the regression – such as accuracy – would do almost nothing, because accuracy "works" by causing a change in one or more of A, B, C, or D.

Now, Heiny didn't actually include all of A, B, C and D in his regression. But he came close. His regression did include:

ACScrambles, which is significantly correlated with D and C.

So what's left that he didn't include? B, and part of D. So when one of the other variables, such as driving accuracy, comes out significant, it must because of its effect on B or D:

-- it increases the chances of getting to the green in less than regulation (B); and-- it increases the chance that if you miss the green in regulation, you'll get back on in few strokes (D).

Looked at in that light, it's easy to see why driving distance is significant: to get yourself a B, you need to hit a par-5 green in two strokes, so you better be hitting long. And it's easy to see why accuracy is significant – landing on the fairway means you avoided losing your ball or hitting in the water, saving yourself a lot of D.

(By the way, could it be that "sand saves" comes out insignificant because all sand saves are actually scrambles? Or does the definition of "scramble" explicitly exclude bunker shots?)

But those minor results are not what the study was looking for. The point of the study was to see if distance and accuracy caused scores to drop in general, not to see if it caused scores to drop only because of "greens in better than regulation" and "water hazards avoided".

And if want the overall effects of distance and accuracy, you can't include any variables that are *also caused* by distance and accuracy. Run your regression on distance and accuracy only, and see what you get.

Wednesday, March 12, 2008

Do golfers give less effort when they're playing against Tiger?

According to this golf study by Jennifer Brown, the best professional golfers don't play as well in tournaments in which they're competing against Tiger Woods. Apparently, they have a pretty good idea that they won't beat Tiger, and so they somehow don't try as hard.

Brown ran a regression to predict a golfer's tournament scores. She considered course length, whether Tiger was playing, whether the golfer was "exempt" (golfers who were high achievers in the past are granted exemptions) or not, whether the event was a major, and so forth. It turned out that when Tiger was playing, the exempt golfers' scores were about 0.8 strokes higher (over 72 holes) than when Tiger sat the week out. Non-exempt golfers, on the other hand, were much less affected by Tiger's participation – only 0.3 strokes. Brown suggests that the difference is that the non-exempt golfers know they can't beat Tiger anyway, so his presence doesn't deter them from playing their best.

Also, when Tiger was on a hot streak – his scores in the previous month were much better than other exempt golfers – the effect increases. Now, instead of being just 0.8 strokes worse than usual, the exempt golfers are 1.8 strokes worse. During a Tiger "slump," the exempt golfers are so pumped by their chance of winning that they're *better* than usual, instead of worse – by 0.4 strokes.

The result seems reasonable – the less chance you have of winning, the less effort you give. But 0.8 strokes seems like a lot. That's especially true when you consider that it's not *every* time that Tiger Woods runs away with the lead. It might be 0 strokes when Tiger is struggling, but 1.6 strokes when it's obvious that it's a lost cause this week. And how does Phil Mickelson lower his scores by 0.8 just by trying harder? Is it more practice? Is it setting up better? Is it spending more time reading the green? What actually is it?

(UPDATE: a couple of commenters noted that with Tiger in the field, certain opponents might change their style of play to take more risks to beat him, and that might cause the scoring difference. However, the study checked for that, by looking at the 72 individual holes, and it found no difference in the variance based on whether or not Tiger was playing.)

There's something else that's confusing me, and that's one of the actual regression results in the paper. Brown has a dummy variable for (among other things) every player, every golf course, and whether or not the tournament is a major. The coefficient for the major is huge: around 17 strokes.

That doesn't make sense to me, because that's after adjusting for the player and the course. It says that if Phil Mickelson plays Pinehurst #2 twice, with the same course length and the same wind and temperature, but one time he's playing a major and the other he's not, *his score will be 17 strokes higher in the major*. That just doesn't sound right to me. Why does calling a tournament a "major" make every golfer 17 strokes worse? (See tables 4 and 5 of the study.)

(Also, how can you divorce the course from the major? As far as I know, the Augusta National Golf Club, which hosts the Masters (a major), doesn't host any other PGA tournaments. So what happens when you run the regression, and the Augusta dummy is always the same as the Masters dummy? I'm not an expert after my one course – but doesn't that cause some kind of matrix problem in the regression?)

I wouldn't even expect a coefficient of +17 only if you didn't adjust for the course at all. Well, maybe if you look at last year's Masters. At Augusta in 2007, every golfer finished over par. But in the 2008 Buick Invitational (which is not a major), 34 players finished at or below par, and the winner (Tiger) was at –19.

But that's an exception. Eyeballing the four majors on Wikipedia, the Masters and PGA Championship usually feature a winner around –8. The British Open looks a little easier, and the US Open a little tougher. Eyeballing the non-majors in 2007, the typical winning score looks to be somewhere in the mid-teens.

So the difference is probably around 7 strokes or so. After correcting for the higher caliber of golfer in the majors, you might get to, what, 12 or 13? You're still short of the 17 the regression found. And, again – that's BEFORE adjusting for the difficulty of the course! So I just don't understand.

The paper is only two pages long; it lists Rose's bets, both on the Reds and on other teams, from April 8 to May 12, 1987. Rose placed 190 bets, typically of $2,000 each. He lost $4,200 on the Reds (27 bets), $36,000 on other NL teams (72 bets), and $7,000 on his AL wagers (89 bets), for total losses of $47,200. (My numbers add up to 188 bets instead of 190 because I probably counted them wrong from the table.)

Coate describes Rose's bookie's odds as even money, with a 10% fee payable on losses. This is equivalent to 10:11 odds. He says the losses "include about $20,000 to $25,000 in transaction fees," so even if Rose were able to have gotten even money, he still would have lost.

From this, Coate argues that these results are "consistent with an informational efficient [betting] market," and says that Rose's expertise "was not an advantage."

I'd agree that it's interesting that Rose didn't win, and I do think the betting market is very efficient, but I think that looking at Rose's bets is a very weak way of testing for strong market efficiency.

First, why presume Rose is an expert? It's pretty well established that most professional baseball managers work from their gut, not from science or empirical evidence. Rose may have inside information, but professional gamblers have analyzed the market, and, in some sense, have a lot more information impacting their bets than Rose does.

Second, during the period in question, Rose bet on every Reds game except one. It's not very plausible to assume that he would have had inside information impacting every game, would it? Even if he did, somehow, know that over the entire month, the Reds were better than could be predicted from publicly-available information, that would have become apparent as they started winning games and the odds adjusted.

Third, any advantage you'd gain by having inside information isn't worth that much. For instance, suppose Rose knew that Eric Davis would be out of the lineup. If Davis was 50 runs better than his replacement over a season, that's 0.3 runs over a game. That's 0.03 wins, or one extra win every 33 games. Knowing about Davis's absence is a fairly big piece of insider information, and even if Rose had one of those every game he bet on, he'd only win one extra game out of 33 (perhaps going 17-15 with a rainout instead of 16-16).

Suppose, on average, Rose had this kind of insider information one out of every three games (which still seems like a lot, especially considering he bet on so many different teams). In that case, instead of going 95-95, he'd be expected to go 97-93. Rose's results aren't significantly different from either of those numbers, so if you say there's no evidence that Rose could beat the market, you have to also admit that there's no evidence that he didn't have insider information and *could* beat the market. Rose's bets just aren't a powerful enough test to tell us much about market efficiency.

Monday, March 03, 2008

Bill James on Bert Blyleven

Should Bert Blyleven be in the Hall of Fame? The main reason he's not is that he didn't win 300 games; his record was 287-250. However, even his critics will acknowledge that Blyleven's other stats are certainly HOF quality – 685 starts, 3701 strikeouts, and a 3.31 ERA.

So the question becomes: is Blyleven's W-L record his "fault"?

On his pay website, Bill James analyzes Blyleven's record quite thoroughly and entertainingly (using Retrosheet data). The Blyleven study is actually available in a free preview – go here and click on "Blyleven."

Bill argues, quite reasonably, that there are two reasons Blyleven might have lost a few wins off his record:

1. His teams might have given him poor run support;2. He might have failed to "match the effort" of his teammates.

Number two means that, even though Blyleven pitched well, he might have saved his best outings for when it still wasn't good enough; giving up three runs when his team only scored two, for instance. If true, that would have cost him a bunch of wins, and, in some eyes, would be enough to keep him out of the hall.

Bill starts by looking at run support. Throughout the essay, he compares Blyleven to six other similar pitchers. Those others, like Blyleven, had long careers and ERAs ranging from 3.22 to 3.45.

It turns out that Blyleven had poor run support compared those other guys:

From here, let me tell you what I would have done. Then, I'll show you what Bill did, which is much more thorough.

Blyleven had 685 starts, and got about a tenth of a run less support than average. That's about 70 runs. That means that run support cost him about 7 wins, still not enough for 300. Of course, if he had had Fergie Jenkins' support, that would be 14 wins, which does take him over the 300 mark.

As for timing of runs in games, you can use Pythagoras for that. Blyleven's ERA wa 3.31; including unearned runs, his "RA" was 3.65.

A team that scores 4.19 runs per game while giving up 3.65 should have a winning percentage of .563 (using exponent 1.83). Blyleven had 537 decisions, so he should have gone 302-235: 15 games better than his actual record.

However, Blyleven pitched 7.25 innings per start, not nine. Since Bert was an above-average pitcher, the bullpen would have cost a few extra runs. Assuming Blyleven's relievers would have given up (say) 4.25 runs per 9 innings, that would have been about 80 additional runs over Blyleven's 3.65. That's 8 wins. So Blyleven was really only 7 wins worse than he "should have" been due to run timing.

So I'd conclude: run timing cost Blyleven 7 wins, and run support another 7.

Now, here's what Bill did. Actually, this is his main method; he has a couple of other methods, and some interesting observations (When given three runs of support, Don Sutton was 52-33 – Blyleven was only 29-48 !!!). You should definitely read Bill's study in its entirety, because I'm only going to describe one of his methods here.

Instead of resorting to Pythagoras, Bill looked at those six comparison pitchers, and figured out the records of their teams when they scored 0 runs of support, 1 run, 2 runs, and so on. He then counted how many times each of those scores happened in a Blyleven start, and computed an "expected" number of wins.

The expected record was 371.5-313.5. The actual record of Blyleven's teams was 364-321. That's 7.5 wins, almost exactly what the Pythagoras method found.

I like Bill's method better than the Pythagorean one because it doesn't just satisfy sabermetricians, but it's able to convince non-sabermetricians as well. To a columnist who is openly hostile to sabermetrics, Bill's method is one that can't be dismissed out of hand. At least not as easily.

And, by the way, for those of you who want to hold Blyleven responsible for his 7 missed "timing" wins, Bill writes,

"Suppose that Blyleven has a seven-game stretch during which he wins games 13-0 and 5-2, but then loses 3-2, 4-3, 3-2, 7-4 and 3-2. Those are the actual scores of Blyleven’s games from May 3 to June 4, 1977.Blyleven was supported by 4.43 runs per game during that stretch and allowed 3.14, but he lost five of the seven games. "One can look at that and say that Blyleven failed to match his efforts to the runs he had to work with—but why is that all Blyleven’s fault? Isn’t it equally true that his offense failed to match their efforts to Bert’s better games? It seems to me that it is. "So why do we hold Blyleven wholly responsible for this? Wouldn’t it be equally logical, at least, to say that this was half Blyleven’s fault, and half his team’s fault?"

Sunday, March 02, 2008

Defensive rebounding stats: barely meaningful?

Back about a year ago, I suggested, following similar arguments by King Kaufman and others, that The Wages of Wins "Win Score" overvalues rebounds by assigning 100% of their value to the specific players who grab them. I argued that rebounds are the result of team play and positioning, and you shouldn't just credit the player who grabs the ball. As Guy put it in the comments to that post, first basemen make many, many more outs in the field than third basemen, but that doesn't mean they're better fielders. They just happen to be assigned to the spot where the ball often goes.

Since then, there have been some additional research to support that idea. Here's a couple that I found.

First, a couple of months ago, Guy posted these results on the "Wages of Wins" site:

-- the correlation between RB/G for a team’s top rebounder and RB/G for the other players on that team is -.76. Clearly, many of the RBs by the top rebounders are just taken from other players. (As TG notes, we don’t see this in baseball.)

-- the SD for RB/48min at the player level is around 3.8, but at the team level is just 1.4. It’s actually lower at the team level! (If each player’s rebound opportunities were independent of his teammates’, then the team SD would be about 8.5, or 6 times as big.)

Clearly, these results show that the NBA's top rebounders are simply taking opportunities that other players would be getting in their absence. I think Guy's findings are pretty convincing.

If you want more, check out thesetwo blog posts by Eli W., of "Count the Basket." In the second post, Eli looks at the empirical data a different way. He finds cases where the five men on the court have, overall, above-average rebounding stats, and cases where they have below-average rebounding stats. If players weren't taking opportunities from their teammates, you'd expect that when five "20% better than average" rebounders are on the court, they should still each be grabbing 20% more defensive rebounds.

It turns out that they don't. Only a very small part of their statistical profile remains when you put them together. Looking at Eli's graph, it appears than when, based on their individual stats, you expect 80% rebounding, you get only 75% rebounding. When you expect 61% rebounding, you get 70% rebounding. No matter how good (or bad) your players look on paper, based on their individual defensive rebounding stats, when you put five of them together, you get much closer to average rebounding than the naive observer would expect. Again, this suggests that individual statistics in this category depend more on opportunities than on skill.

What's interesting is that this doesn't apply to *offensive* rebounding. On the offensive side, there is a bit of an effect of diminishing returns, but not very much: if your five players snag offensive rebounds at a combined rate of 32% (rather than the league average of somewhere in the high 20s), they won't actually hit that 32% -- but they do come in at about 31%.

So it looks to me like if you want to credit individual players for their offensive rebounds, you won't be too far wrong. But if you do the same for *defensive* rebounds, you're going to be very inaccurate.