Sabermetric Research

Phil Birnbaum

Tuesday, February 26, 2008

Bill James website and "cursed" teams

The "Bill James Online" website is now up and running. It's got about 50 articles by Bill James, and a bunch of statistics. The statistics aren't the usual numbers, but, rather, more interesting Jamesian stuff. For instance, there's something called "Ghost Runs," which is when a runner is out on a fielder's choice or forceout, but the runner who takes his place later scores.

The organization of the website, unfortunately, makes it difficult to download large quantities of this data, but you might have some screen-scraping techniques that might help.

Anyway, my main interest isn't the stats, but the articles – many of them are full sabermetric studies, of the usual high Jamesian quality. It's $9 for a three-month subscription, which, in my opinion, is well worth the price: assuming another article a week, that's about seven for a dollar. I'd pay several times that price; there are few things in life I enjoy more than a Bill James study, and those cost a lot more than 15 cents.

I'll review a few of the articles as I read them. Here's one now.

In the study called "Curses" (subscription required), Bill figures out that the chance of a team winning the World Series is almost exactly proportional to the cube of the difference between its wins minus its losses. So if the Yankees go 100-62 (38 games over .500), while the Red Sox go 97-65 (32 games over .500), New York has almost a 50% higher chance to win the World Series than Boston does.

Of course, that doesn't mean *in the same season*. Obviously, if the Yankees and Red Sox had those records in the old two-division AL, the Red Sox would have *zero* chance. Rather, what it means is that if those teams had those records in two separate seasons, the Yankees would be 50% likelier to win their WS than the Red Sox would be to win theirs.

Bill nonetheless uses this method to figure the chances in the same season. He lists all the teams with winning records from 2003, and gives them "claim points" based on the cubes of their games over .500. The Yankees were 40 games over, so get 64,000 points. The Blue Jays were 10 games over, and get 1,000 points. Therefore, the Yankees have 64 times the chance the Blue Jays have.

The sum of all the teams' cubes is 272,767. So the Yankees chances of winning the WS are 64,000 divided by 272,767, which is 23%. The Blue Jays are 1/64 of this, at an 0.4% chance.

It seems weird, but it also seems to work. Bill writes,

-- Of the teams which were estimated to have a 40 to 50% chance to win the World Championship, 48% actually did. -- Of the teams which were estimated to have a 30 to 40% chance to win the World Championship, 33% actually did.-- Of the teams which were estimated to have a 20 to 30% chance to win the World Championship, 22% actually did. -- Of the teams which were estimated to have a 10 to 20% chance to win the World Championship, 17% actually did.-- Of the teams which were estimated to have a 5 to 10% chance to win the World Championship, 7% actually did. -- Of the teams which were estimated to have a 1 to 5% chance to win the World Championship, 3% actually did.-- Of the teams which were estimated to have less than a 1% chance to win the World Championship, one of 339 actually did.

So now we have the ability to estimate how many Series teams should have won based on their historical W-L records. Between 1920 and 2003, six teams were at least one championship below expectation:

Overall, the Yankees were the least-cursed team in history: expected to win 16.7 World Series, they actually won 26.

I'm not completely sure what I think of this method, but the fact that it's so accurate certainly impresses me. When you use it to measure how much a team is "cursed," you're limiting it to one specific sense: having a good year, but failing to win the World Series despite your good year. If there's a curse that keeps your team at 79-83, this method won't tell you.

Straight-up picks can't distinguish good pundits from bad

The easy method is dubbed the Isaacson-Tarbell postulate, after the two readers who proposed it. Pick the team with the better record; if the two teams have the same record, choose the home team. According to ESPN.com's Gregg Easterbrook, no pundit was able to beat Isaacson-Tarbell. Only one was able to tie.

While normally I love to join the "Super Crunchers"-esque refrain that formulas often know better than "experts," I don't think it really applies in this case, where you have to pick winners straight-up.

NFL matchups are often lopsided. If an .800 team is playing a .300 team, it's obvious that you have to pick the .800 team. No matter how expert you are, no matter how much insider knowledge you have, you simply aren't going to be able to know that the .300 team is better, because it *isn't* better. The same is true for a .700/.400 matchup, or even a .650/.450 matchup. You may be more expert than the rest of the world, but the rest of the world isn't dumb. Everyone picks the .650 team over the .450 team, so your insider knowledge doesn't do you any good. The best you can do is to *tie* the rest of the dumb-but-not-that-dumb punditocracy.

It's only when it's a close matchup that expertise can come into play. Suppose that two evenly-matched teams are going at it, and most experts think team A has a 52% chance of winning. So, they pick team A. For you to outpredict those guys, you have to have insider knowledge, and that knowledge has to be in the direction that leads you to believe that team A has *less than a 50% chance*. That's the only way you'll predict team B will win, and the only way you'll be the rest of the (A-picking) experts.

How many close matchups are there in a season? Maybe one or two a week? Suppose there are 30 close games a season. In those, the best predictor might be more accurate than the pack, say, 50% of the time (to be generous). That's 15 games. In those 15 games, 7.5 will be more accurate go the "wrong" way -- that is, the additional expertise will confirm the pack's pick, not contradict it. That leaves 7.5 games left. Again to be generous, call it 8.

So, in eight games a season, the expert predicts a different team than the pack. But those eight games are already pretty close, almost 50/50. It's probably the case that the pack thinks they have a .520 pick, but they only have a .480 pick. So the expert has a .040 edge for eight games a year. Over 256 games, the best, most expert pundit has an advantage of about a third of a game. Is it any wonder you can't find out who the experts are by picking straight-up?

If you want to evaluate the experts, just have them pick against the spread. Now *all* the games are close to 50/50, not just 30 of them. Now, the expert has a fighting chance to emerge from the pack.

Most of the touts I've heard of *do* pick against the spread. I haven't seen how they've performed, long-term, but I bet most of them are pretty close to 50%. And, if so, *then* you can conclude that those so-called insiders can't beat a simple algorithm.

But looking at straight-up picks? That's like trying to find the best mathematician in a crowd by asking them what 6 times 7 is. Under those circumstances, the Ph.D. will do about as well as a sixth-grader. It doesn't mean the guy with the doctorate in mathematics doesn’t know more than the eleven-year-old. It just means you asked the wrong question.

Monday, February 18, 2008

Tango on the 1992-94 home run explosion

There was a large increase in major-league baseball hitting in 1993 and 1994, one that continues today. In 1992, there were 0.721 home runs per game. In 1994, there were 1.033 home runs per game. It wasn't a one-time increase: the 1994 rate has pretty much stayed with us to the present day.

What happened? There are various theories. One says that the 1993 expansion brought in a bunch of inferior pitchers, and the dilution of talent caused the numbers to jump. Another theory says that it's the ballparks – Coors Field entered the National League in 1993. A third theory says it's the ball: it was juiced up around that time, and remains juiced today.

In a post about a year ago, I argued that expansion couldn't have caused an effect as big as the one we saw. With reasonable assumptions, you can show that expanding by two teams should cause home runs to increase by only 3%. And you could also do the same for ballparks – even with Coors Field now in the equation, and even combining that with the 3% for expansion, there's still no way you can explain the 40% increase in home runs.

But for those who don't follow that logic, Tom Tango has an excellent study that should now win over the unconvinced.

Tango looked at the 1993 season, and compared it to 1992. But he stripped 1993 down, considering only players who played in 1992, in parks that were in existence in 1992. And he adjusted all the players' stats to give them equal playing time in 1992 and 1993.

The results: even among the incumbent players and parks, there was an 18% increase in home runs.

Repeating the study for 1993/1994, Tango found a second increase, this time of 20%.

Over two years, even without the effects of expansion, and the effects of new parks, there was a 42% jump in home runs.

So it can't be the parks. And it can't be expansion.

I don't think this finding is a big surprise, but it's so thorough, and so understandable, that even non-sabermetric fans should be convinced. You'd hope.

Anyway, if it wasn't the parks or expansion, was it the ball? Tango presents convincing evidence to say it was.

According to Dr. James Sherwood, MLB's ball tester, minor-league balls travel 391.8 feet under the same conditions that major-league balls travel 400.5 feet, for a difference of 8.7 feet. And, according to Greg Rybarcyzk of HitTracker Online, if you eliminated all home runs in 2006 that cleared the wall by less than 8.7 feet … you'd have roughly the same home run rate as before the jump.

Tuesday, February 12, 2008

If it's peer reviewed, journalists won't question it

Does the media lose its sense of balance when dealing with peer-reviewed studies? Apparently, some won't even ask mildly skeptical questions, even on politically-sensitive issues. One newspaper editor says,

"We are dealing with a peer-reviewed journal study, and I don't feel at all comfortable going beyond what they are publishing. That is not our role."

This is on a study about marijuana smoking possible causing gum disease. If even the science beat won't question academic studies, what hope is there that sports editors will?

Monday, February 11, 2008

The Wharton "Clemens Report" criticism -- Part II

In yesterday's article, Wolfers (and his three Wharton co-authors) showed that Roger Clemens' career trajectory is very different from the average veteran pitcher's. Today, he shows all 31 curves instead of just Clemens and the average:

Looking more closely at the methodology convinces me even more that the article's conclusions are inappropriate. For one thing, there are a few lines that are pretty close to Clemens'. For another thing, extrapolating the lines shows that many are a very poor fit -- Nolan Ryan, for instance, looks like he can pitch effectively at least into his 80s. And if you look at what the regressions are actually doing, it turns out the curves can't have much to do with the effects of steroids at all.

The methodology that created the curves is such that, when you fit a line to a career, it has to be quadratic, which means the line has to be symmetrically U-shaped. (The U can be right-side up, or upside down, but must be symmetrical.) Therefore, there is an implicit assumption built in: that the slope of the player's improvement as he approaches his peak (or, in Clemens' case, the slope of his decline to the trough) has to equal the slope of his decline after the peak (in Clemens' case, the slope of his improvement after the trough).

That is: if a player's peak is (say) 32, the model insists that his numbers at 31 equal his numbers at 33; that his numbers at 25 equal his numbers at 39; and so on. (That's why Nolan Ryan's curve looks like he can pitch forever.)

What this means is that the shape of the curve depends, equally, on both ends of the player's career. What the Wharton curve is doing is not evaluating the player's old-age performance, but, rather, comparing the middle of the pitcher's career *to both ends*.

Now, of the 31 pitchers in the curve, many of them probably had sub-par starts to their careers. Take, for instance, Nolan Ryan. His first five seasons were all above his career average in WHIP. This helps keep his curve concave. If you look at the later part of his career, from 1979 to 1991, he was godlike – but those early years keep his curve from looking like that of Roger Clemens.

For his part, Clemens, started out well: his first five seasons, as a whole, were roughly in line with his career. So he doesn't get that initial downhill momentum that would lift the right-end of his curve in symmetry.

Which brings up another point: Clemens' "right end" is also excellent. Eventually he will age, and it will decline. Even if he now retires, is there any doubt that, if he kept pitching, he would *eventually* decline? Give him a few more years of pitching, and he'll look like other pitchers who were effective from the beginning but faltered with age – and his trajectory will look more like the others.

Most excellent pitchers nonetheless start out simply average, and end with a few mediocre seasons. Clemens started out well, and hasn't hit his decline phase yet. His curve is flatter than the 31 other pitchers because he is the only one who:

(1) Started out pretty well;(2) Hasn't had many mediocre career-ending years yet;(3) Happened to have his two worst years right in the middle of his career.

If you believe the Wolfers curve indicates steroids, then you have to believe that the above three points also indicate steroids.

But (1) has nothing to do with steroids, and (3) simply has to do with the timing of the study. So you're left with (2). That has very little value as evidence; and, in any case, it doesn't require the fitting of quadratic curves.

So the Wharton study doesn't really tell us much of anything.

And, when you think about it, how can a career curve tell you much about steroids anyway? If steroids make you better, they'll let you play longer before the inevitable decline. That will stretch out your career trajectory, but not change its basic convex shape.

Sunday, February 10, 2008

"Clemens Report" criticism misses the point

A couple of weeks ago, Hendricks Sports Management (HSM), Roger Clemens' agents, put together a document purporting to show that Clemens' late-career effectiveness was not unusual, compared to certain other great pitchers with long careers. While the report doesn't mention steroids at all, the intent of the report is clear: to show that you can't conclude any illegal behavior on Clemens' part simply by the fact that he remained effective late in his career.

BJWW criticize the Clemens Report on the main grounds that if you want to see if Clemens' career trajectory is unusual, he should be compared to *all* "durable" pitchers, not just the three pitchers (Randy Johnson, Curt Schilling, Nolan Ryan) that Clemens' defenders chose.

So they found the 31 pitchers since 1968 with at least 15 seasons of 10 starts and 3000 IP over their careers. They plotted Clemens' career trajectory against the average of the group of 31. Here's the chart (am I allowed to show it here under fair use laws? Hope so.)

Clemens is markedly different: the average pitcher shows a U-shaped curve: an improvement up to about age 31, then a decline to the end of his career. Clemens, on the other hand, shows a straight line with a slight decline (for ERA), and an *opposite* U-shaped curve for WHIP: getting worse up to about age 37, then improving after that.

Therefore, the authors say, Clemens really IS unusual. His "statisticians-for-hire" agents are guilty of selection bias. "A careful analysis, and a better informed public, are the best defense against such smoke and mirrors."

Well, I don’t agree. I think BJWW should also have done a more careful analysis, and thought about their conclusions a bit more.

First: is this group of 31 pitchers (which, by the way, BJWW don't list) really the best control group to use? It is well-known among sabermetricians, since Bill James discovered it back in the 1980s, that power pitchers have much longer career expectations than control pitchers. Comparing Clemens to a mix of power- and control-pitchers would bias the group against him.

In their article, BJWW conclude that the graphs show Clemens to be "unusual" compared to the other pitchers. Well, of course he's unusual compared to most pitchers: he is an extreme power pitcher, of a type that has been shown, over 20 years ago, to have significantly longer careers than others! The Times authors think they have evidence that Clemens is on steroids, but what they've probably found is just evidence that Clemens is a power pitcher!And this is the *less* important criticism of the Times article.The second, and absolutely the most important point, is the authors are attacking a straw man. Clemens' agents are NOT saying that his career is *usual* – they are saying his career is *not unprecedented by a non-steroid user*. There's a big difference there, and it's not one of statistics or regressions or comparisons – it's one of common logic.

The public was saying, "look – Clemens' longevity is unusual – therefore he's probably taking steroids." HSM is replying, "Clemens' career is unusual, but not THAT unusual. Indeed, here are three pitchers with similar career trajectories, and nobody is saying *they* took steroids."

That's a convincing reply. To rebut it, it's not enough to show that Clemens' career is even farther from the average than HSM said – because even if that's true, it's irrelevant. The HSM argument doesn't depend on the average – it depends on the extremes. What HSM is saying is, "look, you have to understand, there is a certain type of pitcher, very atypical, who has this kind of career. It's not an outlier, it's not that rare, Clemens fits right in to that group, and it has nothing to do with steroids."

Look at it this way: suppose that five years ago, your neighbor Clem, down the street, comes into some money and builds a big extension on his house and buys a Ferrari. People think he robbed a bank or something. Subpoenaed to appear before a congressional investigation, he denies that he stole the money.

But the public still thinks Clem is a thief. Clem hires a lawyer to rebuff the charges. The lawyer says, look, Clem won the lottery in 2003, that's how he got rich. There's no theft at all. In fact, here are three other well-regarded rich guys who also won the lottery – Ryan, Schilling, and Johnson. They're rich too, and nobody thinks THEY stole anything! See, it's quite possible to get rich without robbing a bank, so lay off my client!

Then, four reporters, in a New York Times investigative article, say, well, why the heck should we compare Clem to only these three guys, cherry picked by Clem's lawyer? We should compare him to *everyone* who made a million dollars ever! They do, and find that, of everyone who made a million dollars in 2003, most of them were CEOs, and made similar amounts in 2004, 2005, and 2006. But Clem didn't make anything in those years – his career earnings trajectory is very different from the average million-dollar earner. See? We *should* be suspicious that Clem robbed a bank! His agents are full of crap!

Well, that argument is obviously silly -- but it's exactly the argument the Times authors make.Even if the statistical analysis is correct, it simply doesn't matter whether Clem's earnings vary from CEOs. What matters is whether other people have won the lottery, and whether it's reasonable to think that Clem did too. The relevant baseball question is not "how far is Roger Clemens from the norm?" The question is: "If a player is as far from the norm as Roger Clemens, what is the chance that he took steroids?"

And the answer is: if you acknowledge that Schilling, Ryan, and Johnson have roughly a similar career trajectory as Clemens, and you believe that none of them took steroids, then, from the statistical evidence alone, your first estimate of the probability Clemens cheated should be approximately *zero*.

Wednesday, February 06, 2008

Did Jose Canseco teach his teammates to use steroids?

Here's a study that believes it has evidence that Jose Canseco influenced his teammates to take steroids, by finding that those teammates hit more home runs after Canseco left the team.

It's called "Learning Unethical Practices from a Co-worker: The Peer Effect of Jose Canseco," by Eric D. Gould and Todd R. Kaplan.

Here's what Gould and Kaplan did. They took all hitters since 1970 in the "power positions" (C, DH, OF and 1B) and examined how they did before playing with Canseco, while playing with Canseco, and after (but no longer) playing with Canseco. They controlled for a bunch of factors: year, manager's record, etc., and, most importantly, they controlled for the player himself, so that they are comparing parts of his career to other parts of his own career.

They found that there was indeed an effect. While playing with Canseco, the average player hit almost 1.1 home runs more than before Jose was on his team. And later, once separated from Canseco, the average player hit an additional 2.9 home runs! The 1.1 figure is almost statistically significant (about 1.7 SD), while the 2.9 is extremely significant (4.5 SD).

However, those numbers don't control for playing time. It turns out that while playing with Canseco, players had an average 16 extra AB than before. 1.1 HR in 16 AB is about 35 HR in a full season of 500 AB – more than average, even for a player in a power position, but really not that big a deal. And in the "after Canseco" years, the players had 53 extra AB in which to hit their 2.9 HR, which is 27 HR per season – probably still a little more than what you'd expect.

So far, it's a bit more power than you would have figured, but nothing too serious, and certainly not enough to persuasively show a "peer effect" of Canseco on his teammates.

Also, keep in mind that the study didn't control for the age of the players. Obviously, they'd be older in their "after Canseco" years than in their "before Canseco" years, and power generally increases with age. Taking that into account, wouldn't you now think that these numbers are pretty much as expected? Players get older, and, as they age, they get more playing time and hit for a little more power. I don't really see what the big deal is.

There is one caveat: the authors repeated their study for players other than Jose Canseco -- Rafael Palmeiro, Jason Giambi, Mark McGwire, Juan Gonzalez, Ivan Rodriguez, Dave Martinez, Ken Caminiti, Ken Griffey Jr, Ryne Sandberg, and Cecil Fielder. (The authors say they checked 30 players, but they only show the AB results for these 10.) It turned out that of all these players, Canseco showed the strongest effect in raw AB numbers. In home runs, the most comparable player was Ryne Sandberg. After playing with Sandberg, players had 1.6 more home runs per season than before Sandburg. (Recall that after Canseco, they showed 2.9.) After Canseco, players had 53 more AB. But after Sandberg, they had 24 *fewer* at-bats. That's a bigger increase in home run percentage on Sandberg's part. From these numbers, it would seem to me that Sandberg would be a better candidate as a bad steroid influence than Canseco.

In any case, the authors do acknowledge that the raw home run numbers are not very meaningful without also taking into account the changes in AB. So they ran another Canseco regression, this time including a variable for AB. Now, instead of 3 extra home runs, these players hit only 1 extra home run. That still comes out statistically significant, at 2.2 SD.

But again, this regression doesn't include player age. For players in the sample, the mean number of AB was 310. Would an increase of 1 HR per 300 AB not be consistent with normal patterns of aging for players in "power positions"? And that's not a 1 HR increase every year – it's the average increase between the several years the hitter played with Canseco, and the several years following. Again, it seems reasonable to me.

We'd have a better idea if the authors repeated this analysis for the other 10 hitters, controlling HR for AB there also. But they didn't, so we don't know if Canseco is a special case in this regard as well.

Finally, Gould and Kaplan dropped six players from the sample, the players that Canseco claimed to have personally injected with steroids. (Those were Palmeiro, Giambi, McGwire, Gonzalez, Rodriguez, and Martinez.) Without those six, the "after Canseco" increase in HRs dropped from 2.9 to 1.5. That's still statistically significant. But there's a corresponding increase in strikeouts (6.8) and walks (4.7), while batting average and slugging percentage barely change (.002 and .003 respectively). That suggests that, again, it's an increase in AB that's responsible for all these increases.

The question we're left with, then, is: why do Canseco's teammates increase their AB once he's gone? It could just be the situations, or the managers. Suppose Canseco tended to be signed by teams that were trying to win it all this year. Those teams wouldn't be playing a lot of rookies. But once they dropped back in the standings, they would tend to trade Canseco to a contender, and give their young players more at-bats.

Does that sound plausible? Canseco played on a lot more teams than most of the other ten players in the comparison set. That would suggest that he would have left a lot of teams in a rebuilding stage, wouldn't it?

Or maybe the types of managers who pushed for Canseco to be signed are the same types of managers that like to keep lots of guys on the bench. That means they get 75 AB when Canseco's on the team, and when they wind up as regulars a couple of years later, those 75 at-bats give them a data point in this study (the authors used a minimum of 50 AB).

Anyway, I'm mostly just thinking out loud here; I have no idea if this is the correct answer or not. The fact remains that "post-Canseco" players did tend to increase their AB. But the significance level in the regression has a hidden assumption, that the players on Canseco's teams were random. And they're not – the patterns of player AB have a lot to do with manager and GM tendencies, which means the variances of the observations are underestimated, and the significance overestimated. That's probably why, of the 11 regressions, three of their "post" AB numbers are statistically significant at the 5% level.

Indeed, look at the "change in AB" numbers for the other 10 players:

-33, 10, 20, 1, 32, -15, -22, -46, -24, 0

Compared to these, +53 isn't *that* out of place, is it? And this hardly looks like a normal distribution, which again suggests team tendencies are at work.

My best guess is the fact that players had more AB after Canseco left has more to do with circumstances than with Canseco. And the fact that players had more HR after Canseco left is almost perfectly explained by the increase in AB, and normal aging.

Canseco may very well have influenced his teammates to use steroids -- but the evidence contained in this paper does not substantiate that hypothesis very much, if at all.

(Thanks to John Matthew for the link.)

------UPDATE: while it is correct that the regression didn't control for age, I just noticed it *did* control for years of experience. That means that the one-home-run-per-300-AB figure is more significant than I originally thought.However, the regression assumes that power increases over the years are constant and linear, which might not be the case. For instance, if power increases faster when the player is young, the regression might underestimate power for (say) 29-year-olds, and overestimate it for (say) 39-year-olds. My gut still says that it's still very possible that a long-term increase of 1 HR per 300AB, even after taking tenure into account, could still have many other causes, such as management decisions on one or more of Canseco's teams. And it still could be that the entire effect is caused by the players (of Canseco's six) that are generally acknowledged to have juiced.I wish the authors had run more regressions that controlled for playing time, so that we'd have more useful data.

Tuesday, February 05, 2008

The clutch hitting bet, part III

A few weeks ago, I offered to bet against anyone who believed there was such a thing as a clutch hitters who could be identified in advance. I got a total of about three bets. That's out of, probably, a few hundred people who read about my offer.

My conclusions, after reading some of the comments on other blogs that mentioned the bet:

1. People may say they believe some players are clutch hitters, but they don't really believe it. If they did, they'd bet.

A common excuse that some people give is that they don't bet, as a matter of principle. This is fine if it comes from one person, but when two hundred people all refuse to bet, it's gotta be more than principle.

Similarly are the people who argued that, if I believed the odds were close to even, I should offer better odds than 2:3. That's a silly excuse. If I had decided that Juan Pierre had a 50/50 chance of outhomering Ryan Howard, and offered the same 2:3 bet, nobody would have time to haggle over the odds – everyone would be too busy rushing to get their bets in.

2. People don't even know what they mean when they talk about clutch hitting. There were a few commenters (on other blogs) who said that, yes, *of course* there are clutch hitters, but you don't measure their clutchness by comparing their regular output to their clutch output.

But if clutch doesn't mean being BETTER in the clutch, what does it mean? As Tango and/or MGL have pointed out many times, if you just mean that they're good in the clutch, doesn't that just mean they're good players in general? I mean, of course you'd rather have David Ortiz up in an important situation than Neifi Perez. But you'd rather have David Ortiz up in ANY situation than Neifi Perez. So what's the point of calling him clutch?

Perhaps many clutch advocates have only a fuzzy idea of what something means, but get confused when you try to pin them down on it? It seems like they just never thought farther than their initial, happy fuzzy reaction.

3. As mgl said in comments on several blogs: if you are unable to pick any player, or any combination of players, whom you believe have even a 60% chance of being clutch (when the average is 50%), then you are pretty much admitting that clutch hitting talent isn't a very strong factor in baseball. To quote mgl,

"[If you don't think the odds are good enough,] you are essentially agreeing that a clutch player is virtually indistinguishable from a choke player before the fact! If you can't beat 55-50 over en entire season with a bunch of so-called clutch players verus choke players, you have no right to talk about who is clutch and who is not!"

Put another way: by complaining about the odds, you are admitting that you don't know who the clutch players are, or that they are not significantly better than other players. In that case, if you still talk about clutch hitters as if they are a significant force in baseball – if, for instance, you continue to refer to David Ortiz as a clutch hitter as if that skill forms a measurable part of his value – you are either bullshitting, or have compartmentalized your brain into believing two contradictory things at once.

4. Speaking of believing two contradictory things at once: there is ample precedent for that sort of thing. People will tell you how great heaven is, and how it's everlasting bliss and contentment, and how they will be reunited with their deceased relatives that they miss so much, and that heaven is the ultimate reward. But they act as if death were the worst thing that could possibly happen to them.

It makes people feel good to think that clutch exists, and so they'll keep talking about it, and psuedo-believing in it, and ducking and weaving when we try to pin them down.

If you are a clutch-hitting advocate who believes this is unfair, you may be right. Maybe it is unfair, applied to you. To prove it, show us your belief is rational and open to new evidence. The best way is to accept my bet. If you say you have moral scruples against betting real money – you never even buy raffle tickets, and you believe even betting for charity is immoral – well, I'm reluctant to believe you. But try this instead.

Write down your best prediction about clutch hitting in 2008. That is, decide what you are most sure of about when it comes to what will happen in the clutch this year, something you believe that would not be true if clutch were close to random. Write it down, and stick it on your fridge.

And assure yourself that if your prediction does NOT come true, you will publicly re-evaluate your belief in clutch hitting talent. If you can't tell yourself that – if your belief in clutch hitting is immune to your absolute BEST prediction in it failing to come true – then either you believe in a very, very weak form of clutch hitting, or your belief in it is faith-based and unfalsifiable.

Friday, February 01, 2008

An inconsistency in Tiger Woods betting markets?

There are many PGA golf tournaments throughout the year. A specific four of those tournaments are called "majors."

What are the chances that Tiger Woods will win all four majors?

You can start by assuming that Tiger has the same chance of winning any of the four. This probably isn't true, but it's a good start. A friend of mine notes that at his online bookmaker, the current odds (expressed as probabilities) of Tiger winning are:

42% Masters44% US Open37% Open Championship37% PGA Championship

Taking an average of about 40%, we can calculate that the chance of Tiger winning all four events is 0.4 to the fourth power, which is 2.56%. Without showing my work, here are the binomial odds of Tiger winning various numbers of majors:

However: at TradeSports.com (choose "Golf" from the left menu, then "Tiger Woods Props"), the current odds are very different. Here they are, taking the halfway point between the bid and ask. (My friend's bookie's odds for these bets are almost the same.)

There appears to be a mismatch between these odds and the individual tournament odds. It could just be that TradeSports bettors are assuming a different win probability than 40%. What would it take to give Tiger an 8.5% chance of winning all four majors, as Tradesports estimates? He'd have to have a 54% chance of winning each major (.54 to the fourth power is about .085). But if he DID have a 54% chance, the other odds should be different. Here's the full table assuming 54%:

4 majors: 8.5%3 majors: 29%2 majors: 37%1 majors: 21%0 majors: 4%

These are very, very different from the posted odds. If you really thought Tiger had a 54% chance of winning each major, you should also believe he has only a 4% chance of losing them all. You should sell the 22.5% contract for $2.25, knowing it's only worth 40 cents, and make lots of money.

But what if you don't believe the "4 majors" odds of 8.5%? What if you believe that the "0 majors" odds of 22.5% is the correct number?

In that case, you'd have to assume Tiger had a 31% chance of winning each tournament and a 69% chance of losing, since .69 to the fourth power equals 22.5%. Based on that assumption, the true odds are:

In this case, you'd believe that Tradesports is hugely overestimating the "4-majors" odds. You'd be happy to sell a "4 majors" contract at 8.5%, knowing the true odds are really only about 1%. You'd be making, on average, a 700% markup!

No matter what probability you assume for a Tiger win, the odds just don't seem to make sense. It looks like there are some serious opportunities here to make money. If you believe Tiger is a 54% winner, lay odds on "0 majors." If you believe Tiger is a 31% winner, lay odds on "4 winners". And so on. Whatever you believe Tiger's correct odds are, there's an advantageous bet for you.

And not just a *slight* advantage – a HUGE one! If you go with the 54% estimate, and lay odds on "0 majors," you're getting a 500% markup. If you go with the 31% estimate, and lay odds on "4 majors," you're getting that 700% markup.

Indeed, there is no single-tournament probability that causes the posted odds to make enough sense that there isn't a huge opportunity one way or another.

That confused me. Aren't markets supposed to be efficient? How can these probabilities be so far out of line?

Then it occurred to me: in the arguments above, which bet was advantageous depended on Tiger's true odds of winning. But what if that itself is unknown? Not just unknown, but REALLY unknown – random, in the sense that no amount of study can figure it out. Maybe Tiger has off-years and on-years, and there is absolutely no way of knowing which it's going to be. In fact, Tiger's skill level might not yet be set – it might depend on his practice level, or his health, or his personal life. Suppose it would be randomly decided just before the Masters?

More specifically, let's suppose that Tiger's single-tournament probability might be 15%, or 20%, or 25%, or so on, up to a maximum of 70%. And suppose each of those 12 probabilities is equally likely. If we do the math based on that assumption, we come up with:

On a more practical level, the theoretical odds now come reasonably close to the market odds. Not perfect, but much better than any of the situations based on a single, fixed probability. Indeed, maybe there's a model that might come even closer to market odds: maybe, for instance, if Tiger has a 50% chance of being way off, with a 10% probability, but a 50% chance of being on, with a 70% probability, we get the TradeSports odds exactly. I haven't tried that, or any of the infinity of other possibilities. The closest I came, though trial and error, is the chart above. But there might be a way to solve this mathematically, instead of by hacking away like I did.

(Another interesting thing about this method is that the assumptions have Tiger's expected single-tournament odds at 42.5% (the sliding scale from 15 - 70% averages 42.5%). The chance of winning four straight tournaments at 42.5% is about 3.3%. At first glance, that might lead you to believe that the odds of "4 majors" should be 3.3%. But, because the probability of a sweep isn't linear on the probability of a single tournament, it doesn't work that way. The actual chance is 8%, not 3.3%. This could be the explanation of why my friend's bookie has each tournament at 40%, but the parlay of all four tournaments at much more than 40% to the fourth power.)In any case, I originally thought there was a seriously advantageous betting opportunity. There's less than I thought. However: in everything I tried, the "3 majors" number was significantly higher than the "4 majors" number. It's only 50% higher at TradeSports. I think money can be made by selling the "4 majors" contract at 8.5%, and buying the "3 majors" contract at 12.7%. It's not a sure thing, but I think the expectation has got to be positive.

Here's a challenge: can you come up with some plausible set of assumptions under which the "3 majors" probability is only 50% higher than the "4 majors" probability? My gut says you can't – that this is one of those examples of "favorite-longshot bias," and one that you can easily take advantage of right now.

But I'm not certain of this, and markets are a lot smarter than I am. There might be something I'm missing. Don’t blame me if Tiger hits the grand slam and you lose your shirt.