Sabermetric Research

Phil Birnbaum

Monday, August 31, 2009

Is a Hamilton NHL team worth as little as Andrew Zimbalist thinks?

I've just finished reading Andrew Zimbalist's sworn submission to the courts with regard to the Phoenix Coyotes situation (hat tip to blogger James Mirtle for posting the entire legal document), and there's lots in it I don't agree with. It could be that I don't understand the antitrust or economics issues, which, of course, are Zimbalist's specialties. I'll list my issues and maybe someone can explain.

First, a quick summary of the situation, as I understand it.

The Phoenix Coyotes are bankrupt. Jim Balsillie wants to buy the team and move it to Hamilton, Ontario, a hockey-mad city 45 miles from Toronto. The NHL doesn't like the idea. First, it believes that it, not the courts, have the right to decide where a team plays. Second, it seems to want to protect the Toronto Maple Leafs from competition. And, third, it doesn't like Balsillie, who is being combative with the NHL rather than cooperating with it.

Zimbalist's written testimony, written at the request of the Balsillie team, argues that

(a) a franchise in Hamilton is worth only $12 million more than if the bankrupt franchise was left in Phoenix, at $175 million versus $163 million;

(b) the effect on the Toronto Maple Leafs would be minimal;

(c) the price Balsillie is offering to pay for the team, $212 million is therefore more than the team is worth, and the difference is "Picasso value," the price Balsillie is willing to pay for the consumption pleasure of owning the team;

(d) the Hamilton expansion opportunity does not "belong" to the NHL.

Maybe there's something about the economics I don't understand, but I don't see it the same way. I'll deal with (b) and (d) in a future post, but for now, let me concentrate on (a) and (c). I think the team is worth substantially more than $175 million, and I think the "Picasso value" is huge, much more than the $37 million that Zimbalist thinks it is.

First, doesn't it seem strange that a hockey team in Hamilton, so close to the best hockey market in the world, would be worth only 7 percent more than the same, bankrupt team in a non-hockey market in the desert? The way Zimbalist gets his numbers is to multiply gross revenue by 2.4. That's based on Forbes Magazine's estimates of team revenue and market value (Zimbalist doesn't justify the 2.4 figure separately).

That seems strange to me, valuing a team by its revenues rather than its profits. It would kind of make sense in comparing "normal" businesses, companies of different sizes in the same industry. Suppose you have two widget manufacturers; Acme sells $10 million worth of widgets a year, and Consolidated sells $100 million worth. You'd expect Consolidated to be worth about 10 times as much as Acme. After all, Consolidated probably has 10 times as many employees, and 10 times as many machines, and 10 times the bill for raw materials, and 10 times the shipping costs, and so on. All else being equal, Consolidated should make 10 times the profit.

But that's not the case in the NHL. With the salary cap, you could argue that team expenses are roughly the same, whether the team is in Glendale or Hamilton. Most of the expense is salaries, and those are now fixed in the range of $41 to $57 million. Forbes has the Coyotes at revenue of $68 million, meaning that if they paid $50 in player salaries, that would leave only $18 million for other expenses and profit. On the other hand, a team like Vancouver, with $107 million in revenue, has $57 million left for other expenses and profit.

For both teams, it looks like those "other expenses" are around $30 million: because Forbes has Vancouver turning a profit of $19 million, whereas Phoenix *lost* $10 million. Vancouver is a profitable enterprise, whereas Phoenix would struggle just to break even. Profits are much less proportional to revenues in hockey than they are in a "regular" business. So why use revenues as your measure?

As a verification, I ran a regression to predict team value based on revenues. The results:

Market Value = 3.2 * annual revenue - $73 million

or, rephrased,

Market Value = 3.2 * (annual revenue - $22.8 million)

The correlation coefficient was .965.

So the value of a team isn't a multiple of revenues: it's a multiple of revenues *above $22.8 million*. Suppose the Coyotes make $68 million revenue, and the Hamiltons twice that. Hamilton won't be worth twice Phoenix, then: it'll be worth two-and-a-half times. Apparently you need at least $22.8 million in revenue to make the team desirable even at $0. Only revenue after that translates into market value.

If you look at Forbes' chart, you can see that: the top 6 teams have a little less than twice the revenues of the Coyotes: and they're worth a little less than 250% as much.

Anyway, if you use this formula for the Coyotes instead of just 2.4 times revenue, you get $145 million, not $163 million. That makes sense, since the regression was based on Forbes data, which values the Coyotes at $142 million. However, Zimbalist did consider subsidies from the city of Glendale, which might make up part of the difference.

As for Zimbalist's Hamilton estimate ... well, he takes Balsillie's own estimate, which assumes revenues would be $73 million. That, to me, seems *way* too low. It would put Hamilton last among the other Canadian teams:

I think an estimate of $100 million would be much more appropriate, given the size of the market. Based on the results of the regression, that would make the new Hamilton franchise worth $247 million -- not $175 million.

---------

Now, Zimbalist also calculates team value another way, a better way: by estimating actual future profits, and calculating their present value. He doesn't do that for Phoenix, which I think is because he can't predict the Coyotes to ever make a profit in the future (in which case, shouldn't its value by this method be zero? But I digress). However, he does it for the proposed Hamilton franchise. Here's how: he starts with Balsillie's own projections of earnings the first five years of the franchise. Then, he assumes earnings will grow steadily for the next 25 years. He then discounts all 30 years' profit into today's dollars.

Zimbalist performs this calculation for five different 25-year growth rates (from 3 to 7 percent) and for three different discount rates (from 8 to 12 percent). He winds up with a franchise value ranging from $70 million to $177 million, with a typical value of $150 million.

This is all quite reasonable, although you have to keep in mind that Balsillie is probably being very conservative in his earnings projections in order to keep his price down. Still, it doesn't seem like this is how other teams are valued, probably because of the "Picasso factor." Looking at the Forbes chart, the market value of teams is much, much flatter than their earnings. The top three teams (Leafs, Rangers, Habs) make an average of about $45 million a year, and their market value is about $400 million -- an earnings/price ratio of about 11%. But the teams in the middle, who look like they make an average of about $3 million a year, are worth about $200 million -- an earnings/price ratio of about 1.5%. And the teams at the bottom are all losing money -- but their market values are still around $160 million.

Why are the values so flat relative to profits, where a team that makes $1 million a year is worth almost half as much as a team that makes $40 million a year? It could be the Picasso effect. I ran a regression to predict market value based on earnings. The results, rounded:

Market value = $200 million + 4 times annual earnings

The correlation coefficient? 0.88. Not as high as for revenues, but still huge.

What that tells us is that, regardless of earnings, there's a value of $200 million dollars to owning a team, even if it only breaks even every year. That might be Picasso value. Or, it might partially reflect the value of the right to move the team if it starts losing money. It might reflect the fact that owners think that earnings will jump soon -- maybe they think a new TV contract will someday be worth a present value of $30 million each, and that's part of the $200 million. But I think it's consumption value, Picasso value.

$200 million does seem reasonable in terms of consumption value. At today's low interest rates of (say) 4%, the opportunity cost of locking up $200 million is only $8 million. Most of these owners are billionnaires -- what's a tiny $8 million a year? Jim Balsillie's own willingness to pay is no doubt much more than $8 million. He likes publicity. He's making much of the fact that he wants to bring the NHL to more Canadian cities, making him something of a hero in some circles. He might have some ambitions beyond NHL owner, ambitions which being in the limelight will further.

Using the regression results puts the Coyotes at $161 million, which is about where Zimbalist has them in his revenue model (he can't use the earnings model because the Coyotes have negative earnings).

So, let's say we use this same regression equation to value the Hamilton team. Balsillie claims that five years from now, the team will be making $11 million. That means it'll be worth about $244 million then. Discounting that to today's dollars, at 4%, gives $209 million today. Adding in $35 million of Picasso value ($40 million discounted) for the next five years takes us back to $244 million.

And, again, that's conservative because it uses Balsillie's own estimates of his profits. Here are the earnings of the six Canadian teams last year, according to Forbes:

$66.4 million (Leafs)$39.6 million (Canadiens)$19.2 million (Canucks)$ 4.7 million (Senators)$ 7.4 million (Flames)$11.8 million (Oilers)

Judging by this, I'd say that, for a Hamilton franchise, $11 million five years from now is pretty conservative. Even so, the Picasso value drives so much of franchise valuation that it doesn't matter much: even if the Hamiltons made as much as the Canucks, it would only raise the franchise value from $244 million to $276 million.

So I think the team in Hamilton is worth about $250 million. Not only is this substantially higher than its worth in Phoenix, but it's even more than Balsillie has offered. So I bet Balsillie is willing to spend a whole lot more than his $212 million offer, if necessary, to achieve his dream of a team in Hamilton.

Saturday, August 22, 2009

Changing my mind on "The Book" and clutch hitting

My last post talked about the clutch study in "The Book." It turns out that study was written by Andy Dolphin, who responds in a comment at "The Book" blog here, as does co-author Mitchel Lichtman (mgl). The comments are definitely worth reading.

I had two arguments, one about statistical significance, and one about walks. To summarize them (perhaps more clearly than in the original post):

-- previous studies found no evidence of clutch hitting talent.-- Andy's study found evidence of clutch hitting (OBA) talent with variance .008.-- The .008 is not statistically significant only at p=.14 (14% rather than the traditional 5%). It therefore constitutes fairly weak evidence.-- Combine that weak evidence of .008 with the previous studies that found zero, and there's still a fair bit of doubt on whether clutch hitting exists.

And also:

-- if you include intentional walks, it seems obvious that the best hitters will appear to be "clutch".-- there is such a thing as a "semi-intentional walk". -- generally, the players who receive IBBs will be the same ones who receive "semi" IBBs.-- so it seems like the best hitters will appear to be "clutch" just because of those semi-intentional walks.-- but extra semi-intentional walks is not what "clutch hitting" traditionally means;-- and so Andy's study may not answer the same question that's being asked.

To clarify: I have no objection to anything in Andy's study itself, just as to the conclusions you can draw from its results.

Anyway, I've changed my mind; I now think that we can draw firmer conclusions from Andy's study, and lean towards his result that clutch hitting exists. I stand by my original logic; but I did another simulation, and my view of the facts has changed.

Specifically: I no longer believe that the previous studies necessarily found evidence of zero clutch hitting. I thought they did, but, on further examination, I think the Tom Ruane study gives results that are perfectly consistent with what Andy found: clutch hitting variance of .008 points of OBA.

Here's what Tom did. He found 727 players who met his cutoff for plate appearances. For each player, he found the difference between each player's "clutch" BPS (batting average plus slugging), and compared it to his "non-clutch" BPS. Then, he broke the 727 differences into categories -- 0 to 15 points clutch, 45-60 points choke, and so on.

Then, he did the same thing, but, instead of using "clutch" and "non-clutch" AB, he divided the AB in each group randomly. And so, if there is no such thing as clutch talent -- if clutch hitting is, in effect, random -- the two groups should break down exactly the same.

Taking groups G to J, which comprise players who were at least 105 points better in the clutch, we see that there were 15 in real life, and 13 random. On the choke side of -G to J, there were again 15 in real life, and 13 random.

Here's where I made my wrong assumption: I figured that if there were any real difference between the two groups, even a small one, we'd see a much larger dispersion in the "real" row. I thought we'd see a lot more extreme values -- more than a ratio of 15:13.

I was wrong. I ran a simulation, where I ran a "fake" row, then added an extra variance of .006 points (which is what Andy found for wOBA) to simulate what a "fake" row would look like if Andy's number were real. The results were indistinguishable. Indeed, I think you could add a lot more than .006 and still not be able to see any difference in the two rows. There is just so much randomness there that any difference in talent gets washed out in this kind of comparison.

Also, Tom's results are consistent with my simulation of Andy's result. My simulation found a p-value of .14. Tom's study found that the "real" data were at the 11th percentile of the distribution of "fake" data -- a p of .11. So it seems that Tom and Andy are consistent with each other. That makes sense, because some of the data they used overlapped. Also, Tom's data didn't include walks, which calls into question my argument that the walks might be causing a large proportion of the effect.

So what we have is now:

-- Andy found an effect of .008 of OBA;-- That's completely consistent with Tom Ruane's study;-- I think it's also consistent with other studies I've seen;-- So maybe .008 should indeed be our best estimate of the variance of clutch talent, given all the available evidence.

I have to say, though, that I'm not still completely satisfied about the walks thing. In his reply, Andy said he checked the results without including walks, and he got approximately the same result. I guess this should satisfy me, but I'm still a bit dubious, perhaps irrationally. I wouldn't mind, as commenter Guy suggested, that we check to see if the clutch hitters also tended to be the better hitters (or the guys who IBB the most). That would help me feel better about the walk issue.

Guy also points out that Andy's .008 result doesn't actually represent the variance of talent alone -- rather, it represents all the variance other than luck. The implicit assumption in Andy's study is that it's all talent; but some of it might be other factors: park, non-random distribution of pitchers, etc. Guy suggests running the same study, but dividing the AB by day of the month instead of clutch. Assuming that day of the month is irrelevant to hitting, if we get the same .008 result, that would suggest that what Andy found was something other than clutch talent. More likely, we'd get something between .000 and .008, and we could calculate how much of the .008 is really clutch talent, and how much is other, random, things.

Both those tests would make me happier. But, until then, I guess I have to agree that the current state of the evidence is that the most reasonable estimate for the extent of clutch talent is closer to .008 than to .000.

Monday, August 17, 2009

Did "The Book" really find evidence for clutch hitting?

For a long time, the most thorough sabermetric studies showed no evidence for the idea that "clutch hitting" exists -- that some players can "turn it on" more than others when the situation is particularly important. Dick Cramer's 1977 study, which compared batters' 1969 clutch performances to those in 1970, found only a very slight tendency for clutch hitters to repeat. That conclusion was criticized by Bill James in his recent "Underestimating the Fog," but better analyses have existed for many years. Pete Palmer's study in 1990 (.pdf, page 6) compared the actual distribution of players' clutch stats to what would be observed if clutchiness were completely random; it found almost an exact match. Then, in 2005, Tom Ruane did the same thing, but for a much larger population of batters, and came up with a similar result.

But three years ago, in "The Book," authors Tom Tango, Mitchel Lichtman, and Andy Dolphin used a different technique (and, I think, even more data), and came up with a different answer. They found that a tendency to clutch hitting does exist, and has a standard deviation of .008 points of OBA. That is, one out of every six batters will hit more than .008 (8 points) better in the clutch than overall; and, by symmetry, one in six players will hit 8 points *worse* in the clutch.

As far as I know, the authors never published their study in full, and their book gives only an outline of how they did it. But, still, I think I was able to figure out their method -- or at least a method that's probably close to what they did -- and I don't have the same confidence in their conclusions that their book does.

I have two disagreements with their study. First, that they used OBA instead of batting average; second, and more seriously, their result of .008 is not statistically significant is significant only at the 14% level, which is only moderate evidence against the competing view that clutch talent does not exist.

First, OBA. The difference between OBA and BA is mostly a matter of including walks. Walks are certainly important, and if you're trying to measure a player's ability or performance, on-base percentage is a much better measure than batting average. But when it comes to clutch, the traditional question is about *hitting* in the clutch, not *walking* in the clutch.

To my knowledge, ability to draw a base on balls in clutch situations has not been studied. But, unlike hitting, it wouldn't be surprising to find that some players are "better" at it than others. Take Barry Bonds, for example. In clutch situations, Bonds was more likely to be walked. (Here are his career splits.)

Of course, Bonds' walks were mostly intentional, and "The Book" omitted the IBB from its totals. But, still, if Bonds was much more likely to be walked, you'd think he'd also have been more likely to be pitched around; and so he'd draw more unintentional walks in clutch situations as well. Maybe there weren't as many "semi-intentional" bases on balls as intentional ones, but, still, a small number would be enough to account for a chunk of a standard deviation of .008.

For instance: suppose on every team the best hitter increases his OBA by about 17 points (.017) in the clutch, because of the semi-intentional walk, and the worst hitter decreases his OBA by the same 17 points. If the other 7 batters are exactly the same in clutch situations, and only these two are different, that's enough to give you an SD of almost exactly .008.

What's 17 points in practice? It's an increase of about 17 walks per 600 PA. And if a typical hitter gets 60 clutch PA a season, you're talking about 1.7 extra walks for one player on the team, and 1.7 fewer walks for a second player. That difference of 3.4 walks total is enough to give you the SD of .008 that the authors found.

That seems pretty realistic, and reasonable, doesn't it? Well, maybe not; I've artificially decided that only two players on the team are affected, which makes the variance move a lot more for 3.4 walks than it would if every player had some tendency. But, still, intuitively, it does seem like a small effect for walks could explain the whole thing.

And that means:

-- several studies have found no clutch ability in batting average;-- "The Book" found clutch ability in on-base percentage;-- intuitively, "clutch walking" would seem to be able to account for everything "The Book" found.

So, with that being the state of the evidence, I am inclined to believe that the evidence still suggests that clutch hitting skill doesn't exist, but "clutch walking" skill does.

----------

But even if the authors had used batting average instead of OBP, and got the same result, the result isn't statistically significant. That's not just my conclusion, but also theirs; they say, on page 102,

"... we can merely state that there is a 68% probability that [the clutch talent SD] is between 3 and 12 points."

Since a 68% probability is 1 SD each way, the authors seem to be implying a standard error of about 4.5 points. That means a 95% confidence interval is about 9 points either way -- which includes zero.

Actually, I get an even wider confidence interval using my method (which might actually be the same as theirs). Let me go through it. For those of you who don't care about the math, you can skip this smaller print.

-- Math/details start here --

The study said that it included 848 players, with an average 2450 PA in non-clutch situations, and 200 in clutch situations. So I created 848 identical players with those numbers, and gave each player exactly zero clutch ability. Every player had an OBA of .340.

From the binomial distribution, the SD of each player's OBA over the non-clutch 2450 PA is .00957. The SD of each player's OBA over the clutch 200 PA is .0335. The SD of the difference between the two is the square root of the sums of the squares, which is .03484. That's 34.84 points of OBA.

That's the variance only due to randomness, or luck. If there truly is variance in players' *talent* for clutch hitting, the observed variance would be higher. How much higher? Well, if you assume that talent and luck are independent, then, as the authors often point out on their blog,

Variance (observed) = variance (talent) + variance (luck)

Since the authors concluded a talent variance of 8 points squared, we can assume that

Variance (observed) = 8 points squared + 34.84 points squared

Which means that

Variance (observed) = 35.75 points squared

Since the SD is the square root of the variance, we get

SD(observed) = 35.75 points

So, presumably, in their population of 848 players, the authors observed the SD of the clutch difference was 35.75 points.

Now, if there really was no such thing as clutch ability, how often would we observe an SD of more than 35.75 points due to luck alone, when the expected number is only 34.84? To check, I ran a simulation, and the answer was: about 14% of the time.

That's obviously not significant, 14%.

Another way to check: the SD of the simulated variance was about .88 of a point. The difference between 35.75 and 34.84 is about .91 of a point. So the observed difference was almost exactly 1 SD from zero. Again, that's not significant.

If we look for a 68% confidence interval like the authors had, 1 SD on each side, we get (34.87, 36.63). That means a 68% confidence interval for clutch talent is 0.1 to 11.3 points. That's different than what the authors gave -- 3 to 12 points -- but I'm not sure why.

Either way, the observed effect is certainly not statistically significant.

-- math/details end here --

To restate my conclusions for those who skipped the math:

The effect "The Book" found is about 1 SD from zero, which is certainly not statistically significant. It's at the 14% level, not the required traditional 5%. This doesn't mean it can be ignored, but that it constitutes fairly weak evidence.

-------

So, to sum up:

-- two previous studies found no evidence of clutch talent in batting average;

-- Tango/mgl/Dolphin found a small measure of clutch talent, but it wasn't statistically significant.

From that alone, I'd say our conclusion still has to be: not evidence to assume clutch talent. But if you add:

-- Tango/mgl/Dolphin's non-significant result included clutch walks, which common sense strongly suggests *do* vary by player,

Then, to me, that removes most of the last bit of doubt. I think that even if the effect they found is real, there's a really good chance it's caused by walks.

Hey, guys, how about running the study again using batting average?

(UPDATE: some statements on statistical significance replaced by something more accurate.)

Last year, Tim Donaghy, the NBA referee convicted of betting on games, suggested that there is a conspiracy between the referees and the NBA. The league, Donaghy alleged, wants large-market teams to advance in the playoffs, and it wants series to go the maximum number of games to maximize excitement and TV revenue. He accused certain referees, "company men," of calling the critical games differently to try to achieve the league's desired result.

In this study, Zimmer and Kuethe attempt to look at the evidence for Donaghy's charge. Is there a "big city" bias in playoff series? And do the underdog teams have an increased chance of winning games when they are behind in the series?

The authors ran a regression, trying to predict margin of victory based on (a) what game in the series it was, (b) the difference in conference seed position between the two teams (so that the #2 team playing the #7 team would be a 5-seed difference), (c) which team was at home, and (d) a couple of other factors that proved unimportant.

They ran the regression only on the first three rounds of the playoffs; they ignored the finals, due to concerns that "seeding" didn't really make sense when the teams are from different conferences. The regression covered 2003 to 2008; I have no idea why they chose to use only six seasons, when there's so much more data available and they could have got much more reliable results. (Gratuitous link to basketball-reference.com.)

Anyway, the results were that the stronger team's margin of victory is roughly:

-4.94 pointsplus 1.55 points * the seed differenceplus 10.19 points if they're at homeplus 0.02 points for every extra 100,000 populationminus 2.67 points if it's game 1minus 1.48 points if it's game 2minus 4.68 points if it's game 3minus 0.00 points if it's game 4minus 4.04 points if it's game 5minus 0.25 points if it's game 6minus 1.59 points if it's game 7

The seed and the home field advantage were significant, as you would expect. But so was the population difference (about 2 SDs), and the Game 3 difference (2.6 SDs).

The authors conclude that there is some evidence for Donaghy's claims; the large-market teams have an advantage, and the significantly increased performance of the underdog in Game 3 shows something funny is going on there.

I don't think that's right, in either case.

First, for population: Zimmer and Kuethe figured the difference in team quality only by standings position; so the #1 team facing the #8 team was scored as 7, no matter how good those teams actually were. But isn't it possible that when a #1 team is a big-market team, they're a better team than a typical #1 team from a smaller market? It seems likely to me. The authors acknowledge that big cities might have better teams than small cities, but they argue that

"If large-market teams attract better players, either through pay or lifestyle, the regular-season winning percentage will reflect this disparity in pay."

Yes, but not all of it the authors don't use winning percentage -- they use standings position. When the highest-paid resident of your street is a CEO, he's probably going to make more money than if the highest-paid resident of your street is a just a professor. Looking at the ranking captures a lot of the information of salary, or team quality, but not all of it. I'd bet that what's being measured by the regression is just that leftover, that #1 teams from large-markets are just better than #1 teams from small markets.

I haven't done any work to prove that. But still, I wonder why the authors chose standings rank instead of season wins, when wins is just as easy to collect and would likely be more accurate.

Also, I'm not sure if it's reasonable to calculate the standard error the way the authors did, as if every observation is independent. Suppose one large-market team has an unlucky season, finishing (say) fifth in its conference when its talent was really good enough for second. If that team goes tearing through three playoff rounds, winning all three as the apparent underdog, those results are certainly not independent. And so the SE of the "population" coefficient is understated; since it was only 2 standard deviations from zero anyway, it's very likely that it's no longer significant once you adjust for the fact that teams with "inaccurate" seedings would be more likely to appear in subsequent rounds.

So, I don't think the population results mean much. What about the Game 3 result of significance?

First, the result of a 4.68 point differential isn't relative to every other game; it's only relative to Game 4. That is, the underdog performed 4.68 points better in Game 3 than in Game 4. Is that consistent with referee bias? I suppose it could be; when the favorite goes 2-0, the referees try to have them lose Game 3, for a longer series. But why not have them lose Game 4 instead? If the idea is to prolong the series without affecting who wins, going from 3-0 to 3-1 is much safer than from 2-0 to 2-1.

But you can't predict the NBA's methods of cheating, I suppose, so let's assume that they do shoot for a Game 3 underdog win. Still, the favorite is going to win some of those games anyway, and go 3-0. Wouldn't the NBA want to see the underdog win Game 4, then? You'd think so: but Game 4 actually shows the *best* performance by the favorite; every other game coefficient is negative, meaning the favorite loses points in those games relative to the fourth game. So why would that be? Why would the underdog do best in Game 3, but worst in Game 4, if the NBA is trying to orchestrate a longer series? That doesn't make a lot of sense to me.

Second: If you take the results as presented, Game 3 is the only one of the six games that shows statistically significant results. But it's only 2.6 SDs away. That's significant at almost exactly the 1% level (assuming a two-tailed test, 2.6 SDs on either end of the curve). The chance that at least one of six variables would show that kind of 1% significance is ... about 6%. So, really, unless you had good reason to suspect Game 3 in the first place, the result isn't really significant enough by the typical 5% standard for these sorts of things.

It's even less significant when you look a little deeper. Game 3 is only significant when compared to Game 4: and Game 4 just happens to be the most extreme observation in the other direction! So you're looking at the difference of the two extremes out of seven.

The chance that the *most positive* of seven normal variables will be more than 2 SDs (of itself) away from the *least positive* of seven normal variables is pretty high. There are actually 21 pairs of the seven variables; if every pair has a 1% chance of showing a result, then, even though the 21 pairs aren't independent, on average you'll find 0.21 apparently significant results. That is: if you run the experiment 100 times, with different sets of data, you'll find 21 significant results. It's therefore not all that surprising -- and certainly not statistically significant at any reasonable level -- when this study finds exactly 1.

It *looks* significant, sure, but that's because the authors of the study happened, luckily, to have randomly chosen Game 4 as their reference point. Had they chosen, say, Game 1, they would have found no significant-looking effect at all.

So, in summary:

-- the population effect is probably (in my judgment) due to the study not adjusting for the fact that big-market teams are better than small-market teams;

-- even if that turns out not to be the case, the effect found is probably not significant anyway, due to underestimation of the SE;

-- the Game 3 effect is significant ONLY when compared to Game 4;

-- there are 21 possible significant game vs. game effects, so the fact that exactly one of those 21 was found to be 2.6 SDs away from zero is not a very low-probability event;

-- the observation that Game 3 most favors the underdog and Game 4 most favors the favorite does not, on its face, appear to be very consistent with Donaghy's conspiracy theory.

So I don't think there's much here at all. Of course, the authors only analyzed six seasons, so there's lots more data if someone wants to investigate further.

Thursday, August 06, 2009

Evidence on whether teams "own" other teams

Here's some new evidence on momentum, as well as on whether teams "own" certain other teams.

For instance: as of right now, the Yankees are 0-8 against the Red Sox this season. Does that mean they should be expected to continue losing to Boston, at least more than you'd expect from the teams' relative talent levels?

-- the team with the 8 wins went about .530 that season against other teams.

-- the team with the 0 wins went about .450 that season against other teams.

-- in all remaining games that season between the two teams (of which there were 545 total), the "8" team went about .600 against the "0" team.

What does that mean? Well, the .530 team probably played about .560 for the season when you include the missing games (the eight consecutive games it won, plus the additional games against that team where it went .600). The .450 team probably went around .420.

Regressing to the mean a bit, the .560 team is probably truly .545 or something. The .420 team is probably around .440.

How often will a .545 team beat a .440 team? I'm guessing about 61% of the time -- pretty close to the 60% observed.

Again, my calculations are only as good as my estimates, and don't take all factors into account (for instance: you'd expect the 8-0 team to have had an above-average number of home games out of those eight, since they wound up winning them all. That means you'd expect more road games in the remaining head-to-head matchups, which should reduce the .610 estimate a bit). Still, I'm confident it's all close enough that if you studied the issue in more detail, the results wouldn't be much different.

Conclusion: no evidence of streakiness, momentum, or "owning" another team.

Sunday, August 02, 2009

Pitchers targeting 20 wins -- followup and slides

Last year, I ran a study on why there are more pitchers who win 20 games in a season than 19. I updated that study slightly for my presentation at last week's SABR convention, and the Powerpoint slides (.ppt) are now available on my website, or by direct click here.