Sabermetric Research

Phil Birnbaum

Wednesday, November 29, 2006

Does "Win Score" overvalue rebounds?

In the past little while, there's been a debate about a basketball statistic from "The Wages of Wins" called "Win Score." The statistic, invented by authors Berri, Schmidt, and Brook, attempts to calculate how many wins each player contributed to the team. One of its forms is

First, the data shows that not every player has the same opportunity to try for a rebound. After a missed shot, only about 30% of rebounds are secured by the offense; the other 70% by the defense. (I got that 30% figure from this comment.)

Obviously, the circumstances of where players find themselves has a bearing on who gets the rebound. Otherwise, the breakdown would be 50-50, not 70-30.

So, for some reason, players have different chances of rebounding that are related to positioning, rather than raw skills. Crediting a player for plays he makes only because of his position tends to overrate the value of those skills. I don't know enough about basketball to know if or how certain players are somehow set up for more rebounds – but to the extent to which that happens, if any, rebounds will tend to be overvalued in players' accounts. Just like cleanup hitters have more RBI opportunities just due to circumstances, some players may have more rebounding opportunities due to circumstances. And the 70-30 split shows there is certainly some of that going on. And the more it's circumstances, the less it's skill on the part of the player.

To see why, consider a more extreme example. Imagine that the NBA institutes a new rule: the offense is prohibited from touching a rebound until it has bounced three times on the floor.

That rule change will do nothing to affect TWOW's regression or logic. A defensive rebound still constitutes a change of possession, and is therefore still worth exactly the same number of wins as it was before. But, now, instead of 70% of rebounds going to the defense, the number is now 99%. Dennis Rodman might still snag a large proportion of rebounds, but now, instead of having to run and jump and position himself and maybe fight off an opposing player, he can just jog to where the ball is and pick it up.

Given that there is now no skill at all, doesn't it overrate Rodman to give him credit for those rebounds? Obviously, any excess rebounds picked up by Rodman, instead of his teammates, are positioning, luck, or opportunities given him by his coach and team. Even a caveman could get them.

The argument for 99% also applies to 70%, but to a lesser extent. Some, but not all, of Rodman's rebounds are, in effect, his team "letting him" have the ball more. Those are perhaps better classified as team rebounds, rather than individual rebounds. Since they aren't, Rodman winds up overrated.

That's opportunities. But there's a second reason rebounds are overrated, a much more important reason, and it has to do with the construction of Win Score itself.

It's the reason John Hollinger gives, the one TWOW disputes in the above links. That argument is that part (or even most) of the credit for a rebound should go to the other members of the team, for making the rebound possible. As Hollinger writes here, "missed shots can be rebounded while turnovers can't, and ... a defensive rebound is merely the completing piece of a sequence that began by forcing a missed shot."

Suppose the NFL makes a rule change. Starting immediately, a touchdown is worth zero points instead of six – but, to compensate, the convert [extra point] is now worth seven points instead of one. A touchdown and convert is still worth seven points total. And since almost all converts are good, this doesn't change scoring in the NFL very much.

But now, running a regression assigns the entire seven points to the kicker. So suddenly, kickers are overrated, because they get credit for seven points instead of one! There's a 90-yard drive... the quarterback takes the team down the field, the receivers make some great catches, the running back drags two defenders three yards down the field for a third-down conversion, and they finally get the ball into the end zone. But, if you do a regression, it's the kicker that gets all the points ... the rest of the players come out at zero!

And the regression is absolutely correct – all things being equal, only the kicker matters. It's the interpretation of the regression that's questionable.

Really, the touchdown drive and the kick are one unit. No matter how good the kicker is, the only way he can get an opportunity to try for seven points is to have the rest of the team score a touchdown first. We know in our gut that it's really the touchdown that's worth the seven points, not the kick, because that's where the important skills came out. But the regression has no idea where the skills lie. It has no idea about what really caused the points, in the human sense. It sees when a kick is good, that's seven points. When a kick is bad, it's zero points. And everything else is irrelevant.

A similar situation happens for rebounds in basketball. To get the opportunity for a defensive rebound (convert), the defense must first force the opposition to miss (touchdown). The defensive rebound is a combination of the two acts: good defense for up to 24 seconds, and one grab of the ball. Crediting the rebounder with the full value of the defensive play is like crediting the kicker with all seven points of the touchdown.

And to get the opportunity for an offensive rebound, the shooter must have missed a field goal attempt. Win Score sees the two events superficially – the missed field goal is a turnover, and gets scored as such, and the offensive rebound is treated like a steal back from the defense. The shooter is charged with minus one possession, and the rebounder is credited with plus one possession.

But that's the wrong weighting. Any field goal attempt has, intrinsically, built into it, the embedded feature that a missed shot results in a 30% chance of getting the ball back. The miss includes a consolation prize, a lottery ticket with a 30% chance of winning back the possession. The shooter figured that into his decision about whether to make the shot. That 30% chance belongs to the shooter. In effect, he hasn't wasted a whole possession with his miss, he's only wasted 70% of a possession. Remember Hollinger's point – a missed shot gives the team a chance to recover, but a turnover doesn't. Obviously, the shooter should be debited less for getting a shot away than for letting the shot clock expire.

I think the correct way to handle rebounds in a stat like Win Score is to start by ignoring them. Take the league average rebounding stats, and give the entire contribution to the shooter and defense.

For offensive rebounds, note that on average, a missed field goal causes no damage 30% of the time. And so give the shooter back his 30% and charge him with only 70% of a turnover.

For defensive rebounds, note that they are the statistically average outcome of a defense good enough to force a missed shot. And so give all the credit for defensive rebounds – 70% of opposition missed shots -- to the defense, and ignore the rebounder.

(Remember that assigning values this way is completely compatible with the empirical data. If you were to run a regression that leaves out rebounds entirely, those are the weights you'd get – 70% of a turnover for a missed shot by either team.)

After all that, if the team turns out to be different from average, we can figure out how much different, and assign the credit or debit it to the players in proportion to what we think their contribution is. The hard part is figuring that out. Is Dennis Rodman a great rebounder with average opportunities, or an average rebounder with lots of opportunities? That's something you have to analyze properly, or you'll get bad results.

How much can the TWOW method overrate a rebounder? Let's take Kevin Garnett as an example. In 2005, the Timberwolves had 947 offensive rebounds and 3527 defensive rebounds. Garnett was responsible for about 16% of the team's playing time. If rebounding were exactly proportional to playing time, Garnett would have come in at 150 offensive rebounds and 559 defensive. His actual numbers were 247 and 861. Garnett got to 399 more rebounds than average, or about 56% more than expected.

Is that difference a matter of skill, or opportunity? It's hard to argue that it's completely a matter of skill. The average team gets 70% of defensive rebounds. If Garnett is 56% better, a team of five Garnetts would get 109% of defensive rebounds! Now, you could argue that the five Garnetts would get in each other's way and take rebounds away from each other – there's only one ball, after all. But if you argue that five Garnetts would take rebounds away from each other, then you have to admit that there are times when two players both have a chance to make the play. And, therefore, there must be cases where Garnett takes rebounds away from his existing teammates! And so we have deduced that not all of that 56% can be simply Garnett's exceptional skill, because some of his rebounds would be snagged by a teammate if he weren't there. There must be at least some effect of opportunity there, and possibly a lot.

Now take the other extreme -- suppose Garnett is just an average rebounder, and his numbers are completely the result of opportunity. Then Garnett is being credited with wins that should really be going to the defense (for defensive rebounds) and the shooters who missed (for offensive rebounds). 401 rebounds is worth 14 wins. When we reallocate the defensive-rebound wins among all the players, about two will come back to Garnett. If we reallocate the offensive-rebound wins among shooters who missed, maybe half a win will come back to Garnett. Call it three wins total.

So if Garnett's rebounding is simply a matter of other players deferring to him, Garnett would be overrated by 11 wins. That's huge. Instead of being responsible for 30 wins out of his team's 45, he'd be responsible for only 19.

The correct number is somewhere between 19 and 30. Logic and evidence suggest that it has to be at least somewhat lower than 30. And so I think Hollinger and Kaufman are right -- and that TWOW's Win Points do indeed seriously overvalue rebounders.

Friday, November 24, 2006

Why do NHL teams make so little money?

In 2004-05, the Toronto Maple Leafs had the highest operating income -- $41.5 million US -- while the New Jersey Devils were the biggest money losers, at negative $6.7 million. These are all Forbes estimates; I believe that the teams don't release their financials publicly. (And for those scoring at home, earnings figures are EBITDA.)

Also interesting are Forbes estimates of what each team is worth. The Leafs again top the list with an enterprise value of $332 million, and the Washington Capitals are at the bottom, at $127 million. The median is $153 million. The Buffalo Sabres were the biggest gainer over the last two years, going from $104 million to $149 million. Forbes attributes the increase to Thomas Golisano, the Sabres' new owner, who cut front-office jobs, reduced ticket prices, and improved his team by telling the coaches to make the players practice shooting more. (Seriously, that's what the article says.)

What strikes me about the numbers is the crappy rate of return the owners are getting on their investment. The average team is worth $180 million, but earned only $4.2 million. That's an "enterprise multiple" of about 43. That's huge. By comparison, Home Depot trades at about 7, IBM is at 9, McDonald's is at 10, and Coca-Cola is at 14.

Put another way, the owner who invested $180 million and made $4.2 million earned a return of only 2.3%. He could have earned 4%, risk-free, by selling the team and putting the $180 million into government bonds. Or, roughly speaking, he could have bought $180 million worth of IBM stock, and earned 11% instead of 2.3%.

According to standard economic theory, a return of only 2.3% can't persist in the long term, at least if owners are rational. So what's going on? One possibility is that last year's NHL income could have been abnormally low – the league was having a bad year, and owners (and potential buyers of their teams) expect higher earnings in the years ahead. That doesn't sound plausible to me, especially because last year's earnings already include the effects of the salary cap. Also, to pull even with other investments, team earnings would have to at least triple. I don't see that happening, but, then again, I don’t know all that much about the business of sports, and I might be misunderstanding all these numbers.

Another possibility is that Forbes has overestimated team values. But their numbers seem to be close to what teams have sold for recently. For instance, Eugene Melnyk bought the Ottawa Senators in 2003 for $127.5 million Canadian, and Forbes says the team is now worth $159 million US. The Devils were sold for $175 million in 2000, and now Forbes has them at $148 million. (However, Forbes' values include debt, while the sale prices quoted in the press may not.) My feeling is that the numbers are correct – after all Forbes knows accounting, and I don’t.

But I think sports franchises are always going to earn less money than other businesses. Why? Because the people who buy sports teams aren't doing it just for investment purposes; they're doing it for ego and status and fun. A hockey team isn't something you buy and forget about, like a share of General Motors stock. It's partly a consumer good, like an antique car or a Honus Wagner hockey card. A large part of its value is the benefits other than cash earnings.

If you were a billionaire, how would you spend your vast wealth? The truth is, you couldn't. Even if you invested everything in government bonds at 4%, you'd earn forty million dollars a year. That's $109,000 a day, every day, even before touching the principal. If you absolutely had to get rid of that much money, you'd have to spend it on exotic stuff, like trips into space, or Van Goghs, or huge diamonds.

Or a sports team.

Suppose you sold some of your investments to buy the Ottawa Senators at the Forbes price of $159 million. You'd be giving up about $10-$15 million in earnings from your investments. The Senators would earn you only about $4 million. So the cost of owning the team for a season is about $8 million.

The Senators are actually owned by a man named Eugene Melnyk. Eight million dollars is about one-half of one percent of his net worth. For that, what does Melnyk get? A lot. He gets fame – everyone in Ottawa knows him now. He gets his name in the papers, and his face on TV. He gets respect and admiration. He gets the best seat in the house for games. He gets to run a hockey team, or at least select the people who will. He gets to decide how much to spend on players, and maybe even input on who to sigh for how much. Basically, he gets to own a fantasy league team, except that it's no fantasy.

All that is a huge bargain at $8 million a year. Think about it. If you had so much money that $8 million was a drop in the bucket, wouldn't you want to own a sports franchise? I sure would. As soon as I make my first couple of billion, I'm making an offer for my beloved Toronto Maple Leafs, opportunity cost be damned.

If this theory is true – if sports teams weren't just profit-making institutions, but also consumer goods for people who are extraordinarily rich – what should we expect to see?

1. Teams would be owned by individuals, rather than corporations, because corporations don't have egos and care only about the bottom line.

2. Where rule number 1 doesn't hold, and teams *are* owned by corporations, it would be those where a single individual or family owns most of the voting shares, and where an individual from that family is the face of ownership.

3. Where rule number 2 doesn't hold, and teams *are* owned by widely-held corporations, it will be mostly teams that are profitable, and earn a reasonable return on the market value of the franchise.

4. Sports will have a larger proportion of egoist owners than other corporate fields. Owners will have a larger presence in the community than owners of other businesses. Absentee or reclusive owners will be rare.

5. Different owners will have different priorities. Some owners will concentrate more on making money and less on winning, while others will concentrate more on winning, even taking substantial financial hits to do so.

6. Losses will be widespread and returns on investment will be low. Some owners will make decisions that do not appear to make sense financially, in pursuit of something other than just profits.

7. Because the supply of billionaires increases faster than the supply of sports teams, demand will rapidly bid up the market value of a franchise, even past the point of profitability.

And all these things are roughly true, I think.

So, does this mean that sports teams are a bad investment? Not necessarily. It does mean that teams, as a whole, will never show a profit as good as other investments with equal market value. But it is quite possible that demand for teams is increasing so fast that there's a lot of money to be made buying a team, holding it for a few years, and flipping it to the next bored billionaire. It's kind of like buying the Mona Lisa for a billion dollars. You won't earn much charging admission to see it, but that's OK, because, unlike IBM, its value isn't based on its income stream. Eventually, as society gets richer and richer but Mona Lisas stay rare, the price gets bid up until someone will pay more than you will.

I don't think that sports teams will never earn a full economic profit. They are mostly expensive toys.

------

UPDATE: Tango coined the phrase "Picasso Effect" to describe this phenomenon. Subsequent posts on this topic can be found by searching for "picasso".

Tuesday, November 21, 2006

Is hitting important in the NHL? A study and a puzzle

Do hits – in the sense of players physically checking other players' bodies -- contribute to winning hockey games?

Here's a study from "On the Forecheck" that finds that they do. For the teams in the 2005-06 NHL, blogger "The Forechecker" ran a regression on hits inflicted vs. goals for and against, and found close to zero correlation. But then, he considered hits taken, this time regressing the ratio of hits inflicted divided by hits taken, on the ratio of goals for/goals against. This time, the correlation was significant, at -0.4. In chart form:

The more a team gets hit, the better it turns out to be. The more a team hits, the worse it turns out to be.

Why would this happen? A commenter on Forechecker's post suggested that because a player can be legally hit only when in possession of the puck, a low hit ratio may be implying a high possession ratio, which in turn implies a good team.

That makes sense, and I'd bet he/she was right. Except that, if hits inflicted have nothing to do with goals, but hit ratio does, it must be hits taken that makes the difference. Using Forechecker's data, I ran a couple more regressions:

So the bottom line appears to be that hits taken is very important, but hits inflicted is only a tiny bit important.

Why would that be? Obviously, a hit taken by team A is a hit inflicted by team B, so there's a symmetry there. If team A somehow wins games in connection with being hit by team B, shouldn't team B lose games in connection with team A being hit?

The SD of hits taken is almost the same as the SD of hits inflicted, so it can't be compression of the data.

Monday, November 20, 2006

Increasing NBA competitive balance

Today, on the "Wages of Wins" blog, David Berri reprises an argument from the book, that NBA competitive balance is low because of "the short supply of tall people." He writes,

"Given the supply of talent the NBA employs, there is very little the league can do to achieve the levels of competitive balance we see in soccer or American football."

I disagree. There are many ways the NBA could substantially increase competitive balance. Here are a few. Some are more realistic than others, of course:

-- add a 4-point, 5-point, and 6-point line behind the 3-point line.-- make the 3-point shot worth 5 points.-- make a "nothing but net" shot worth an extra point.-- make the hoop a few inches smaller.-- make the shot clock 12 seconds instead of 24.-- overinflate the ball, like they do in carnival games.-- make games 20 minutes in length instead of 48.-- adjust the draft rules so that the worst teams get even more draft choices and the best teams get even fewer, thus more quickly evening out team talent over time.-- institute a "talent cap" instead of a salary cap, so that teams with too many good players have to get rid of some.-- count young players at estimated free agent value towards the salary cap, so that teams can't dominate just because of drafting ability.-- divide the game into seven "quarters" instead of four. Whoever wins 4 out of 7 quarters is declared the winner of the game.-- like in baseball, allow each player to take only about 1/9 of his team's offensive opportunities.-- like in soccer and hockey, allow goaltending.-- like in football, give teams points only when they complete a long sequence of successful plays – for instance, give them a seven-point "touchdown" when they score on six consecutive posessions, or a three-point "field goal" when they hit two three-pointers in a row.

(Just for one example, making the hoop smaller would help substantially. I simulated a simplified 100-possession game between a team that shoots field goals at 50% vs. a team that scores at 48%. The first team's record was .620. Then, I changed the probabilities from 50%/48% to 40%/38.4%. The first team's record dropped to .590.)

The point, of course, is that the rules of the game are at least as important as the supply of talent. For a full exposition of this argument, see Roland Beech's review here. TWOW's recent rebuttal to Beech is here. (My own argument is on page 3 here.)

Sunday, November 19, 2006

Can money buy wins? Team correlation alone can't tell you

In the previous post, I linked to the Wages of Wins study that found a correlation of less than 0.4 between single-season NBA team payroll and wins. The authors have argued that because the correlation is low, we can conclude that money can't buy wins.

That got me thinking ... is that really true? It occurred to me that if the team payrolls vary in a narrow range, that should reduce the correlation, because there's less room for the relationship to make itself evident. To check that, I ran an experiment. I set up a situation where payroll was 100% correlated with talent. Then, I simulated 30 independent seasons of 82 games, first, where the salary distribution was wide, and, then, where the salary distribution was narrow.

First, the wide distribution. Teams vary between .300 and .700 in a distribution shaped like the roof of a house. (Technically, talent was taken as .300 + (rnd/5) + (rnd/5), where "rnd" is uniform in (0,1).). Here are five correlation coefficients from successive runs. (These are r's; square them to get r-squareds.)

.78 .76 .86 .90 .84

Now here's the narrow distribution. Teams vary from .450 to .550:

.36 .32 .35 .07 .11

Clearly, the variability of payroll makes a huge difference in the correlation.

We can make the correlation come out as low as we want, just by reducing the teams' spending variance. If a salary cap and floor forced every team to spend within, say, 1% of each other, the correlation would probably be very close to zero.

And, therefore, the conclusion that low correlation implies inability to price talent is just not true. Here, payroll buys talent with 100% correlation, and there is no doubt that a team that chooses to spend more will win more games. And that's true whether the payroll/wins correlation is .7, or .3, or .1, or even zero.It sounds illogical, but it's true: the correlation between team spending and wins, taken alone, is not enough to tell you anything about whether team spending can buy wins.

Saturday, November 18, 2006

Payroll vs. wins for basketball, football

On the "Wages of Wins" blog, David Berri now posts the results of a salary vs. performance regression in basketball, and another in football. For basketball, they find 15% of wins "explained" by salary, and only 5% in the NFL. I assume this means r-squared; the r's would be .39 and .22 respectively. (For baseball, they had previously found r-squared of 18%, or r of 43%.)

For basketball, Berri takes the analysis a step further, and show the average record for each of the five NBA salary quintiles:

One surprising thing about this breakdown is that if you do a regression on just the quintiles, you get a negative correlation between salary and wins – and it's minus 39%, exactly equal in magnitude to the positive correlation Berri got for the regression on the full league! I don't think that really means anything important, although it's an interesting coincidence that it worked out that way. And it does mean that the relationship between salary and wins within each quintile must be exceptionally high, in order to cancel out the negative relationship between quintiles.

This does show a strong relationship between payroll and success. Which means that when Berri says he "really, really believe[s] that money cannot buy love in baseball," he presumably is arguing that for money to buy success, a strong relationship in a chart like the above is necessary but not sufficient.

Wednesday, November 15, 2006

Alan Ryder on first and second assists

I just found out that hockey sabermetrician Alan Ryder, of hockeyanalytics.com, has been writing an online column for The Globe and Mail newspaper for several months now. I don't think he appears in the printed newspaper, but an archive of his columns can be found here.

This column, from October, is on assists. Ryder notes that the NHL data divides assists into "first assist" and "second assist." The first assist is the player who presumably passed the puck to the scorer, and the second is the player who passed the puck to the first assister. Ryder argues that the first assist is more important than the second, and is a better indicator of actual skill. That's because the first assist contributed a more crucial task – passing the puck to a sniper in position to score. The second assist, on the other hand, could have been a routine pass through the neutral zone, or some such.

Last season, assists leader Joe Thornton had 72% of his assists being of the "first" variety – of Thornton's 96 assists, 69 of them were first assists. That compares favorably to the league average of around 60%. (Actually, as Javageek and "The Puck Stops Here" wrote in comments here, there were 1.73 assists per goal last year. That means first assists must be 1/1.73, or 57.8% of all assists.)

Of the top 30 assists leaders, the Stars' Brenden Morrow had the highest percentage with 83% first assists. Jason Spezza of the Senators was the lowest, with 52%.

In this subsquent column, Ryder creates something called "Goals Created." He assigns 30% of the goal to the first assister (if there is one), 20% to the second assiter (if there is one), and the remainder (56% on average) to the goal scorer. On that basis, Jaromir Jagr led the league last year with 48 goals created.

I'm not sure how you would go about validating the percentages Ryder assigns, by correlating them to team wins or such. But regardless, it's fun to look at his list of leaders.

It's too bad Ryder doesn't appear in the newspaper itself ... he'd have a large and interested audience (the Globe and Mail is Canada-wide) and his work is great.

Monday, November 13, 2006

Perhaps fighting penalties *don't* help NHL teams win

Earlier today, I wrote about a story in today's National Post that quoted a study saying major penalties help a team win. Alas, a look at the study's results show that isn't true.Here's the paper again. It's by John Herald Heyne, Aju Fenn, and Stacey Brook, and it's called "NHL Team Production."

The method suffers from the same problem that I previously wrote about here (in reviewing a paper also co-written by one of the same authors): namely, that team performance is determined directly by goals scored and goals allowed (assuming timing is random, as is normally assumed for baseball), and the other variables are expected to impact on goals, not on wins directly. For instance, if a team scores 300 goals and allows 280, why would you expect its winning percentage to depend on how many assists it accumulated in scoring those 300 goals? Or why would it matter how many faceoffs it lost on the way to giving up those 280 opposition scores?

Also, the study includes several sets of variables that measure almost the same thing. For instance, it includes team plus/minus. That statistic is exactly equivalent to even-strength goal differential multiplied by the average number of skaters on the ice (say, 4.6, to take 4-on-4s into account). Because of that, it measures almost the same thing as even strength goal differential.

In his "Win Probabilities" study (see page 3), Alan Ryder shows that historically, each NHL goal has been worth .1457 wins. If the difference between a win and a loss equals two points, a goal is .2904 points. But from 2000-2004, the seasons the new study covers, teams who lost in overtime got one standings point. That happened in about 12% of losses. And so a win is worth only 1.88 points more than a loss, not 2.00 points. Adjusting for that turns the .2904 into .2730.

That is, Ryder's work shows that

Points = league average + .2730 (GF – GA)

How does this paper's results compare to these? Let's concentrate, for now, only on those variables that have to do with goals.

Now, according to this website, there is about 1.55 assists per goal. From a quick check of nhl.com, I estimate there are 4.6 skaters on the ice for the average even-strength goal. Also, a lazy estimate is that 60% of goals are scored even strength, 37% on the power play, and 3% shorthanded. (Can't manage to get a permalink – go to nhl.com and do "Stats," then choose "report view – goals for" under "team comparison reports.")

From all that, we can do a bit of algebra and reduce the above to goals scored and allowed. For instance, .0935(assists) = .0935 * 1.55 (goals) = .145(goals). After all the simplification, if I've done it right, the six coefficients above collapse down to:

+0.2591 goals for-0.2817 goals allowed

which compares well to the previous Ryder numbers:

+0.2730 goals for-0.2730 goals allowed

So far, the study has only duplicated Ryder's results, using a more complicated method.

And so the significance of other variables puzzles me. After looking at goals for and goals allowed, we'd expect none of the other variables to affect winning. After all, if you lose 4-3, it shouldn't matter if you took ten penalties or none – any power play goals against are already accounted for in the score.

In that case, why is the coefficient for penalties significant? Or the coefficient for faceoffs won and lost? Or for major penalties? Or for shooting percentage and saves?

I'm at a loss. The only thing I can think of is this: after accounting for goals for and against, all that's left is timing of goals. These other variables may have to do with timing. For instance, more overtime games equals more points, even for identical goal differentials. The more overtime games, the more minutes; the more minutes, the more faceoffs. So that's one way faceoffs could affect points. However, the same would be true for faceoffs lost, but that coefficient goes the other way! So that theory is out.

Another theory is that not every team gets 1.55 assists per goal. Some teams might get only, say, 1.45 assists, because of their style of play. That would tend to underestimate their points. In that case, other determinants of the team's quality would tend to fill the gap. That would be faceoff performance and shooting percentage for the offence, and faceoff and goalie performance for the defense. That could be the answer, but it doesn't have to be.

Regardless, most of the authors' conclusions aren't justified by the data. For instance, they write that

"PIM implies that a team … is playing a man down, which is a huge disadvantage. Therefore, the more penalties a team takes, the harder it is going to be for them to win games."

This cannot be the correct explanation, for reasons already stated. In the regression, the coefficient for PIM implies that total team points go down with PIM, but *only when all other variables are held constant*. Since goals against and shorthanded goals for are two of those other variables, and those have to be held the same, the regression actually implies that a team will gather fewer standings points *when a penalty is killed without any goals being scored*. This is much harder to explain. The same is true for faceoffs, and save percentage, and all the other variables.

Also, as the National Post article highlighted, the authors say that the more major penalties, the better the team's position in the standings. They conclude that major penalties spark the team.

But I don't think that's true – they simply misinterpret the results of the regression again. The regression shows that major penalties result in more points *keeping all other variables constant*. PIM is one of those variables. To keep total penalty minutes constant while increasing major penalties by one, you have to eliminate 2.5 additional minor penalties. That's a trade where you gain a penalty that is likely offset by an opponent's fighting penalty, and you lose five minutes of being shorthanded. Obviously, that's a good thing, and that's why the coefficient is negative.

To see the effect of an additional major alone, you want to both (a) add one major, and (b) add five minutes to the PIM total. If you use the coefficients to compute the sum of both changes, the effect is now very close to zero, and probably not even statistically signficant. Thus, there is no evidence at all that major penalties help the team.

I suspect most of the the other results in the paper are also artifacts, due to the use of proxy variables for goals for and against. I'd bet that if you just used plain old Goals For and Goals Allowed, instead of all those other proxies like assists and PPG and plus/minus, you'd get all the other variables suddenly becoming a lot less significant.

The bottom line is that, unfortunately, because of the way the study was structured, none of the results is convincing – and the featured conclusion, the one about major penalties, is not confirmed by the evidence at all.

Do fights provide an emotional lift to NHL teams?

There's a new study that shows that fighting majors increase an NHL teams success on the ice. That's according to this story in today's National Post.

The preliminary paper, called "NHL Team Production," is by John Herald Heyne and Aju Fenn, two Colorado College sports economists, and Stacey Brook (of "The Wages of Wins").From the story:

An intensive numbers-crunching of five years of statistics shows major penalties … increases the total [standings?] points of the offending player's team and decreases the number of goals scored by their opponents.

For each [major] penalty minute served, a team accrued 0.07 points and decreased their opposition's scoring by 0.24 goals.

"It is clearly not the act of laying a guy out that is going to help your team win, but it spurs the team on, it rallies your teammates and prompts them to dig deeper," [Fenn] said in an interview.

This confuses me a bit … within a game, the number of fights is almost always the same for both teams. Shouldn't the teams get the same emotional lift? Why would the number of fights in games other than this one produce a different mental effect for each team? (Unless, of course, the "rallies the teammates" effect carries into subsequent games.)

Anyway, I shouldn't be speculating before I read the full paper. The article didn't give a link, or even a title, but I'll look for it online.

Sunday, November 12, 2006

Kickoff distance more reliable than field goal percentage

Nice NY Times article today on NFL kickers by Aaron Schatz, of Football Outsiders. (Get it here quick before it becomes pay-only.)

Schatz notes that kickers are inconsistent in field goal success, but quite consistent in kickoff yardage. For instance, Mike Vanderjagt's field goal percentage for the last five years has been (2002 to 2006, respectively) a bouncy 74, 100, 80, 92, and 77. But his kickoff yardages from 2002-2004 were a steady 59, 60, and 58. (This year, he's at 57.3, the only kicker in the league below 60.)

Schatz concludes that teams would be better off concentrating on the kickoff stat when making personnel decisions, since the field goal percentage is such a poor indicator of the player's actual talent.

A couple of comments:

1. Kickoffs are exactly the same every time, while field goal attempts are from different yardages. Perhaps much of the apparent variance in FG percentage will disappear after adjusting for distance? In any case, the FG percentage is binomial in, say, 35 attempts, while the kickoff distance is (presumably) normal in about 60 (?) attempts, so you could probably figure out some variance details if you wanted.

2. Assuming a good kicker can net an extra two yards on a kickoff, and 60 kickoffs a year, that's 120 yards. At 12 yards per point, that's 10 extra points. 10 extra points is about 3 missed field goals., which is about the difference between an 85% kicker and a 75% kicker. Schatz is implying that kickers are so inconsistent that you can't really tell who's the 85% and who's the 75% kicker, so concentrate on the kickoffs. Fair enough.

3. If kickers are inconsistent, doesn't that imply that much of their apparent performance is due to luck? Therefore, you have to regress them to the mean quite a bit. That means the range of field goal skill could be very narrow, so narrow that you should almost ignore field goal percentage altogether.

4. Kicking two yards deeper means that the kicking team has to run two yards farther to get to the kick returner. Shouldn't that mean the returning team has a bit of extra time to run the ball back a little farther? Or does the extra time the ball is in the air negate that?

5. Can't they just give the guy a test before hiring him? Make him kick field goals from various distances every half hour for a couple of weeks. That should give you enough information to make a decision. Sure, there's no pressure, and no defense bearing down on him, and no crowd noise, and all that, but isn't that still better than nothing? Do teams already do this? Is there a radar gun for kickers?

Some of this stuff might already have been studied … I just got my copy of Schatz et al's Pro Football Prospectus a few days ago, so maybe I should read up.

How does a player's team affect his output?

Does what team a batter is on affect his performance? We know there are some factors, such as park effects, in which the characteristics of the team affect a hitter. Are there others? And how important are they?

In "A Variance Decomposition of Individual Offensive Baseball Performance," David Kaplan tries to find out. I didn't understand the full details of his statistical methodology, but he starts off by using a statistical technique called "analysis of variance." I studied this a long time ago, and have since forgotten most of it. But if I understand it correctly, it examines the variance of the performance of players within teams, and then the variance of the team means themselves, to figure out how important teams are relative to players.

For instance, suppose there are three teams, each with three players, and their batting averages are:

In this case, we can conclude that 100% of the variance comes from the teams, and zero percent is inherent in the players. Perhaps the Browns' manager is great at bringing out the best in his hitters, or their home park is very hitter-friendly, or perhaps the Browns can afford to sign all the .280 hitters while the other teams can't.

In this example, the variance among teams seems to be the same as among the players on the team, so we can probably say the variance is decomposed 50-50 between players and teams.

And, of course, in real life, the percentage can be anywhere between 0 and 100% -- not just 100%, 0%, or 50% like in the three examples above.

So what did Kaplan find? Here are a few of his numbers. He did 2000 and 2003 separately; I've taken the simple average of those two percentages. What's shown is how much of the variance can be attributed to the team:

What to make of this? I have no idea. I'm not sure why teams would be responsible for 28% of the variance in total bases, but only 12% of the variance in runs created.

However: Kaplan appears to have used actual counts, not rates, for all the counting stats. And he considered everyone with 200 plate appearances. What that means is that the variances are very heavily influenced by how the manager used his players. For instance, suppose the Browns and Pilots have identical players, but the Browns platoon and the Pilots don't. Then their hits might look like this:

Pilots: 120, 140, 160Browns: 60, 60, 70, 70, 80, 80

This makes it look like there's big differences between the teams, since Browns players average 70 and Pilots players average 140. Futhermore, the Pilots' within-team variance is four times as high as for the Browns! I'd guess that player usage patterns are the reason for some of the numbers, such as between-team effects being 28% of the variance of total bases, but only 3% of slugging percentage, when, really, those two statistics measure the same skill.

Also, the percentages result from a large muddle of a bunch of factors:

-- how much a team spreads out its at-bats among players;-- park effects-- whether teams that have good players are more likely to have other good players, because they can afford to spend more-- whether teams that have good players are less likely to have other good players, because there's effectively a salary limit on how many good players they can afford-- whether GMs concentrate on certain types of players, such as Billy Beane buying up a lot of OBP-- whether managers are more likely to give playing time to certain types of players-- managers' decisions on elective strategies like bunts and steals-- and lots more that you can probably think of.

(As far as that point about elective strategies: it's safe to assume that variation in the rate of sacrifice hits should be very strongly team-related. That's because the bunt is a strategy held in different levels of esteem by different managers, especially in the American League. In the 2003 AL, sac hits ranged from 11 (Blue Jays) to 65 (Tigers). The fact that this study didn't show any reasonable effect – it showed that only 0.5% of bunt variance was team-related – suggests that the methodology is flawed, or at least not powerful enough for its intended purpose.)

Because the methodology tangles so many causes together, I don't know what this study tells us. I have no idea, absolutely none, of what any of Kaplan's numbers might mean in the baseball sense. Kaplan doesn't make any suggestions either. Could it be there's nothing we can conclude? Is it possible that the figures in the chart are no better than random numbers? Or am I missing something important?

Koppett: Allan Roth is the new Bill James

"Self-starters like Bill James expanded the scope of [statistical] activity. But Allan Roth, hired by Branch Rickey for Brooklyn in the 1940s, is the real developer of detailed, beyond-standard stats ... the Bill James approach, of cloaking totally subjective views … in some sort of asserted 'statistical evidence,' is divorced from reality."

Benjamin Alamar is the editor of the on-line JQAS, and, based on the evidence provided in this article, undoubtedly a very competent researcher. Having said this, this effort is a textbook case of what happens when very competent researchers with little understanding of advances on statistical baseball research attempt to contribute to our knowledge. The authors make two fundamental errors: first, they ignore past research on the topic, and second, they use ill-chosen performance measures in their estimations.

The issue at stake here is, quoting the abstract, “to determine the percentage of the outcome of an at bat that is controlled by a pitcher and the percentage that is controlled by the batter.” They seem to start off well enough. First, using play-by-play data for 2001 to 2003 obtained from STATS, they estimated expected run value for all base-out situations and then determined through multiple regression which game situation variables had an impact on scoring; the significant variables were out, base, league, ninth inning, extra inning, batter lineup position, and park effect. The use of the regression coefficient for each of these variables allowed for a more precise final determination of expected run values. Second, to increase accuracy for expected run values on plays ending with batted balls other than home runs, they combined this information with hit location data, specifically the odds of getting a hit based on where the ball was batted. Third, the results of all this were regressed on pitcher and batter indices to see which accounted for more variance. It is in their choice of these indices where the study goes to pieces. For batters, they choose strikeouts per plate appearance and home runs per plate appearance. These are useful measures, but seem incomplete without some further ways of distinguishing batters (both Babe Ruth and Dave Kingman struck out a lot and hit a lot of homers, but to say the least they had a few differences). For pitchers, they chose strikeouts per home run allowed and outs per bases allowed. The latter is relevant but biased by team defense; the former makes absolutely no sense to me at all. At no point do the authors provide rationale for these choices. Further, the whole exercise ignores bases on balls as a predictor. Anyway, the authors conclude that batters are responsible for 62% of expected run value and pitchers are responsible for the remaining 38%.

Time to editorialize. First, statistical baseball research as a discipline has a lot to learn from other sciences about the cumulative nature of knowledge. Far too many studies are performed in a historical vacuum, and as a consequence even well done efforts are too often wasted in constantly reinventing the wheel. Researchers are either ignorant of or unwilling to cite previous relevant work (case in point; as Phil Birnbaum noted in his review in a recent "By the Numbers", most of the research in the Baseball Prospectus folks’ Baseball Between the Numbers is well-done wheel reinvention). Alamar et al. are academics who should no better, but they have cited no past efforts to compare the impact of pitching and hitting on team performance. I am aware of the two studies attempting what Alamar et al. tried found in the academic literature.

In An Empirical Estimation of a Production Function: The Case of Major league Baseball, American Economist, Volume 25 Number 2, Fall 1981, pages 19-23, Charles E. Zech used the following indices to measure player performance: batting average and home runs to represent batting, stolen bases to stand in for speed, total fielding chances as an indicator of fielding, strikeout to walk ratio for pitching, and career manager won-loss record and years of experience to measure managing. Zech determined that batting accounted for 6 times more variance in team won-loss than pitching; fielding and managing had no significant impact.

Also, there's An Acturial Analysis of the Production Function of Major League Baseball, Journal of Sport Behavior, Volume 11 Number 2, 1988, pages 99-112. Michael D. Akers and Thomas E. Buttross purposely patterned their study after Zech’s, although they replaced total fielding chances with fielding average. They discovered that both hitting, as measured by batting average alone, and managing, as measured by manager’s career won-loss record only, were better predictors than pitching, as measured by strikeout to walk ratio. Again, managing and fielding impacts were small.

Given the similar findings of hitting being more important than pitching, Alamar et al.’s research might have some value as a replication of these past efforts with measurement indices and data sets, increasing our confidence in the validity of the conclusion. But this brings us to the second problem; choice of indices to represent batting and pitching. At the time of Zech’s work, all that researchers had to work with were the standard measures of pitching, batting, and fielding that predated sabermetric work. Even by the time of Akers/Buttross, all that was available other than the standard measures was a couple of years of raw Project Scoresheet data. But even then, we knew that on-base average was a better measure than batting average for getting on base and slugging average better than home runs alone for power, and that we knew that we could do a better job measuring fielding with some measure of plays made than with fielding average. Strikeout to walk ratio is actually a pretty good index for representing pitching, but now we know we can do better by adding home runs allowed to the mix. The point is, Alamar et al. are responsible for knowing this. Their choice of measurement indices is indefensibly ignorant.

In conclusion, Alamar et al. is a lot of sound and fury signifying nothing.-- Charlie PavittCharlie Pavitt writes reviews of sabermetric studies for "By the Numbers." (Click here, scroll down for current and back issues.) He also maintains a sabermetric bibliography.

Tuesday, November 07, 2006

"The Wages of Wins" on r and r-squared

In "The Wages of Wins," the authors regressed wins against salary, found an r-squared of .18, and concluded that, because .18 is low, there is a very weak relationship between payroll and performance, and therefore "teams can’t buy the fan’s love."

I have posted a few times (like here) disagreeing, and arguing that the “r” gives you more useful information than the r-squared. In that regression, the r is about .42, which is high enough to be significant in a baseball sense.

"Recently, some individuals who claim to have knowledge about statistics have questioned [our] conclusion. Specifically … these individuals have suggested that using the correlation coefficient – otherwise known as r – is a more “real-life” statistic to use in looking at how payroll and wins are related in Major League Baseball. As you can guess, we disagree."

(By the way, I'm not sure Dr. Brook is not necessarily addressing his post to my argument specifically. For all I know, it's someone else entirely. There’s no link in his post and his use of the plural – "some individuals" – suggests it's more than just one person.)

Why do Dr. Brook and his colleagues disagree? They say that using r "exaggerates the relationship" between the two variables. They quote a professor who agrees. They say that r-squared says that "18% of the variance" in performance is explained by salary, and the percentage of variance is the appropriate measurement to consider.

The last claim sounds reasonable, until you realize that "variance" is not being used in the normal English sense of the word. It's a technical, statistical term meaning the square of the standard deviation.

Variances are unintuitive. If the standard deviation of weight is 30 pounds, the variance of weight is 900 square pounds. If the standard deviation of professors' salaries is $10,000, the variance of professors' salaries is 100 million square dollars. And if the standard deviation of team wins is 11.6 wins, the variance of team wins is 136 square wins.

The 18% makes sense only in context of the squares of what you're actually trying to measure. If salary explains 42% of performance, then salary explains 18% of performance squared. But we sabermetricians don't care about what performance squared; we care about performance. And that's why the .42 is more meaningful than the .18.

In the early 20th century, economist Alfred Marshall famously explained how to study economics:

"(1) Use mathematics as shorthand language, rather than as an engine of inquiry. (2) Keep to them till you have done. (3) Translate into English. (4) Then illustrate by examples that are important in real life. (5) Burn the mathematics."

If we concentrate on the r = 0.42, we can follow Marshall's advice. Translating into English, and using an example that's important in real life, we can say,

"If a team spends one extra standard deviation (in 2006, about $25 million) in salaries, it should have expect to improve its performance by 0.42 standard deviation of wins (about 4.5 wins)."

If you want to burn even more of the mathematics, and you make a few additional assumptions (for instance, that wins and salary are both normally distributed), then I think you can even say,

"If a team becomes the Nth highest-spending team in the league, it will, on average, be 42% as many wins above or below .500 as the Nth winningest team in the league."

That last sentence follows Marshall's prescription; it has no math, it's significant in real life, it's in English, and it's understandable to any GM, whether he knows statistics or not.

Friday, November 03, 2006

Money *does* buy wins

In “The Wages of Wins,” the authors determined that the correlation between payroll and wins was about 0.4. I interpreted that to mean that for every dollar a team spends on free agent signings, 40% of that money can be expected to translate into wins. But I was wrong – 40% is way too low.

Suppose 30 boys each have some marbles. They all have different numbers of marbles, collected over a childhood – some they got for their birthday, some were hand-me-downs from their siblings, some they found in the schoolyard. Some of the kids have only 10 marbles, but some have as many as 40.

Now, each boy is given $2, and let loose in the toy store, where he can buy toy store marbles at 10 cents each. Some won’t buy any toy store marbles, while the ones who really like marbles might blow the entire $2 and buy twenty of them.

Suppose that now, economists run a regression, to try to predict the number of marbles the child has based on how much he spent at the toy store. There is some correlation, because the kids who bought marbles will, obviously, tend to have more of them after the purchase. But the relationship isn’t perfect, because the original distribution of marbles was pretty much random. So they might get a correlation coefficient of, say, 0.4.

They conclude that such a small correlation means that “money can’t buy marbles.”

But that’s wrong, or course. Money can, with certainty, buy marbles, at the rate of 10 cents each. While the correlation between the money you spend and the marbles you *have* is 0.4, the correlation between the money you spend and the marbles you *buy* is 1.0.

Because some of the marbles arrived from sources other than money, they act as random noise in the regression. They make it appear, at first, that money only buys marbles at a 40% rate, when, really, the rate is 100%.

You probably see where this is going.

The boys are baseball teams. The marbles are wins. The toy store marbles are wins from signing free agents. And the legacy marbles the boys had are players not yet eligible for free agency.

The new argument goes something like this:

Team payroll is highly correlated with player ability only in the case of free agents. For young players, and non-free-agent players, salary is based more on years of experience than on performance – and besides, those salaries are very small compared to the amount of money spent on free agents. For instance, Albert Pujols made only $700,000 in 2003 not because that’s all he was worth, but because he was only in his second major-league season.

It turns out that the correlation between team payroll and wins is 0.4. But the correlation between non-free-agent payroll and wins is probably close to zero. Therefore, to make the overall correlation between payroll and wins rise to 0.4, the correlation between free-agent payroll and wins must be significantly higher than 0.4. That is:

X must be way higher than 0.4 for this to work. (A naive estimate might be 0.8, which is probably too high. But it’s got to be higher than 0.4.)

So, of course money can buy wins. Not with a correlation of 1.0 like for marbles – marbles, of course, don’t get injured or have off years. But, yes, if you spend a bunch of money on free agents, you’re going to improve your team substantially, more substantially than the 0.4 of the simple regression suggests. Money does buy wins.

Hockey's CBA, salaries, and competitive balance

Brinkman starts out by explaining the revenue sharing process, by which small-market teams with low revenues get monetary transfers from the other teams. The algorithm involves a bunch of different slush funds from different sources (television revenues, playoff tickets, etc.) that get allocated in certain ways. One thing that surprised me is that playoff teams are taxed from 30 to 50 percent of the total value of tickets for playoff games in their arena. The bottom teams get some of that back in transfers, of course, but it still seems to me that taxing teams for succeeding in the playoffs is the wrong kind of incentive.

There’s also a bit of a disincentive to improve your team in the off-season. Because teams must pay players at least 54 percent of revenues, there’s effectively a 54% tax on hiring better players. To quote Brinkman’s example, suppose the Carolina Hurricanes figure that signing Erik Cole will gain them $2 million in additional revenues. Since they will eventually have to pay an extra $1.08 million in salaries because of that additional revenue, that means their gain will be only $920,000, and they can’t offer Cole any more than that. Presumably this doesn’t bother the Hurricanes too much, because the overall effect is to depress salaries in general, which can only help their bottom line.

As for competitive balance, Brinkman concludes that the salary cap won’t necessarily help. Teams receiving transfer payments aren’t required to spend that extra money on salaries – all they need to do is sell 13,000 tickets per game and grow revenues faster than the league average. One way to do that is to improve the caliber of play by signing players, but if a team doesn’t find that necessary to meet the conditions, it can simply pocket the cash.

The conclusion, therefore, is that if competitive balance does increase, it will be due to the salary cap acting on the richest teams, rather than because of transfer payments to the rest. However, there’s always the possibility of an owner who doesn’t care as much about losing money. “It is impossible to predict the behavior of owners in the real world,” Brinkman writes. “[Some] surely prefer to maximize wins [instead of profits].”

There’s a bit of sabermetrics in the paper, and, although the numbers only indirectly impact the argument, they’re useful results.

First, there’s a regression on salary versus standings points for the years 1994-2003. It turns out that the correlation is .388, which is almost exactly what “The Wages of Wins” reported for baseball. Like the “Wages” authors, Brinkman concentrates on the r-squared of .150, pronounces it small, and argues that “a team’s ability to compete on the ice had relatively little to do with how liberal they were with the pocketbook.”

But as I repeatedly and tiringly argue (example), the .388 means that 38.8% of a team’s salary spending goes directly to the win column -- and that's actually quite significant.

Second, there’s a chart of NHL competitive balance from 1994-95 to 2003-04. The standard deviation of winning percentage seems fairly constant in the 0.100 range, and there doesn’t seem to be much of a trend. Here are the numbers so you can judge for yourself:

.121, .111, .074, .097, .092, .108, .111, .095, .097, .099

If all teams were exactly equal in ability, Brinkman says, the SD would be .060.

If you’re interested in hockey at all, this a great paper – it’s very well written and not too full of economic jargon. You can tell by the way Brinkman explains his reasoning that he knows hockey and understands the issues. His explanations make sense, and the conclusions seem credible to me.