Sabermetric Research

Phil Birnbaum

Monday, December 29, 2008

How much is an NHL power play worth?

Here's a piece by hockey analyst Alan Ryder, who tallies up some statistics on NHL power plays.

Ryder looks at teams' conversion rates with the man advantage, and finds that a full two-minute power play results in about .27 of a goal.That works out to 7.9 goals per 60 minutes, "the usual expression of scoring rates" when analyzing hockey.That 7.9 figure is "more than 300% the rate of scoring during even-handed play."

(I wish Ryder had also told us the even-handed rate directly.)

With a two-man advantage, there's less data, so Ryder had to smooth out the curve.His chart shows a 50% conversion rate after 80 seconds of 5-on-3 play (compared to about 17% for 4-on-3).So the second penalty, on a second-by-second basis, is worth twice as much as the first penalty.Of course, the two penalties are seldom simultaneous.If the second infraction comes a minute after the first infraction, it would be worth one-and-a-half times as much; that would be one minute of 4-on-3 followed by another minute of 5-on-3.

Ryder estimates that a two-man advantage results in about 25 goals per sixty minutes.Yesterday, in the World Juniors, Canada beat Kazakhstan 15-0.That's approximately the equivalent of playing with a two-man advantage for two periods.

If a (single man) power play is worth .27 goals, and that's three times the normal rate, then the one-man advantage is worth about .18 goals.However, add in the fact that the other team probably won't score, and you have to add back in the .09 defensive savings.That means the PP is indeed worth the full .27.At least approximately – this doesn't take into account that the average power play doesn't go the full two minutes, so maybe the .27 should really be only .25 or something.And there are shorthanded goals, so maybe it's now .24.But these are just estimates.

Back 20 years ago in the days of the Hockey Compendium, I think Klein and Reif adjusted players' points for power play goals against while they were in the box.It might once again be interesting to figure out how to do that once again.I'd use *expected* PP goals against rather than actual, and I'm not sure how I'd make the adjustment.But I'm curious as to whether players are costing their teams a lot of goals this way.

Thursday, December 18, 2008

Who is a sabermetrician?

"... I shouldn’t be considered a sabermetrician. I have never claimed to be a member of this community. What I do is apply my knowledge from my economics training and experience with analyzing data to issues in baseball."

I find this kind of confusing – If JC analyzes issues in baseball, statistically, then *of course* he's a sabermetrician.Perhaps there's a confusion in regards to the definition – does sabermetrician mean "one who does sabermetrics," or "one who is expert in the field of sabermetrics"?I suppose JC could be the one but not the other – if I do some analysis using economic reasoning, I'm doing economics, but that doesn't mean I'm an economist.

In any case, where all this came from is a post by Rob Neyer, pointing out that JC and Tom Tango are in strong disagreement on the Raul Ibanez signing.JC has Ibanez being worth $46 million over three years, while Tango has him worth only about $10 million.That is, to say the least, a substantial difference.

Rob uses this example to show how there are, and always will be, major sources of disagreement in the sabermetric community.I'm not completely with Rob on this one – I think conflicts of this magnitude are fairly rare.My take is that JC has a methodology for valuing players that most other sabermetricians think is incorrect (I, myself, disagreed with him here).As Mitchel Lichtman says in one of the comments,

"There are plenty of experts with respect to projections, and I don't think - in fact I know - that any of them will project his value at anything close to 14mm per year."

If that's the case, that JC's opinion is an outlier, then it might not be that there's long-term disagreement, but simply that somebody – perhaps JC, perhaps everyone else -- is just plain wrong.And, in legitimate science, when someone is wrong, it usually gets corrected pretty quick.

-----

JC's post also goes on to argue that academic research in sabermetrics is more reliable than informal online discussions:

"... much what passes for research within [the sabermetric] community is not sufficiently rigorous to reach the conclusions often claimed. There are many academic researchers from a variety of fields who have significantly advanced the understanding of baseball that receive scant mention in the sabermetric community. For example, Michael Schell’s Baseball’s All-Time Best Sluggers is the most thorough treatise on hitting ever written; yet, few individuals mention his work or attempt to replicate his methods. You rarely see economists Gerald Scully or Tony Krautmann mentioned when attempting to value players, despite the fact that their methods were published in reputable peer-reviewed economics journals, where established experts vetted their work. Academics are not always right, but I believe the checks ensure they are more likely to reach correct conclusions than informal online discussions."

Not to beat a dead horse, but, as I have written before, I'm not very impressed with the output of the academic community, when it comes to sabermetrics.I think the peer review to which JC refers is often not capable of spotting flawed analysis.For my own part, I would certainly prefer to have a paper of my own vetted by Tango or MGL than by an anonymous academic economist who is unlikely to have a strong grounding in sabermetrics.I think experienced amateur sabermetricians are much more likely to find legitimate flaws in my work than professional economists.

However, I haven't seen the academic examples JC mentions.It's certainly possible that Schell, Scully and Krautmann have done work that we non-academics are ignoring.Are any readers familiar with Schell's book, or the Scully and Krautmann valuation methods?If someone can point me to the relevant journal articles, I'll post a review.

Friday, December 12, 2008

Do NBA general managers outperform sabermetric stats?

Last December, David Lewin and Dan Rosenbaum released a working version of a fascinating APBRmetrics paper.It's called "The Pot Calling the Kettle Black – Are NBA Statistical Models More Irrational than "Irrational" Decision Makers?"

I can't find the paper online any more (anyone know where it is?), but a long message-board discussion is here, and some presentation slides are here.

Basically, the paper compares the state-of-the-art basketball sabermetric statistics to more naive factors that, presumably, uneducated GMs use when deciding who to sign and how much to pay.You'd expect that the sabermetric stats should correlate much better with winning.But, as it turns out, they don't, at least not by much.

Here's what the authors did.For each team, they computed various statistics, sabermetric and otherwise, for each player.Let me list them:

Except for the first two, the study adjusted each stat by normalizing it to the position average.More importantly, they normalized all six stats so that the sum of a team's individual stats would sum to the team's actual efficiency (points per possession minus points against per possession).

That team adjustment is important.I can phrase the result of that adjustment a different way: the authors took the team's performance (as measured by efficiency) and *allocated it among the players* six different ways, corresponding to the six different statistics listed above.(Kind of like six different versions of Win Shares, I guess).

For the most "naive" stat, minutes per game: suppose the team, on average, scored five more points per 100 possessions than its opponent.And suppose one player played half-time (24 minutes per game).That means he's responsible for one-tenth of his team's minutes, so he'd be credited with 0.5 points per game.

So the authors did this for all six statistics.Now, for the current season, all the teams will sum to their actual efficiency, because that's the way the stats were engineered.So you can't learn anything about the relative worth of the stats by using the current season.

But what if you use *next* season?Now, you can evaluate how well stats predict wins.That's because some players will have moved around.Suppose a team loses players A, B, and C off-season, but sign players X, Y, and Z.

Using minutes per game, maybe A, B, and C were worth +1 win, and players X, Y and Z were worth +2 wins.In that case, the team "should" – if minutes per game is a good stat – gain one win over last year.

But, using Wins Produced, maybe A, B and C were worth 1.5 wins, and X, Y and Z are also worth 1.5 wins.Then, if Wins Produced is accurate, the team should finish the same as last year.

By running this analysis on all six stats, and all teams, you should be able to figure out which stat is best.And you'd expect that the naive methods should be the worst – if sabermetric analysis is worth anything, wouldn't you think it should be able to improve on "minutes per game" in telling us which players are best?

But, surprisingly, the naive methods weren't that much worse than the sabermetric ones.Lewin and Rosenbaum regressed last year's player stats on this year's wins, and here are the correlation coefficients (r) he got:

0.823 -- Minutes per game

0.817 -- Points per game

0.820 -- NBA Efficiency

0.805 -- Player Efficiency Rating

0.803 -- Wins Produced

0.829 -- Alternate Win Score

It turns out that the method you'd think was least effective – minutes per game – outperformed almost all the other stats.The worst predictor was "Wins Produced," the carefully derived stat featured in "The Wages of Wins."(BTW, not all the differences in correlations were statistically significant, but the more extreme ones were.)

And repeating the analysis on teams two years forward, and three years forward, the authors find the results to be very similar.

I agree.But I don't think it's because GMs are omniscient – I think it's because even the best statistics measure only part of the game.

All of the above measures are based on "box score statistics" – things that are actually counted during the game.And there are more things counted on offense than on defense.For instance, shooting percentage factors into most of the above stats, but what about *opponent's* shooting percentage?That isn't considered at all, but we could all agree that forcing your opponent to take low-percentage shots is a major part of the defense's job.That's factored into the player ratings as part of the team adjustment, but all players get equal credit for it.

So: if coaches and general managers know how good a player is on defense (which presumably they do), and Wins Produced doesn't, then it's no surprise that GMs outperform stats.

-----

Take a baseball analogy.In the National League, what correlates better to team wins summed at the player level – wOBA, or GM's evaluations?It would definitely be GM's evaluations.Why?Because of pitching.The GM would certainly take pitching into account, but wOBA doesn't.That doesn't mean that wOBA is a bad stat, just that it doesn’t measure *total* goodness, just hitting goodness.

Another one, and more subtle: what would correlate better with wins – wOBA or At Bats?It could go either way, again because of pitching.Better pitchers have playing time, and therefore more AB, so good pitching correlates with AB (albeit weakly).But good pitchers don't necessarily have a better wOBA. So AB would be better for measuring pitching prowess (although, of course, it would still be a very poor measure).

That means that if you run a regression using AB, you get a worse measure for hitters, and a better measure for batters.If you use wOBA, you get a better measure for hitters, but a worse measure for pitchers.Which would give you a better correlation with wins?We can't tell without trying.

-----

What Lewin and Rosenbaum are saying is that, in basketball right now, sabermetric measures aren't good enough to compete with the judgments of GMs, and that APBRmetricians' confidence in their methods is unjustified.I agree.However, I'd argue that it's not that the new statistical methods are completely wrong, or bad, just that they don't measure enough of what needs to be measured.

If I wanted to reliably evaluate baskeball players, I'd start with the most reliable of the above six sabermetric measures – Alternate Win Score.Then, I'd list all the areas of the game that AWS doesn't measure, like various aspects of defensive play.I'd ask the GMs, or knowlegeable fans, to rate each player in each of those areas.Then, after combining those evaluations with the results of AWS, I'd bet I'd wind up with a rating that kicks the other methods' respective butts.

But, until then, I have to agree with the authors of this paper – the pot is indeed calling the kettle black.It looks like humans *are* better at evaluating talent than any of the widely available stats.

Monday, December 08, 2008

Has Consumer Reports discovered a case of price gouging? A quiz

In the latest issue of Consumer Reports, CR makes accusations of price gouging against a particular industry.

Here's a quiz. One of the excerpts below is the real CR article. The other I just made up. Can you tell which is which?

---

1. Why texting is too $$$

Jerry Sobieski's teenagers are forever sending text messages to friends on their cell phones instead of calling them. What baffles and infuriates Sobieski, a computer network engineer from Woodbine, Md., are the rates wireless carriers charge their customers to send and receive text messages.

"Text messages take up almost nothing on their networks, but the carriers are charging much more for them than they do for phone calls, which use up a heck of a lot more space," he says. "The rates for texting are completely outrageous."

Three years ago, those rates were 10 cents per text at the nation's four big wireless carriers: AT&T, Sprint, T-Mobile, and Verizon Wireless. Each company raised rates to 15 cents, then to 20 cents.

Text files are small and cost carriers very little to transmit, so texting is exponentially more expensive to consumers in terms of network space than other cell services. Five hundred text messages contain less data than a 1-minute voice transmission.

Sen. Herb Kohl, D-Wis., also questions text-messaging rates. In letters to the above companies, Kohl, chairman of the Senate's antitrust subcommittee, demanded that wireless carriers explain why they've doubled the cost to customers in near lockstep. Kohl says he's particularly concerned that the rate jump appears not to be based on a rise in the cost of transmitting text messages.

Some carriers say that texting rates are actually lower now because most customers buy monthly buckets of messages. Consumers Union isn't swayed. Not all customers send or receive enough texts in a month to warrant a plan. Those who don't must shell out for messages they won't use or must pay the higher per-text rate.

In recent years, industry consolidation has greatly reduced the number of wireless carriers. That's troubling, particularly when carriers engage in anti-consumer behavior. CU has asked federal officials to investigate text-messaging rates in this supposedly competitive industry. They should ask tough questions about the actual costs to carriers, so that consumers aren't saddled with an unwarranted expense.

-----

2. Why the Consumer Reports Website is too $$$

Jerry Sobieski's family is forever going to the Consumer Reports website to look up which products are best.What baffles and infuriates Sobieski, a computer network engineer from Woodbine, Md., is the rate CR charges its customers to research products on its site.

"CR web traffic takes up almost nothing on the internet, but CR is charging much more for the website than they do the magazine, even though the magazine uses up a heck of a lot more physical resources," he says. "The rates for the CR website are completely outrageous."

It costs $5.95 to access the CR website for one month.

Internet data costs CR almost nothing to transmit, so the website is exponentially more expensive to consumers in terms of actual resources used. The bandwidth for five hundred website hits costs CR less to provide than a single print issue of CR.

Sen. Herb Kohl, D-Wis., also questions CR's pricing policy. In letters to Consumers' Union, Kohl, chairman of the Senate's antitrust subcommittee, demanded that CR explain its cost to consumers. Kohl says he's particularly concerned that the price appears not to be based on the cost of maintaining CR's internet connection.

CR says website rates are actually lower now because most customers buy an annual subscription to the website, which costs $26.Consumers Union isn't swayed. Not all customers research enough products in a year to warrant a subscription. Those who don't must shell out for access they won't use or must pay the higher per-month rate.

In recent decades, industry changes have greatly reduced the number of not-for-profit consumer magazines with websites.That's troubling, particularly when the only such US organization engages in anti-consumer behavior. CU has asked federal officials to investigate web subscription rates in this supposedly competitive industry. They should ask tough questions about the actual costs to CR, so that subscribers aren't saddled with an unwarranted expense.

-----

So did you guess which is Consumer Reports' real complaint? For the record, the answer is here.

In team sports, it's not as obvious when athletes improve, mostly because there's an offense and a defense.When Wayne Gretzky scored 92 goals back in 1981-82, was he really that much better than Rocket Richard, who scored 50 in 1944-45?Not necessarily: even apart from the fact that Gretzky played 80 games to Richard's 50, there could be many other factors operating:

-- expansion could have diluted the talent of the goalies and defensemen Gretzky played against;

-- a different style of play in the 80s might have caused a shift in the balance between offense and defense, so that more goals were scored in general;

-- Gretzky may have been given more playing time than Richard;

-- Gretzky might have been allowed to handle the puck more than Richard, giving him more opportunities;

And so on.

So I find it interesting whenever someone discovers a method of showing actual improvement in a team sport.And that's what Eddy Elfenbein has done for the NFL here.

In the middle of Elfenbein's post is a historical graph of how often kickers missed the point after touchdown (PAT).This year, it's less than 1%, but that's exceptional – in the past few years, it's been between 1 and 2 percent.But what's amazing is that misses were so much more frequent in what is pretty recent history.In the 80s, it was 4%, and back in the mid-70s, it was as high as 8%.Here, let me copy in Elfenbein's graph, if it's not a violation of blogger etiquette:

Elfenbein notes that some of the improvement comes from rule changes (notably one in 1984), so it's not all improved ability. Still, that can't be all of it, or even most of it, since the graph is so smooth.

I would never have believed the change could be that dramatic.And check out Elfenbein's entire post, where he advocates moving the two-point conversion line from the two-yard line to the one, to make it more competitive with the virtually assured single point kick.

Tuesday, December 02, 2008

How do coaches improve their teams?

How much do coaches improve player performance?According to Dave Berri, of "The Wages of Wins," not much at all.According to a study coauthored by Berri (as quoted in this Ryan McCarthy article in Slate), only eight out of 19 coaches studied had any "statistically discernible" effect on performance.And only the difference between the very best and very worst coaches was statistically significant.

I haven't seen the numbers, because the study is unpublished.However, at a 5% significance level, you'd expect about one out of the nineteen coaches to be significantly different from zero, so the fact that the top coach was significantly better than the bottom coach is to be expected.

Anyway, the study quotes a Dan Oliver study as coming to a different conclusion.That analysis is in Oliver's book "Basketball on Paper"; I, um, haven't got to that chapter yet, but I really should do that soon.McCarthy says that Oliver shows

Coaches like Phil Jackson can be worth up to an additional 12 wins per year.

I gotta read that chapter: 12 wins sounds like a huge number, and I'm now very curious.

In any case, my gut leans to Berri's finding, that coaches don't affect a player's performance all that much.But, then again, I've always thought that there's no way a baseball manager could consistently cause a team to beat its Runs Created estimate, but some managers do appear to be able to do that.So I may be wrong.

My thinking is that coaches and managers *can* make a big difference, but that comes about in the choice of who to play.I remember back in the early 1980s, when Tony Fernandez, by all accounts ready for the major leagues, sat in the minors while the Blue Jays stuck with the mediocre Alfredo Griffin.Glancing at the Baseball Reference pages for those two guys, we find that, in 1984, Griffin had –3.2 batting wins, while Fernandez had –0.6.So if manager Bobby Cox had played Tony and benched Alfredo, the Jays would have won almost three extra games – which is huge.(It wouldn't have affected that particular pennant race – the Jays won 89 games in 1984, but that was the year the Tigers ran away with it at 104-58.)

I'm sure you can think of other examples ... I remember Bill James once writing that Dick Williams had a perverse fascination with Rodney Scott, keeping him on the roster when he obviously wasn't of major-league caliber.And in his Managers book, James also called Cito Gaston "inert," because he used his bench so little.However, Cito did win two consecutive World Series.And when you think about it, why would you bench Roberto Alomar, even for a few games a year, to play Luis Sojo?Unless you think the days off would *really* help Alomar, you're just throwing away runs.

What I'd really like to see is a study that somehow figures out what managers made the best use of the talent they had available.I'm sure every manager has, at some point, played a mediocre established player instead of an unknown but more-talented one.But which ones do it the most?Who has a penchant for leaving Tony Fernandezes languishing while Alfredo Griffins get playing time?Who doesn't bother giving 150 plate appearances to some crappy 35-year-old first baseman with no power?Who knows which players are good enough to be given a shot?

It would be a tough study to do: you'd have to estimate what all the bench players were capable of, figure out who was in the minors and how good they were, and compare all those possibilities to the guys who actually played.But I'd bet that if you found a way to do that, you'd discover some managers are legitimately worth an extra game or two a year, just from their personnel decisions.