Sabermetric Research

Friday, September 29, 2006

Michael Lewis and "The Blind Side" -- are left tackles really that valuable?

A couple of weeks ago, Sports Illustrated ran an excerpt (print subscription required) from “The Blind Side,” Michael Lewis’s new book on football due for release this coming Monday.

The article is fantastic. I normally don’t have much patience for narrative articles that profile players, but Lewis talks about players only in the context of what they do and what it means in terms of the game. I didn’t know much about football, and knowledgeable fans might see nothing new here (as experienced sabermetricians already knew most of the revelations in “Moneyball”). But the casual fan will learn a lot from Lewis’s eight page piece, as I did.

Here’s my summary of what Lewis tells us:

-----

When a right-handed quarterback looks to pass, his body is turned slightly to the right, which means he can’t see any defender coming at him from the left – the “blind side” of the title. On the offensive line, it’s the position of left tackle that has to protect the quarterback from a blind sack. This is a crucial task, because a surprise tackle is exceptionally likely to injure the quarterback – after all, he can’t see it coming to brace himself. So the left tackle is an exceptionally important role in a team’s offense.

Up until the NFL got free agency, the members of the offensive line, left tackle included, were largely anonymous. Their salaries were modest. In 1987, with quarterbacks earning some $2 million per year, Bengals left tackle Anthony Munoz, whom “people were saying … might just be the greatest offensive lineman in history,” asked for $500,000. He was told “that there was no lineman alive who was worth that much.”

Then, after the 1992 season, free agency came to the NFL. Immediately, several linemen are signed for $1.5 million to $2 million, players “no one had ever heard of.” NFL insiders are “baffled” and think the linemen aren’t worth the money. Then, “the only free agent A-list left tackle,” Will Wolford, signs a deal with the Colts that guarantees him that for the duration of the three-year contract, he’ll be the highest paid player on the team.

Why? Bill Polian, who later became Colts GM, says it was “for the simple reason that he shut down Lawrence Taylor in the Super Bowl.” Taylor was the feared Giants linebacker who ended Joe Theismann’s career with a leg-breaking blind-side sack in 1985.

By 2005, the left tackle was the second-highest paid position in the NFL, after quarterback.

-----

That’s what the article (and presumably the book) is all about: the transformation in the NFL that turned left tackles into rich men.

Which brings up a few interesting questions:

Back when left tackles were underpaid, was it because GMs took advantage of the cultural norm that said they weren’t worth much?

It’s quite possible that teams couldn’t escape paying top quarterbacks a lot of money, because they were recognized by fans and journalists as being worthy of it. But, if few people are recognizing the excellence and importance of the left tackle, there’s no pressure anywhere to pay them what they’re worth to the team. It wouldn't have to be a conscious decision on the part of the GM, just a cultural practice. (Where I used to work, the best programmers were paid less than the worst project leaders, even though the best programmers were much more important to the project.)

On the other hand, was it because GMs didn’t realize the value of those players? In Moneyball, Lewis showed us how baseball decision-makers undervalued walks – perhaps their football counterparts undervalued blind-side protection?

Lewis’s narrative seems to suggest the first explanation is more likely; he writes that when Will Wolford became the first left-tackle free agent, at least five teams were willing to pay Will Wolford at least $7.65 million over three years. Which means at least five teams knew his intrinsic value.

Are there any decent estimates of the actual value of the left tackle? Lewis writes that “nothing had changed in the game to make the left tackle position more valuable … There was no new data to enable teams to value left tackles more precisely.” But don’t teams routinely use game tapes to evaluate every lineman’s performance on every play? You’d think that the teams have an excellent idea of every lineman’s value. Does anyone else? It doesn’t seem like it would be that hard to figure out – assuming you had game tapes covering that position for every play.

Could a left tackle actually be worth that much? “The Hidden Game of Football” (p. 104) estimates that the difference between a good quarterback and an average one is four completions per game. An eyeballing of 2005 NFL sack statistics shows that the difference between a good individual sack total and an average one is only about four sacks per season. That’s a difference of 1500 percent in favor of the quarterback.

Of course, a good left tackle does more than prevent sacks – he prevents the QB from having to hurry a throw, or throw the ball away. And, of course, preventing injury to the quarterback is a huge factor too. But there must be some way of quantifying all this sabermetrically to see if left tackles might not be quite as important as their salaries suggest.

Here’s one possible way: find all quarterbacks who switched teams while their old offensive line (and especially their left tackle) stayed put. Which QB rating was more consistent with his old one: the same QB with the new team, or the new QB with the old team? If they’re about equally consistent, that would be evidence that the QB and defensive line are about equally important.

But the left tackle isn’t the entire offensive line. If the offensive line is only equally important to the QB, the left tackle must be less important. You’d need to find that the offensive line is, for instance, twice as important as the QB, if you want to argue that the left tackle alone is equal to the QB.

For that reason, I’m skeptical about that left tackle is that important. But I’m keeping an open mind. My logic may be wrong (please correct me in the comments), or there may be other factors that Lewis will cover in his book. I’m off to order it right now.

Monday, September 18, 2006

Study: "protection" and "clutch pitching" exist

Conventional wisdom says that a batter will see good pitches if he has the “protection” of a good on-deck hitter. With a superstar on deck, the pitcher is less likely to walk the current hitter, and more likely to give him a pitch he can drive. On the other hand, if the on-deck hitter is mediocre, the opposition should be more likely to walk the batter, or at least not give him anything good to hit.

Therefore, with Barry Bonds on deck, the current hitter should (a) have his walk rate reduced, and (b) have his batting average go up.

But in this academic study, J.C. Bradbury and Doug Drinen come up with a surprising result: with a good hitter due up, batting averages actually go down.

Bradbury and Drinen ran a regression on batting average against a whole bunch of other variables. There were a few variables for the quality of the batter and pitcher, the score, the number of outs, and so on. And there were at least 50 dummy variables, including nine variables for inning, thirty or so for park, and so on.

What Bradbury and Drinen found is that, holding all those variables constant, a one-SD increase in the OPS of the on-deck batter was associated with a 1% drop (about .003) in the current hitter’s batting average. It also led to a 2.6% drop in walk rate, a 3.7% drop in extra base hit rate, and a 3% drop in home run rate. (From these rates, it looks like there may have been about a 5% drop in doubles and triples, but barely any drop in singles.)

Why does this happen? If there’s a drop in walks, shouldn’t we expect an increase in other offensive stats, since the batter is getting more good pitches to hit?

Bradbury and Drinen’s answer is that with Barry Bonds on deck, it becomes exceptionally important to keep the batter off base. Therefore, the pitcher puts extra “effort” into that particular at-bat, throwing his best stuff. He can’t do that every time, because he can only throw hard for so long, and he has to pace himself. But at this crucial point in the game, that’s the time that he needs to turn it on. So he does.

In effect, the authors are arguing that there is such a thing as “clutch pitching” (even if there’s no such thing as clutch hitting).

Could that be true? It does make sense to me.

First, clutch pitching is different from clutch hitting – it seems plausible that pitchers can throw harder or softer, can’t throw hard indefinitely, and would save their best pitching for the most important times. For hitters, though, how would you play more or less clutch if you wanted to? There’s no real physical limitation on the batter – unlike the pticher, he can play at 100% every at-bat without wearing himself out.

Second, the magnitudes of the findings are about what I’d expect … just a little bit of an edge when it really matters. If the finding was very large – say, 25 points in batting average instead of three – I’d be skeptical, on the grounds that someone would have noticed how much better pitchers do in certain situations.

Having said that, it’s possible that the study wasn’t perfect, and found an effect where none existed because of the variables they chose and the structure of their study. And you could probably come up with certain nitpicks (for instance, the authors used separate variables for bases and outs – if there’s an interaction between them, it might appear as a small false protection effect). My gut feeling is that correcting for those kinds of objections wouldn’t change the results much.

But the most interesting thing to me is the idea that pitchers turn it on or off at various points during a game. It does make sense, but I’ve never seen it talked about. And how would you design a study to find it? There are so many different variables to consider, as Bradbury and Drinen noted, that it might not be easy. Maybe you’d have to start by looking at pitch selection and velocity. Does anyone know of any such study?

Friday, September 15, 2006

NBA time of possession doesn't matter much

Here’s a nice basketball study from 82games.com … as it turns out, it doesn’t tell us all that much, but it’s interesting nonetheless.

In their “Random Stat” column, the (anonymous) author charts how time of possession relates to winning. It turns out there’s a small positive correlation of .13, meaning that teams who hold the ball longer have a slight tendency to be more successful. (But there’s a huge confidence interval around that .13, from -.24 to .+.47, “so it could be anything basically.”) And the bottom three teams made the playoffs, which reinforces the idea that the stat doesn’t matter that much.

The most extreme team was Phoenix, with a possession time of 46.94%. That means that instead of having the ball for 24 minutes, they had possession for only 22:28 – about one and a half minutes less than average. Assuming about 105 possessions per game, it means that every time they had the ball, they held it 0.84 seconds less than an average team.

Perhaps the reason time of possession doesn’t help much is that there are offsetting reasons why a team could be holding the ball less. On the positive side, they may be able to score faster. On the negative side, they may turn the ball over more. And the opposite applies to the other team’s time of possession; a good defense might keep the other team in possession for the full 24 seconds as they try (and perhaps fail) to get open for a decent scoring chance.

Another “Random Stat” study on steals shows how this might apply. Phoenix’s Steve Nash led the NBA last year by having 137 of his passes stolen. Assuming that the Suns would have kept the ball ten seconds longer if the pass wasn’t stolen, that’s 1370 seconds of possession lost, or about 17 seconds a game – all due to just the one cause, Steve Nash getting intercepted.

But Nash’s teammate, Shawn Marion, led the league the other way, by stealing 121 opponents' passes -- so it’s a bit of a wash. As a team, the Suns were fourth in the league in stealing passes, which increases their time of possession. Offsetting that, they were dead last in steals off the dribble or loose balls – they stole only 122 balls that way, while the league leader, Charlotte, stole 292.

Basically, what the correlation of .13 for time of posession tells us (assuming the .13 is close to the real value, and not just random luck) is that, in general, the good reasons for holding the ball longer tend to occur a bit more frequently than the bad reasons. Which doesn’t tell us a whole lot about strategy, or even about the tendencies of any given team. But it’s kind of fun anyway.

Wednesday, September 13, 2006

Fielding is only a small part of BABIP variance

Measures of a quarterback’s performance vary quite a bit from year to year. “The Wages of Wins” argues that’s because the results of a play depend on many of the players on the field, not just the one – the offensive line, the defensive line, the receiver and his defender, and so forth.Today, in a blog post, David Berri argues that quarterback theory is similar to Voros McCracken’s “DIPS”, which is based on the theory that the batting average on balls in play (BABIP) for a pitcher doesn’t depend on who the pitcher is.

Berri writes,

“Why are hits per balls in play not consistent across time for pitchers? It is because how many hits a pitcher allows depends upon the ability of the eight players surrounding the pitcher on defense.”

But I don't think that's true. How many hits a pitcher allows depends only slightly on the ability of the defense. Most of the reason that BABIP is inconsistent is just random chance.

Here’s the math – hope I did it right.

The variance of team BABIP for a season is about .01^2 (or at least it was for the 1985 NL; I was too lazy to check any more than that).

The binomial variance caused by luck in 4000 balls in play (about a season’s worth), each of which has a 30% chance of being a hit, would be .0072^2.

It’s hard to agree with Berri that teammates are the cause of the inconsistency. Even with the same teammates, there’s only 11% correlation, which is pretty inconsistent.Variance in BABIP is caused mainly by luck.

Monday, September 11, 2006

NFL turnovers are inconsistent from year to year

Are turnovers in the NFL largely random?

Apparently so. The Wages of Wins blog did a year-to-year regression, and came up with a correlation coefficient of about .15. That means only 15% of a team’s turnover tendencies carry over to the next season. (They do report the r this time, not just the r-squared.)

The also report that teams with more than 10 net turnovers (turnovers forced minus turnovers made) finished with 11 wins. Teams worse than –10 net turnvers won only 6 games, on average.

Which kind of makes sense. According to “The Hidden Game of Football” (pp. 143-144), a turnover is worth four points. And it takes 34 points to turn a loss into a win. So it takes roughly eight turnovers to make up a win.

The study doesn't give an actual number of net turnovers for the two groups, but it's got to be over 10. Let's assume it's 16. That would be two wins. That works out to 10-6, or 6-10, roughly what the study found.

Which is again evidence of turnovers being largely random. If they weren’t, you might expect turnovers to be an indication of a team’s quality in other ways. If that were true, turnovers would correlate higher with quality, and teams with lots of turnovers would tend to be even worse than the expected 6-10.

Friday, September 08, 2006

Payroll vs. wins -- still significant in 2006

First, the news story. Its prime exhibit is a chart of “dollars spent per win,” which is exactly what it sounds like: payroll divided by wins. It shows the Marlins leading, at $220,000 per win (as of this week), and the Yankees trailing, at $2.4 million per win. Ranked by dollars per win, the chart seems a nearly-random mix of good teams and poor teams, which writer Bill Freehling uses to imply that salaries don’t matter all that much.

However, dollars per win is a poor measure of payroll efficiency, because the relationship between pay and performance isn't linear. A team could, in theory, have 25 players making the minimum $327,000, which would make its payroll about $8.2 million. That team could probably still win a few games. Even if it only won 20 games out of 140, that would still be only $410,000 per win, which would appear to lead the league.

For the Yankees to match that kind of efficiency with their $195 million payroll, they’d have to win 475 games out of 140. That would be difficult, even if Derek Jeter's defense improved substantially.

So, clearly, the relationship isn’t linear. Some have suggested “marginal dollars per marginal win” as an appropriate measure. Even that one isn’t perfect, because of diminishing returns on increasingly high salaries. But at least it’s better. In any case, the chart isn’t very informative.

Freehling is on more solid ground when he points out that the teams in “the top-third [of payroll] spent about 117 percent more than the bottom-third but had won just 14 percent more games.”

But even that statistic doesn’t mean much out of context. Is 14% a lot? A little? What about 117%? What kind of relationship do these numbers imply, and do they really mean that salary doesn’t matter much?

14% more wins, in a 162-game season, is about 11 games in the standings. That’s a fair bit of difference between the high-spending teams and the low-spending teams. Not as much as one would think, perhaps, but it’s still 1.5 standard deviations from equal – a low-payroll team has a long way to go to catch up to the richer clubs.

Second, the blog entry itself. David Berri uses the article to confirm “The Wages of Wins’” claim that payroll is not an economically significant factor in predicting wins, because the r-squared of the relationship is only 17%. (As I argued here, the important number is the r, not the r-squared. An r-squared of 17% is an r of 41%, which implies that, in a certain sense, payroll actually explains 41% of wins.)

Berri notes that so far in 2006, the r-squared is 24%, which is statistically significant but doesn't carry much "oomph."But again, I argue that would make the r around 49%, which is pretty significant.

How signficant? Here’s another way of looking at it.

Start by observing that payroll doesn’t buy wins directly – it can only buy talent. Do a “what if.” Suppose that GMs could evaluate talent perfectly, and always spent exactly the correct amount for the player’s ability. Suppose we also were able to find a function to translate increased payroll into increased ability, whatever that function might be.

In that case, the correlation between salary and wins would be exactly the same as the correlation between ability and wins. So what would the correlation be between ability and wins?

It wouldn't be 100%, because teams don’t always win exactly in accordance with their ability. Just as a fair coin might have more or less than 50% heads, just by luck, a .500 team might go 90-72 or 77-85 just by chance alone. So r would be less than 1.

How much less? Tangotiger tells us here (see comments) that the variance of team ability is .06^2. Based on that, I ran a simulation, and came up with a wins/talent correlation around .83.

(This number happens to be SD(ability)/SD(wins) … which might not be a coincidence. A result from Tangotiger tells us that var(ability)/var(wins) is the correlation between two independent sets of wins with the same ability distribution … this might be a variation of that result.) So even if payroll completely determined talent, the best we could get would be an r of 0.83.

Stated in point form:

--If payroll had no relationship to talent, we’d get r=0.--If payroll had 100% relationship to talent, we’d get r=0.83.--For 2006, we actually get r=0.49.

Or, put into baseball terms:

-- Suppose there were a perfect relationship between payroll and ability. You’d find that a one SD increase in payroll led to a .83 SD increase in wins.--In real life, a one SD increase in payroll leads to a .49 SD increase in wins.

All things considered, the relationship between payroll and wins seems pretty oomphy to me.

Wednesday, September 06, 2006

Run estimator accuracy

This paper from the Retrosheet research page introduces a statistic that’s a “dramatic improvement” over OPS.

Gary M. Hardegree’s article (and his stat) is called “Base-Advance Average.” Here’s what you do: you consider how many total bases were available to be advanced; then you count how many of those bases the batter actually caused to happen.

So suppose there’s a runner on second. There are six total potential bases – two for the runner advancing home, and four for the hitter advancing home. If the batter then doubles, four of the six bases were advanced, and so the Base-Advance Average (I’ll call it BAA) for that plate appearance would be 4/6.

Hardegree figures that 84% of games were won by the team with the higher OPS. But 95.5% were won by the team with the higher BAA. Therefore, BAA is substantially better than OPS.

Which, of course, it is. But that’s because it uses a lot more information than other stats.

OPS is based on only the “baseball card” stats – the player’s basic batting line. BAA requires much more information than that – the number of bases available and the number advanced. The increase in accuracy is therefore not surprising -- by considering situational information, you can get as accurate a stat as you want.

Indeed, we could improve BAA further by adding all kinds of variables to the it. For instance, by somehow including the number of outs, we could get even better accuracy – weight the on-base portion of BAA higher if there’s nobody out, and the bring-runners-around portion higher if there’s two outs. Or add an adjustment based on who’s coming to bat next.

To take this to an extreme, there was a statistic floated around a few years ago, I think on the SABR mailing list, that predicted runs almost perfectly. If I recall correctly, it started like BAA, by counting all bases advanced and subtracting out bases lost via caught stealings and such. But it went one step further – it subtracted out “bases left on” at the end of an inning – so if the last out happened with a runner on third, the stat would subtract three bases.

What was left was exceedingly accurate – because, if you divided it by four, it was almost exactly identical to runs scored, by definition! Take bases advanced, subtract out bases given up, and you’re left only with bases belonging to runners who scored. The method was like the old joke about figuring out how many sheep are in a field – count the legs and divide by four.

So inventing a statistic that’s substantially better than OPS isn’t very difficult, so long as you’re willing to throw some play-by-play information into the formula.

Why, then, do we stick with OPS and Runs Created and such, when we can so easily be more accurate using BAA and other such stats? Two reasons.

First, OPS and BAA answer different explicit questions. OPS asks, “how can we determine offensive performance using only standard hitting stats?” BAA asks, “how can we determine offensive performance using detailed information on how many bases were advanced?”

The first question is probably more important, if only because we don’t really have play-by-play data for lots of players, such as Babe Ruth or some minor league callup.

Second, and much more significantly, the point of OPS is to take evidence of a player’s skill and translate that into a measure of his value. What’s better evidence of skill and talent – batting line, or BAA?

Suppose Player A bats always with the bases empty, and player B hits only with a runner on third. Each gets a single one out of every four times, and strikes out the rest.

A will wind up with a BA of .250. B will wind up with a BA of .250. But:

A will wind up with a BAA of .063. B will wind up with a BAA of .100.

The two measures give different estimates of ability. One implicitly argues that BA is the true measure of talent, and B appearing to have better BAA skill than A is an illusion caused by the situation. The other implicitly argues that BAA is the true measure of talent, and B and A appearing equal in BA talent is an illusion caused by the situation. [Does this remind you of the grue-bleen paradox?]

This gives us two different theories of what happens if both batters now hit with the bases empty:

1. If the true skill is BA, A and B will both continue to hit .250 (which means that both will now have a BAA of .063).

2. If the true skill is BAA, A will continue to have a .BAA of .063 (which means he will now hit .250), and B will continue have a BAA of .100 (which means he will now hit .400).

Which will actually happen? Could anyone really argue number 2 in good faith?

I may be wrong, but I would argue that we’ve probably come close to the limit of accuracy for batting statistics that use a basic batting line. The same batting line can and will lead to a different number of runs scored, if the events happen in different orders (such as if the hits are scattered, or all come in the same inning). No statistic can do better than this natural limit of accuracy, and I suspect that the traditional batting line stats (like Runs Created, Linear Weights, and Base Runs) are coming pretty close.

If that’s true, then any new, more accurate stat would have to use situational information for its improvement. And, given the evidence that clutch hitting is essentially random, those new stats would basically be adjusting for dice rolls. Which isn’t terribly useful.

Do batters learn during a game?

David analyzed 12 years’ worth of “Baseball Workshop” data (Retrosheet didn’t yet have those seasons at the time, I guess), and split batter performance by whether it was the batter’s first, second, third, or fourth plate appearance against the starter. Here’s the results for batting average; the OBA and SA charts would be similar.

So it does appear that batters “learn” to hit pitchers better as the game goes on. However, because of a bias in the data, the effect is probably stronger than the statistics show.

To stay in the game long enough to face a hitter for the fourth time, the pitcher has to have been pitching a very good game. More likely then, the pitchers in the fourth group are better than average pitchers. So for the batters, in the aggregate, to still improve the fourth time, they have to overcome an extra obstacle – that the fourth group of pitchers is actually of better quality than the other three groups. The .259 in the first group is obtained against all starters – the .276 in the fourth group is obtained against better starters. The value of “learning,” then, is actually greater than the 17-point difference in the data.

Notice that the PA barely change between the first and second group, indicating this is still largely the same group of pitchers. The difference there is nine points. But the second difference is four points when the PAs drop (meaning the pitchers have improved), and the third difference is again only four points when the pitchers have improved a lot. This is consistent with the hypothesis that the batters’ learning is constant, but the pitchers are getting better.

You could get a more accurate estimate of batter learning by looking at the caliber of pitcher. That’s what Tom Tango, Mitchel Lichtman, and Andy Dolphin did in “The Book.” They did it a little differently. David Smith’s study included only PAs where the batter was in the starting lineup. The Book’s study included all batters. It found that the quality of batters increased as the game progressed (presumably because of pinch hitting or platooning), so you can’t go by the raw batting stats – you have to adjust for the batters. They did.

Here are their results. The numbers are in “wOBA,” which is their measure of offensive effectiveness, and are the difference between what you’d expect the batter to do against that pitcher, and what actually happened:

So, the first time through the order, the pitcher was eight points better than you’d expect. The batters then take the advantage for the rest of the game.

To convert wOBA to runs per game, you divide by 1.15 and multiply by the number of PA in a game (call it 40). That means that the first time through the order, the pitcher’s ERA is about 0.25 lower than average. By the third time through, it’s about 0.25 higher than average. That’s a pretty big jump, from (say) 4.25 to 4.75.

You could save a tenth of a run per game by taking out your starter after 18 PA and bringing in someone new. That’s assuming that the increase is caused by the pitcher, and not by the batters just being warmed up and better against any pitcher. But the strategy probably wouldn’t work in real life – even with an 0.5 disadvantage, your ace starter is probably still better than your middle reliever. And the ensuing revolt by the starters would probably negate that tenth of a run pretty quick.

Saturday, September 02, 2006

OPS is kind of like Linear Weights

Here, Dan Fox does a bunch of algebra on the OPS formula and discovers that it’s similar in form and results to a Linear Weights function. So now we have a better idea why OPS works – because it’s a lot like Linear Weights.