December 17, 2007

It’s taking longer than I anticipated to compile and analyze the context-dependency of various player stats by the method I outlined in my last post, so in the meantime I would like to shift gears and introduce a method that uses team stats to try to understand whether the offensive or defensive team controls various aspects of the game.

There’s an old saying in baseball that “good pitching always beats good hitting.” I want to examine what a claim like this is trying to get at, look at a method that attempts to objectively analyze whether it’s true, and then apply that method to many areas of basketball and see what we can learn.

What does it mean?

“Good pitching always beats good hitting.” Maybe this is saying that when good pitchers match up with good hitters, good pitchers always (or at least usually) come out on top. But what does “come out on top” mean in this context? It can’t just mean that the good hitters will hit lower than their average, because of course we’d expect this to happen to some degree when they face good pitchers (since their “average” is obtained by both facing both good pitchers and fattening up on bad pitchers). Similarly, it can’t just mean that the good pitchers will hold the good batters to a lower average than they typically hold batters to (because again the “typical” is a mix of facing good and bad batters). But what if we assume that the range between the best and worst pitchers is larger than the range between the best and worst hitters. Taken to an extreme, what if all batters were of equal ability but pitchers varies in their ability - then good pitching would always beat good (=average) hitting, and bad pitching would always lose to bad (=average) hitting. So maybe what the saying is getting at is that the distribution of talent among pitchers is more spread out than it is among hitters.

At the core of the idea is the fact that in every pitcher-batter matchup their is one side doing all it can to get a hit, and one side doing all it can to prevent the other side from getting a hit. But this doesn’t necessarily mean each side has an equal influence over the eventual result. The claim that “good pitching always beats good hitting” is expressing the view that when it comes down to it, it’s the pitcher who holds more control or influence over what happens in an at-bat. The pitcher is the one who imposes his will on the batter, not the other way around. Even the best batters don’t have the ability to overcome the best pitchers and get a hit (and even the worst batters don’t have the lack of ability to overcome the worst pitchers and not get a hit).

How can we measure it?

In the introduction to his 1986 Baseball Abstract, Bill James addressed the claim that “baseball is 75% pitching,” which one could take to be a close cousin to “good pitching always beats good hitting.” To rebut the claim, James went through a list of specific statements that would seem to follow if the claim were correct. Here is one of the specific statements and how James dealt with it:

The standard deviation of runs allowed would be larger than the standard deviation of runs scored. If pitching were 75% of the game, then one would expect the differences between teams created by pitching to be larger that those created by hitting. But they aren’t; the standard deviation of runs scored and runs allowed are almost the same.

The idea here is that if pitching really had all the control, then the results in a particular matchup would be determined by the ability (or lack of ability) by the pitcher. Batters would basically be passive participants whose performance in a given matchup would not be determined by skill or lack of skill on their part but by the quality of the pitcher they were facing. So every player’s batting average (or a team’s runs scored) would basically just be an average of the abilities of the pitchers they faced, which would mean there would be little variation between players (or teams) in batting average (or runs scored) assuming they all tended to face the same mix of opposing pitchers. On the other hand, some pitchers would be better than others, and they would consistently perform better because they would always be facing the same level of batters. There would be much greater variation in pitcher ability than hitter ability, which would lead to much greater variation in team runs allowed than runs scored (unless, in spite of the variation in pitching ability among players, it so happened that each team had the same mix of high and low quality pitchers and thus averaged around the same runs allowed).

How can we apply this method to basketball?

The only time I’ve seen this method applied to basketball was a brief mention in Kevin Pelton’s 2003 HoopsWorld article, Moving Beyond ‘Defense Wins Championships’. Pelton applied the method to offensive and defensive efficiency.

I think we have to break things down further to gain a real understanding of what’s going on. Many things go into a team’s offense (or defense) - there’s surely a mix of aspects more controlled by one side of the ball and aspects more controlled by the other. Put them all together and they’ll tend to cancel each other out, leading to the unsurprising result that basketball is basically 50% offense and 50% defense. What I think is much more interesting is separating out these different parts of the game and seeing how much each one is controlled by the offense or defense. So that is what I will attempt to do.

One difference in basketball from the pitcher/hitter example in baseball is that many basketball stats don’t have both offensive and defensive components on the individual player level. On a team level we can look at field-goal percentage and field-goal percentage allowed, but only the first of these has a counterpart on the player level. Similarly, by looking at team stats we can see how many shots a team blocked and how many times they had their shots blocked, but only the defensive side of blocks is available on the player level (though this will no longer be the case going forward thanks to the addition of Blocked Attempts to boxscores). 82games does a good job of tracking some of these missing player stats, but they don’t cover all of them. So at least initially I will just be looking at stats on the team level.

My method

I looked at all the team stats that I could think of. I used stats from boxscore data, 82games, NBA.com HotZones, and Harvey Pollack’s yearbooks. One team stat that can’t be analyzed with this method is possessions per game or pace. This is because in any given game both teams by definition have the same number of possessions.

I looked at each season separately. For a given season, I calculated how each team fared in the offensive side of the stat, and took the standard deviation of all the teams’ rates. I then did the same for the defensive side of the stat. I wanted to compare these two SD’s but I didn’t want a ratio (of offensive SD to defensive SD) for a variety of reasons, so I divided the offensive SD by the sum of the two SD’s. This way the resulting stat ranges from 0 to 1, with a higher figure indicating more offensive control, and .5 indicating equal 50% control by the offense and 50% by the defense. I’m not sure what the best name for this measure is. Offensive Control Rate or OCR is the first thing I thought of, but I’m open to other suggestions.

OCR = SD(tmoSTAT)/(SD(tmoSTAT) + SD(tmdSTAT))

An extreme case - free-throw percentage

Maybe the first part of the game that comes to mind when one starts talking about stats being controlled by either the offense or defense is free-throw shooting, which seems to be almost completely controlled by the offensive team. No matter how well a team plays defense, they can’t guard an opponent when he’s on the foul line. A team’s free-throw percentage is likely to be determined by the free-throw shooting skills of their players (and how frequently those players get to the line relative to one another). It’s not going to be determined by which defense they’re facing on a given night. Because of that, one would expect all teams to have around the same free-throw percentage allowed on the season (which would mean a low standard deviation in tmdFT%), while there would be greater variation in teams’ free-throw percentages depending on how well their players shot from the line (which would mean a higher standard deviation in tmoFT%). This is a slight simplification because defenses may be able to control who they foul, and thus some teams may disprortionately send their opponents’ lesser foul shooters to the line, and vice versa. This would lead to some variation in free-throw percentage allowed, but still one would think that the variation would not be nearly as great as it would be for free-throw percentage. Free-throw percentage just doesn’t have a defensive side to it like many other stats do (such as field-goal percentage).

Here’s an example of how I applied the method to free-throw percentage. For a given season I calculated each team’s free-throw percentage and then took the standard deviation of those percentages (SD(tmoFT%)). Then I took the standard deviation of team free-throw percentages allowed (SD(tmdFT%)). From these I calculated the Offensive Control Rate (OCR) for free-throw percentage by dividing the standard deviation of tmoFT% by the sum of the standard deviation of tmoFT% plus the standard deviation of tmdFT%.

FT% OCR = SD(tmoFT%)/(SD(tmoFT%) + SD(tmdFT%))

I then averaged each season’s league-wide free-throw percentage OCR. Over the 73-74 to 06-07 seasons, the average free-throw percentage OCR was .69, or 69% offensively-controlled. I think this provides a useful ceiling to compare other stats’ OCR’s to. If a stat has an OCR near 69%, it’s pretty strongly controlled by the offensive team.

THE RESULTS

Below is a link to a spreadsheet on Google Docs that lists all the stats I looked at, their OCR’s, their definitions/formulas, the source of the data, and the years for which I had data. I would have posted the table directly but I looked at almost 70 different team stats (though many of them are related). I will try to highlight some of the more interesting results below, but it may help to first take a glance at the table of all the results to get a feel for the numbers.

The top two stats in OCR were 82games stats indicating the percentage of a team’s field-goal attempts that came in the first 10 seconds of the shot clock, and the percentage that came with 16-20 seconds elapsed off the clock. Both had OCR’s of 0.73. Not far behind were the percentage taken with 21 or more seconds elapsed (OCR of 0.72) and the percentage taken with 11-15 seconds elapsed (OCR of 0.67). So the offensive team strongly controls when it takes its shots. Defenses generally can’t force teams to take quick shots, or prevent them from being able to get a shot off until the end of the shot clock.

Pace

Although we can’t look at pace in terms of possessions per game with this method, we can look at other proxies for pace like the percent of shots taken in the first ten seconds of the shot clock, and fastbreak points per minute. As we saw above, the first of those proxies for pace has an OCR of 0.73. And fastbreak points per minute also has a very high OCR of 0.70. Both of these suggest that offensive teams control the pace they play at. Teams push the ball or bring it up slowly, but they generally don’t force their opponents to push it or force their opponents to bring it up slowly.

Where teams shoot from

Another interesting high-OCR stat is the percentage of a team’s shots that are taken from behind the three-point line, with an OCR of 0.72. This suggests that the offense controls where it shoots from (at least in terms of threes vs. twos), and that the idea that defenses can keep their opponents off the three-point line may not hold water.

Looking at 82games and NBA HotZones stats we can see that even when shooting zones are further broken down, it generally is the case that the offensive team is controlling where it shoots from. The 82games stats %Dunk, %Tips, %2PJump and %3PJump all have OCR’s in the 0.65 to 0.69 range. %Close is lower, at 0.60. Similarly, the HotZones stats %08_16, %16_24, and %24_Plus all have OCR’s between 0.68 and 0.69. But again, %00_08 is less controlled by the offense, with an OCR of 0.58.

Mixed bags:

Assists

Looking at some 82games stats, we can see that the percentages of a team’s dunks and close shots that are assisted are mostly controlled by the offense (OCR’s of 0.60 and 0.62), but the defense seems to have more control over the percent of jumpers that are assisted (OCR of 0.53). I’m not sure how to interpret this.

Rebounds

Rebounding appears to be more controlled by the offensive team than the defensive team (the OCR of offensive rebounding percentage is 0.56, meaning tmoORB% has a greater standard deviation than tmdORB% or tmoDRB%). In other words, teams (and players?) separate themselves in their differing offensive rebounding abilities, while defensive rebounding is more the default and doesn’t vary as much throughout the league. However, the OCR of 0.56 isn’t that extreme compared to some other stats, which means things aren’t completely one-sided. One possible theory for why rebounding is more controlled by the offense is that coaches make strategic decisions to either have their players crash the offensive glass or to have them instead get back on defense. Those differing strategies from team to team would increase the variation in offensive rebounding percentages, but of course it’s also possible that coaches could have different strategies in terms of sending players to the defensive glass versus having them leak out on the break, which would increase the variation in defensive rebounding percentages.

What areas of the game are controlled by the defense?

Shooting percentages by distance

We already saw that the defense seems to have some control over the percentage of their opponents’ shots that are taken near the basket (at least more than they do over how many threes their opponent takes). This pattern extends to field-goal percentages - FG% for shots from 8-16 feet has an OCR of 0.58, 16-24 ft FG% has an OCR of 0.55, and 24+ ft FG% has an OCR of 0.55. But for shots from 0-8 feet, the OCR is 0.48, which means that it is the defense that is primarily in control (82games’ close-shot FG% also has an OCR of 0.48). This is very close to 50%, so it’s not like the defense is in total control, but relative to the OCR’s for other stats, this is one of the areas where a defense can have its greatest impact.

Fouling

The defense also exhibits a lot of control over whether players are fouled on shots. The OCR for shooting fouls drawn per two-point shot (taken from 82games) is 0.51, which is low compared to most other stats. If we look at an estimate of free throw trips per two-point shot (which allows us to use many more seasons’ worth of data), the OCR is 0.46. These numbers suggest that while players may have the ability to draw shooting fouls, there is also a large degree to which defenses are more or less prone to committing shooting fouls.

A similar story holds for other types of fouls. The OCR for personal fouls drawn per play is 0.46, and for offensive fouls drawn per play it is 0.41. This last figure is one of the very lowest OCR’s that I found, suggesting that charge-drawing may be more of a controllable skill than offensive-foul-proneness is (the OCR is below 50% even with illegal screens included among the offensive fouls, which one would assume are more offensively controlled).

Steals and turnovers

Steals are more controlled by the defense, with turnovers from steals per play having an OCR of 0.45. Bad-pass turnovers per play has an OCR of 0.50, and ball-handling turnovers per play has an OCR of 0.51, indicating that the defense has a good deal of control over both. However, some bad-pass and ball-handling turnovers are also steals for the opponent, and some aren’t. If we look at non-stolen, non-offensive-foul turnovers (one could call these “unforced” turnovers) we get an OCR of 0.58, which as we would have expected suggests that those types of turnovers are more controlled by the offensive team than the defensive team. Team turnovers, which consist of inbound, backcourt, and shot clock violations, have an OCR of 0.59, perhaps somewhat surprisingly suggesting that the offense mainly controls these types of turnovers.

Blocks

Finally, we get to the stat (along with offensive fouls) that is most controlled by the defense - blocked shots. The percentage of a team’s two-point shots that were blocked had an OCR of 0.41. This isn’t a surprise, as shot blocking has usually been thought of as a skill that different players have to greatly different degrees, moreso than the tendency to have your own shots blocked.

Compound stats

We can also look at compound stats that combine the basic stats discussed above into more comprehensive measures. Generally I don’t think this is as useful as looking at the isolated components, but there may be something to be gained from it nonetheless.

First let’s look at Dean Oliver’s Four Factors - shooting, turnovers, rebounding, and getting to the foul line. There are a variety of ways to measure each factor, but here’s one breakdown (which actually uses two compound stats and two basic stats):

Factor OCR
------- ---
eFG% .53
TO/Play .48
ORB% .56
FTM/FGA .48

Of course, as with all the stats we’ve looked at, even if one side of the ball doesn’t have much control over a stat, that stat could be very important to winning, and thus even a slight gain in it could pay large dividends. So while defenses may have more control over getting to the foul line than shooting, the importance of this may be outweighed by the fact that Oliver and others have found shooting to be a much more vital factor to winning than getting to the foul line.

Now, finally, we can look at the ultimate compound stats - offensive and defensive efficiency, or points per 100 possessions. The average OCR for offensive efficiency from the 73-74 to 06-07 seasons was 0.52. This suggests that overall, both sides of the ball have a lot of influence over how many points a team scores on a given possession. We can also look at some trends over time:

Which areas should players and teams focus on improving in practice, and which should they put less time into because not much improvement is likely in that area anyway? This is something that OCR’s can help answer. Similarly, OCR’s can help determine which general strategies coaches should try to employ to improve either their team’s offense or defense. This applies to player acquisition as well, in terms of which areas of a team one can most hope to improve by bringing in players that excel in that area.

Identifying flukes and anticipating regression to the mean

If early in a season a team’s free-throw percentage allowed is very high or low relative to the league average, knowing that free-throw percentage is strongly controlled by the offense we would suspect that their current figure is something of a fluke and that it will move toward the league average as the season progresses. OCR’s can be used as a way of estimating regression to the mean on the team level. Though really you would want to go beyond OCR’s and look at the variation in the offensive and defensive sides of a stat separately (two stats could both have OCR’s of 0.50 but one could have high standard deviations for both offense and defense while the other could have low standard deviations in both).

Predicting matchups

In trying to predict what will happen in various areas of the game when two teams meet in a game or series, we can do better than just averaging one team’s offensive stats with the other’s defensive stats. For example, given the high OCR of free-throw percentage, we would weight the offensive team’s season free-throw percentage higher than the defensive team’s season free-throw percentage allowed in predicting how the offensive team would do from the foul line. Knowing whether good offense beats good defense in various stats is especially important in the playoffs, when the matchups are typically between two teams that are both good in many statistical categories.

Potential follow-up research:

Trends and changes over time

Instead of looking at the average OCR over a number of seasons for a stat, one could look at how the OCR has changed from year to year. From this one could look for general trends over time as to which areas have been controlled by the offense or defense. One could also see how rule changes have affected things.

Player-level stats

The idea of comparing the standard deviations of the offensive and defensive sides of various stats doesn’t have to be limited to team stats. It can also be applied to the player level. It won’t work for stats like field-goal percentage that don’t have a defensive side for players, but it will for things like rebounding. Where the method can be applied, it can potentially give us a fuller picture than we can get from just looking at team stats, where other factors can cloud things (such as the earlier baseball example of variation in pitching ability being hidden on the team level by teams having the same mix of talents and thus the same average runs allowed).

Year-to-year correlations

Another way we can look at controllable aspects of the game is to see whether teams’ performances are consistent from year to year. If the OCR of the percentage of shots taken from three-point range is high, this suggests that defenses have little control over this area of the game. But isn’t one aspect of the Spurs defensive success that they consistently keep their opponents off the three-point line? By looking at year-to-year correlations we can better understand these types of cases.

Game-by-game matchups

In the discussion above about using OCR’s to predict matchups, it was suggested that one could come up with something better than just averaging Team A’s offensive rebounding percentage with Team B’s defensive rebounding percentage to predict what Team A’s ORB% will be when they match up. By using game-by-game data we can test this out - the OCR suggests that ORB% should be weighted more than DRB%, but does past data bear this out?

Team-specific game-by-game variation

Another approach would be to look at specific teams and try to see how much variation they have in different statistical areas from game to game. Those areas in which they show greater variation can be seen as areas in which their opponents have been able to exert their will against them, whereas areas where there is little variation are those in which no matter what their opponents do, the team fares the same. I believe an approach like this lies behind Dean Oliver’s Roboscout program, though that’s just based on the few somewhat-dated public articles I have read about it.

I think that’s what I meant, but I confused even myself with that sentence. In the example, if bad pitchers control matchups with bad hitters, then even the worst hitters are going to get some hits against the worst pitchers.

Hello, i read your blog occasionally and i own a similar one and
i was just curious if you get a lot of spam responses?
If so how do you protect against it, any plugin or anything you can
recommend? I get so much lately it’s driving me crazy so any help is very much appreciated.