Saturday, August 01, 2015

New Study of NBA 3-PT Contest Heats Up Hot-Hand Debates

A new study of NBA All-Star Weekend three-point shooting contests by Joshua Miller and Adam Sanjurjo, posted to the Social Science Research Network (link), has re-ignited debates over the magnitude of hot-hand effects on basketball shooting. Miller and Sanjurjo have identified a bias in certain types of hot-hand calculations that appears to have led to underestimation of hot-hand effects in previous studies. While there appears to be a broad consensus (including the present writer) on the validity of Miller and Sanjurjo's point, numerous other issues are being debated among the lead writers and commentators on various sports blogs.

First off, let's review the aforementioned bias. Miller and Sanjurjo, as have others, compared basketball shooters' hit rates when hot (in this case, following three straight made shots) to their hit rates following three-shot sequences other than three straight hits (when players are less hot or even cold). The authors' SSRN paper notes that distortion stems from the fact that "conditioning on a streak of three or more hits creates a selection bias in which these hits are removed from the sample, leaving a smaller fraction of hits, thus driving conditional performance on the subsequent shot below the base rate" (p. 9). Here's a concrete illustration. Using part of an example Miller shared in an e-mail, where H = hit and M = miss, the sequence [HHHMHHHM] would yield the not-so-hot result that the player was 0-for-2 on shots following three straight hits, even though the player's overall shooting (6-of-8) was very hot. Further, with a correction formula devised by Miller and Sanjuro, hot-hand effects now appear to be larger than previously thought (at least within this type of analysis).

This finding has sent statistically oriented bloggers to their keyboards with great urgency. Columbia University statistics professor Andrew Gelman headlined his July 9 posting "Hey -- guess what? There really is a hot hand!" The last I checked, Gelman's piece had received 105 comments! Then, on July 21, all-around sabermetrician Phil Birnbaum weighed in on his blog with a posting entitled "A 'hot hand' is found in the NBA three-point contest."

Though some of the commenters on these blogs have gone back and forth over the proper magnitude of the correction for the aforementioned bias and other methodological issues, I think it's easiest to take Miller and Sanjurjo's findings at face value. Table 1 of their paper is very informative, presenting results for 33 players, with and without bias-correction. The authors correctly note that, when averaging over players, those with negative results (shooting worse after a hot streak) can cancel out positive results. A simple look at the frequencies of different results therefore seems warranted, so I have summarized the results concisely from Miller and Sanjurjo's more-elaborate table. Even with the bias-correction (which enhances how streaky a player looks), here's how many players show different increases in shooting percentage conditional on making three straight shots:

Accuracy Gain

After 3 Straight

Hits ("Hotness")

No. of

Contestants

.34

1

.20 to .22

3

.14 to .18

4

.11 to .12

5

.05 to .07

5

.01 to .04

10

Negative

5

Miller and Sanjurjo's claim that some players exhibit quite appreciable streakiness is well-supported. What about the "typical" or "average" performance? The median for all players (which is unaffected by how far in a positive or negative direction the most extreme values sit) is a .05 or 5% improvement after making three straight shots (16 players above .05, 2 at .05, and 15 below it). This is indeed stronger evidence for basketball-shooting streakiness than we've seen before. For example, a Harvard study (Bocskocsky, Ezekowitz & Stein, 2014) found approximately a 2% hot-hand increase for NBA in-game shooting, using SportVU tracking-camera technology (here and here).

There are a couple of possible reasons why Miller and Sanjurjo's findings may overstate hot-hand effects. As Birnbaum notes, within the 25-shot sequence of the NBA three-point contest, there are five locations, from each of which the player attempts five straight shots. Thus, if a shooter hits his first shot from a given location, he can rely on the same motor/muscle memory in launching the next four shots.

Further, players invited to the NBA three-point-shooting contest are known to be great outside shooters, and players with high base rates of success appear more likely than those with lower base rates to go on hot streaks. Therefore, it would be interesting to see what would happen with a more representative cross-section of NBA players. (Both the motor/muscle-memory and base-rate issues are discussed in my book.)

Even if Miller and Sanjurjo's 5% median hot-hand effect is not inflated, it is still probably a smaller magnitude than most fans would associate with the term "hot hand," as the authors appear to acknowledge. The double-digit percentage-point increases some shooters exhibit after three straight hits, on the other hand, would seem to be closer to a lay characterization of a hot hand.

In addition, Miller, in comments on the Gelman blog, holds the Harvard study to a very high level of scrutiny, in my view. Arguably, too high. For example, Miller notes that it omitted some possible control variables, including "the quality and identity of the defender." However, the Harvard study did control for "Distance of Closest Defender, Angle of Closest Defender, Shooter-Defender Height Difference, and [whether the shooter was] Double Covered." Once all these facets of the defense are accounted for, I don't know how much incremental knowledge we gain from knowing the defensive-efficiency of the player guarding the shooter. I therefore take the Harvard study's 2% estimate of a hot-hand magnitude as having probative value.

In the end, I come down closer to Birnbaum's relatively skeptical view -- including his point that Miller and Sanjurjo's finding should be described as "a" hot hand, rather than "the" hot hand, because, like all studies, it is context-dependent -- than Gelman's more accepting position. Miller and Sanjurjo's hotness magnitudes for the hottest-shooting players are higher than I would have guessed. But the magnitudes for median shooters are only slightly higher than what I would have imagined.

8 comments:

Dear Alan ReifmanThank you for taking the time to comment on the June 11th version of our three point paper. In reading your post, it appears that your comments are based exclusively on your reading of our three point paper, our brief discussion over email explaining the bias that we discovered, and Phil Birnbaum’s blog post on July 21 on the three-point paper, as well as our comment on his post (and perhaps a few of the comments on Gelman’s post?).While your comments are restricted to the evidence presented in our June 11 three-point paper, we have two other papers that are directly relevant to the issues you discuss, and it appears that you may not yet have had the opportunity to read through these. These papers---the 2014 “Cold Shower” paper in particular---contain three critical elements that we have not seen presented in any other work: (1) evidence of the hot hand in all extant controlled shooting studies (including the original study), and evidence on expert beliefs, (2) a technical discussion of the subtleties of measuring and testing for the hot hand, and (3) a comprehensive and technical discussion of the hot hand literature up to 2014. In our reply to your comments below, there are references to these two papers, and the comments on Gelman’s blog, to save some time. Before responding directly to your points, it is important to provide some context for the June 11th three-point paper, context that was not included in your post. In our 2014 “Cold Shower” paper (http://ssrn.com/abstract=2450479) we have corrected for a subtle bias in the original hot hand study of Gilovich, Vallone & Tversky (1985) that no one noticed before; we found strong evidence of the hot hand using their data. The Koehler & Conley (2003) paper, which analyzes four years of NBA Three-Point contest data (1994-1997), has been viewed by many as the replication of the original GVT study. In your book “Hot Hand: The Statistics Behind the Sports’ Greatest Streaks” you present a positive view towards this data and Koehler & Conley’s 2003 analysis: you note that this data is important because shot distance is held constant, and the atmosphere approximates an NBA game action because as you quote from Koehler & Conley, the data includes: “professional players, competition, high stakes, professional court, and a large crowd and television audience.” Others have concurred, including Thaler and Sunstein (2008), who describe it as “an ideal situation in which to study the hot hand.” In your book you also mention that you have unpublished studies of the NBA Three Point Contest from 1998, 2000, and 2002, which confirm Koehler & Conley’s conclusions. It turns out that this is an important issue here; there simply is not sufficient statistical power to detect a hot hand with just three or four years of three point contest data, as we have demonstrated in the 2014 “Cold Shower” paper (appendix). For the three-point paper we have coded sequential shot data from videos of 29 years of the NBA Three Point Contest to achieve necessary power, we have fixed a bias in the analysis of Koehler and Conley, and now the evidence is strongly in favor of the hot hand.

In your blog post you have a few specific points, but there are two general themes (responses to your specific points are further below):(a) You suggest that we are overstating the size of the hot hand effect. In fact, with our methods, it is quite the opposite: for three point contest data we are likely *understating* the hot hand effect due to measurement error and the pooling of heterogeneous responses. These two issues are critical and cannot be ignored (and even more so for game data). In the 2014 “Cold Shower” paper we cover these two issues, which we haven’t seen discussed by any other hot hand researcher, with the exception of Dan Stone and Jeremy Arkes, who discuss measurement error (see our comments on Gelman’s blog). In our 2014 “Cold Shower” paper we illustrate the measurement error issue with a stark example (page 63): for a regime shift model in which a player has a “normal” state, and a “hot” state, with a 20 percentage point boost in field goal percentage in the hot state, you will only ever detect a 4 percentage point boost, no matter how many shots you observe from a player (we do not cherry-pick parameters, and we are happy to send you the code so that you can convince yourself this is an issue). In terms of the second issue, the heterogeneity in effect size means that if you want to know how “big” the hot hand effect is, you should not look at the mean or median, because pooling attenuates the effect. Why not look at how many more players have big deviations in the direction of hot hand shooting than would expect from consistent shooting? We find many more than expected. (b) You suggest that the evidence we find, like any single study, is “context dependent.” This is not correct. The evidence we have presented across our papers is not equivalent to a single study and is not a single context either. In fact, we have analyzed every controlled shooting data-set that exists (only those that have reliable data, and authors who are willing to share), and what we find is that the existence of the hot hand does not depend on the specific context of the Three-Point Contest, and its rules. The hot hand also appears with comparable or greater magnitudes when (non-selected) players attempt many shots from the same position (Jagacinski, Newell and Issac [1979] and Miller & Sanjurjo [2014] , and as well as when players change position (angles) after each shot (Gilovich, Vallone & Tversky [1985]).

Now on to your specific points:(1) In your view the hot hand effect is at most modest in the three point contest because the median is small.We have to decide about what we mean by the hot hand. In the original GVT paper, and to this day, it has been defined as being in the zone (elevated performance state), and for some, hitting a few shots in a row is a good signal for that. We want to know if there is a big shift in a player’s probability of success when he/she is in the hot hand state.It doesn’t seem like you are arguing against the idea that some players may have large hot hand effects, which happens to be in accordance with comments of Phil Birnbaum, whom you reference. Birnbaum is willing to acknowledge that an individual player’s *probability* of success may change a great deal when he/she is hot in a game. On the other hand, he argues that this won’t have an appreciable impact on field goal percentages in a game, which is another matter. Instead you argue that the median (or average) player doesn’t have a large hot hand effect. First, while the median is likely understated in the Three-Point contest for the reasons we outline in (a), the median effect size is still appreciable, when you consider the degree of heterogeneity across players in the hot hand effect that we discover. Also, keep in mind that the difference between the median, and the very best 3 point shooter was around 10 percentage points (2013-2014 NBA season). Second, and more importantly, is the median player really what fans and players get excited about when they talk about the hot hand? Usually it’s the player or two on each team that is known for being streaky, and this is the player you want to identify. In fact, the existence of heterogeneity is precisely what gives you incentive to discriminate between players who get hot and players who don’t.(2) You echo Phil Birnbaum’s concern about the Three-Point Contest involving 5 shots being taken from a fixed location before switching to another location, and thus suggest that the hot hand effect that we discover may be an artifact of this shooting environment.As mentioned in (b), we have found significant and substantial hot hand shooting in all of the shooting environments that we have studied, which are numerous, and vary considerably in terms of shot conditions. What are the ideal conditions to test for the hot hand? We have tested for it in controlled environments in which the shooter always shoots from the same position (Miller and Sanjurjo, 2014; Jagascinski et al. 1979), in which the shooter must move between each shot (Gilovich, Vallone, and Tversky 1985), and in which 5 shots are taken at each of five positions (Miller and Sanjurjo, 2015a). Regarding the motor/muscle memory argument, while this is out of our domain of expertise, the underlying concern would seem to be allayed by testing for the hot hand in an environment in which the shooter either has to move after each shot, or has to always shoot from the same position (rather than switching on occasion). As discussed, we also find strong evidence of hot hand shooting in both of these environments.

(3) You note that NBA 3 point contest shooters are not representative of your typical shooter, and therefore their performance cannot be taken as representative.

It is true that the best three-point shooters in the NBA are not representative of the typical shooter, but again, the debate has not revolved around the typical shooter. Recall the debate over Larkey, Smith, Kadane (1989); which revolved around the importance of establishing whether or not the hot hand existed in a single (selected) player---Vinnie “The Microwave” Johnson. Tversky & Gilovich’s (1989) response: “The`Hot Hand': Statistical Reality or Cognitive Illusion?” shows that from the very beginning the central issue has been about whether or not observers are suffering from an illusion when they feel that an individual player’s ability has suddenly shifted. In any event, if you want to go beyond the best NBA shooters, you can still look at the other controlled shooting studies, which likely have a more representative selection of players, and the hot hand effect is alive and well there.One additional thing: there is possibly a bit of ambiguity in what you say about shooters with higher base rates being more likely to go on streaks; in one reading of this it sounds like we have not controlled for this, when in fact we have. The other reading, which seems to be a more reasonable one, is that shooters with high base rates are more likely to exceed their already higher chance of going on streaks (because of the higher base rate). A high base rate may signal a better ability to focus or sustain attention, which is the same ability one might need to sustain a streak. Not sure if this is what you meant.

(4) You mention that the beliefs of observers don't match the magnitude of the hot hand effect.

First, as a theoretical point, if you were asked to estimate Stephen Curry’s probability of success when he is in the zone, could we ever test whether you provided an over-estimate? No, we cannot. As a statistician we do not know when a player is in the hot state (or how exactly to characterize the hot state), so we must take a *proxy* for the hot state. With any proxy, measurement error is likely to lead to a vast understatement of the true probability (For example, consider what happens in the simple regime switch model mentioned above).Now let’s discuss the actual evidence that exists for people over-stating the hot hand effect, and we will see that it looks like peoples’ beliefs are pretty reasonable. It is useful to divide these people into two categories: (1) expert participants (players and coaches), (2) spectators (fans & announcers). If you are interested in seeing if there is evidence of the hot hand fallacy (or bias), it is more interesting to look at (1), because players and coaches actually make costly decisions based on their beliefs. Fans and announcers don’t have incentive to be accurate; in fact, announcers have incentive to describe what they just saw in a more exciting way. Further, for both fans and announcers it is not clear what they mean when they say hot---is it a prediction or an ex-post statement?When considering the beliefs of expert participants, we find in the 2014 paper that expert shooters do a remarkable job of identifying which players have a tendency to get hot, and which do not. This ability is valuable, as one needs to make important ball allocation decisions in a game.What did the original GVT study have to say about expert beliefs? The 76ers players were found to be suffering from a cognitive illusion because of their responses to the following questions, responses which now appear pretty reasonable given the new evidence: (1) most players report that on occasion they feel that after having made a few shots in a row they "know" they are going to make their next shot---that they "almost can't miss.", (2) most of them believe that a player has a better chance of making a shot after having just made his last two or three shots than he does after having just missed his last two or three shots." (3) most players report that after having made a series of shots in a row, they "tend to take more shots than they normally would, (4) it is important for the players on a team to pass the ball to someone who has just made several (two, three, or four) shots in a row. Notice all the questions are about ex-ante “probability” not ex-post field-goal percentage and the responses are not at odds with the evidence. The only “unreasonable” response is to an unnatural question: (5) The coach and five players estimated their field goal percentage for shots taken after a hit (mean: 62.5%) to be higher than their percentage for shots taken after a miss (mean: 49.5%), though it might be mentioned that the difference between 62.5 and 49.5 is roughly the strength of the average bias-corrected hot hand effect that we find in GVT’s controlled shooting study---See Table 3 in http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2627354). In any event, do players and coaches really make decisions based on estimated field goal percentage; is this an important measure?

(5) You mention that we hold the Bocskocsky, Ezekowitz and Stein (2014) paper to a high level of scrutiny, perhaps too high.

Actually, we state clearly that the Bockocsky et al. paper is the state-of-the-art for studying game data, and if anything, we are saying their results likely understate the true hot hand effect size in games. The thing we scrutinize is not their paper per se, but what can be concluded about the size of the hot hand effect from studying in-game data, and we do not think that they would disagree with us here. Further, the potential for omitted variable bias, which you claim that we over-state, is not the first order issue we mention. We mention measurement error, which is more severe in game data, in which shots are often taken tens of minutes apart. Further, looking at averages across players can only attenuate the effect further. Now, returning to omitted variable bias, Bocskocsky et al. address this as well as anyone can expect to, yet they still acknowledge, as anyone would, that their results may be sensitive to specifications which include variables that they do not measure. One example of a variable that we cannot currently track, but which might matter quite a bit, is that player’s all know that getting a clean look at the basket at a key moment in the shot is important. This has been verified with controlled studies (please see Joan Vicker’s work on the quiet eye, as well as the work of Oudejans, Oliveira and colleagues). There is every reason to expect this type of interference to increase with defensive adjustment, and SportVU cannot yet track if a player has a clean look at the basket (let alone defensive player appendages, which Bockocksy et al. note). To give just one other example, the identity of the defender is also not tracked, and we should expect better defenders to be re-assigned to a player who is truly hot. In summary, the science now says that the hot hand effect is large and robust across all controlled shooting environments (including the NBA’s Three-Point Contest). There is no science to support the view that the hot hand effect is small in games. Based on the available evidence it is not unreasonable to infer that the hot hand effect is big in games. Phil Birnbuam has recently written up his thoughts on our three point paper with a telling title: “Has the hot hand effect finally been proven?” Science is not about proof; it is about measurement, control, and inference. We do not currently have the right tools to measure how much Stephen Curry’s probability of success changes when he is in the zone during a game. What are we to believe in the interim? We say go with the theory & evidence we have, rather than the priors that were given to us by a 1985 study that has now been invalidated.

I thank Josh Miller and Adam Sanjurjo (M&S) for their detailed response to my posting, including a close reading of my book. :) Let’s take stock of where we are:

The recent M&S study of NBA three-point-shooting contests and the Harvard study (Bocskocsky, Ezekowitz, & Stein, 2014) of in-game shooting have been discussed here and on other sports and statistics blogs. Arguments have been proffered for why certain features of the studies may lead to under- or over-estimations of hot-hand effects. M&S have also provided links to additional studies they’ve conducted.

The numbers in the major studies are what they are. Readers can read these studies (and associated blog commentary) and now decide for themselves if the median or the upper end of the distribution of results (or both) from M&S’s NBA study are most informative, as well as the merits of the other points raised on the various blogs. In the remainder of this reply, I just want to clarify my arguments in four areas where I differ somewhat from M&S’s above comments.

1. When I said M&S’s NBA-contest results were “context-dependent,” I meant it in a different way than how M&S seemed to interpret it. M&S are correct that the only study of theirs that I had read was the one on the NBA contest. They have brought to my (and other readers’) attention that three-point-shooting hot hands have been detected not only in their study of NBA contests, in which players move to a new spot on the floor every five shots, but also in studies that involved shooters moving around after each shot and remaining stationary throughout. In that sense, hot-hand effects appear to transcend particular three-point-shooting contexts/paradigms/procedures and are not context-dependent. However, I wasn’t referring to different three-point-shooting contexts. I was referring to hot-hand contexts much more broadly. Not only have hot-hand studies been conducted in many different sports, but also with multiple paradigms within a single sport (e.g., golf studies using actual tournament data, as well as controlled studies with putting greens constructed in laboratories). Within the full spectrum of hot-hand research, from archery to baseball, and golf to horseshoes, basketball-shooting studies (even with procedural diversity among themselves) are just one context, in my view.

2. While acknowledging good points about the Harvard study, M&S stand by their earlier statements that the in-game defensive metrics available to Bocskocsky and colleagues (and for which they controlled) are not complete. I don’t know that any study would be able to control for all conceivably relevant variables and, accordingly, I’m satisfied with the Harvard study. M&S note that the Harvard study lacked an explicit measure of whether a shooter got a clear look at the basket. I would reply that, if a shooter/ballhandler is double-teamed by players who are only inches away and the defenders have a height advantage over a shooter (all variables that were in the Harvard study), the shooter is less likely to have had an open look than if the shooter were clear of all defenders by several feet.

3. I don’t know what to make of M&S’s claim that, “There is no science to support the view that the hot hand effect is small in games.” The Harvard study found roughly a 2% hot-hand magnitude in games. One can raise questions of whether the Harvard study underestimates the hot hand, but it certainly seems to be scientific! The Harvard study indeed includes three ingredients M&S propose for scientific research: "measurement, control, and inference."

4. I noted in my main posting that, because there appears to be a positive correlation, across sports, between overall success in a sport or task and propensity to streakiness, the fact that the NBA All-Star Weekend three-point contests invite players thought to be the league’s premier long-distance shooters should be taken into account. I want to elaborate on this point a bit.

There’s a famous quote from the late Harvard paleontologist (and sometimes statistical writer) Stephen Jay Gould that, “Long streaks always are, and must be, a matter of extraordinary luck imposed upon great skill.” (http://www.nybooks.com/articles/archives/1988/aug/18/the-streak-of-streaks/)

A list of the most famous U.S. sports streaks would include Joe DiMaggio getting at least one hit in 56 consecutive baseball games in 1941; Coach John Wooden’s UCLA men’s basketball teams winning 88 straight games in the 1970s, only to be eclipsed by Geno Auriemma’s UConn women, who won 90 straight a few years ago; and Tiger Woods making the cut in 142 straight pro golf tournaments. Even without their respective streaks, DiMaggio, UCLA/Wooden, UConn/Auriemma, and Woods would be considered among the finest practitioners of their sports.

Regarding baseball hitting streaks, it is not just DiMaggio (who compiled season-specific batting averages between .323-.381 from 1936-1940) whose general skill with the bat presaged a long hitting streak. Wee Willie Keeler, owner of the second-longest hitting streak (45 games, 1896-97), batted .371 and .377 in the two prior seasons. And Pete Rose, owner of the third-longest streak (44, 1978), batted .300 or higher in 12 of 13 seasons before his streak.

M&S suggested a mechanism for the (apparent) correlation between general success and streakiness, namely that, “A high base rate may signal a better ability to focus or sustain attention, which is the same ability one might need to sustain a streak.” This topic is certainly worthy of further investigation.