Sunday, September 12, 2010

A study on how travel affects NBA teams

Something I didn't know: NBA teams score more points in the second half of the season than in the first half.

From the 1990-91 to the 2006-07 seasons, both the home and visiting teams scored almost a point higher later in the season than earlier. Specifically, the average score in the first half (that is, where both teams are playing games 1-41) was 100.0 to 96.5. In the second half, it was 100.9 to 97.4. (The home team's winning percentage was .608 in both cases.)

The difference of 0.9 points is statistically significant, more than 4.7 SDs away from zero. (The SD of home team points in a game was 12.9 in the first half, 12.7 in the second half. There were 9,206 games total in each half.)

Of all the findings in the study, I think that one is the most interesting. Anyone have any ideas why this happens, why scoring changes significantly as the season goes on? The study doesn't speculate. As you can tell by the title, its main concern is how travel affects team performance, so it concentrates on that issue.

The study's main regression predicts home team winning percentage based on a bunch of variables, repeated for home and visiting team. There are actually two identical regressions -- one for first half of the season, and one for second half. The variables were:

-- travel distance since last game-- total travel distance in last 7 days (excuding since last game)-- total travel distance in previous days 8-14-- total travel distance in previous days 15-28-- total travel distance more than 28 days ago (log)-- days since last game-- games in last 7 days-- games 8-14 days ago-- games 15-28 days ago-- total games so far this season (log)-- one of two dummies for this team (one for home and one for road).

As you might expect, the fatigue variables were significant, at least in the first half of the season. The more rest since last game, the better the chance of winning. But the visiting team effect was almost three times as large as the home team effect (.029 winning percentage points per rest day for the visitors, as compared to .011 for the home side).

Strangely, the number of games in the last seven days had the reverse effect -- the more recent games you played, the *better* your chances of winning this one. It's statistically significant for the home team, but almost zero for the visiting team.

Why would that be? I think maybe there's confounding with the "rest since last game" variable. Is there any reason to think that having four days rest before this game is twice as good as having two days rest? It's probably not. So maybe the "four days rest" estimate is too high. But, if you've had four days rest, you probably haven't played a lot of games in the last week. And so, the coefficient for that might come out negative, to offset the high "four days rest" estimate.

I bet that would disappear if you used dummies for days of rest, instead of the actual number.But ... in the second half of the season, the direction is reversed. The more games in the last week, the *worse* the performance this game. Moreover, now the effect seems to have moved from the home team to the visiting team: the home team coefficient is not significant, but the visiting team is (.021).

Another puzzler: in the first half of the season, the more total games already played, the worse the performance -- an effect that's statistically significant at more than 3 SDs. But, again, the effect reverses in the second half! That puzzled me a bit. I suspect it's because the study actually uses the logarithm of games: that means the numerical distance between games 1 and 2 is the same as between games 32 and 64. In the first half of the season, there's a wide variation in log(games) -- from 0 (game 1) to 3.7 (game 41). In the second half, it runs only from 3.7 to 4.4. So if the effect exists but isn't logarithmic at all, that might be causing funny things to happen.

Other than those, there are a few other variables that come up as barely significant in one half, but not the other. This suggests to me that they're random noise -- with 20 variables in each half, you're bound to wind up with some of that.

One such variable is "log of the number of miles flown by the team as of a month ago." You'd think when that those come up significant, but only for the first half of the season, and only at the 10% level, you'd just dismiss them. And, indeed, the author writes, "This relationship indicates a very substantial lag of distance travelled in the win production function."

----

A second regression examines the effects of time zone change on the probability of winning.

However, there's a problem. The study uses a signed value that indicates the direction of travel -- East Coast to West Coast is -3, but West to East is +3. That means things will just wash out.

Suppose jet lag causes bad play whichever way you move. That means that -3 will be bad, and +3 will be bad, and 0 will be good. That creates a U-shaped curve. But the regression is looking for a straight line -- and the best-fit straight line through a U is simply horizontal!

And, indeed, the regression comes up with a near-zero coefficient for time zone change. But we can't tell whether that's because (a) there truly is no effect, or (b) there *is* an effect, but it's the same no matter which way you're travelling.

-----

Finally, an apparent time-zone anomaly. When teams from the West play in the Eastern time zone, they get better. But when teams from the East play in a Western time zone, they get worse. This happens in both halves of the season, and both halves generally have the same pattern -- the more time zones difference, the larger the effect. For instance, here are the numbers for visiting teams in the first half:

I can't explain that, but, again, it could be some confounding. The regression includes the log of cumulative time-zone changes so far this season, and that might be less than linear (in the sense that twice the travel should give you less than twice the effect). So that variable might overstate the cumulative travel effect for west-coast teams (which travel more), and the "moving east" variable favors them in order to compensate.

-----

My overall impression of the paper is that there are too many similar variables confounding things, and so it's hard to get a true idea of whether or not there's anything real happening here. But I still wonder why teams score more in the second half of the season.

7 Comments:

A thought on 'moving west' being more harmful than 'moving east'. Most NBA games are played in the evening, and take about 2.5 hours. If a Celtic is used to playing from 8 pm to 10:30 or so, he has to play in Los Angeles from 11:00 to 1:30 am according to his body time. A Laker traveling to Boston just has to be able to play at 5:00.

I would posit that although you may think of 'jet lag' being equivalent if the issue is caused by total time spent in the air, having to perform at a high level when your body thinks its 1:00 in the morning is going to be harder than having to perform after eating a slightly early dinner.

Just a guess: scoring increases later in the season because players get worn down by the length of the season, and playing good defense is more physically exhausting than offense. Most players are more interested in offense than defense, so they don't exert as much effort on defense when they are tired.

Another possibility is that there is a "practice effect", where the players' shooting percentage increases through the season from playing actual games. We expect baseball players to play better a month or two into the season than they do right after spring training, don't we?