Wednesday, September 21, 2011

There is muchtalk in sports statistics circles about pace-adjusted statistics. As Wikipedia puts it:

A key tenet for many modern basketball analysts is that basketball is best evaluated at the level of possessions.

The notion here is that because teams play at different paces, game-level statistics can be misleading. A team that averages 95 points per game is not necessarily better than one that averages 78 points per game. The higher-scoring team may simply be playing at a much faster pace. We can account for this by measuring statistics per possession rather than per game.

While this makes a lot of intuitive sense, I always like to test my intuitions. So I took the same set of statistics used in this posting and re-calculated them as per-possession statistics. (See here for how to estimate the number of possessions in a game.) Then I ran the prediction model using the per-possession statistics. (Obviously some statistics, like "Field Goal Shooting Percentage" are not calculated on a per-game basis, so those don't get pace-adjusted.) Here is the performance comparison:

Predictor

% Correct

MOV Error

Govan + Averaging

73.5%

10.80

Statistical prediction (per-game stats)

72.2%

11.09

Statistical prediction (per-possession stats)

72.2%

11.10

As you can see, the two approaches were indistinguishable. Not only was performance nearly identical, but they both selected the same statistics for the prediction model. So at least for this case, it doesn't appear that adjusting for pace improves performance.

My guess is that the relative unimportance of pace is due to the shot clock and the copycat nature of coaching. There probably isn't enough pace variation across teams to make it a significant factor.

If you search around for "pace-adjusted statistics" you'll eventually stumble across Ken Pomeroy's Four Factors page. The four factors are derived statistics that are intended to give additional insight into how teams play. The factors are:

Effective field goal percentage

Turnover percentage

Offensive rebounding percentage

Free throw rate

(Definitions can be found on Ken Pomeroy's page.)

"Effective FG%" is not of interest to me because the linear regression can adjust the relative importance of field goals versus three-point attempts. "Turnover %" is turnovers per possession; that's one of the statistics I calculated as part of the per-possession statistics experiment above. (It had no value in the predictor, fwiw.) "Offensive rebounding %" is a more interesting statistics, and since offensive rebounds are used by the statistical prediction model, this seems like a worthwhile statistics to investigate. "Free throw rate" seems to capture some notion about how often a team draws a foul. I think that's already captured, but it isn't difficult to generate this statistic.

If I generate these two new statistics and run the prediction model, I find that performance remains the same, but the "Offensive rebounding %" statistics replace the per-game or per-possession offensive rebounding statistics. ("Free throw rate" has no predictive value and is eliminated in the linear regression.)

Since three point shooting percentages are used in the predictor, I decided to define a new statistic to capture how much a team relies on the three-point shot (and impacts its opponents use of the three-point shot). I defined this as:

Offensive Balance = (# 3 Pt Attempts) / (# FG Attempts)

and re-ran the predictor. The new statistic has no predictive value. An alternative formulation is to look at the made 3 pointers versus the made field goals:

Offensive Balance = 3*(# 3 Pt Made) / 2*(# FG Attempts)

but again, this statistic has no predictive value.

I'm open to suggestions if anyone out there has any thoughts on similar "derived statistics" that might be of value in prediction.

Monday, September 19, 2011

One thing we want to consider in doing statistical prediction (or any sort of prediction where we have a variety of dissimilar inputs) is to normalize our inputs. The purpose of this is to be able to compare inputs that have different scales. For example, in my data set, home team scoring average varies from 43 to 102, while "steals by the away team" varies from 0 to 13, so it's hard to compare those two numbers directly. And we don't want our prediction model to favor one data over another just because it has a bigger absolute value. To address this we can "normalize" our data to similar scales.

I mentioned here that Brady West normalizes all the input data to his model by subtracting the mean and dividing by the standard deviation -- this is called "standard score." Instead of knowing that the home team scored 108 points, you'd know that they score 2.38 standard deviations above the mean. That sounds like a fine approach to me, but as it turns out, RapidMiner (the tool I'm using to do the predictive models) doesn't offer that as an option. It does, however, offer a z-transformation, which transforms the data so that it has a mean of zero and a standard deviation of 1. If we apply that to all of our inputs, we'll have more of an apple-to-apples comparison. For example, the home scoring average ends up ranging from -9.96 to 3.99, while the away team's FT percentage varies from -14.34 to 4.87 -- giving you some sense that there is more variance in FT shooting percentage.

If we apply the z-transformation to our inputs, there is no change in performance for the model that takes only scoring averages. That's reasonable, since the scoring averages are all basically on the same scale anyway. But when we throw in a second data point with a different scale, the difference becomes apparent:

Predictor

% Correct

MOV Error

Govan + Averaging

73.5%

10.80

Scoring averages

72.1%

11.18

Scoring + 3 pt % -- Without normalization

72.1%

11.18

Scoring + 3 pt % -- With normalization

72.1%

11.09

So as a matter of course I'll perform a normalization step as part of the prediction workflow. (In this case, it doesn't improve our best performance by much.)

It's also interesting to compare the coefficients in our linear regression. This is what we see if we look at the coefficients for the various scoring averages:

Datum

Coefficient

Home Team Scoring Average

5.886

Away Team's Opponent Scoring Average

-4.447

Away Team Scoring Average

-5.686

Home Team's Opponent Scoring Average

4.793

Naively, you might want to predict a team's score as exactly halfway between what the team usually scores (offense) and what the other team usually gives up (defense); but what this shows is that the best estimate actually weights offense slightly more -- 57% for the home team, 54% for the away team.

Friday, September 16, 2011

With this post, I'm going to start taking a look at predicting game outcomes based upon team-level statistical measures other than won-loss or MOV, i.e., measures like "team scoring average," "average number of offensive rebounds per game," etc.

There are a number of ways to slice & dice these statistics, but the most straightforward approach is to use season-to-date averages. So, when I'm trying to predict the Illinois-Purdue game on 2/15, I'll be looking at the statistics for those two teams averaged over all the games for that season before 2/15. And I also want to include average statistics for a team's opponents. So I want to know both Purdue's scoring average for all of its previous games, and also the scoring average of its opponents in those games. For every game, I'll typically have four values for a statistic: the home team's average, the home team's opponents' average, the away team's average, and the away team's opponents' average.

To begin with, let's look at how well we can predict games using the most obvious statistic: the scoring average. Using just the (four) scoring average statistics, and the usual methodology, here's our performance:

Predictor

% Correct

MOV Error

Govan + Averaging

73.5%

10.80

Scoring averages

72.1%

11.18

That's pretty encouraging. Just using the scoring averages delivers performance comparable with some of our better W-L and MOV-based predictors. The bad news is that this is still highly correlated with our best other predictors (around 96%), meaning that it probably can't be used in an ensemble to improve our overall predictive performance.

If we look at adding other statistics we find (as would be expected from the literature) that they offer little improvement. The best combination I could find (in order of importance) was (1) scoring, (2) 3 pt percentage, and (3) opponent's average offensive rebounding:

Predictor

% Correct

MOV Error

Govan + Averaging

73.5%

10.80

Scoring averages

72.1%

11.18

Scoring + 3 pt % + Opponent's off rebounding

72.2%

11.09

As you can see, the improvement was not huge. The inclusion of "average number of offensive rebounds by opponents" is interesting because it is not scoring-related. That statistic would seem to capture some aspect of a team's defensive performance -- a team that gives up a lot of offensive rebounds to its opponents is probably doing something wrong at the defensive end of the court. That suggests that we might want to think about a better measure of defensive performance -- for example, we might want to look at offensive rebounding percentage rather than just the raw total.

[Gill 2008] "Assessing Methods for College Football Rankings," JQAS 2008

Summary: This paper purports to "...consider several mathematical methods for ranking college football teams based on point differential... [and] assess the predictive performance of these models using leave-one-out cross validation." The models considered are variants of least-squares fitting of rating values to point differential. Variants include different fitting methods (e.g., weighted least squares) and methods for limiting the impact of blowouts (e.g., cutting off the point differential at 14 or 28 points). Predictive performance is used to assess cutoff values for blowouts.

Comment: A disappointing paper for me; from the title and abstract I had hoped that this paper would analyze some set of football ranking approaches for their predictive value. Instead, the main conclusion of the paper seems to be that one can construct a rating system that emphasizes nearly any aspect of competition by selecting the right approach and tuning constants.

Summary: This paper describes a method for ranking college football teams. The method uses (potentially) score, location (home or away) and time of season to create an initial value for each game, and then iteratively re-rates games until equilibrium is achieved. The method has a number of parameter/options, and the paper evaluates the performance of several combinations. Performance is measured by the % of correct predictions for bowl games. Over 9 seasons, the best combination predicts about 59% of the total bowl games correctly, and about 63% of the total BCS bowl games. In contrast, over the same span the BCS computer rankings have predicted about 57% of the BCS games correctly.

Comment: A fairly interesting paper, and apparently the work mostly done by an undergraduate. The approach is at least somewhat novel -- it involves creating a graph where the nodes are teams and the links are games between teams, and then summing all the simple paths originating from a team and going out "K" links. (Where K is a parameter, but K=4 was the best performing.) There's no intuitive (to me at least) meaning for doing this, but to some extent it captures the strength of opposition, the same way RPI uses OWP, OOWP, etc. I'd like to implement and test this system, but the naive implementation for calculating all the simple paths is likely going to be very slow, and if there's a clever matrix implementation it doesn't occur to me. I've put a question in to the authors asking about their implementation.

[Loeffelholz 2009] "Predicting NBA Games Using Neural Networks," JQAS

Summary: An ensemble of several different neural networks fed with team statistics was used to predict NBA games. Performance was assessed using "% Correct" and compared to consensus picks from five experts published in USA Today. The ensemble methods did not improve upon the best included baseline predictor. The best predictor (feed-forward NN) predicted 74% of the test games correctly (compared to 69% for the human experts).

Comment: There are number of interesting results in this paper. First, the authors looked at both (1) splitting team statistics based on home/away, and (2) using only the most recent 5 games, and in both cases found no value. This agrees with my own experiments with similar approaches. Second, the authors experimented with various combinations of statistics and had the best performance using only FG% and FT% for each team.

Summary: This paper describes an effort to use various machine learning techniques to predict NBA game outcomes (as well as some related tasks). Inputs to the learning process were 62 features for each game -- most features were averages for the current season and the previous season of team statistics such as rebounding, shooting percentage, etc. The most effective technique was linear regression, which predicted about 70% of games correctly -- comparable to human experts. The most important statistics were team winning percentage in the previous season, and (in decreasing importance) defensive rebounds, points made by opposing team, number of blocks and assists made by opposing team.

Comment: A fairly straightforward attempt to predict NBA games based upon team statistics. Prediction accuracy is in line with similar work (although below Loeffelholz) -- around 70% seems to be fairly easy to achieve for NBA games. There's no attempt to predict MOV.

Comment: The methodology here is similar to my methodology -- the research uses a x-validation on the entire NBA season. However, there is one very important distinction. This research uses the entire season's data to predict the held-out games -- not just the season up to the time of the predicted game. This makes a huge difference in prediction performance, so take the authors' result of 76% accuracy with a grain of salt. It's likely that the accuracy using season-to-date data would be 15-20% lower.

Summary: This paper compares a number of simple systems for predicting college football bowl games.

Comment: This is a difficult paper to analyze. It is written in a very colloquial, unorganized manner and lacks a clear purpose. The systems analyzed are described in vague terms that make it difficult to understand the computational implementation, or even to attribute authorship of the systems. All that said, at least one system described has out-performed the Las Vegas line (by 1 game) over a 7 year period.

[West 2008] "A New Application of Linear Modeling in the Prediction of College Football Bowl Outcomes and the Development of Team Ratings," JQAS 2008

Summary: This paper uses linear regression to build a predictive model for college football bowl games. The inputs to the model are average statistical measures (e.g., "Offensive yardage accumulated per game"). The model predicted 19 of 32 bowl games correctly (59.4%).

Comment: This paper is of particular interest to me at the moment because I've also turned to looking at prediction using statistical team measures. This work seems to agree with my result that only a few measures (mostly related to scoring) have significance in the final model. Also of interest here is that West pre-conditions his statistical measures by expressing all of them in units of "standard deviations from the mean."