Monday, April 18, 2011

RPI Distribution

In our previous ill-fated excursion triggered by Dick Vitale, we mentioned briefly the formula for combining the terms of RPI:

RPI = (WP * 0.25) + (OWP * 0.50) + (OOWP * 0.25)

This formula is pleasingly symmetrical but it isn't obvious at a glance why the terms are weighted as they are. Andrew Dolphin provides a (possible) explanation on his web page: Essentially, the numbers are chosen to make the formula the best approximation for an "ideal" RPI that went to an infinite depth, i.e., included terms for OOOWP, OOOOWP, etc. The "proper" weightings are determined by the ratio of conference games to non-conference games, and for basketball Dolphin gives the following ideal formulas:

The first is very close to actual RPI formula, so it's possible that the NCAA chose that weighting intentionally. As an experiment we can try the second alternative to see if that provides better performance:

Predictor

% Correct

MOV Error

1-Bit

62.6%

14.17

RPI (unweighted)

74.6%

11.53

RPI (unweighted, 23+23+54)

75%

11.37

And indeed it does, improving in both metrics over the unweighted RPI. A quick hill-climbing experiment with other values for the distribution hits upon this alternative:

Predictor

% Correct

MOV Error

1-Bit

62.6%

14.17

RPI (unweighted)

74.6%

11.53

RPI (unweighted, 15+15+70)

75.4%

11.49

which improves the number of correct predictions even further, at the cost of a slight degradation in MOV error. At this point, I'm favoring improvements that increase the % Correct over the MOV Error, so we'll take this as our new highwater mark.

Of course, since we're using the RPIs of both teams as inputs to a predictor, there's no reason that we have to combine the elements of the RPI at all. We can feed the elements of the RPI (e.g., the home team's WP, OWP and OOWP as well as the away team's WP, OWP and OOWP) directly into our predictor and let it choose the best weightings. This also has the advantage of not requiring the same weightings for both home and away. As it turns out in this case that doesn't result in signficantly better performance -- it drives the MOV error down slightly at the cost of % Correct -- but it's an option we can keep in our back pocket.