Wednesday, June 8, 2011

Iterative Strength of Victory (ISOV)

The next MOV-based rating system we'll look at is "Iterative Strength of Victory" or ISOV. ISOV is the invention of Paul Kislanko. ISOV is based upon the Iterative Strength Rating (ISR). Recall that ISR is defined as an iterative calculation:

Begin with all teams set to an even rating -- 100 in this case. Then, for each game played, give each team the value of their opponent's rating plus or minus a factor for winning or losing the game -- 25 in this case. Total all of a team's results, divide by the number of games played, and that's the end of a cycle. Then use those numbers as the start of the next cycle until you get the same results for each team for two consecutive cycles.

ISOV replaces the 25 point factor for winning (or losing) a game with a factor that is based upon the Margin of Victory (MOV). The factor is calculated in two steps. First, we calculate a "strength of victory":

SOVgame =

(winning score - losing score)

( winning score + losing score )

This will be a percentage from 0 (in the case of a tie) to 1 (in the case of a shutout), although obviously both of those results are unlikely/impossible in college basketball. The idea here is to capture the intuitive notion that a 42-34 win is stronger than an 82-74 win, even though in both case MOV=8. (I'm not sure I entirely agree with this, by the way. Is a 42-40 win really twice as good -- or even any better -- than an 82-80 win?) The next step is scaling the SOV:

SOVscaled =

SOVgame

(average SOV for all games)

The notion here is to convert our SOV percentage to a measure that is scaled around the average -- an average win will be worth 1, a less than average win less, and a more than average win more.

Finally, we'll multiply the scaled SOV by the average points per game:

Factor =

(average points per game per team)

×

SOVscaled[1]

This step seems arbitrary -- there's no particular reason to scale by the average points per game -- particularly since the scaled SOV isn't related to scoring. Most ELO-type systems (like this one) try to have a Factor of about 1/4 of the base weight (e.g., ISR uses a base weight of 100 and a factor of 25). However, because this is an iterative system, it isn't particularly sensitive to size of the factor (small factors just lead to more iterations), and testing ISOV with different values for the left-hand side of this equation show little impact. But at any rate this is how Kislanko defines ISOV, so we'll take this as a starting point.

Looking at the 2010 basketball season, the "average points per game per team" was about 68 points, and the "average SOV for all games" was 0.0866. Plugging these numbers into the formula(s) above and testing shows this performance:

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

ISOV

72.7%

11.09

This is pretty good -- the best of the MOV-based ratings so far and competitive with the best of the RPI-like ratings.

There are a several variants/experiments we can try. As mentioned above, changing the left-hand side of [1] has little impact on performance. We can also try changing the right-hand side to eliminate the second step, where we scale SOV to the average, and instead use just the raw SOV (which is the MOV as a percentage of the total scoring in the game):

Factor =

Weight

×

SOVgame

And we can choose "Weight" to give us an average Factor close to 25 (for the reasons expressed above):

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

ISOV

72.7%

11.09

ISOV (no scaling)

72.6%

11.10

This is only slightly worse than using the scaled SOV, so apparently scaling the SOV does not have a huge benefit. We can go a step further and eliminate SOV entirely, and just use the actual MOV:

Factor =

Weight

×

MOVgame

Again, we choose Weight to make the average Factor about 25. Over the last three seasons the average MOV was 11.5 points, so a Weight of ~2 will work:

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

ISOV

72.7%

11.09

ISOV (MOV)

72.7%

11.05

Interestingly, this actually provides a little better performance than SOV. This implies that (as I speculated above), a two point win may be just as good evidence that one team is better than another regardless of whether it comes in a 42-40 contest or an 82-80 contest.

As long as we're playing with the Factor function, we can try compressing it (to reduce the impact of winning by a greater margin) or expanding it (to increase the impact of winning by a greater margin). A simple way to do this is to take the square root (or log, or square) of our previous Factor function. (I used the equation from [1] for this experiment.)

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

ISOV

72.7%

11.09

ISOV (sqrt)

72.9%

11.19

ISOV (log)

71.8%

11.27

ISOV (square)

71.4%

11.48

Compressing/expanding the Factor function doesn't improve our results. Another interesting experiment is to crossbreed with LRMC. We can borrow the "RH" functions from LRMC and with a little tweaking use them to generate the bonus Factor for use in ISOV:

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

ISOV

72.7%

11.09

ISOV (RH [2006])

71.4%

11.37

ISOV (RH [2010])

71.2%

11.45

Both the "RH" functions perform more poorly than ISOV. Interestingly, the newer RH function -- which performs better in the LRMC rating -- performs more poorly when plugged into the ISOV rating. (I'm a little disappointed in the LRMC results, and I will probably revisit them to make sure that my understanding/implementation of LRMC is correct.)

Returning to the version of ISOV that uses the raw MOV as the bonus function (which we'll call IMOV for now), we can speculate on some possible improvements. The first thought is to treat games below some cutoff (e.g., games one by less than 4 points) as ties, on the notion that such games may not tell us anything about which team is better. So let's have a look:

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

IMOV

72.7%

11.05

IMOV (mov-cutoff=2.0)

72.3%

11.07

IMOV (mov-cutoff=4.0)

72.1%

11.13

Not much to work with there; any cutoff seems to worsen performance. We can also take a look at the other end of the MOV spectrum, and restrict the benefit of blowout wins. There are two thoughts here -- first, that winning by 40 is no better indicator than winning by 30, and second, that rating systems should discourage "Running Up The Score" (RUTS). We don't care about the latter, but perhaps the former makes this a good idea. We can test this by clamping all scores above a threshold to the threshold (i.e., we treat every score above +30 as 30). Let's see:

Predictor

% Correct

MOV Error

TrueSkill + iRPI

72.9%

11.01

IMOV

72.7%

11.05

IMOV (clamp = 30)

71.6%

11.59

IMOV (clamp = 40)

72.7%

11.05

There are very few games with MOV > 40, so there is little change at that level. Any stronger level of clamping reduces performance.

If IMOV ends up being the "best of breed" for MOV-based ratings I may revisit it with some additional tweaks, but for now we'll stop here. A couple of interesting results from these experiments: (1) the iterative framework seems to outperform the "Random Walker" framework regardless of the evaluation function (on second thought, let's wait on that conclusion...), (2) pruning the training data to eliminate close games, reduce the effects of blowouts, etc., almost never improves performance, and (3) scaling MOV in the various ways offered by ISOV does not improve performance.

The details can be found here (http://netprophetblog.blogspot.com/2011/04/testing-methodology-redux.html) but the general idea is that for each game we have a record which has the rating for each team as well as the margin of victory (MOV). I pull out about 80% of those games and train a linear regression using those examples. The linear regression is basically an equation that uses the ratings of two teams to predict the MOV, e.g., it will be something like:

12*Home Team's IMOV - 10*Away Team's IMOV + 6 = Predicted MOV

Then that equation is tested on the remaining 20% of the games -- the equation predicts the outcome and then it is checked against the actual outcome. The MOV Error is the square root of the squares of all those errors. RapidMiner handles all of that automatically.

I've found that home/away tweaking doesn't usually improve performance because the linear regression equation (see example above) derives a constant term to add or subtract (see the +6 above), and that always turns out to be more accurate than anything I can do a priori. And more complex home/away models (see http://netprophetblog.blogspot.com/2011/04/prophet-dick-vitale.html) don't add accuracy, either -- a tentative conclusion is that home court advantage doesn't vary much from team to team.