Can recruiting rankings predict performance?

We spend so much time on college football recruiting. Ace works on it full time, and the Mathlete uses it in his models. If you don't believe how much MGoBlog readers care, check out the comments on Ace's post today.

But how far can we get with just recruiting rankings? For example, if we just had the last 4 years of Rivals recruiting data, could we predict how Michigan would do next season?

Recruiting Analytics.

In a recent SI.com article, I used linear regression to answer this question. This method determines the linear relationship between predictor and outcome variables. For recruiting, the predictor variables are a 4 year window of team recruiting ratings while the outcome variables are the next year's team performance given by The Power Rank algorithm. This team ranking system assigns each team a rating that gives an expected margin of victory against an average team. (In 2012, Michigan had a rating of 8.7.)

The linear regression method gives a weight to each of the last 4 years of recruiting ratings. These weights imply a team rating for the next year, and these team ratings are sorted to give preseason rankings. To test this recruiting model, I compared whether these rankings or the preseason AP poll better predicted the final AP poll. (Yes, there are many problems with using the final AP poll as a measurement of team strength, but it's a starting point.)

For the last 100 teams in the preseason AP top 25, the recruiting model did as good or better on 46 of these teams. This is remarkable given the lack of information in the recruiting model. It knows nothing about winning tradition, players lost to graduation or the NFL or any number of factors used by writers who vote in the AP poll.

What About 2013?

The good news: Michigan is 9th. Hoke has recruited top 10 classes the last two years according to Rivals. This followed two classes that didn't break the top 20.

The bad news: Ohio State is 3rd. Starting in 2013 and going back, they've had the 2nd, 4th, 11th and 25th ranked classes by Rivals.

Nebraska leads the rest of the Big Ten at 20th. Notre Dame checks in at 4th with their big recruiting year.

Don't take the rankings too seriously. They consider a very limited amount of information for next season. However, they can be useful for navigating expectations.

To see the Top 25 college football teams for 2013 by recruiting rankings (which includes a link to the SI.com article), click here.

I'm having a little trouble figuring out what's going on without seeing your model or results, but there are some pretty serious potential sources of bias. For example, programs that recruit the highest rated players probably tend to have better coaches, better facilities, better training staffs, more intimidating home stadium environments, more institutional emphasis on winning football games, etc. Those things should produce more wins, which would lead to your model overstating the impact of getting good recruits. Getting more talented recruits helps, of course, but it's hard to estimate that kind of thing too precisely with all of the potential confounding variables.

Those factors, such as better coaches, better facilities, certainly affect the type of recruits a school gets. Think of the recruiting ratings as some kind of aggregate indicator of all these factors.

In the big picture, I wasn't trying to isolate the effect of getting better recruits. Should leave that to the Mathlete. I just wanted to ask how far one could get based on the number of points Rivals assigned each team each year.

Sure they can, but only to a certain degree. You can recruit an entire team of 5-stars but if they're not properly coached, all that talent goes to waste. Recruit rankings are such a small piece of the puzzle, and oftentimes, they don't even matter. Dantonio recruits a bunch of nobodies that bigger schools usually pass on and turns them into a formidable defense with NFL potential. Johnny Football was an unknown three star that always wanted to play at UT but they obviously passed on him. Now look at him. There's also facilities, overall support from the school and alumni, the football culture a school has, etc.

Attention campers. Lunch is cancelled due to lack of hustle. Deal with it.

Second, the method. I took a 4 year window of Rivals ratings and used regression to predict the next year's ratings by my algorithm. For example, the 2009 through 2012 recruiting ratings are fit to the team ratings for the 2012 season. The regression calculates a best fit to the team performance from 2005 through 2012.

It's difficult to say much without seeing your data, but have you also considered maybe doing a correlational study with the same data? If you wanted to limit it to, say, the top 25, you would probably first have to determine which schools had the highest overall ranking over that period, or you could even do a random sample of Division I.

You could take those ratings and try to relate them to win percentage over the same period. I did it really quickly with the Big Ten for last year and it turns out R=0.78, so there may be something there (that's a limited sample, so that might turn out to be the anomaly, which would also be interesting). The only stumbling block would be what others have mentioned - accounting for potential bias in the ratings themselves (actually, if the R-value is consistently high, it might provide evidence of the bias - it might depend on how you would approach that).

"Funny isn't it, how naughty dentists always make that one fatal mistake."

I stayed away from win percentage because it doesn't account for strength of schedule. A 7-6 Big Ten team is different from a 7-6 Big East team. The team ratings I used account for strength of schedule. You could probably use any decent ranking algorithm though.

I find this to be very interesting and quite impressive (assuming the math works, which I'm not checking). Predicting preseason team rankings given a single variable vs. experts ranking teams using all information at hand and the single variable providing a better result almost half the time - I find that impressive. He's not claiming that recruiting rankings are the only variable that should be used or that recruiting rankings are even the best variable for ranking teams, just that this one variable is almost as good as the experts. Sure, recruiting has a lot of factors baked into the ranking, but so does most any variable (ranking coaches takes into a school's financial resources and commitment to football, the number of returning starters includes coaching, etc.)

I'm not suggesting, and I don't believe thepowerrank is either, that this one variable is the only lense through which to view teams in 2013. It's another way to view teams headed into next year. (The offseason sucks!)

For the nitpickers, there's probably ways to improve the analysis: study this over multiple years, weight the recruiting classes so the freshman class has less impact, rate the players on the roster, rate only the two-deeep or starters, exclude players that have left the team and incoming transfers, etc.

The bottom line, that few would argue, is that recruiting better players is likely to result in a better team.