Coach evaluation is a difficult task to approach analytically. I’m currently working on a paper on NBA coaching with other HSAC members, and it’s been very difficult work. Coaches can’t be held entirely responsible for the action on the field, but they exert that limited influence through many different channels. It’s a situation where you have to find a quantity of indeterminate size in an indeterminate number of places. Not fun.

But that doesn’t mean we should forget it altogether. Over the past week and a half, I’ve been developing a simple, intuitive coach evaluation system for the NCAA Football Bowl Subdivision (formerly Division I-A). It’s called RAP, which stands for Resource-Adjusted Performance. Learn about the methodology and the results, after the jump.

The idea behind RAP is that college coaches have a lot more control over their team than their professional counterparts. Not only do they make in-game decisions and playing time calls, but they’re also responsible for the composition of the team through recruiting, and their role in developing athletes is more important than at a professional level. In turn, they receive a lot more credit or blame for the team’s performance.

However, coaches can’t control the resources that are given to them. Southern Mississippi’s football expenses are less than 1/10th of Ohio State’s, and even if they spent the same, Southern Miss still wouldn’t have the same prestige to make recruiting star prospects a lot easier. With that in mind, RAP is essentially the following formula:

(Team Performance Points) + (Team Resource Points)

RAP isn’t a statistic so much as a framework: there are a lot ways to represent team performance and team resource, so it’d be dumb if I didn’t let you choose your own. However, for the sake of comparison, there are a few necessary components:

The points have to be based on ranking: The worst-performing team gets 1 point, and so on until the best team gets n points, where n is the total number of teams. Similarly, the team with the most resources available gets 1 points, and so on until the team with the least gets n points.

Team resource points have to be scaled down. Once resource point total is calculated, it has to be multiplied by the coefficient of determination, R2 (which will be between 0-1), between performance rank and resource rank.

Finally, the sum of the two components has to be scaled so that the average RAP is 1.

The cool thing about RAP is that coaches can be compared across any level of competition or even any sport. There’s no reason you couldn’t separately calculate RAP for FCS (Division I-AA) or Division I Basketball, and the scale would be the same. If resources are more of a factor in some sports than in others, the adjustment for determination would account for that.

The way in which I computed RAP for the FBS was by using the Fremeau Efficiency Index for team performance points, and the 2008 fiscal year expenses of each program for team resource points. If you haven’t heard of FEI, check it out: unlike other computer NCAA rankings, FEI analyzes every single possession, and not only accounts for strength of schedule, but weights those possessions for importance. I’m convinced it’s better than any of the computer rankings that go into the BCS. 2008 expenses come from this fantastic site from the Department of Education, which includes data for all college sports programs except those of the Armed Forces, which means that I had to exclude Army, Navy, and Air Force. It unfortunately does not have data up past 2008 yet.

The R2 between FEI Rank and 2008 Expense rank was .45. So my specific formula was:

The top 10 coaches all had teams that performed well this year, but to varying degrees: Butch Jones and Houston Nutt coached teams that didn’t even finish in the top 25. And the budgets obviously varied quite a bit as well, although no one in the top 10 in budget appeared in the top 10 in RAP (Mack Brown and the Longhorns were the first of those to appear, at 16th).

RAP is not intended to be perfect, and it certainly is not. There is likely a lot of noise that I’m attributing to the coach’s skill (the previous coach’s prowess, luck, freak injuries, etc.), which is unfair. The calculations I did were based only on 2009 performance, and would likely be more accurate with a larger sample. Additionally, some coaches can affect the resources at their disposal through success and longevity (Joe Paterno and Bobby Bowden being the best examples.) The two statistics I chose to compile it in this case are also problematic: FEI is a proprietary stat, and it’d be nice to have an open-source number for a lot of reasons. And the expense figures are already a bit out of date. The list still goes on.

I want RAP to add to the conversation, not to replace it. By having a number that represents coaching skill, we can do some studies and calculations a little bit better and easier. The first one will be on Monday. Stay tuned.

Joe Tiller at Wyoming? Is that the son of the “original” Joe Tiller that left Wyoming for Purdue? Because *that* Joe Tiller retired from Purdue last year, and I’m pretty sure he didn’t go back to Wyoming.

Absolutely right. I compiled the coach affiliations from a single, older source and then corrected one by one as I went through. He’s way off, though. Must have just missed that one. I’ll fix it on the chart, thanks for picking it up.

I’m flattered by the use of FEI as your performance baseline. I certainly like the idea of this, conceptually. As you mentioned, certain aspects of team performances shouldn’t all be attributed to the current coach, but as a broad brush, it works well.

I’m not as keen on using the ordinal ranking of a team for performance instead of its rating, though, as it appears you do. I’d also be curious about using a team’s performance vs. its own historical program expectations as a baseline. Program FEI is one such tool we use to smooth out year-to-year fluctuations.

How valid is it to have a “linear” difference (i.e. 1 point) between each team in performance and resources? Would it make any additional sense to “scale” the points credited to each based on another metric? I feel like there’s likely to be “clumps” of both performance and resources (say, 25 programs have budgets within 1 million dollars of each other, or, oppositely, 2 programs ordered consecutively in resources differ by 20 million) that would pose a large difference between a linear allocation of resource points and a scaled allocation.

I’ve exchanged some notes with Brian via the comments section over at FO, and I’m pleased that he seems very open-minded toward trying to make a comprehensive attempt to rank teams even better.

I’m concerned that a system that truly reflects reality hasn’t been invented yet. Strength of schedules vary widely. Attempts I’ve seen to reflect that don’t seem to touch on the multi-game challenge of facing several top teams in a row…or a schedule heavy in “hammer” type teams rather than “nails.” Boise State faced 1 team in the Sagarin top 50 this year for example. TCU was at 3. Anyone in the SEC was well over twice that, some three times that.

It’s a different type of challenge…dealing with the physicality, attrition, fatigue that the power conferences face. Trade Arkansas for Boise State, and Arkansas crushes the WAC while Boise deals with attrition and struggles to contend in the SEC. Give Boise a six-game run of Georgia, Alabama, Texas A&M, Auburn, Florida, Ole Miss without a week off. Are we talking about a 12-0 team then? Will the FEI rating that reflects the new record (and perhaps additional poor results later because of injuries) showcase Boise in a way that will celebrate Petersen’s use of resources?

FEI only had one Pac 10 team the top 25 last year in the final ratings. That league would go on to sweep the bowls 5-0. The Pac 10 is 10-2 the last three years in bowl games vs. only BCS conference opponents. It’s unlikely only one Pac 10 team was truly top 25 last year.

FEI only had 2 SEC teams in the top 22 last year at the end. The SEC has been dominant in heads-up competition vs. other conferences the last year years…and is 14-6 vs. BCS leagues in the postseason. Only 2 in the top 22?

FEI had 6 ACC teams in the top 18 of its final rankings. The ACC would go 2-6 against BCS conference opponents in bowls last year, and is 6-15 against them the last three years.

I agree that it’s interesting to evaluate coaches in light of their resources. I think the stathead nation isn’t really in the neighborhood yet of measuring how Division I teams really rank in terms of quality on a fair scale…other than just variations of “ranking the records” the polls and most simplified computer programs are already doing. Chris’s game attempts to rank everyone still aren’t capturing what happens when when the top conferences face each other, or when players from the top conferences advance to the next level.

The SEC is owning postseason play (3 national championships the last 3 years and a shot at a fourth straight). They’re sending more guys to the NFL (highest per conference average of anyone on current pro rosters). Make THAT the threshold for any team under consideration. Where would they finish in the SEC, playing an SEC schedule? Would TCU, Cincinnati, and Boise remain unscathed? Who would be more of an 8-4 type team? 7-5? Where would Auburn or Georgia finish in the WAC, Mountain West, or Big East if they played conference schedules in those weaker leagues?

I think FEI and RAP will pop some great insights once that reality is better captured.

For now, I’m concerned that RAP will more reflect the “illusions coaches outside the real power conferences create in their best seasons” rather than sustained quality adjusted for resources. What did Petersen’s predecessors at Boise State do at Arizona State and Colorado when they were given more resources? They surely would have ranked great in RAP at Boise, and lousy at ASU and Colorado. Same guys.