This is a guest post from Ed Feng. Ed founded The Power Rank to bring more analytics and visualization to sports, football in particular. It all started when he got inspired to apply his Ph.D. research in physics to ranking sports teams. Now he hopes to get young people interested in math through sports.

How good is an NFL coach?

How do you compare established coaches like Bill Belichick with upstarts like Jim Harbaugh?

As a reader of Advanced NFL Stats, you know the answer is not Super Bowl rings. In the playoffs, a team plays at most 4 games, a small sample size.

The best coaches will not always win the Super Bowl. Most who saw the two Super Bowl wins by the New York Giants or last year's run by Baltimore would agree.

Instead, let's look at winning percentage in the regular season to evaluate coaches. However, sample size is still an issue.

Let me share a story about why.

How to make a stat head puke

In 2009, Josh McDaniels got off to a 6-0 start as the new head coach of the Denver Broncos. If you remember any game from this streak, it was their win over Cincinnati in the opener. Late in the 4th quarter, Brendan Stokely caught a tipped passed and rumbled into the end zone for a game winning touchdown. The play was immortalized as the Immacculate Deflection.

After his 6-0 start, ESPN's Tom Jackson proclaimed McDaniels "one of the great ones." I nearly soiled the carpet with vomit.

Denver would only win 2 more games the rest of the season. They missed the playoffs the last game of the season by losing to Kansas City, a team that finished 4-12.

Denver started the next season 3-9. Coach McDaniels was fired.

Never judge a coach after 6 games.

Coach Average

Let's look at why no one should judge a coach after 6 games.

Consider the fortunes of Coach Average. He coaches in a league with such parity that the outcome of each is like flipping a coin. Coach Average has a 50% chance to win each game.

Coin flipping isn't a bad model for NFL games in the salary cap era. Later, we'll look at which coaches do better than random coin flips.

But first, let's look at some simulations results. The visual show the first 50 games of Coach Average's career.

Just for the record, I was committed to only generating this sequence once. In no way did I ask Python to perform this calculation numerous times in hopes of finding a streak of 6 consecutive wins.

However, there are clearly streaks of wins and losses. In fact, Coach Average goes on a 10 game tear starting in his 19th game. It's easy to see order (or coaching skill) in randomness.

How the law of large numbers applies to football

Overall, Coach Average wins 31 of his first 50 games for a win percentage of 62%. Since 1991, only Tony Dungy, Bill Belichick and Bill Cowher have coached 200 or more games with that high a win rate. Tom Jackson would start Coach Average's petition for the Hall of Fame.

Since I didn't know how many games would fit in the visual, I simulated 200 coin flips for Coach Average. He won 106 of those 200 games for win percentage of 53%. As the number of flips increases, the win rate converges towards 50%, the true win rate in the model.

This is an example of the large of large numbers, the reason why no one should judge a coach after 6 games.

A look at great coaches

The coin flipping model provides a baseline to compare coaches.

In the visual, the black line represents one standard deviation away from the expected 50% win probability. For example, the standard deviation is 12.5% for one 16 game regular season.

Let's call a set of 16 coin flips a season. If we simulate a season a million times, 2 out of every 3 seasons would have a win percentage within 12.5% of the expected 50% for Coach Average.

The visual also shows the regular season win percentage for every coach since 1994, the year the NFL imposed a salary cap. I also included Bill Belichick and Bill Cowher, even though their tenures as head coach started a few years before 1994. (Cowher is the unlabeled data point above Andy Reid; Tom Coughlin is the other coach with over 200 games that I couldn't label.)

The visuals lets you see the greatness of coaches like Tony Dungy and Bill Belichick. No tests of statistical significance required.

3 other coaches stand out in this visual.

Andy Reid. Despite a 4-12 record in 2012, Reid did a remarkable job, winning more than 58% of his games in Philadelphia. Eagles fans never gave him enough respect because he never won a Super Bowl. The visual shows his win percentage is more than 2 standard deviations better than 50%.

Jim Harbaugh. The San Francisco coach is off to a fast start, winning 24 of 32 games after two seasons. But judging from the elite coaches with more than 200 games under their belt, Harbaugh will not continue to win an excess of 76% of his games.

Mike Smith. The Atlanta coach finally won his first playoff game over Seattle last season. However, his regular season record has been stellar, posting a win percentage of over 70% for 5 seasons. Keep that guy around.

In a league as competitive as the NFL, random coin flips are not a bad model for game outcomes. Coaches that outperform this model should be kept, and data visualization captures the big picture.

It would be interesting to see how well (or poorly) coaches/GMs do in terms of record improvement as a function of the % change in their starting roster, with a change in team obviously representing 100%

Could you further weight the wins by multiplying them by the opponent's end-of-season record, so that wins against good teams matter more? I'm thinking about things like Dungy's tenure in a division with the Jags, Texans, and Titans. Were any of those teams consistently above .500?

Building on Anonymous's comment, it would be interesting to see the performance of the great, long-time coaches (Belichick, FIsher, Dungy, etc.) over time. For example, a line representing each's cumulative win % throughout their careers.

That's a fun chart, but it seems like other personnel are a big factor. Consider for example that Bill Belichick and Tony Dungy are both excellent coaches that got average(ish) performance until they teamed up with Tom Brady and Payton Manning, respectively. Jeff Fisher's record - minus the Steve McNair years - is also pretty average, and Andy Reid's tenure is largely also Donovan McNabb's.

The other two points in that top right quadrant are probably Bill Cowher and Tom Coughlin, whose careers are a bit more interesting - Coughlin's good seasons in Jacksonville don't line up with Brunell's stint well, and Cowher had success with several different QBs.

As an illustration of this point my stats prof did a cool experiment. He asked us to go home and flip a coin 100 times and write down the results. We were 4th year math majors, so you can imagine we thought we were above this child's play. He quipped under him breath that we could just quickly make them up if we wanted.

Turns out the point was he could tell by looking who made them up and who actually flipped. The forgerers never put in long enough streaks. The chances of doing 100 flips *without* a streak of 7 is just ridiculously small; of course I forget how to actually do that calculation!

> The chances of doing 100 flips *without* a streak of 7 is > just ridiculously small; of course I forget how to > actually do that calculation!

There are 94 consecutive groups of 7 flips, and each of those has a 1 in 64 chance to be a series of 7 (either all heads or all tails. So the chance of getting no streaks is (63/64)^94=.2275. A bit better than 1 in five. 1 in 20 to get no streaks of 6, and 2 in 1000 for no streaks of 5.

It would be interesting to see the graph updated with the bound for two standard deviations as well. If I remember my statistics right Coach Average has about a 30% chance to fall outside of one standard deviation, but only about a 5% chance to fall outside of two standard deviations. This would be a far better indicator for a coach performing above or below an average level.Of course it might turn out that all coaches, even the consensus great ones, fall within two standard deviations.

Interesting that we see so many coaches grouped around 2-3 SDs below the mean in 50 games or less. Is this the opposite of survivor bias? Coaches unlucky enough to be a couple of standard deviations below the mean after 3 or 4 seasons get fired and never coach again, so they never get a chance to regress back to the mean. And coaches that are 2-3 SDs above the mean after 50 or so games keep their jobs, so they show up further to the right on the chart (and most regress back to the mean).

I'm going to play devil's advocate and point out the obvious flaw in the data set. NFL coaches are a prime example of survival bias because losing coaches usually get fired at the end of the season. You can't use the coin flip model to evaluate a coach's entire career because the events aren't independent. Long running head coaches in the NFL are going to have better winning records even if there is no skill and entire game is random because like Andy Reid those that are unlucky in a season will often get fired.

Yes, you can certainly weight wins for strength of opposition. In fact, you might as go all the way and rank teams like Brian or I do. Then looking at average rating is a better way to evaluate how a coach did in a particular season.

Thomas:

Great story, and certainly a very clever test by your stats prof.

LamKram:

I think you're right on about how the plot looks. If you have really bad luck your first 4 seasons, you don't get a chance to regress towards the mean.

An interesting graph, but I'm not convinced of its utility for evaluating coaches. Is Bill Belichik so far above the mean because he's a great coach? Or is it because he coached Tom Brady for so many season? Is Romeo Crennel so far below the mean because he's a terrible coach, or because he had poor talent?

you can't dismiss good coaches just because they had good players in their tenure as head coach. It is their job to not only make game time decisions but also to select personnel and players to give his team the best chance of wining. That is the very reason all of the great coaches have good QBs for example.

I'm not dismissing any great coaches. Dismissing would be if I said something like "Belichik isn't a great coach, he just had Tom Brady". What I am doing though is asking the question, how much of a head coaches W/L record is due to his performance, and how much is due to factors outside of his control? Even a successful head coach only has a career of about 20 years, a couple fortuitous windfalls (like a great QB) could have a big impact on a time scale like that.

I think you are giving coaches way too much credit for their role in roster selection. For starters most teams have GM's who ultimately make the final decision. Second there is a large amount of luck involved in the player acquisition process. To continue with Tom Brady, he's a famous example as every team in the NFL passed on him multiple times, including the Patriots.

Many of the "bad" coaches on this graph though have it even worse, as during their short tenures they had to deal with talent that was largely in place before they arrived. Marty Mornhinweg is a pretty good example of this, his successors didn't have much more luck with Joey Harrington and company than he did.

Adding more to my reply (the second), I believe that the best coaches wouldn't wouldn't lose OR GAIN wins when their rosters change because a bad coach would start winning a lot more if he suddenly had good players, whereas a good coach would have been winning anyhow.

Good Article Ed. FYI though, the Chiefs record in 2009 was 4-12. It was 2008 when they were 2-14. And to Anon, at least in Kansas City Romeo Crennel had plenty of talent to work with(As evidenced by the 6 Pro-Bowlers and one alternate on the Chiefs last year.) He was just an absolutely terrible game manager and disciplinarian.

P.S. Why are people so quick to dismiss Tony Dungy? Peyton Manning, while IMO the best QB of all time, didn't go out and play by himself. He was a major factor, but it's silly to say he caused a coach to be 4 standard deviations better than 50-50. Dungy was a great coach.

If I coach on a team with a great front office that supplies me with great players I will look better than I really am. Same for a coach on a team with a weak front office, that coach has less to work with.

Anon, if you had read the article you'd know who the dots are by Reid and Fisher: "(Cowher is the unlabeled data point above Andy Reid; Tom Coughlin is the other coach with over 200 games that I couldn't label.)"

If you can assume that some coaches are more talented than others, then wouldn't we expect a cluster of lower left quadrant stats? That average entrant to the coaching ranks should less talented than the average coach already in the ranks.

Yes, dumb luck should put other coaches there as well, but anti-survivor bias shouldn't be the only thing that swells that quadrant.

I think a record can be very related to a team's talent, more than the coach of the team itself. Personally, I don't like Bill Belichick, even as a Patriots fan. He has only gotten lucky with team talent (Tom Brady). Agree or not, I think this data, although very interesting, is somewhat misleading.

I completely refute the idea that a team's talent is a major factor in a winning record. All NFL teams have comparable levels of raw talent. If they don't, that is the fault of the organization. Parity in talent is the natural outcome of selecting a handful of players each year from a pool of thousands of talented candidates coupled with the 'parity' rules of drafting and salary caps. Evidence abounds. Witness the 2008 Patriots who lost Brady for the entire season in the opening game against K.C. They finished 11-5 with quarterback Matt Cassel who hadn't played a football game since high school. Witness this season in which they lost their top 3 receivers, and then their top 3 defensive players, and they are still stitting at 7-2. Counter examples also abound, in which the raw talent of a team is irrefutable yet they don't win in proportion. I would argue that the success of an NFL team is foremost based on organizational effectiveness, in which the head coach is - while not the only piece - the most important piece.

@BBurkeESPN

ANS COMMUNITY

Interested in publishing your own football research, analysis, or stat-based commentary? Advanced NFL Stats Community is the site to share your thoughts and ideas. There's plenty of data available to get started. All submissions will be accepted and published. Check it out!

Support Military Families

If you enjoy Advanced Football Analytics, please consider a small donation to The Fisher House, a place where families of injured servicemen can stay while visiting their hospitalized heroes.