Monday, October 31, 2011

A rudimentary NFL season simulation

Following a post by Tango a couple of weeks ago on the playoff systems of the various sports, I thought I'd try writing a simulation. This is an update of that work in progress. Actually, I've only started on the NFL, and I haven't even done playoffs yet, just the regular season. But I thought I'd at least share what I've got so far.

In the simulation, each of the 32 teams was assigned a "true talent," from a normal distribution with mean .500 and standard deviation .143. No team was allowed to have talent higher than .900 or lower than .100; if they did, they were moved to .900 or .100. Then, all 32 teams were moved the same amount (arithmetically) in the same direction to get the overall talent to average exactly .500. (I think this method actually reduces the expected SD below .143, but I didn't bother fixing that.)

The 16-game schedule is random, instead of unbalanced (with the restriction that a team can't face any another team more than twice). There are no tie games. There is no home field advantage (although that would be easy to add in). The chance of winning each game is determined by the log5 method. There are no ties in games. Ties in the standings (division or wild card) are broken randomly.

As I said, I stopped there for now; haven't done playoffs yet. That's the next step, along with home field advantage.

Anyway, here are some results. Each result is out of 100,000 seasons. Every result came from a different run of the simulation. Results varied a fair bit per run, but I think everything is reasonably typical.

--------

I checked for all teams out of 3,200,000 (32 teams, 100,000 seasons) that finished more than 8 games above or below their talent. That's hard to do, obviously. Also, the worse or better you are, the harder it is. It's (relatively) easier for an 8-8 team to go 16-0 than for a 3-13 team to go 11-5. Amplifying that is the fact that there are a lot more 8-8 teams than 3-13 teams. However, offsetting that, a little bit, is the fact that the 3-13 team can also go 12-4 or 13-3 or better.

In any case ... there were 43 cases where a team differed from its talent by 8 games or more. Of those, 26 were teams that outperformed, and 17 were teams that underperformed.

The biggest differential was in season 98,534, where the Broncos a team that had talent of 4.56 wins (out of 16), but went 14-2, for a differential of 9.44 games. That was the only team with a differential of 9 or more. Part of the reason it did so well was that it faced inferior opponents. You'd expect any given team's opponents to average 8.00 games of talent. But in that season, the Broncos' opponents' talent was only 7.45 games. Not a huge difference, but still.

Actually, when it comes to extreme events, a small difference in opponents makes a big difference in probability. Of the 43 teams in the sample, 38 of them had records that went in the direction "aided" by the opposition (in the sense that the underperforming teams played better-than-expected opponents, and vice versa). That's 38-5 in favor.

The worst team in the sample was the season 63,924 Jets, a 3.03 team that went 12-4 (playing 7.11-win opponents). The best team in the sample was a Bucs team that was expected to win 12.04 games, but instead went 4-12 (playing 8.58-win opponents).

--------

I also took a look at teams that went 0-16.

Those results probably aren't as realistic, because they're heavily dependent on the shape of the tail of the talent distribution ... and we really don't know what that is. Recall that we chose a normal distribution that gets truncated at .100 (1.6 wins). Both those choices -- normal, and truncated -- are arbitrary and probably not close enough to real life. (Also, teams could drop below .100 in talent from the adjustment that sets all league-years to .500.)

In addition, the other shortcuts in the simulation probably skew the results too. The mainstream results are probably right, but the extremes are extremely sensitive to some of the assumptions.

With those caveats: there were 5,663 of those 0-16 teams out of 3.2 million, and their average talent was .181, which is just under 3-13. I suspect the talent of actual flesh-and-blood 0-16 teams is higher than that, but I really don't know.

16-0 should be exactly symmetrical, so I won't show that separately.

--------

I checked for four-way ties where every team has the same record. That happened 878 times out of 800,000, or about once every century.

There were five seasons out of 100,000 where two divisions had a four-way tie. Actually, that might be a little high ... the test runs had only 1 or 2 such seasons.

---------

Anyway, before I start on the playoffs, and repeating this for other sports leagues, I'm looking for feedback on what I've got so far. Any suggestions?

And, if you want me to run the sim and check for something in particular, let me know in the comments. It's real easy to add a couple of lines of code to check for something specific.

16 Comments:

Interesting. But do you think you could outline what it is you are trying to prove or disprove in this exercise? It is not really laid out here or in Tango's article. It is difficult to give feedback without knowing the purpose of such a simulation.Thanks.vr, Xei

Eventually, I think, the idea is to compare playoff odds among sports, so we can see if the chance of a (say) .550 team winning a championship is similar.

But, that's part of my question: what else can this do that's useful? One obvious thing is, if it turns out there's a four-way tie in an NFL division this year, we can simulate the odds of that happening, within a reasonable margin.

My first thought is that you are going to need to have the teams play a "real" NFL type of schedule, not one where they play against a random team. Better teams "usually" have tougher schedules out of their division.

And when you say you are assigning team true talent levels based on a normal distribution, how does this work?

I think you need to implement HFA if you want to make this realistic. Find out what the average HFA is in an NFL football game and add that to the home teams true talent level and subtract that from the away team. This and the other suggestion would give you more realistic results on the distribution of final season records.

To assign a talent level, I just find a random point on the normal curve, and figure out what it is, assuming mean .500 and SD .143.

For HFA, I'll assign half to each team. The problem is, HFA is largest for a .500 team, and smaller for extreme teams. Gotta figure out what to do about that. For now, I'll just assign them all equally.

Do you have a link that explains how better teams usually have a harder schedule? Maybe I can try simulating that.

Phil, in the NFL scheduling is followed by a template. And since the schedule is unbalanced outside of the division you might see something like NFC East First Place plays against NFC West First Place and NFC East 4th Place plays against NFC West 4th place. Teams that finish low in the standings in their division face more teams that finished low in the standings in the other divisions, with the added caveats that some of the good teams could get worse the next season and some of the bad teams better, but for the most part it equates to the weaker teams having easier schedules than the tougher teams.

For the HFA, use your Log5 on a .700 team vs a .500 team, whatever the answer is (ie - .610) then just add/subtract the HFA factor from the .610 to come up with the new win expectancy. If the HFA is 0.050 and the .700 is the home team then their win expectancy at home becomes .660 vs the .500 team.

The NFL schedule goes like this. There are 32 teams. 2 conferences and each conference has 4 divisions. Each division has 4 teams.

Teams play other teams in their division twice a year, one at home, one on the road. That makes up 6 games.

Teams play every team in a division in the other conference. This rotates every four years. This makes up 4 games.

Teams play every team in a division in their own conference. This rotates every three years. This makes up 4 games.

Teams play the "equivalent teams" in the other two divisions in their own conference. "Equivalent teams" are defined as the same order of finish in the previous season. This makes up 2 games.

Example: The Patriots won the AFC East last year. They play six games against teams in their own division (Dolphins, Jets and Bills). This year, the AFC East plays the NFC East and the AFC West. And lastly, the Patriots play the 2010 winners of the AFC North and AFC South - the Steelers and the Colts.

Effectively, the only difference in the schedule these days is the two games between same-standing teams between same-conference divisions and their own talent level. The Patriots get a weaker schedule than the Dolphins because the Patriots get to play the Dolphins, and the Dolphins can't play themselves.

I think if you wanted to do a simulation that reflects reality, you will have to do multiple-season simulations so it accounts for the change in scheduling. I'm guessing this is not a big issue in other major sports because they play so many more games. But in a 16-game season, scheduling can make a big difference. The Patriots and the Colts were two of the best teams in the AFC over the last few years and they got to play each other almost every year. Without that scheduling difference, both would have had easier schedules.

Apologies if this has already been suggested. As mentioned above the NFL season follows the exact same template each year. You can take this seasons (or any recent season's schedule) and use it for every year in your sim.

If you assign team strengths randomly in each simulated season, it doesn't matter if you use 2010's schedule or 2008's or 20XX's. (Except for one exception: the 2 'strength of schedule' games each team has will correlate with the their team strength because their is a correlation between last year's team strength and this year's. In other words, better teams will likely have better opponents for 2 games out of 16.)

You can rename the teams TEAM1 TEAM2 instead of ARI, ATL, etc. if it helps conceptualize.

Brian: right, that's exactly what I was going to do. Actually, I had used team numbers instead of names but found it was more interesting for me if I used real team names. So I added them to my program.

The two games against correlated teams probably isn't a huge deal because it's by actual wins instead of talent ... And, moreover, it's StanDings position and not even actual wins. But it's still significant enough that I'll try to deal with it.

Couldn't you use your simulations to estimate how much better than chance experts should perform in picking NFL (or baseball) playoff winners?

A 27 Sep post is concerned with a Freakonomics claim that experts are not picking winners very well (see below). Your simulations might be used to show that they are doing as well as can be expected since they are basing their predictions on team strength. The random component is the only factor that keeps them from doing better.

All you would have to do would be to count the percentage of times that the team ratings in your simulations with the highest values were eligible for the playoffs.

---quoted from 27 Sep post-----According to this Freakonomics post, the experts who make NFL predictions aren't very good at it. Freakonomist Hayes Davenport checked the past three years' worth of predictions from USA Today, Sports Illustrated, and ESPN. He found that the prognosticators correctly picked only 36% of the NFL division winners.

"The biggest differential was in season 98,534, where the Broncos a team that had talent of 4.56 wins (out of 16), but went 14-2, for a differential of 9.44 games. That was the only team with a differential of 9 or more."