Gold Mining Explained

We wanted to provide a more useful way of understanding how players in the NFL are projected to perform week in and week out in Fantasy Football. Wisdom of the crowd is an underlying principle leveraged by many within the statistics community to bridge the gap between both intuitive pundits and statistical forecasts. There is value in examining both perspectives. A few things can be discovered such as biases, uncertainty, and ultimately opportunity when examining expert projections.

R Script

Gold Mining: What are we doing?

At it’s core, we’re aggregating “experts” actual projections of players’ weekly performance. “Experts” range from well known pundits to companies’ statistical forecasting algorithms. In the end, the difference between the two is negligible as the information we seek comes from aggregating these sources.

What’s different about this?

Many sites and people that aggregate fantasy football statistics often aggregate rankings. While aggregating rankings can be helpful, the reader loses granularity of the difference between rankings.

Simple Example:

Calvin Johnson is unanimously ranked #1

AJ Green is unanimously ranked #2

Jordy Nelson is unanimously ranked #3

If we simply left our analysis at this point, we would clearly understand the pecking order established: Johnson > Green > Nelson. However, imagine the alternative:

Calvin Johnson is projected 25 points

AJ Green is projected 16 points

Jordy Nelson is projected 15 points

Knowing that Nelson is the #3 best WR is hardly relevant when you consider he’s only 1 point behind Green. As a manager, understanding these differences is vital to get a sense of upside and ultimately value to your team.

One step deeper

We’re not satisfied with just knowing the aggregated projection of a single point estimate. The human mind tends to prefer to think about the world as specific numbers, instead of confidence intervals. It’s also important to think of these projections in terms of the variation (uncertainty) each aggregation possesses.

Consider our previous example now with confidence intervals.

Simple Example 2.0:

Calvin Johnson is projected between 20-30 points

AJ Green is projected between 14-18 points

Jordy Nelson is projected between 5-30 points

Had we left our analysis at aggregation of projections, we may not have discovered Jordy Nelson’s extremely high ceiling. For a manager projected to lose but in desperate need of a win, starting Nelson over Green would be recommended because Nelson has a higher ceiling than Green. Something about his team, matchup, etc. is giving him a high variation which could result in big points in a big game. It is also possible, however, that Nelson could bust because he also has a low floor. Thus, you should pick the right players to start based on your team’s needs.

How can you pick the right players to start? See the graph below:

What’s our graph showing?

We’ve discussed the value of understanding point projections over rank projections, but getting a sense for ranks can also give us insight into players that may be statistically incorrectly ranked. You’ll notice that some players have higher point projections than their peers within their tier. We show different tiers in the graph using different colors. Players with the same color are considered to be in the same tier. Tiers were calculated statistically with a cluster analysis in the mclust package in R. The number inside the tier is the player’s robust average across sources of projected points (using the Hodges-Lehmann estimator also known as the pseudo-median—calculated from the wilcox.test function in R). This value can be considered the “most likely” number of points the player is projected to score. The line reflects the 10th (floor) and 90th (ceiling) percentiles of a player’s projected points across analysts.

What can I do with all of this information?

At the end of the day, the best way to use this information depends on your situation (e.g., picking players with high ceilings when you are projected to lose and picking players with high floors when projected to win). Typically it works well to identify waiver wire pickups that have a high variance (tons of upside) or perhaps you can identify players that are sure bets week in and week out. The key is to look for players that tend to “break the mold” of the data. These players tend to have the most interesting stories.

Aggressive managers tend to try to find players that have high variation with low averages (which means other managers/experts might undervalue them).

Conservative managers tend to find solid week in and week out contributors.

I don’t get it

Ask your questions below in the comment section, we’re happy to help explain more specific questions as they arise! Good luck everyone!

Great post. I totally understand the point about point projections being more valuable than rank projections, but what if I just need to have a rank order for the week and didn’t care about the point differential between players? What methodology would you suggest using to aggregate rankings across multiple experts to determine one common ranking? Thanks.

I would just calculate an average of the projections for each player across sources. Ignoring risk level, weekly rankings would simply be based on the number of projected points for each player (#1 = player projected to score the most points, #2 = player projected to score the 2nd most points, etc.). You could adjust this if you value higher (or lower) risk players. For calculating rankings from season-long projections (as opposed to weekly projections), you would want to use the Value-Over-Replacement (VOR) for comparing across positions. For more info on the benefits of using projections instead of rankings, see here:http://fantasyfootballanalytics.net/2014/08/use-projections-not-rankings.html

We hope to eventually provide gold mining for all positions, but our current focus is on developing draft tools such as the Auction tool so they are ready for draft season. We will focus more on gold-mining when the season is in progress.

So I was reading over the “gold mining”…. Actually I have enjoyed reading all of your work and give thanks for the insight you have provided. I am a rookie using R but learning every day. Getting back to gold mining, you mention how this is calculated and list the cluster pkg, an estimator and test function. If I wanted to create these same tiers can I capture script already written? I am a bit in the dark how I can make this happen. Also, are the graphics provided or is there script for that too? Wondering if you might point me in the right direction. Again, thank you for all you do!

Hi, I have been fooling with your scripts for most of the week. I was finally able to get some charts when I realized I’m pulling data from Week 11 of last season. Any idea how i can get more recent weekly projections?

I am new to R as well and like Don I used your sample data to replicate your results from week 11 from last year. When you say use the link to the current weekly projections, do you mean I need to select the download button, choose raw data and save? Would this replace the ffa data using the script example? Sorry new to this 🙂

One way to determine whether players are on track is to use the weekly projected points as a way to track the average performance.

For instance, Jamaal Charles is projected to have 234.7 points this season. He made 15 points the 1st week. If points are calculated with the regular season in mind (16 or 17 if you count the bye) that would be equivalent to 240 points. Then basically take an EMA of both values (238 points) and use that to see if players are on track or not.

The other thing is determining which QB’s to play. Basically take the stats of the QB and Defense and determine which QB would be the best fit to play. It’s more involved than basically, but it’d be interesting to see if matching defense to QB’s could account for some predictably in determining whether a QB will make his weekly projected points.

I have looked through various pages of you site and picked up what I could but was wondering about researching sports that dont have the detailed stats available.

As a hypothetical example pretend there is something called ‘Super Volleyball’ and in it has a virtual league going…but there are nt player rankings such as those found for NFL and NBA etc, rather there are just the general stats relevant to the game.

At this point what woudl you suggest in terms of player selection methods ? Start to tray and create regression models and see which variables are important ?

The stats and rankings etc avialable for NBA, NFL and the liek seem to be so far ahead of those for other sports.

Just found the site this morning, so if I am asking questions that have been answered ad nauseam, please forgive me.

I have two questions/comments:

1.) Is there a way to get the raw data on the distribution of points? It seems like this would be an application that would benefit from running monte carlo simulations to compare and optimize your lineup vs your opponents lineup on a weekly basis.

2.) It seems like the data is strictly based on projections from experts. Is there any correlation done to look at players on the same team? Seems like there should be some positive correlation in historical points between QB/WR and negative correlation between WR/RB (i.e. not enough yards and touchdowns to go around or focusing on running game vs. passing game etc.). Do you start QB and WR on same team to get big upside on good game, do you avoid starting WR and RB from the same team (e.g Amari Cooper and Latavius Murray week 3?) to avoid cannibalization of points between players?

Do you have any insight into estimating the confidence interval for the actual outcome (as opposed to a confidence interval for the mean of the projections)? I would presume the former is wider than the latter, so the question would be: how much wider?

For the ceiling and floor of seasonal projections, we use the 10th and 90th percentiles of projections, which results in a much wider interval. I’d like to do this for gold mining, as well. Another approach would be, for each player, to consider their historical variability in forecasting their interval.

Wow! I love your work! I am a bit of a modeler myself and looking to build some of my own tools. Can I ask though what algorithm you use (or the process by which) you come up with your lineup optimizers for points on the app page? Thanks!

A second Question. Can you give me a bit more or point me out to where ti is explained that your raw data comes from? Is it just the averages projected from the multiple websites (with the weights applied) for each stat (ie. pass yds, pass tds, rush yds, rush tds, etc.) per player? The are the projections just the summation of the points per stat times the stat? Thanks!

Hello, I have a normal 14 team league and have already drafted using this websites projections with custom scoring settings of our league. My question is, do you have a tool I can use (lineup optimizer) that selects the best combination of players I drafted that I should be starting week 1 moving forward? I looked at the lineup optimizer and that seems like its for website like fanduel and auction leagues.