News & Analysis

Since analytics entered the mainstream there have been two competing groups when it comes to player evaluation: the stats-based, box-score analysts vs the film-study analysts. The best blend both worlds to build off of each other.

Despite PFF being first and foremost a data company that helped facilitate the growing demand for more worthwhile stats beyond the box score, we still have many of the core tenets of the film-study analysts. All of our stats and data are derived from a team of film-study analysts charting and grading every player on every play of every game. Still, from the outside, people look for a disconnect when the stats don’t match up with what people’s eyes are seeing on the field.

And there is some truth to this skepticism — outside of tracking data, there can be some misses when relying solely on a stats-based approach; subtle nuances that can only be picked up through the keen eye of a veteran scout grinding film. This information when used to its fullest potential is documented in written form to communicate findings to others in a similar way to how box score stats can paint a picture of a prospect's potential. Things like “prototypical size,” “body control,” “effortless motion,” “understands timing and touch” are all verb-adjective descriptors that paint a picture based on film for how a prospect could perform at the next level. All publicly available stats currently cannot account for these subtle indicators that are observed on film.

In PFF’s evaluation of NFL prospects, we like to leave no stone unturned. That is why we are always searching to blend all schools of thought to most accurately predict how well a prospect performs at the next level. Thanks to math and feature engineering, we can use some of these language descriptors to more accurately fit prospects from various years into closer groupings or clusters. We can then tie in advanced descriptive stats that we have built previously to gauge how well a prospect that fits within a certain mold performed in the NFL.

What we have done is taken prospect write-ups from one of the best film analysts out there, Dane Brugler of The Athletic, over the past six seasons (including 2020) and used latent semantic analysis (LSA) to derive similarity scores between the text in prospects’ scouting reports. After building our dataset to span six seasons, we can create a prospect's score in a number of ways — we decided to use a weighted average of similar players’ WAR, using the similarity score derived above as the weights.

Using the analyses above, we can look at 2020 prospects in a couple of ways. First, we can look at player comps of interest for some notable prospects. Second, we can rank the players in each position group by the score derived above. These scores have correlated well with draft position and future WAR generated at the NFL level, although a more robust analysis using more seasons and sources of data is beyond the scope of this article.

In this installment of this analysis, we look at the quarterback position: