The 2017 QB Prospect Model: Draft Deshaun Watson at Your Own Peril

Last year I created a model to predict quarterback success at the NFL level for prospects. The model was useful in that it helped identify traits that transfer to the NFL level, but it was certainly overfit and didn’t bring in enough data. I’ve updated the model for 2017, and the improvements paint a much better story of each QB’s NFL prospects. The results will probably surprise you.
Like last year, I’m going to define success by a QB throwing for an AYA of 7.0 or higher for at least one season in his career, while also starting 8-plus games in that season.
To refresh your memories, this criteria is useful for the following reasons:

The data was easy to get.

It leaves out rushing statistics, so we’re focusing only on passing success.

AYA incorporates touchdowns and interceptions along with yards, which are all fantasy relevant stats.

The data set I used for ball velocity comes from Ben Allbright’s combine data on every thrower since 2008. I then grabbed other combine results and player stats to use when training and testing my model. In the few cases where combine data wasn’t available, I used pro-day numbers. This gave me a data set of 103 full records with no missing values.
There are plenty of quarterbacks where the data is incomplete that I am not using for this model’s purposes. Quite often, it’s because a QB didn’t throw at the combine, or at least didn’t have a ball velocity recorded, which is unfortunate because it’s predictive of NFL success. For example, I don’t have ball velocity data on Matt Ryan, so he was not used in building the model.
In the end, that’s okay, because there are six QBs in the 2017 class, including the big four, that have complete sets of data, so we’re still comparing apples to apples.

Building the Model

To build the model, I took the complete data set, which included the following data: height, weight, age, 3-cone, shuttle, hand size, throwing velocity, and a film grade, and added on final college year AYA, draft position,1 and a bin for the number of years out of college.
I’ll spare you the details, but the most statistically significant split for the out of college bin was between four and five years removed from college. In other words, players who have completed four years post-college fall in one bin, and players who are going into their first through fourth years out of college fall in the other bin. This is important because it’s certainly much easier to hit the criteria of one year of a 7.0 AYA if you’ve played several years, compared to having only one or two years of playing.
For the 2017 QB class, I imputed their draft position by using the RotoViz Scouting Index and comparing that to where past prospects were drafted. That gave me the following average picks:

QB

Avg.Dpos

DeShaun Watson

5.6

Mitch Trubisky

15.5

DeShone Kizer

20.4

Pat Mahomes

68.6

Davis Webb

167.9

Nathan Peterman

189.0

From there, I took all these data points, and ran them through a feature selection process — which is a fancy name for finding the best predictors for a model. Here were the results of that process:
You probably have no idea what you’re looking at. This is just 15 different models with different combinations of predictors, with the best model listed at the top.2 If the box is filled in, that means the variable was used in the model. So the best model overall had years out of college (ExpBin), hand size (Hand), throwing velocity (Vel), logarithm of draft position (Log.D.Pos), final college year AYA (Last.AYA), and the film grade (Film) as inputs.
I validated the top model candidates against a randomly withheld test data set, and sure enough, the same six variables from the feature selection process also gave the best out-of-sample results in the test set with multiple metrics for accuracy.3 I did this with multiple randomly withheld subsets, and almost every single time the same six variables gave the best out-of-sample results, ensuring this was the best model to use.
From there, I rebuilt the model without holding back any data, except the data we were trying to predict – the 2017 draft prospects, along with those prospects that hadn’t hit the required four years out of college, because it’s still possible in their remaining eligibility they hit the success criteria at least once.
It’s interesting to note that the 3-cone did not pop up as helpful, like it did last year. There’s probably two reasons for that. First, I’m using a more sophisticated model. Next, it’s also possibly baked into some of the other metrics, like velocity, draft position, and film grades.