Using R to Forecast the NFL Draft: What Combine Results Tell Us About Future Success | MicroStrategy
Data Visualization

Using R to Forecast the NFL Draft: What Combine Results Tell Us About Future Success

The NFL Scouting Combine is an annual event that brings together hundreds of college football stars and NFL hopefuls. At the Combine, players showcase their athletic abilities across a variety of physical events and receive a huge amount of media coverage, with every 40 time and cone drill relentlessly dissected (even hand size is agonized over: see Culpepper, Daunte).

It’s also hugely popular among football fans because it provides a single set of familiar benchmarks for them to compare the new class of college players with current stars and historic greats (who’s faster, Saquon Barkley or Ezekiel Elliott? Don’t know? Check out their “simulcam” 40 times).

Nonetheless, many analysts are critical of Combine events—often derided as the “Underwear Olympics” due to the athletes’ ensembles—as predictors of future NFL success, noting that players are unlikely to run 40 yards in a straight line (the 40-yard dash), repeatedly lift 225 lbs. while lying flat on their back (the bench press), or stand completely still before jumping as high as they can (the vertical leap) in a game. So that begs the question—just how well does Combine performance predict NFL success?

To test the usefulness of using only Combine results for forecasting a player’s future performance, we’ve created two models to predict two different measures of players’ football “quality”: Approximate Value (AV score) and draft quintile. To keep the analysis simple and to ensure that Combine results still reflect an athlete’s current athleticism, we use only data from 2017 rookies in the AV analysis.

AV score is an attempt to assign a numeric value to a player’s performance for a given season, which allows for comparisons across positions (for more information on the methodology behind’s AV, check out this blog post). For our purposes, we’ve placed AV scores into 5 groups—players with no AV score are assigned a 0, players with an AV score of either 0 or 1 are assigned a 1, and players with AV scores higher than a 9 are given a 5, with roughly equal groups between these values.

Draft quintile divides drafted players (there were 253 of them in 2017) into 5 roughly equal-sized bins—meaning picks one through 50 are categorized as draft quintile 1, and picks 201 through 253 are draft quintile 5. We labeled players who participated in the Combine but were not drafted with a zero.

To make sure the models run smoothly, we needed to clean the data to ensure that all players have results for each Combine evaluation. As players do not always participate in all events because of injury, we created multivariate linear models to predict results for missing data using their results in other events. By leveraging this complete dataset, we can build out our first model—a k-nearest neighbors model to predict the AV range for each player.

K-nearest neighbor models use the “k” number of observations most similar to the observation being classified to determine the “class” that it belongs to. Our model uses a majority vote of the class of the 4 closest observations to predict a particular player’s grouped AV score. We can then compare the predicted AV to the actual AV assigned by to test what our model believes the Combine says about each player’s football quality. How does it do? Poorly—our model only predicted 45% of the AV scores correctly.

Maybe the Combine results are useful for forecasting another variable, like where a player will be selected in the draft. For this analysis, we’ll use the draft and Combine results from 2017 to fit a logistic regression model, and then use that model to forecast the draft quintile for each player who participated in the 2018 Combine. A logistic model is used to classify observations based on the probability that they belong to one of two or more categorical outcomes—in this case the probability of a player being drafted in each of the draft quintiles.

Using MicroStrategy’s native API ecosystem, we can easily pull our predictions from R into MicroStrategy via the Intelligence Server to visualize our results. Check out the dossier and see our predictions.

It looks like the Combine might not be the best predictor of NFL success, but it’s one of the major data points that we have to work with leading up to the NFL draft—and it turns out the draft is a much better predictor of a player’s future success than their performance at the Combine. Clearly, GMs know something we don’t. Be sure to check back after the draft, when we’ll see how our predictions do (we aren’t betting on them), and develop another model that predicts rookie performance based on their draft position!

Interested in learning more about using MicroStrategy with R? Check out the MicroStrategy R Integration Pack on GitHub and download MicroStrategy Desktop today to get started.

Comments Blog post currently doesn't have any comments.