Unsupervised methods for identifying pass coverage among defensive backs with NFL player tracking data

Unsupervised methods for identifying pass coverage among defensive backs with NFL player tracking data

Abstract

Analysis of player tracking data for American football is in its infancy, since the National Football League (NFL) released its Next Gen Stats tracking data publicly for the first time in December 2018. While tracking datasets in other sports often contain detailed annotations of on-field events, annotations in the NFL’s tracking data are limited. Methods for creating these annotations typically require extensive human labeling, which is difficult and expensive. We begin tackling this class of problems by creating annotations for pass coverage types by defensive backs using unsupervised learning techniques, which require no manual labeling or human oversight. We define a set of features from the NFL’s tracking data that help distinguish between zone and man coverage. We use Gaussian mixture modeling and hierarchical clustering to create clusters corresponding to each group, and we assign the appropriate type of coverage to each cluster through qualitative analysis of the plays in each cluster. We find that the mixture model’s soft cluster assignments allow for more flexibility when identifying coverage types. Our work makes possible several potential avenues of future NFL research, and we provide a basic exploration of these in this paper.