Introduction

Ultimate Frisbee (ultimate) is a non-contact sport played with a flying disc that combines a variety of elements from American football, soccer, and basketball. The goal of the game is to score points by catching the frisbee in the opposing team’s end zone. The Ultimate Frisbee Association (UFA) was founded in 2012 and is the premier professional Ultimate Frisbee League in the US. Long and short passes are the two main types of passes that players use in ultimate. Long passes often span almost the entirety of the field while short passes span fewer yards. In this report, we aim to predict whether or not a frisbee pass will be completed based on pass distance and team/thrower. This issue is important to help UFA teams figure out the viability of various offensive strategies and determine their best plan on game day. Teams may choose to use a combination of long and short passes in order to move the frisbee down the field to score. Our goal is to help coaches and players understand what works best based on game data from the UFA league itself and how they could potentially optimize scoring based on pass completion.

Data

The data for this report was acquired from the Ultimate Frisbee Association Application Programming Interface (UFA API). We obtained this data online at https://www.docs.audlstats.com/. We first used python to collect GameEvents from UFA games between 2021-2023 using the API and create a csv file called gameEvents.csv. This original csv file contained 356,792 rows and 18 columns. We processed the data in gameEvents to only include passses/throwaways and games from 2023. We then added columns such as completed_pass (whether a throw was completed), yardsGained(the vertical distance of a pass), passDistance(the total distance of a pass), and xDistance(the horizontal distance of a pass). In the end, there were 84,595 rows and 27 columns. They key variables in our filtered data include thrower, team (home or away), completed pass, pass distance, and yards gained.

Exploratory Data Analysis

Most of the passes in the data are completed, with under 15,000 passes, or roughly 6% of the total passes being missed.

Most of the passes are thrown forward in the direction of the end zone, represented by a positive value for yards gained, but around 60,000 or 25% of the passes are thrown backwards, away from the end zone, represented by a negative yards gained.

Completed passes seem to be shorter, with most passes under 30 yards, while the missed plot shows more longer passes, and a high value for 0, indicating that the pass was never initiated, maybe from dropping the disc.

The thrower heat map shows that the most frequent starting point is X coordinate 0, and Y coordinate 40. While this may look weird, (0, 40) makes up 1883 observations, while the most frequent point after this is (0, 100) with 484 observations, about a quarter of the (0, 40) observations. This is because the brick mark is located at (0, 40). When the game starts, the opposing team has the opportunity to pull, or throw the disc as far as they can’t away from the offensive team’s end zone. If this pull lands out of points, the offensive team starts the throw from the brick mark, which accounts for why this fixed coordinate has the highest frequency in the dataset.

If the pull lands in bounds, then the thrower starts from that point, which explains the hot spots behind the brick mark. The receiver heat map looks similar, with most passes going back and forwards on the field aligned along the X coordinate of zero. The passes in the end zone seems to be to the extreme sides of the field, possibly as an offensive strategy to avoid an interception by the opposing teams.

From a simple logit completion pass model with the XY coordinates of the intended receiver and the yards gained on the pass, we take the top 10 throwers with the highest CPOE (completion percentage over expectation) values - all throwers beat their expected completion percentage by about 3%.

Methods

We utilized 3 different methods to create our expected completion model.

Our first model was a generalized linear model that utilized logistic regression. Our respone was whether or not a pass was completed and our features included home, throwerY (thrower Y location), endOfPeriod, yards gained, passDistance, and an interaction between passDistance and yardsGained.

Thrower Y is important completions gets harder as you move towards the redzone and there is less space to work with. EndOfPeriod shows if a throw was then in the last 30 seconds of a period which could indicate more pressure or a buzzer beater. The interaction between yardsGained and passDistance is also important because it can help differentiate between types of throws. In ultimate, a long horizontal pass generally will have less pressure than a long vertical pass. Finally, longer passes are more likely to be inaccurate and have a longer time for defenses to catch up. This is why we included yardsGained and passDistance in our model.

Our second model utilized these same features but also included a random intercept effect for the thrower. This will help identify our model differentiate between better or worse thrower.

When testing our model assumptions yardsGained seems to have a linear relationship with the empirical logits. However, Horizonal yards (xDistance) and thrower location (throwerY) seem to have nonlinear relationships.

Variables vs Empirical Odds
Variables vs Empirical Odds

This led us to our third model which was a generalized additive model. Our features for this model included home and endOfPeriod. Furthermore, we also had yardsGained, xDistance, and throwerY, each with their own smoothing functions.

Model

To compare our models, we utilized 5 fold cross validation and compared their respective Brier Scores with standard errors. The baseline (where every pass is completed) had a score betwen (.0572, .0590). The glm logistic regression model was between (.0484, .0500). The gam model was between (.0476, .0497). The glm multilevel model was between (.0477, .0493).

Uncertainty

Uncertainty was quantified using the standard errors for the coefficients. The multilevel model also has standard errors for the thrower random effects.

From the plot above we can see the players with the top 10 best estimates and top 10 worst estimates. We can see there are quite large standard errors for each of the throwers. However, we can still differentiate between the best and worst players.

Results

From comparing the Brier scores from cross validation, we find that our GAM model and multilevel model performed the best with GLM logistic model following close behind

Observing the coefficients we can see that in all the models, the log odds of a pass reduces as the distance of a throw increases. We can also see that home and time pressure provide a statistically significant effect on pass completion despite the relatively large standard errors.

We moved forward with our analysis on our multilevel model due to its inclusion of thrower. To calculate the probability of scoring based on the number of passes, we multiplied the probabilities of each throw, where the sum of the yards gained is 80 yards (the length from endzone to endzone). Each throw is off the same length per possession and includes a slight random amount of horizontal distance.

The overall trend implies the optimal number of passes to be around 5 or 6 to maximize the probability of the team scoring during their possession of the frisbee. For the best thrower, this probability is about 80% compared to the approximate 70% for the average thrower. This makes sense as scoring with a smaller number of completed passes is indicative of longer passes and with fewer passes we expect there to be fewer chances for a turnover for a given possession.

This analysis is somewhat naive in calculating probability of scoring considering each throw is of equal length by the same thrower. Furthermore, considering the standard errors of our player effect estimates, the probabilities between the best and average thrower can be much closer than indicated in the plot.

Discussion

Through our analysis of this ultimate frisbee data, we have come to realize the importance of offensive strategy as it relates to completed passes. By using a Brier score we saw that our models alone are better than the baseline given the bounds of our confidence interval for Brier score do not overlap with the interval presented by the baseline model. Our best expected completion model was the GAM model and the multilevel model accounting for random effects of the thrower in attempted passes compared to a logistic regression model and a baseline. This is likely due to the variance between throwers being high enough to drastically impact expected completion. By capturing these random effects we more effectively modeled the nuance between teams as it relates to their throwers’ capabilities. We made use of our multilevel model to determine what is the ideal number of passes in an ultimate frisbee game. Overall we determined the optimal number of passes to be around 5 or 6.

However, there are limitations to this analysis. Most prominently is the lack of receiver data for incomplete passes. With every pass there is an intended receiver and perhaps the random effect attributed to the receiver regarding incomplete passes can provide a more holistic model regarding expected completion. Additionally, it would be interesting to analyze how the time remaining or the psychological and physical effects of the player throughout the game impact expected completion as these factors can be used to model the pressure to perform by the player. Furthermore, no weather data was captured in relation to the data analyzed and thus the analysis fails to look into how rainy weather might impact the ideal number of passes and the expectation of a complete pass compared to fair weather.

The next steps for this analysis include looking into these additional factors as they seem important towards better predicting expected completion. First, we must determine the best way to capture these additional factors. Finally, we must look at a more comprehensive model that incorporates these factors and see how they compare to the multilevel model present in the analysis here.