Introduction

Over the past few years, Major League Baseball has seen substantial shifts in what have traditionally been considered the most dominant pitches. Earlier this year, Mets ace Justin Verlander remarked that sliders haven’t been performing as well as in the past, in fact, OPS on sliders is “the highest it’s been in the pitch tracking years”.1 The introduction of the sweeper, an attempt to designate differences between two types of sliders, and new evaluation metrics such as Stuff+, Location+, and Pitching+, have only added dimension to the evaluation of pitch types and are indicative of a growing focus on individual pitch characteristics. While the discrete qualities of a pitch (such as movement, velocity, and spin rate) are undoubtedly useful in evaluating and explaining elite pitchers, the effects that sequencing and in-game pitching strategy have on outcomes cannot be understated. Therefore, integrating inter-pitch dynamics with discrete pitch characteristics might more accurately model a pitcher’s effectiveness. Fangraph’s Stuff+ metric does a satisfactory job in evaluating the efficacy of a pitch based on discrete characteristics, but it is limited in that it fails to consider the art of pitch sequencing. Our motivation behind this project is to both assess current discrete pitch evaluation metrics and consider how inter-pitch dynamics can enhance existing models.

Data

The data for our project was compiled from Fangraphs and Baseball Savant. All of the data used in our research are season statistics separated by year, pitcher, and pitch type from the 2020-2022 seasons where the specified pitch had 250 or more recorded pitches for the given season. The Stuff+ data, both pitcher and pitch specific values, are from Fangraphs. All other data, including movement and pitch metric data (e.g. spin rate, xwOBA, pitches thrown) was collected from Baseball Savant.

Exploratory Data Analysis (EDA)

Note Recategorization:
“Sliders” includes Slurves and Sweepers
“Curveballs” includes Knuckle Curves
“Changeup” includes Splitters

Our initial exploratory data analysis revolved around understanding pitch characteristics such as movement, velocity and spin. We began by charting some average values by pitch type: vertical and horizontal break, the proportion of time that they’re thrown, spin rate in revolutions per minute, and speed.


Pitch Horizontal Vertical Pitch Proportion Spin Rate Speed
4-Seamer 7.45 14.86 0.46 2285.29 93.93
Sinker 15.00 22.89 0.39 2127.16 93.21
Cutter 2.88 25.97 0.34 2380.57 88.97
Splitter 11.71 33.09 0.28 1459.77 86.37
Slider 6.42 36.28 0.34 2432.44 84.94
Changeup 14.03 32.27 0.26 1754.87 84.59
Curveball 9.45 53.35 0.26 2572.18 79.64

Next, we turned our attention to the popular Stuff+ metric used to evaluate the success of pitches. Stuff+ factors in the discrete characteristics we’ve explored and more to evaluate pitchers - our question was, do pitchers know what they do well? Can Stuff+ actually predict usage? In doing this, we wanted to see if pitchers “knew” which of their pitches were most effective and threw those pitches in higher proportions.

Our graph shows that as Stuff+ increases, pitch usage tends to increase as well - indicating that pitchers do in fact throw their best pitches (as determined by Stuff+) more frequently. Our graph is colored by weighted on base percentage (wOBA) to further show that pitches with high Stuff+ ratings are correlated with lower opposing wOBA.

Pitch-by-Pitch Breakdowns:

Next, we wanted to know how each of these discrete characteristics might carry a different level of importance based on the pitch type. Is vertical break more important than speed for curveballs, and if so, how much more important is it? To explore these questions we created variable importance plots for each of our interest variables across all pitch types. Our outcome variables - wOBA, xwOBA, and run value per 100 - were selected because they are tangible result-based metrics that point toward the effectiveness of a pitch.

Intra-Pitch Data Analysis

Variable Importance Plots

Sinkers, Cutters, and Four-Seam Fastballs have been aggregated into a singular “Fastball” category.

wOBA

Fastballs

Sliders

Curveballs

Change-Ups

xwOBA

Fastballs

Sliders

Curveballs

Change-Ups

Run Value / 100

Fastballs

Sliders

Curveballs

Change-Ups

These importance plots should be considered only within themselves, not across various plots. From these plots, we can see what variables have similar importance levels to each other within a particular pitch type and how variable importance changes based on the pitch and outcome response variable.

Pitch Characteristics

wOBA

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

xwOBA

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

Run Value / 100

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

These scatter plots build upon our variable importance plots: they actually show the relationships between our four discrete pitch characteristics and our outcome variables by pitch type. These plots inform our overarching interest in the relationship between intra-pitch characteristics and effectiveness. It’s important to note too that the y-axes on these plots are inverted to show that a lower xwOBA is “better” for pitchers. These plots show significant correlations between discrete pitch characteristics and outcome as measuerd through wOBA, xwOBA, and run value per 100.

Pitch Movement

wOBA

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

xwOBA

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

Run Value / 100

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

Similarly, these scatter plots specifically explore movement as it relates to our outcome variables. They show how both vertical and horizontal movement correlate with outcome differently based on the pitch type.

This next grouping of plots moves into the one of the two primary models we are using in this report: generalized additive models (GAMs). These GAMs only take in one explanatory variable: Stuff+. The purpose of using Stuff+ is that it is a broad, all-encompassing metric that takes into account discrete pitch characteristics (with the exception of speed and movement fastball differentials). Stuff+ provides a single number summary for intra-pitch characteristics that can then be used to predict outcome via xwOBA. We used xwOBA as our response variable because it is the least noisy of the three. These plots show the relationship between our model’s predicted xwOBA values using just Stuff+ and the observed xwOBA values.

Intra-Pitch xwOBA GAM

Four-Seam

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

We used 5-fold cross validation to obtain our RMSE values in the above GAMs and in the following GAMs and random forests. For each pitch type, we assigned pitchers to a fold and ran the cross validation.

Inter-Pitch Data Analysis

The next phase of our project revolves around inter-pitch interactions. We’re curious to see how differentials in speed, movement, and release point, when factored in with the discrete metric Stuff+, improve our predictions on the success of a pitch. To do this, we used both generalized additive models (GAMs) and random forests. The purpose of using both was to assess whether or not these input variables interact significantly with each other, since random forests capture these interactions while GAMs do not. We are essentially “adding on” to the GAMs that previously only took into account Stuff+ in order to see how much better they do with inter-pitch characteristics. Again, our plots show how our predicted xwOBA values do against the observed values.

Inter-Pitch xwOBA Models

GAM

Four-Seams

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

Random Forest

Four-Seams

Cutters

Sinkers

Sliders

Curveballs

Change-Ups

Comparative

Now to compare the various models we’ve used. Both the intra-pitch (based on Stuff+) and inter-pitch (based on differentials) models were generally effective at predicting xwOBA outcome, but the central question of our project is to determine if inter-pitch metrics create an edge in overall pitch evaluation. To do this, we compared the root mean squared error (RMSE) of our various models. For each individual model, we calculated the RMSE by pitch to analyze which pitches it performed the best on. We compared the RMSEs across models through a box plot.

Model Accuracy Comparison

RMSE Comparison

Pitch Intercept Intra-Pitch GAM Inter-Pitch GAM Inter-Pitch RF
Four-Seam 0.0520 0.0465 0.0483 0.0484
Cutter 0.0409 0.0389 0.0612 0.0406
Sinker 0.0420 0.0394 0.0465 0.0383
Slider 0.0491 0.0532 0.0567 0.0461
Curveball 0.0480 0.0480 0.0555 0.0441
Change-Up 0.0443 0.0421 0.0496 0.0420

Intra-Pitch GAM

Inter-Pitch GAM

Inter-Pitch Random Forest


Results Summary

The comparison of our models shows that utilizing variable interactions and interpitch dynamics improves the accuracy of xwOBA predictions. While it is difficult to improve upon Stuff+, a metric that has been developed and fine-tuned for years, our layered inter-pitch models were comparable and, in some cases, more effective than GAMs that only took into account Stuff+.

Looking Ahead

Our next step is to take into account the rates of pitch-pair sequences and use them as features in our models. Doing so would further integrate inter-pitch dynamics and the effect that pitch sequencing has on outcome. Next, there were extreme outliers in our models that affected our resulting RMSE and led to unrealistic xwOBA values. Looking into why these outliers exist and taking steps to adjust our predictions accordingly will lead to more consistently accurate xwOBA values. Finally, we would look into exploring models outside of GAMs since inter-pitch dynamics appear to have more complex relationships with xwOBA than GAMs can capture. Further adjust the tuning parameters of our models would add to their sophistication.

Acknowledgements

We would like to thank Meg Ellingwood and Shamindra Shrotriya, the leaders of the Carnegie Mellon Summer Undergraduate Research Experience in Statistics, for their invaluable knowledge, guidance, and instruction throughout this research experience. This project would not have been possible without the help of Dr. Ron Yurko and Sean Ahmed, Pirates Director of R&D. We would like to thank the entire Carnegie Mellon Sports Analytics Camp teaching team for their support and guidance during this project.

Citations

[1] Sammon, W., & Sarris, E. (2023, July 7). Fall of the slider: Why are hitters feasting on MLB’s once-deadly breaking ball? The Athletic. https://theathletic.com/4671150/2023/07/07/mlb-sliders-hitters-success/

Major League Leaderboards “2020-2022” Pitchers. FanGraphs Baseball. (n.d.). https://www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=y&type=36&season=2022&month=0&season1=2020&ind=1&team=0&rost=0&age=0&filter=&players=0&startdate=&enddate=

Sammon, W., & Sarris, E. (2023, July 7). Fall of the slider: Why are hitters feasting on MLB’s once-deadly breaking ball? The Athletic. https://theathletic.com/4671150/2023/07/07/mlb-sliders-hitters-success/

Jldbc. (n.d.). JLDBC/Pybaseball: Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, fangraphs). GitHub. https://github.com/jldbc/pybaseball

Statcast Pitch Arsenal Stats Leaderboard. baseballsavant. (n.d.). https://baseballsavant.mlb.com/leaderboard/pitch-arsenal-stats

Statcast Pitch Movement Leaderboard. baseballsavant. (n.d.). https://baseballsavant.mlb.com/leaderboard/pitch-movement


  1. Ethan Park, University of Southern California, ↩︎

  2. Evan Wu, Elon University, ↩︎

  3. Priyanka Kaul, Harvard University, ↩︎