The Wrong Stuff

Authors

Isabelle Schmidt

Liam Jennings

Tiger Teng

Published

July 26, 2024


Introduction

In the fast-paced environment of Major League Baseball, Stuff+ serves as a vital metric for evaluating the ‘nastiness’ of a pitch. The metric analyzes the physical characteristics of a pitch including velocity, spin, extension, and movement. A Stuff+ value of 100 means a pitch is considered league average; anything above or below 100 is considered above or below average respectively.

With the emergence of Stuff+ as a common metric used in pitcher evaluation, we wanted to evaluate how accurately the model predicted a pitcher’s success and how appropriately it weighed the factors that go into it. We were curious to know if there were commonalities between the pitches that Stuff+ tended to over or undervalue, telling us it was not accounting for variables that played a role in a pitcher’s effectiveness.

Our goal is to provide insights to improve the Stuff+ model so it becomes a more reliable metric for player evaluation. Additionally, we can help pitchers develop effective pitches by understanding which physical pitch characteristics indicate successful outcomes.

Our initial analysis is concentrated on the four-seam fastball, with plans to extend this research to other types of pitches in future studies.

Data

The data we used were from FanGraphs and Statcast from 2021 to 2023. Although FanGraphs has Stuff+ data for the 2020 and 2024 seasons, we decided not to include them because we wanted to limit bias. In 2020 and 2024, there is minimal data due to COVID-19 or an incomplete season respectively. We web-scraped the FanGraphs data using the baseballr library and utilized a web-scraping function to gather data from Baseball Savant.

FanGraphs Data:

The FanGraphs data provides each MLB pitcher’s Stuff+ for each pitch they throw in a given season. Note: throughout this paper, pitch refers to a pitch name (e.g., curveball, slider) unless specified otherwise. FanGraphs data was originally formatted with each row providing information about a pitcher in a given year. We transformed it, however, to make each row represent a pitcher in a given year for a specific pitch. For example, instead of one row having information about all of Paul Skenes’ pitches from the 2024 season, it became that one row had information about Paul Skenes’ sinker in the 2024 season and another had information about his four-seam fastball in the 2024 season.

Player name Season stuff plus change up stuff plus four seam fastball stuff plus cutter stuff plus slider stuff plus curveball stuff plus sinker
Ron Marinaccio 2022 96.94 122.13 NA 129.17 NA NA
Chris Devenski 2022 131.77 69.70 NA 135.88 NA NA
Hunter Strickland 2022 69.30 89.00 NA 94.59 NA 84.01
Caleb Baragar 2021 NA 112.99 NA 110.61 75.97 NA
Statcast Data:

Statcast data is pitch-by-pitch data that gives information about all the individual pitches thrown. Initially, we created individual swing and miss indicators for each pitch (e.g., if attempted swing but missed, swing = 1 and miss = 1). Also, we combined certain pitches because Stuff+ does not have a model for each pitch type (e.g., FanGraphs treats sliders and sweepers as the same). From there, we grouped by season, pitcher, and pitch type to summarize the number of pitches, swings, misses, and total run value and calculate the mean of each physical characteristic and xwOBA. We applied the summarized values to compute whiff percentage, swing percentage, swing-and-miss percentage, and run value per 100 pitches.

pitcher pitch name season velocity horizontal break induced vertical break spin extension outcome description
543339 4-Seam Fastball 2021 97.5 -0.67 1.36 2171 6.4 hit_into_play
656302 4-Seam Fastball 2023 95.6 -0.16 1.58 2364 6.6 ball
676684 4-Seam Fastball 2023 95.7 -0.80 1.23 2151 6.1 foul
668676 4-Seam Fastball 2021 93.1 -0.76 1.50 2131 6.4 foul
Combining the Data:

Once all these values were calculated, we joined the FanGraphs dataset with the Statcast dataset to evaluate these metrics simultaneously. To mitigate bias, we only included pitchers who threw the given pitch at least 100 times.

player name pitch name season velocity extension induced vertical break horizontal break spin stuff plus whiff pct xwOBA
Josh Winckowski 4-Seam Fastball 2022 93.7 6.5 14.9 -2.8 2201 72.2 16.4 0.283
Logan Gilbert 4-Seam Fastball 2022 96.1 7.4 17.8 -6.4 2135 101.6 23.7 0.338
Louie Varland 4-Seam Fastball 2023 95.3 6.9 16.3 -7.7 2220 106.7 23.7 0.315
Chasen Shreve 4-Seam Fastball 2023 90.5 6.3 16.0 -11.2 2287 79.0 22.4 0.389
Player Examples:

The two players we are going to examine are starting pitcher Andrew Heaney and relief pitcher Julian Merryweather. Andrew Heaney played for the Los Angeles Angels, New York Yankees, Los Angeles Dodgers, and Texas Rangers since 2021. Despite being an above-average pitcher (by ERA+ standards) over the last two years, he generates flyballs at an above-average rate. Andrew Heaney doesn’t fit the mold of the fireballing, high fastball hurler. He has below-average velocity, extension, and induced vertical break but above-average horizontal break and spin rate. His whiff percentage and xwOBA are better than average despite having an average Stuff+. We believe Andrew Heaney’s four-seam fastball should have above-average Stuff+ based on the whiff percentage and xwOBA.

Julian Merryweather has pitched for the Toronto Blue Jays and Chicago Cubs since 2021. His only above-average year (in terms of ERA+) was 2023. However, he has better than average velocity, extension, induced vertical break, and spin. The only below-average pitch physical characteristic is the horizontal break. His physical characteristics generate below-par whiff percentages and xwOBA but a well above-average Stuff+ (almost 30% above an average pitch).

We believe these two pitchers are misrepresented by Stuff+.

Andrew Heaney & Julian Merryweather 4-Seam Fastball
player name season velocity extension induced vertical break horizontal break spin stuff plus whiff pct xwOBA
Andrew Heaney 2021 92.0 6.2 15.2 -15.3 2443 100.0 27.8 0.328
Andrew Heaney 2022 92.9 6.2 14.3 -14.8 2441 94.6 31.1 0.324
Andrew Heaney 2023 92.5 6.5 13.9 -15.9 2413 106.6 25.6 0.336
Julian Merryweather 2021 97.5 6.8 17.5 -7.7 2262 129.4 19.1 0.429
Julian Merryweather 2022 97.3 6.6 16.9 -6.5 2286 127.0 14.6 0.493
Julian Merryweather 2023 98.1 6.5 16.6 -6.3 2342 128.4 20.6 0.386

Exploratory Data Analysis (EDA)

In the initial analysis, we focused on the four-seam fastball because it is the most common pitch in baseball. Additionally, its increased usage during the pitch-tracking years is noticeable. Before Statcast, savvy teams (such as the Pittsburgh Pirates) generated strong results with the low sinker. Batters were routinely grounding out to shifted infields. However, the emergence of advanced pitch-tracking technology led more teams to flock to fastballs with high spin, up in the zone. These pitches generated high whiff rates and limited hard contact, which made sinkers look obsolete.

With the continued frequency of the high four-seam fastball, we wanted to know how the pitch effectiveness was changing from year to year. Intuitively, it would make sense for the pitch to become less effective with time because batters see it more frequently and adjust. To see if this was true, we looked at the relationship between each pitch physical characteristic individually against an outcome variable.

Our group decided to focus primarily on whiff percentage and xwOBA as our outcome variables because they had the highest correlation with Stuff+. We liked that whiff percentage gave us an idea of whether or not batters were making contact with the pitch while xwOBA gave us an idea of (a) batters’ ability to get on base against the pitch and (b) how well batters were making contact with the pitch.

Key Takeaways:

Given the hype around the rising fastball, we expected that induced vertical break would play a large role in whiff percentage and xwOBA. To our surprise, the change in whiff percentage and xwOBA per inch of induced vertical movement was much less than expected. We also found that the change in both outcome variables per inch of horizontal movement was more than we expected.

Further Analysis:

This piqued our interest in further analyzing how movement related to both outcome and pitch physical characteristic variables, and how this change varied from year to year. To further our understanding, we made plots looking at how vertical and horizontal movement combined impacted these variables.

The y-axis, vertical movement, is the induced vertical break. The induced vertical break is the vertical movement based on spin and velocity without gravity impacting it. A pitch cannot defy gravity and rise; however, some four-seam fastballs’ combination of spin and velocity generates enough vertical movement to create the optical illusion of rising. Two indicators show the pitch movement based on horizontal break (e.g., glove side or hand side). The point of view of the plot is from a right-handed pitcher facing home plate. Horizontal movement is different in this plot compared to other plots we created because of the different point of view. The thick black lines represent 0 inches of movement for each axis. 0 inches of movement is a pitch that is thrown directly straight. The dashed red line on the plots represents the average movement over the three seasons. The vertical dashed line represents the average horizontal break and the horizontal line is the average induced vertical break.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hex Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Through the relationship and hexbin plots, we saw that the importance of each physical characteristic for determining the effectiveness of a four-seam fastball was changing year to year. This made us wonder if the Stuff+ model was adjusting as well. For this, we observed the relationship between each pitch physical characteristic individually and Stuff+. Like the outcome variables, we initially looked at each pitch physical characteristic individually and then observed vertical and horizontal movement simultaneously.

Key Takeaways:

Stuff+ accounts for the various physical characteristics of a pitch differently each year, particularly velocity. The change from year-to-year, however, appears to be less for Stuff+ than for whiff percentage and expected on-base percentage. This indicates that there is some bias in the way Stuff+ is measured.

Key Takeaways:

Stuff+ accounts for the various physical characteristics of a pitch differently each year, particularly velocity. The change from year-to-year, however, appears to be less for Stuff+ than for whiff percentage and expected on-base percentage. This indicates that there is some bias in the way Stuff+ is measured.

Key Takeaways:

Stuff+ accounts for the various physical characteristics of a pitch differently each year, particularly velocity. The change from year-to-year, however, appears to be less for Stuff+ than for whiff percentage and expected on-base percentage. This indicates that there is some bias in the way Stuff+ is measured.

Movement & Stuff+

Key Takeaways:

Despite our previous observations that pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year, Stuff+ still ranks the “best” fastballs as the pitches with excellent vertical movement and the fourth quadrant is mostly average or below-average. Visually, there appears to be some bias in Stuff+.

Important Question

From both types of plots, we concluded that the Stuff+ model was adjusting how it valued each physical characteristic of a pitch year-to-year. The question then became, “How well is the model adjusting?”.

Methods

Linear Models

We addressed the question by fitting linear models for each outcome variable: Stuff+, whiff percentage, and xwOBA. Specifically, we modeled each outcome as a function of the season and each physical characteristic, including an interaction term between the season and the physical characteristic. For example, our models took the form of lm(Stuff+ ~ season + horizontal_break + season * horizontal_break), and similarly for other physical characteristics and outcome variables.

This approach allowed us to understand how changes in pitch characteristics influenced Stuff+, whiff percentage, and xwOBA over different seasons. Including the interaction term was crucial, as it provided insights into how the relationship between pitch characteristics and the outcomes varied year-to-year, rather than just showing overall trends.

Once our linear models were fitted, we plotted the coefficients of the various pitch physical characteristics in 2021, 2022, and 2023 to evaluate how each additional unit of the characteristic impacted Stuff+ compared to whiff percentage and expected weighted on-base average.

Coefficient Plot Key Takeaways:

The plots that provided us with the most valuable information were the vertical break plots. From them we learned that Stuff+ started valuing vertical break less in 2023. We also learned that in 2021 and 2022, as induced vertical break increased, expected wOBA increased. With this information and the knowledge that a lower expected weighted on-base average is better for pitchers, we were able to assume that having more induced vertical break is not necessarily good. This was only reiterated when observing that in all three years, the whiff percentage also went down as induced vertical break increased.

Coefficient Plot Key Takeaways:

The plots that provided us with the most valuable information were the vertical break plots. From them we learned that Stuff+ started valuing vertical break less in 2023. We also learned that in 2021 and 2022, as induced vertical break increased, expected wOBA increased. With this information and the knowledge that a lower expected weighted on-base average is better for pitchers, we were able to assume that having more induced vertical break is not necessarily good. This was only reiterated when observing that in all three years, the whiff percentage also went down as induced vertical break increased.

Coefficient Plot Key Takeaways:

The plots that provided us with the most valuable information were the vertical break plots. From them we learned that Stuff+ started valuing vertical break less in 2023. We also learned that in 2021 and 2022, as induced vertical break increased, expected wOBA increased. With this information and the knowledge that a lower expected weighted on-base average is better for pitchers, we were able to assume that having more induced vertical break is not necessarily good. This was only reiterated when observing that in all three years, the whiff percentage also went down as induced vertical break increased.

Coefficient Plot Key Takeaways:

The plots that provided us with the most valuable information were the vertical break plots. From them we learned that Stuff+ started valuing vertical break less in 2023. We also learned that in 2021 and 2022, as induced vertical break increased, expected wOBA increased. With this information and the knowledge that a lower expected weighted on-base average is better for pitchers, we were able to assume that having more induced vertical break is not necessarily good. This was only reiterated when observing that in all three years, the whiff percentage also went down as induced vertical break increased.

Coefficient Plot Key Takeaways:

The plots that provided us with the most valuable information were the vertical break plots. From them we learned that Stuff+ started valuing vertical break less in 2023. We also learned that in 2021 and 2022, as induced vertical break increased, expected wOBA increased. With this information and the knowledge that a lower expected weighted on-base average is better for pitchers, we were able to assume that having more induced vertical break is not necessarily good. This was only reiterated when observing that in all three years, the whiff percentage also went down as induced vertical break increased.

Coefficient Plot Key Takeaways:

The plots that provided us with the most valuable information were the vertical break plots. From them we learned that Stuff+ started valuing vertical break less in 2023. We also learned that in 2021 and 2022, as induced vertical break increased, expected wOBA increased. With this information and the knowledge that a lower expected weighted on-base average is better for pitchers, we were able to assume that having more induced vertical break is not necessarily good. This was only reiterated when observing that in all three years, the whiff percentage also went down as induced vertical break increased.

Random Forest

To further evaluate the Stuff+ metric, which is created using a Random Forest model, we modeled Stuff+ by physical characteristics to determine the importance of each variable by year. The Random Forest model, which handles non-linear relationships well and does not need normally distributed variables, was chosen for this purpose. We trained the model with velocity, extension, induced vertical break, horizontal break, and spin rate as predictors for Stuff+. From the model, we extracted the variable importance of each variable to assess how much they influence Stuff+.

In addition to the Random Forest model, we explored the use of Generalized Additive Models (GAM) to understand the relationships between physical characteristics and Stuff+. GAMs are flexible models that can capture non-linear relationships by allowing the data to determine the shape of the relationship between the predictors and the response variable. This flexibility makes GAMs particularly useful for identifying and modeling complex interactions between variables. We used GAM to model Stuff+ with velocity, extension, induced vertical break, horizontal break, and spin rate as predictors. By comparing the performance of GAM with the Random Forest model using k-fold cross-validation, we aimed to determine which modeling approach provides better and more predictive accuracy regarding the influence of physical characteristics on Stuff+. Our findings indicated that the Random Forest model outperformed GAM, as evidenced by its lower RMSE value. Based on these results, we decided to use the Random Forest model. We created three Random Forest models predicting Stuff+ for each season. Then, we modeled the xwOBA of each season using Random Forests to compare its performance with the Stuff+ model and assess how well Stuff+ predicts actual at-bat outcomes. Additionally, we examined the variable importance of each variable to understand their influence on the models. We also analyzed how the variable importance values changed each season, comparing their relative importance and interactions over time. This comprehensive approach allowed us to evaluate the robustness and effectiveness of the Random Forest model in capturing the key physical characteristics that influence pitch effectiveness for both xwOBA and Stuff+ models.

Results

Variable importance is a measure used in Random Forest models to quantify the contribution of each predictor variable to the model’s predictive power. It helps identify which features have the most significant impact on the model’s predictions. Higher variable importance values indicate a greater influence on the outcome variable. The variable importance plots for Stuff+ and xwOBA across the 2021-2023 seasons are shown in the figure. These plots illustrate how the importance of different physical characteristics varied over time for each metric.

Key Takeaways:

  • Velocity consistently holds the highest importance across all three years, demonstrating that velocity is a critical factor in determining the Stuff+ value

  • Spin rate shows a significant level of importance, though it decreases slightly in 2022 before rising again in 2023. This suggests that while spin remains a vital factor, its relative importance can fluctuate

  • Induced vertical break maintains moderate and relatively stable importance throughout the years, indicating its consistent role in influencing Stuff+

  • Horizontal break and extension have the least importance, with minimal variation across the seasons, highlighting their lesser impact compared to other characteristics

Key Takeaways:

The xwOBA Variable Importance plot reveals more dynamic changes:

  • Extension is the most important in 2021 and 2023, indicating its significant role in impacting at-bat outcomes; however, it becomes the least important in 2022

  • Spin rate increases significantly in importance in 2022, indicating a high impact during that year, but then decreases in 2023, though it remains an important factor

  • Velocity sharply decreases in importance by 2022 and remains low in 2023, suggesting its influence on batting outcomes becomes the least important

  • Horizontal break shows a decrease in importance in 2022, which then increases in 2023, highlighting the changing recognition of horizontal movement in affecting at-bat outcomes

  • Induced vertical break shows relatively low importance with minimal fluctuations across the years, indicating this characteristic is less influential on xwOBA compared to others

Interpretation of Results

Our results highlight key differences in the importance of vertical break, horizontal break, and velocity between the Stuff+ and xwOBA models, which is critical for assessing if Stuff+ accurately reflects real pitch quality:

  • Velocity is consistently the most important characteristic across all years, underscoring its critical role in determining Stuff+. In contrast, for the xwOBA model, velocity’s importance sharply decreases by 2022 and becomes the least important by 2023. This suggests that while velocity is heavily weighted in the Stuff+ metric, its actual impact on at-bat outcomes (xwOBA) diminishes over time. Therefore, the Stuff+ model’s reliance on velocity may encapsulate the real pitch effectiveness

  • Horizontal break in the Stuff+ model is relatively unimportant but stable over the three seasons; however, in the xwOBA model, horizontal break decreases in importance in 2022 but increases again in 2023. This indicates that horizontal movement has a varying impact on at-bat outcomes, becoming more significant in the most recent full season. The Stuff+ model may underestimate the importance of horizontal break in predicting actual performance

  • Induced vertical break maintains a moderate and stable importance in the Stuff+ model, reflecting its consistent role in pitch evaluation. In the xwOBA model, vertical break shows relatively low importance with minimal fluctuations, indicating it is less influential on at-bat outcomes compared to other characteristics. This suggests that the Stuff+ model might overestimate how essential vertical break is in determining pitch effectiveness.

Discussion

These insights suggest that the Stuff+ model is not modeling the outcomes well and appears biased. It does not account for the changing importance of certain physical characteristics for four-seam fastballs. Understanding these differences is essential for refining the Stuff+ metric to better align with practical performance indicators.

Limitations

Aggregated Data: The primary limitation of our analysis is the absence of pitch-by-pitch Stuff+ data. Instead, we only have access to season-by-season aggregated data. This limitation hinders our ability to capture the nuances and variations occurring during individual games and events. Therefore, important short-term trends and deviations in pitch effectiveness may be smoothed too much by season averages

Limited Pitch Types: The FanGraphs data we used has limited Stuff+ pitch types, such as not including sweeper as a separate pitch type. We may not be able to fully analyze certain pitch types, leading to important insights being left out

Temporal Scope: Our dataset is restricted to a limited time frame, specifically from 2021 to 2023. The short period may not provide a sufficient historical context to understand long-term trends in pitch effectiveness

Future Work

In future work, we aim to extend our analysis to include other pitch types beyond four-seam fastballs. By exploring a wider array of pitches, we can understand how different physical attributes influence pitching effectiveness across various pitches. Additionally, we can examine the differences between similar pitch types (e.g., sliders and sweepers). This will help refine the Stuff+ metric further and ensure it accurately reflects the performance of all pitches, providing a powerful tool for evaluating pitcher performance

Acknowledgments

We appreciate Sean Ahmed from the Pittsburgh Pirates for being an exceptional mentor. Sean’s extensive experience working in baseball and his creative thinking were instrumental in shaping the direction of this research. We are grateful to Dr. Ron Yurko and Quang Nguyen for their invaluable guidance and encouragement throughout this project.