The Wrong Stuff

Introduction

In the fast-paced environment of Major League Baseball, Stuff+ serves as a vital metric for evaluating the ‘nastiness’ of a pitch. The metric analyzes the physical characteristics of a pitch including velocity, spin, extension, and movement. A Stuff+ value of 100 means a pitch is considered league average; anything above or below 100 is considered above or below average respectively.

With the emergence of Stuff+ as a common metric used in pitcher evaluation, we wanted to evaluate how accurately the model predicted a pitcher’s success and how appropriately it weighed the factors that go into it. We were curious to know if there were commonalities between the pitches that Stuff+ tended to over or undervalue, telling us it was not accounting for variables that played a role in a pitcher’s effectiveness.

Our goal is to provide insights to improve the Stuff+ model so it becomes a more reliable metric for player evaluation. Additionally, we can help pitchers develop effective pitches by understanding which physical pitch characteristics indicate successful outcomes.

Our initial analysis is concentrated on the four-seam fastball, with plans to extend this research to other types of pitches in future studies.

Data

The data we used were from FanGraphs and Statcast from 2021 to 2023. Although FanGraphs has Stuff+ data for the 2020 and 2024 seasons, we decided not to include them because we wanted to limit bias. In 2020 and 2024, there is minimal data due to COVID-19 or an incomplete season respectively. We web-scraped the FanGraphs data using the baseballr library and utilized a web-scraping function to gather data from Baseball Savant.

FanGraphs Data:

The FanGraphs data provides each MLB pitcher’s Stuff+ for each pitch they throw in a given season. Note: throughout this paper, pitch refers to a pitch name (e.g., curveball, slider) unless specified otherwise. FanGraphs data was originally formatted with each row providing information about a pitcher in a given year. We transformed it, however, to make each row represent a pitcher in a given year for a specific pitch. For example, instead of one row having information about all of Paul Skenes’ pitches from the 2024 season, it became that one row had information about Paul Skenes’ sinker in the 2024 season and another had information about his four-seam fastball in the 2024 season.

Player name	Season	stuff plus change up	stuff plus four seam fastball	stuff plus cutter	stuff plus slider	stuff plus curveball	stuff plus sinker
Ron Marinaccio	2022	96.94	122.13	NA	129.17	NA	NA
Chris Devenski	2022	131.77	69.70	NA	135.88	NA	NA
Hunter Strickland	2022	69.30	89.00	NA	94.59	NA	84.01
Caleb Baragar	2021	NA	112.99	NA	110.61	75.97	NA

Statcast Data:

Statcast data is pitch-by-pitch data that gives information about all the individual pitches thrown. Initially, we created individual swing and miss indicators for each pitch (e.g., if attempted swing but missed, swing = 1 and miss = 1). Also, we combined certain pitches because Stuff+ does not have a model for each pitch type (e.g., FanGraphs treats sliders and sweepers as the same). From there, we grouped by season, pitcher, and pitch type to summarize the number of pitches, swings, misses, and total run value and calculate the mean of each physical characteristic and xwOBA. We applied the summarized values to compute whiff percentage, swing percentage, swing-and-miss percentage, and run value per 100 pitches.

pitcher	pitch name	season	velocity	horizontal break	induced vertical break	spin	extension	outcome description
543339	4-Seam Fastball	2021	97.5	-0.67	1.36	2171	6.4	hit_into_play
656302	4-Seam Fastball	2023	95.6	-0.16	1.58	2364	6.6	ball
676684	4-Seam Fastball	2023	95.7	-0.80	1.23	2151	6.1	foul
668676	4-Seam Fastball	2021	93.1	-0.76	1.50	2131	6.4	foul

Combining the Data:

Once all these values were calculated, we joined the FanGraphs dataset with the Statcast dataset to evaluate these metrics simultaneously. To mitigate bias, we only included pitchers who threw the given pitch at least 100 times.

player name	pitch name	season	velocity	extension	induced vertical break	horizontal break	spin	stuff plus	whiff pct	xwOBA
Josh Winckowski	4-Seam Fastball	2022	93.7	6.5	14.9	-2.8	2201	72.2	16.4	0.283
Logan Gilbert	4-Seam Fastball	2022	96.1	7.4	17.8	-6.4	2135	101.6	23.7	0.338
Louie Varland	4-Seam Fastball	2023	95.3	6.9	16.3	-7.7	2220	106.7	23.7	0.315
Chasen Shreve	4-Seam Fastball	2023	90.5	6.3	16.0	-11.2	2287	79.0	22.4	0.389

Player Examples:

The two players we are going to examine are starting pitcher Andrew Heaney and relief pitcher Julian Merryweather. Andrew Heaney played for the Los Angeles Angels, New York Yankees, Los Angeles Dodgers, and Texas Rangers since 2021. Despite being an above-average pitcher (by ERA+ standards) over the last two years, he generates flyballs at an above-average rate. Andrew Heaney doesn’t fit the mold of the fireballing, high fastball hurler. He has below-average velocity, extension, and induced vertical break but above-average horizontal break and spin rate. His whiff percentage and xwOBA are better than average despite having an average Stuff+. We believe Andrew Heaney’s four-seam fastball should have above-average Stuff+ based on the whiff percentage and xwOBA.

Julian Merryweather has pitched for the Toronto Blue Jays and Chicago Cubs since 2021. His only above-average year (in terms of ERA+) was 2023. However, he has better than average velocity, extension, induced vertical break, and spin. The only below-average pitch physical characteristic is the horizontal break. His physical characteristics generate below-par whiff percentages and xwOBA but a well above-average Stuff+ (almost 30% above an average pitch).

We believe these two pitchers are misrepresented by Stuff+.

Andrew Heaney & Julian Merryweather 4-Seam Fastball
player name	season	velocity	extension	induced vertical break	horizontal break	spin	stuff plus	whiff pct	xwOBA
Andrew Heaney	2021	92.0	6.2	15.2	-15.3	2443	100.0	27.8	0.328
Andrew Heaney	2022	92.9	6.2	14.3	-14.8	2441	94.6	31.1	0.324
Andrew Heaney	2023	92.5	6.5	13.9	-15.9	2413	106.6	25.6	0.336
Julian Merryweather	2021	97.5	6.8	17.5	-7.7	2262	129.4	19.1	0.429
Julian Merryweather	2022	97.3	6.6	16.9	-6.5	2286	127.0	14.6	0.493
Julian Merryweather	2023	98.1	6.5	16.6	-6.3	2342	128.4	20.6	0.386

Exploratory Data Analysis (EDA)

In the initial analysis, we focused on the four-seam fastball because it is the most common pitch in baseball. Additionally, its increased usage during the pitch-tracking years is noticeable. Before Statcast, savvy teams (such as the Pittsburgh Pirates) generated strong results with the low sinker. Batters were routinely grounding out to shifted infields. However, the emergence of advanced pitch-tracking technology led more teams to flock to fastballs with high spin, up in the zone. These pitches generated high whiff rates and limited hard contact, which made sinkers look obsolete.

With the continued frequency of the high four-seam fastball, we wanted to know how the pitch effectiveness was changing from year to year. Intuitively, it would make sense for the pitch to become less effective with time because batters see it more frequently and adjust. To see if this was true, we looked at the relationship between each pitch physical characteristic individually against an outcome variable.

Our group decided to focus primarily on whiff percentage and xwOBA as our outcome variables because they had the highest correlation with Stuff+. We liked that whiff percentage gave us an idea of whether or not batters were making contact with the pitch while xwOBA gave us an idea of (a) batters’ ability to get on base against the pitch and (b) how well batters were making contact with the pitch.

Key Takeaways:

Given the hype around the rising fastball, we expected that induced vertical break would play a large role in whiff percentage and xwOBA. To our surprise, the change in whiff percentage and xwOBA per inch of induced vertical movement was much less than expected. We also found that the change in both outcome variables per inch of horizontal movement was more than we expected.

Further Analysis:

This piqued our interest in further analyzing how movement related to both outcome and pitch physical characteristic variables, and how this change varied from year to year. To further our understanding, we made plots looking at how vertical and horizontal movement combined impacted these variables.

The y-axis, vertical movement, is the induced vertical break. The induced vertical break is the vertical movement based on spin and velocity without gravity impacting it. A pitch cannot defy gravity and rise; however, some four-seam fastballs’ combination of spin and velocity generates enough vertical movement to create the optical illusion of rising. Two indicators show the pitch movement based on horizontal break (e.g., glove side or hand side). The point of view of the plot is from a right-handed pitcher facing home plate. Horizontal movement is different in this plot compared to other plots we created because of the different point of view. The thick black lines represent 0 inches of movement for each axis. 0 inches of movement is a pitch that is thrown directly straight. The dashed red line on the plots represents the average movement over the three seasons. The vertical dashed line represents the average horizontal break and the horizontal line is the average induced vertical break.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hexbin Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Hex Plots’ Key Takeaways:

Before making these visualizations, we expected that most, if not all, pitches with above-average vertical movement would excel in whiff percentage, xwOBA, and Stuff+. However, pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year. Specifically, the fourth quadrant (the bottom right of above-average horizontal movement and below-average vertical movement) generates the best results of the four quadrants.

Through the relationship and hexbin plots, we saw that the importance of each physical characteristic for determining the effectiveness of a four-seam fastball was changing year to year. This made us wonder if the Stuff+ model was adjusting as well. For this, we observed the relationship between each pitch physical characteristic individually and Stuff+. Like the outcome variables, we initially looked at each pitch physical characteristic individually and then observed vertical and horizontal movement simultaneously.

Key Takeaways:

Stuff+ accounts for the various physical characteristics of a pitch differently each year, particularly velocity. The change from year-to-year, however, appears to be less for Stuff+ than for whiff percentage and expected on-base percentage. This indicates that there is some bias in the way Stuff+ is measured.

Key Takeaways:

Stuff+ accounts for the various physical characteristics of a pitch differently each year, particularly velocity. The change from year-to-year, however, appears to be less for Stuff+ than for whiff percentage and expected on-base percentage. This indicates that there is some bias in the way Stuff+ is measured.

Key Takeaways:

Stuff+ accounts for the various physical characteristics of a pitch differently each year, particularly velocity. The change from year-to-year, however, appears to be less for Stuff+ than for whiff percentage and expected on-base percentage. This indicates that there is some bias in the way Stuff+ is measured.

Movement & Stuff+

Key Takeaways:

Despite our previous observations that pitches with above-average horizontal movement have generated better-than-average whiff rates and xwOBA each year, Stuff+ still ranks the “best” fastballs as the pitches with excellent vertical movement and the fourth quadrant is mostly average or below-average. Visually, there appears to be some bias in Stuff+.

Important Question

From both types of plots, we concluded that the Stuff+ model was adjusting how it valued each physical characteristic of a pitch year-to-year. The question then became, “How well is the model adjusting?”.

Methods

Linear Models

We addressed the question by fitting linear models for each outcome variable: Stuff+, whiff percentage, and xwOBA. Specifically, we modeled each outcome as a function of the season and each physical characteristic, including an interaction term between the season and the physical characteristic. For example, our models took the form of lm(Stuff+ ~ season + horizontal_break + season * horizontal_break), and similarly for other physical characteristics and outcome variables.

This approach allowed us to understand how changes in pitch characteristics influenced Stuff+, whiff percentage, and xwOBA over different seasons. Including the interaction term was crucial, as it provided insights into how the relationship between pitch characteristics and the outcomes varied year-to-year, rather than just showing overall trends.

Once our linear models were fitted, we plotted the coefficients of the various pitch physical characteristics in 2021, 2022, and 2023 to evaluate how each additional unit of the characteristic impacted Stuff+ compared to whiff percentage and expected weighted on-base average.