Sam Speaking

About The Conference

The Carnegie Mellon Sports Analytics Conference is an annual event dedicated to highlighting the latest sports research from the statistics and data science community.


Stay tuned for more information about the upcoming #CMSAC. Check out our 2019, 2020, 2021, and 2022 conferences.


Registration

Register now!

You can register to attend #CMSAC in-person or virtual. Virtual attendees will be able to attend the workshop and speaker events via a zoom webinar. While virtual attendees will able to ask questions, priority will be given to in-person attendees. Additionally, in-person attendees will have access to the poster session and networking opportunities throughout the conference. We will provide printed name tags for in-person participants that register before Nov 3rd (handwritten name tags will be available at check-in). Click the button above to register and see below for pricing.

In-Person Registration (SOLD-OUT 10/27)

  • High School / Undergrad / Grad students Conference: $15 (with school ID)
  • High School / Undergrad / Grad students Workshop: $10 (with school ID)
  • High School / Undergrad/ Grad students Conference + Workshop: $20 (with school ID)
  • Non-students Conference: $25
  • Non-students Workshop: $20
  • Non-students Conference + Workshop: $30

Virtual Attendance Registration (limited access to networking opportunities)

  • High School / Undergrad / Grad student Conference + Workshop: FREE (with school ID)
  • Non-student conference: $5
  • Non-student workshop: $5
  • Non-students Conference + Workshop: $8
Registering indicates agreement to abide by the Code of Conduct .

Hotel information

We have room blocks with Hilton Garden Inn and The Oaklander Hotel (reserve now with the respective links).

Conference Location

Carnegie Mellon University
Giant Eagle Auditorium
4909 Frew St, Pittsburgh, PA 15213
From PIT Airport

1. Head northeast on Airport Blvd
2. Keep left to stay on Airport Blvd - 0.6 mi
3. Keep left to stay on Airport Blvd - 0.7 mi
4. Continue straight to stay on Airport Blvd - 0.2 mi
5. Keep left at the fork, follow signs for
I-376 E/I-79 E/Pittsburgh/Pennsylvania Turnpike E and
merge onto I-376 E - 0.6 mi
6. Merge onto I-376 E - 16.4 mi
7. Keep right to stay on I-376 E - 2.1 mi
8. Take exit 72A to merge onto Forbes Ave toward Oakland - 0.3 mi
9. Merge onto Forbes Ave - 1.0 mi
10. Turn right onto Schenley Drive Extension - 449 ft
11. Turn left onto Schenley Drive - 0.2 mi
12. Turn left onto Frew St 0.2 mi
13. Destination will be on the left


Schedule Details

Big Data Bowl Workshop in Giant Eagle Auditorium, led by Quang Nguyen

  • 4:30 PM

    Check-in and food (pizza to be provided)

  • 5 PM

    Access slides for workshop here

Conference sessions in Giant Eagle Auditorium (times are subject to change)

  • 8:00 AM

    Registration / check-in

  • 8:50 AM

    Welcome and Opening Remarks

    CMU Statistics & Data Science
  • 9:00 AM

    Keynote Address: Sarah Rudd

    src|ftbl
  • 9:45 AM

    Coffee Break

  • 10:00 AM

    Mathematical Models of Player Contract Valuation

    Albert Cohen, Michigan State University
  • 10:30 AM

    Sources of bias in diving competitions: prevalence and adjustments

    Monnie McGee, Southern Methodist University
  • 11:00 AM

    Coffee Break

  • 11:20 AM

    Undergrad SMT Data Challenge Finalist:

    Evaluating Player Relationships in Stolen Base Defense

    Jackson Balch, Isaac Blumhoefer, Kai Franke, Jack Rogers
  • 11:35 AM

    Undergrad SMT Data Challenge Finalist:

    OPTIMAL APPROACH FOR INFIELDERS ON STOLEN BASE ATTEMPTS

    Jonah Lubin
  • 11:50 AM

    Undergrad SMT Data Challenge Finalist:

    A BOTTOM-UP APPROACH TO EVALUATING INFIELDER RANGE ON GROUND BALLS

    Ethan Park
  • 12:05 PM

    Poster Previews

    Poster Presenters
  • 12:15 PM

    Lunch and Poster Session

  • 1:20 PM

    CMSACamp Students Talk:

    Clustering Racehorse Movement Profiles to Discover Trends in Injured Horses

    Sara Colando, Jonathan Pipping, Kristopher Wilson
  • 1:35 PM

    CMSACamp Students Talk:

    To Bet or Not To Bet?

    Fungai Jani, Maria Tsakalakos, Tseegi Nyamdorj
  • 1:50 PM

    CMSACamp Students Talk:

    Deal or No Deal? An NBA Recommender System for Team Composition and Salary Optimization

    Mathew Chandy, Leo Cheng, Lauren Okamoto
  • 2:05 PM

    Break

  • 2:20 PM

    CMSACamp Students Talk:

    The Art of Sequencing: Utilizing Inter-Pitch Dynamics to Enhance Pitch Evaluation in Major League Baseball

    Ethan Park, Evan Wu, Priyanka Kaul
  • 2:35 PM

    CMSACamp Students Talk:

    Examining the Decline of Save Percentage in the NHL

    Quinn Robnett, Luke Welsh
  • 2:50 PM

    CMSACamp Students Talk:

    Draw 2: Identifying Key Players in Drawing NHL Power Plays

    Katherine Gong, Bethany Gonzalez
  • 3:05 PM

    Break

  • 3:15 PM

    From Fixed to Fluid: A Model for Frame-by-Frame Player Role Classification

    Devin Pleuler, Maple Leaf Sports & Entertainment Partnership
  • 3:45 PM

    Break

  • 3:55 PM

    Grad SMT Data Challenge Finalist:

    Analyzing Arm Strength’s Impact on Minor-League Position Player Promotion

    Dakota Olson
  • 4:10 PM

    Grad SMT Data Challenge Finalist:

    Vacuums and Stone Hands: An Exploration of First Baseman Receiving

    Matt Nicholson
  • 4:25 PM

    Grad SMT Data Challenge Finalist:

    This Isn’t a Stretch: Quantifying Ball Acquisition Proficiency to Evaluate Fielders on Assisted Put-Outs

    David Awosoga, Daniel Hocevar, Jaden Majumdar, Aaron White
  • 4:40 PM

    Break / Voting

  • 4:55 PM

    SCORE Network Overview

    Rebecca Nugent
  • 5:10 PM

    Closing Keynote Address: Andrew Patton

    National Football League
  • 5:55 PM

    Awards and Closing Remarks

  • 6:15 to 7:30 PM

    Networking Reception (Baker Patio and coffee lounge)


Keynote Address

Sarah Rudd

Sarah Rudd

CTO and co-founder, src | ftbl

Biography:

Sarah is a pioneer of modern football analytics. After publishing one of the first papers on a pass value model, she joined StatDNA to head their analytics and software development. StatDNA was acquired by Arsenal in 2012, and Sarah spent the next decade leading Arsenal’s analytics department, one of the most advanced in world football. She worked for Microsoft as a software developer before entering football. Sarah studied environmental science and computer science at Columbia University and earned an MBA from the University of Washington. She is currently CTO and co-founder of src | ftbl.

Closing Keynote

Andrew Patton

Andrew Patton

Director of Player Health and Safety Data Science, NFL

Biography:

Andrew Patton is the Director of Player Health and Safety Data Science for the National Football League. He works with stakeholders and vendors across the league to conduct data driven research in a variety of domains to develop an ecosystem that improves decision-making to reduce injuries. Prior to the NFL, Andrew spent time working at the Centers for Disease Control (CDC) and Stats Perform in addition to time in healthcare and scientific consulting. In addition, he has significant experience working in human performance/strength and conditioning in collegiate athletics. Some of Andrew’s public portfolio includes front end work for DARKO and several other NBA focused web apps as well as articles in The Athletic and Nylon Calculus.


Andrew received his B.S. in Molecular Toxicology from UC Berkeley, his M.S. in Environmental Management from the University of San Francisco, and his PhD in Exposure Science and Environmental Epidemiology from Johns Hopkins Bloomberg School of Public Health. He continues to remain active in academia as an Adjunct Professor at the University of San Francisco and as an advisor to the Yuma Center for Excellence in Desert Agriculture at the University of Arizona.

Invited Talks

Albert Cohen

Albert Cohen

Michigan State University

Albert Cohen is the interim Director of the Actuarial Science Program at Michigan State University, in addition to directing MSU’s Graduate Certificate in Sports Analytics. Albert received his Ph.D. from Carnegie Mellon University in 2007 under the supervision of David Kinderlehrer, and his research focuses on the interaction between probability and science, sports, and financial economics.


This focus began with his work on a stochastic approach to the coarsening of cellular networks, and has since branched out to develop new risk measures using stochastic optimal control. This approach has also led to models of online auction behavior, the structural modeling of default bond pricing under the incorporation of recovery processes, and (recently) the use of tools commonly found in risk management to analyze challenges in player contract valuation.

Mathematical Models of Player Contract Valuation

How does a team properly compensate a player for their on-field efforts? In this brief talk, we will outline foundational tools first proposed by Scully (with the onset of MLB free agency in the 1970’s) and expanded upon by Rockerbie and others. We will carry out two pricing examples, and no previous quantitative finance or actuarial tools are needed. In fact, a little bit of multi variable calculus and a desire to apply it to sports is all that is needed to follow this talk. We hope that audience members will want to follow up the presentation with further work in contract pricing within sports analytics.

Monnie McGee

Monnie McGee

Southern Methodist University

Monnie McGee is an Associate Professor in biostatistics and bioinformatics in the Department of Statistics and Data Science and SMU in Dallas, Texas. She earned a B.A. in Mathematics and English from Austin College in Sherman, Texas, and M.A. and Ph.D. degrees in Statistics from Rice University. She specializes in analysis of high-throughput biological data and is an expert in background correction, normalization, and hypothesis testing for these techniques. She is also interested in sports analytics, particularly analysis of “individual-team” sports like track & field, swimming & diving, and gymnastics. For all the applications previously mentioned, she has applied and developed methods for compositional data analysis, particularly those methods using the Nested Dirichlet Distribution and related distributions. Dr. McGee has taught and developed many courses in statistics at the graduate and undergraduate levels, including proposing several full programs at all academic levels.

Sources of bias in diving competitions: prevalence and adjustments

Sports such as diving, gymnastics, and ice skating rely on expert judges to score athletic performance during competition. To assure that scores are fair and comparable, judges are trained to follow a set of standards for various components of a routine and align their scores to these standards. However, eliminating subjectively is impossible even for the most conscientious and experienced judges. Nationalistic bias, where a judge from a particular country favors participants from his or her country (while disfavoring others), is the most infamous type of bias, but it is not the only kind of bias. Other more subtle forms of bias include difficulty bias, where athletes will more difficult routines receive better scores, and round bias, where scores increase or decrease during the course of a competition. In order to determine the prevalence of various types of bias in diving competitions, we have assembled results from divemeets.com, a database of most diving competitions, from youth to college, in the US. In this talk, we present the results of our analysis and provide adjustments for such types of bias when necessary.

Devin Pleuler

Devin Pleuler

Maple Leaf Sports & Entertainment Partnership

Devin is the Senior Director of Research and Development for Team Operations at MLSE, with leadership roles in both the Sport Performance Lab and SportsX programs. These groups are tasked with executing strategic research initiatives to develop competitive advantage for MLSE teams on the court, field, and ice. Prior to this cross-sport role, Devin led the analytics department at Toronto FC for 8 seasons, which included winning an MLS Championship in 2017. With Toronto FC, Devin led the integration of analytics and data science into opposition analysis, player recruitment, and performance forecasting. Before joining Toronto FC, Devin worked for Opta Sports, Major League Soccer, and as a soccer coach. Devin has a technical background in both Computer Science and Goalkeeping and is a regular contributor to the sports analytics academic community.

From Fixed to Fluid: A Model for Frame-by-Frame Player Role Classification

Traditional perspectives on soccer formations evoke images of rigid, symmetrical lines of players categorized into different nominal positions. Like a chess board. However, the contemporary demands of the sport necessitate that players dynamically adapt, often assuming multiple roles within the span of a single game. This phenomenon is accentuated by an emerging trend of teams deploying asymmetrical formations, with player rotations built directly into the game plan. A case in point: a left defender may occasionally required to position themselves as a left winger when the left right winger pinches inward to behave as an attacking central midfielder. In sophisticated systems, these sort of adjustments often trigger a cascading re-categorization across the field. This rapid positional interchanging, taking place within a dynamically evolving setting, poses challenges for cataloging and analysis. Gaining insights into such rotations is critical, especially for opposition analysis and in-game tactical adaptations. Currently, this is primarily achieved through traditional video analysis. Harnessing the potential of player tracking data, we have engineered a model that achieves frame-by-frame role classification for individual players. This approach discerns the frequency at which a player adheres to their assigned position but also deciphers the plethora of roles they transition into amidst the game's inherent chaos.

Big Data Bowl Workshop Speaker

Quang Nguyen

Quang Nguyen

Carnegie Mellon University

Biography:

Quang Nguyen is a second year PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. His current research is on network analysis, and he is also interested in applications of statistics and machine learning in sports with focus on player tracking data. Quang previously received his MS in Applied Statistics from Loyola University Chicago and BS in Mathematics and Data Science from Wittenberg University in Springfield, Ohio. He is an avid supporter of Manchester United.

CMSACamp 2023 Student Speakers


Sara Colando

Sara Colando

Pomona College

Sara is a senior at Pomona College majoring in Mathematics and Philosophy, specifically focusing on Statistics and Ethics. She is deeply interested in how we can combine statistics and ethics to help inspire good decision-making practices. After graduation, she plans to pursue a Ph.D. in Statistics and supplemental graduate coursework in Ethics. Outside of academics, she also enjoys running and is a member of Pomona-Pitzer's Varsity Cross Country and Track and Field teams.

Clustering Racehorse Movement Profiles to Discover Trends in Injured Horses

Between 2009 and 2021, over 7,200 horses died or were euthanized due to racing-related injuries (Fobar, 2023). Using horse profile data and horse tracking data from the NYRA, we identified horses who under-raced between 2019 and 2021 and clustered movement profiles for horses who raced in 2019 New York races to discover if certain profiles were more associated with injured horses. By fitting a negative binomial model on horse profile data and performing residual analysis, we discovered that at least 251 horses under-raced between 2019 and 2021. Additionally, clustering horse movement profiles revealed that a horse’s speed profile is most associated with its injury status; specifically, greater variation in speed is more associated with injury.

Jonathan Pipping

Jonathan Pipping

University of Florida

Jonathan Pipping is a senior at the University of Florida studying Statistics and Economics. In addition to his coursework, Jonathan serves as the VP of Projects for the university's Statistics Club and is actively working towards two undergraduate theses in his fields of interest. In his free time, he enjoys playing music, working on data challenges, and watching football with friends. After graduation, he plans to complete a Ph.D. in Statistics and Data Science before pursuing a career in sports, biomedical, or economic research.

Clustering Racehorse Movement Profiles to Discover Trends in Injured Horses

Between 2009 and 2021, over 7,200 horses died or were euthanized due to racing-related injuries (Fobar, 2023). Using horse profile data and horse tracking data from the NYRA, we identified horses who under-raced between 2019 and 2021 and clustered movement profiles for horses who raced in 2019 New York races to discover if certain profiles were more associated with injured horses. By fitting a negative binomial model on horse profile data and performing residual analysis, we discovered that at least 251 horses under-raced between 2019 and 2021. Additionally, clustering horse movement profiles revealed that a horse’s speed profile is most associated with its injury status; specifically, greater variation in speed is more associated with injury.

Kristopher Wilson

Kristopher Wilson

North Carolina State University

Kristopher Wilson is a senior at North Carolina State University (NC State) where he is majoring in Statistics with a minor in Industrial Engineering. His current research interests include social and behavioral science, sports analytics, and high-dimensional data, among others. At NC State, he is an officer of the Sports Analytics Club, and an active participant in the Grappling Club. After graduation, Kris plans to pursue a Ph.D. in Statistics. Outside of the classroom, he is an avid Atlanta sports fan and music enthusiast.

Clustering Racehorse Movement Profiles to Discover Trends in Injured Horses

Between 2009 and 2021, over 7,200 horses died or were euthanized due to racing-related injuries (Fobar, 2023). Using horse profile data and horse tracking data from the NYRA, we identified horses who under-raced between 2019 and 2021 and clustered movement profiles for horses who raced in 2019 New York races to discover if certain profiles were more associated with injured horses. By fitting a negative binomial model on horse profile data and performing residual analysis, we discovered that at least 251 horses under-raced between 2019 and 2021. Additionally, clustering horse movement profiles revealed that a horse’s speed profile is most associated with its injury status; specifically, greater variation in speed is more associated with injury.

Fungai Jani

Fungai Jani

College of Wooster

I, Fungai Jani, am a senior at the College of Wooster, double majoring in data science and business economics. I enjoy working on data science, specifically sports visualizations and machine learning projects. I am deeply passionate about sports and all the insights that can be uncovered from data to help a franchise reach the next level. After graduation, I plan to pursue a career in data science. I am an avid basketball and soccer fan. I enjoy watching and playing any sport in my free time.

To Bet or Not To Bet?

In recent years, there has been an increase in bets placed on sporting events, especially in soccer. Soccer is the biggest sport in the world, attracting global attention, which puts pressure on betting companies. To build upon the already existing foundation of betting odds, we enriched our predictive models by adding two key components: player evaluation metrics and the teams' recent form based on their performances over the last five matches. Our primary predictive models initially rely on betting odds to forecast match results, constrained by the nature of the dependent variable, an ordered categorical variable representing match outcomes. This leads us to employ Generalized Additive Models (GAMs), Multinomial regressions, and Random Forest algorithms. Subsequently, we augment these base models with four independent variables: streak difference, expected goals (XGs) for the home team, XGs difference, and XGs difference based on defense and offense ratings of home/away teams. The incorporation of team ratings derived from player evaluation metrics forms a central objective of our project. To evaluate the accuracy of our predictive models, we employ the Brier score, akin to mean squared error for predicted probabilities. Our findings reveal that the GAMs model, particularly when incorporating XGs based on defense and offense ratings, consistently yields the lowest Brier scores, indicating superior performance. Conversely, models of the Random Forest type exhibit notably higher Brier scores. The results showcase significant improvements over standard betting probabilities, illuminating the potential of player and team evaluation metrics. Nevertheless, inherent limitations persist, stemming from the unpredictability of soccer matches, including factors such as last-minute injuries, weather, and unforeseen tactical adjustments. Moreover, the data's specificity to a single season underscores the need for caution when applying these findings to future seasons with potentially distinct trends and dynamics.

Maria Tsakalakos

Maria Tsakalakos

Emory University

Maria Tsakalakos is a junior at Emory University, where she is currently enrolled in the Goizueta Business School. She is concentrating her studies in Information Systems and Operational Management, with a minor in Data Science. Maria is highly engaged in sports both on and off campus. She is an active member of the Emory Women's Club soccer team and is also formally associated with the Emory Oxford varsity soccer team. Data science is a newly discovered passion to her and after graduation, Maria aspires financial or data analyst in the sports industry.

To Bet or Not To Bet?

In recent years, there has been an increase in bets placed on sporting events, especially in soccer. Soccer is the biggest sport in the world, attracting global attention, which puts pressure on betting companies. To build upon the already existing foundation of betting odds, we enriched our predictive models by adding two key components: player evaluation metrics and the teams' recent form based on their performances over the last five matches. Our primary predictive models initially rely on betting odds to forecast match results, constrained by the nature of the dependent variable, an ordered categorical variable representing match outcomes. This leads us to employ Generalized Additive Models (GAMs), Multinomial regressions, and Random Forest algorithms. Subsequently, we augment these base models with four independent variables: streak difference, expected goals (XGs) for the home team, XGs difference, and XGs difference based on defense and offense ratings of home/away teams. The incorporation of team ratings derived from player evaluation metrics forms a central objective of our project. To evaluate the accuracy of our predictive models, we employ the Brier score, akin to mean squared error for predicted probabilities. Our findings reveal that the GAMs model, particularly when incorporating XGs based on defense and offense ratings, consistently yields the lowest Brier scores, indicating superior performance. Conversely, models of the Random Forest type exhibit notably higher Brier scores. The results showcase significant improvements over standard betting probabilities, illuminating the potential of player and team evaluation metrics. Nevertheless, inherent limitations persist, stemming from the unpredictability of soccer matches, including factors such as last-minute injuries, weather, and unforeseen tactical adjustments. Moreover, the data's specificity to a single season underscores the need for caution when applying these findings to future seasons with potentially distinct trends and dynamics.

Tseegi Nyamdorj

Tseegi Nyamdorj

Smith College

Tseegi is a junior at Smith College double majoring in Mathematical Statistics and Computer Science. She is passionate about developing statistical inferences and machine learning to tackle real-life problems. Her interest in community development, sports analytics, and computer science have culminated in various types of projects and experiences. After graduation, she wishes to work in either sports analytics companies/teams or non-governmental organizations.

To Bet or Not To Bet?

In recent years, there has been an increase in bets placed on sporting events, especially in soccer. Soccer is the biggest sport in the world, attracting global attention, which puts pressure on betting companies. To build upon the already existing foundation of betting odds, we enriched our predictive models by adding two key components: player evaluation metrics and the teams' recent form based on their performances over the last five matches. Our primary predictive models initially rely on betting odds to forecast match results, constrained by the nature of the dependent variable, an ordered categorical variable representing match outcomes. This leads us to employ Generalized Additive Models (GAMs), Multinomial regressions, and Random Forest algorithms. Subsequently, we augment these base models with four independent variables: streak difference, expected goals (XGs) for the home team, XGs difference, and XGs difference based on defense and offense ratings of home/away teams. The incorporation of team ratings derived from player evaluation metrics forms a central objective of our project. To evaluate the accuracy of our predictive models, we employ the Brier score, akin to mean squared error for predicted probabilities. Our findings reveal that the GAMs model, particularly when incorporating XGs based on defense and offense ratings, consistently yields the lowest Brier scores, indicating superior performance. Conversely, models of the Random Forest type exhibit notably higher Brier scores. The results showcase significant improvements over standard betting probabilities, illuminating the potential of player and team evaluation metrics. Nevertheless, inherent limitations persist, stemming from the unpredictability of soccer matches, including factors such as last-minute injuries, weather, and unforeseen tactical adjustments. Moreover, the data's specificity to a single season underscores the need for caution when applying these findings to future seasons with potentially distinct trends and dynamics.

Mathew Chandy

Mathew Chandy

University of Connecticut

Mathew Chandy is a senior at the University of Connecticut, where he is double majoring in Statistics and Statistical Data Science, and minoring in Computer Science and Economics. His research interests include methods research, econometrics, and of course, sports analytics. He provides web support for the Connecticut Data Science Lab, which organizes the UConn Sports Analytics Symposium. Additionally, he is a co-founder of the UConn Joint Statistical Club. After graduating, he hopes to pursue a PhD in Statistics. He is very passionate about sports, and he especially enjoys watching and playing basketball.

Deal or No Deal? An NBA Recommender System for Team Composition and Salary Optimization

In the dynamic ecosystem of professional basketball, the ability to discern the true value of a player often lies at the intersection of performance on the court and the economic dynamics of the sport. By venturing beyond traditional evaluation metrics, we seek to shed light on the nuances of the game and offer insights that could redefine team strategies. Our investigation delves into whether player compensation aligns with their performance, revealing potential hidden gems. Recognizing that basketball is a team sport, we also aim to identify optimal player combinations for a hypothetical team. This challenge has been tackled by previous methods, but we seek to combine the two team goals of positive surplus value and complimentary playstyles into a single Shiny App. Through this project, we hope to offer a demonstration of a salary surplus model for an individual and an expected points model for a five-man lineup. We believe this demonstration can help teams make decisions about team-building and salary allocation to maximize wins.

Leo Cheng

Leo Cheng

Carnegie Mellon University

Leo Cheng is a junior at Carnegie Mellon University, majoring in Economics and Statistics with a minor in Computer Science. He is passionate about the insight from data and his academic interest is to develop statistics, data science methodology, and computational simulations in data analytics. Beyond academia, he is a big fan of basketball and soccer. This enthusiasm in sports motivates his research in team composition & salary optimization at NBA level.

Deal or No Deal? An NBA Recommender System for Team Composition and Salary Optimization

In the dynamic ecosystem of professional basketball, the ability to discern the true value of a player often lies at the intersection of performance on the court and the economic dynamics of the sport. By venturing beyond traditional evaluation metrics, we seek to shed light on the nuances of the game and offer insights that could redefine team strategies. Our investigation delves into whether player compensation aligns with their performance, revealing potential hidden gems. Recognizing that basketball is a team sport, we also aim to identify optimal player combinations for a hypothetical team. This challenge has been tackled by previous methods, but we seek to combine the two team goals of positive surplus value and complimentary playstyles into a single Shiny App. Through this project, we hope to offer a demonstration of a salary surplus model for an individual and an expected points model for a five-man lineup. We believe this demonstration can help teams make decisions about team-building and salary allocation to maximize wins.

Lauren Okamoto

Lauren Okamoto

UC Berkeley

Lauren Okamoto is a junior at UC Berkeley studying Data Science and Cognitive Science. She is passionate about deriving meaning from data and using these insights to shape decision making. Her research interests include computational modeling of the mind, sports analytics, and the usage of generative AI in various applications. CMSAC along with her participation in Berkeley’s Sports Analytics Club has furthered her passion for sports analytics. Currently, she is interning for the United Soccer League and hopes to continue work in the sports industry following graduation.

Deal or No Deal? An NBA Recommender System for Team Composition and Salary Optimization

In the dynamic ecosystem of professional basketball, the ability to discern the true value of a player often lies at the intersection of performance on the court and the economic dynamics of the sport. By venturing beyond traditional evaluation metrics, we seek to shed light on the nuances of the game and offer insights that could redefine team strategies. Our investigation delves into whether player compensation aligns with their performance, revealing potential hidden gems. Recognizing that basketball is a team sport, we also aim to identify optimal player combinations for a hypothetical team. This challenge has been tackled by previous methods, but we seek to combine the two team goals of positive surplus value and complimentary playstyles into a single Shiny App. Through this project, we hope to offer a demonstration of a salary surplus model for an individual and an expected points model for a five-man lineup. We believe this demonstration can help teams make decisions about team-building and salary allocation to maximize wins.

Ethan Park

Ethan Park

University of Southern California

Ethan Park is a senior at the University of Southern California studying Computer Science and Data Science. Ethan is highly passionate about the application of data science in sports and works as a Data Analyst for the USC Men's Baseball Team and the USC Sports Business Association, Analytics Division and various personal projects. After graduation, he plans on pursuing a career in sports analytics for a front office.

The Art of Sequencing: Utilizing Inter-Pitch Dynamics to Enhance Pitch Evaluation in Major League Baseball

Pitch evaluation metrics such as Stuff+, Location+, and Pitching+, have added dimension to the evaluation of pitch types and are indicative of a growing focus on individual pitch characteristics. While the discrete qualities of a pitch (such as movement, velocity, and spin rate) are undoubtedly useful in evaluating and explaining elite pitchers, the effects that sequencing, tunneling, and in-game pitching strategy have on outcomes cannot be understated. Therefore, integrating inter-pitch dynamics with discrete pitch characteristics might more accurately model a pitcher’s effectiveness. Our motivation behind this project is to evaluate current pitch metrics and consider how inter-pitch dynamics can enhance existing models.

Evan Wu

Evan Wu

Elon University

I am from Pasadena, California and go to Elon University in North Carolina where I am majoring in Statistics. I have loved baseball my whole life and want to use numbers to help people better understand my favorite sport.

The Art of Sequencing: Utilizing Inter-Pitch Dynamics to Enhance Pitch Evaluation in Major League Baseball

Pitch evaluation metrics such as Stuff+, Location+, and Pitching+, have added dimension to the evaluation of pitch types and are indicative of a growing focus on individual pitch characteristics. While the discrete qualities of a pitch (such as movement, velocity, and spin rate) are undoubtedly useful in evaluating and explaining elite pitchers, the effects that sequencing, tunneling, and in-game pitching strategy have on outcomes cannot be understated. Therefore, integrating inter-pitch dynamics with discrete pitch characteristics might more accurately model a pitcher’s effectiveness. Our motivation behind this project is to evaluate current pitch metrics and consider how inter-pitch dynamics can enhance existing models.

Priyanka Kaul

Priyanka Kaul

Harvard University

Priyanka is a junior at Harvard University studying Applied Mathematics on the Computer Science track with a minor in Mind, Brain, and Behavior. She is an infielder on the Harvard Softball team and has always been curious about the intersection of athletics and analytics as an athlete. Priyanka is involved in Harvard Undergraduate Women in Business and works as a calculus course assistant. She is a diehard Yankees fan and enjoys watching college softball, baseball and football.

The Art of Sequencing: Utilizing Inter-Pitch Dynamics to Enhance Pitch Evaluation in Major League Baseball

Pitch evaluation metrics such as Stuff+, Location+, and Pitching+, have added dimension to the evaluation of pitch types and are indicative of a growing focus on individual pitch characteristics. While the discrete qualities of a pitch (such as movement, velocity, and spin rate) are undoubtedly useful in evaluating and explaining elite pitchers, the effects that sequencing, tunneling, and in-game pitching strategy have on outcomes cannot be understated. Therefore, integrating inter-pitch dynamics with discrete pitch characteristics might more accurately model a pitcher’s effectiveness. Our motivation behind this project is to evaluate current pitch metrics and consider how inter-pitch dynamics can enhance existing models.

Quinn Robnett

Quinn Robnett

Syracuse University

Quinn Robnett is a senior at Syracuse University majoring in Sport Analytics and minoring in Information management and Technology. At Syracuse, he is the Director of Media for the Hockey Analytics Club and previously worked as an Analytics Intern for the DI Field Hockey team. After completing his undergraduate degree, he plans to pursue a Master’s in Applied Data Science at Syracuse. He is motivated by a deep love for hockey and data, and his long-term career goal is to work in the analytics department of an NHL team.

Examining the Decline of Save Percentage in the NHL

Goaltending is of vital importance in the National Hockey League (NHL). A goaltender's job is simple: to stop the other team from scoring. A recent trend that has been observed by many people throughout the hockey community has been that shots are being saved at declining rates over the most recent 7 seasons. It was hard to believe that the collective goaltender talent declined across the league, so we began to search for the true causes. Through the analysis on the league's play-by-play data, which is available dating back to the 2011-12 season, we began to find multiple reasons for the observed change. We were successfully able to attribute the change to a couple reasons. We determined that the best goaltenders on teams were playing less minutes in more recent seasons in a trend referred to as "load management". Secondly, we found that shots were being taken more often from closer and more straight on angles. And lastly, by creating expected goal logistic regression models, we found that situational descriptions of shots were influencing shooters' success rates differently across the years.

Luke Welsh

Luke Welsh

University of Wisconsin - Madison

Luke Welsh is a senior at the University of Wisconsin - Madison, double majoring in Data Science and Economics. At Wisconsin, he is the founder of the sports analytics club, where he is able to spread his passions for the topic to his classmates. In his free time, he likes to keep busy hopping in to play any sport he can, as well as participating in his school's track, swing dancing, and log rolling clubs. After graduation, he is excited to begin his career working in data science, with aspirations to work for a sports team.

Examining the Decline of Save Percentage in the NHL

Goaltending is of vital importance in the National Hockey League (NHL). A goaltender's job is simple: to stop the other team from scoring. A recent trend that has been observed by many people throughout the hockey community has been that shots are being saved at declining rates over the most recent 7 seasons. It was hard to believe that the collective goaltender talent declined across the league, so we began to search for the true causes. Through the analysis on the league's play-by-play data, which is available dating back to the 2011-12 season, we began to find multiple reasons for the observed change. We were successfully able to attribute the change to a couple reasons. We determined that the best goaltenders on teams were playing less minutes in more recent seasons in a trend referred to as "load management". Secondly, we found that shots were being taken more often from closer and more straight on angles. And lastly, by creating expected goal logistic regression models, we found that situational descriptions of shots were influencing shooters' success rates differently across the years.

Katherine Gong

Katherine Gong

Mount Holyoke College

Katherine (Shaojun) Gong is a senior at Mount Holyoke College, pursuing a double major in Statistics and Critical Social Thought, with an anticipated graduation this December. Passionate about employing her analytical prowess and dedication to data-driven research, Katherine aspires to effectuate profound social change within vibrant, high-impact professional settings. Beyond her academic pursuits, she delights in watching figure skating and shines as an avid dancer. Ultimately, Katherine envisions herself contributing to the R&D department of a professional sports team, although she is also contemplating advancing her studies with a master's program in data science.

Draw 2: Identifying Key Players in Drawing NHL Power Plays

Penalties in hockey are crucial for game strategy and player assessments. However, traditional penalty-related metrics did not consider offsetting penalties that have no impact on team strength, nor do they accurately depict the link between these neutralizing penalties and power plays. Using play-by-play data, we proposed Power Play/60, a metric for evaluating a player's performance in drawing power plays, thereby enhancing the team's goal-scoring opportunities. We analyzed the past ten NHL seasons and identified the top players in drawing power plays. Furthermore, our study delves into a position-based and team-wise evaluation of power play drawing capability. These findings are presented through the Shiny App we developed, which provides an interactive interface for exploring our results. In our talk, we will discuss our summer work and share our updated research, including the evaluation of the types of penalties that lead to power plays and the temporal distribution of power-play goals.

Bethany Gonzalez

Bethany Gonzalez

University of Indianapolis

Bethany Gonzalez is a senior at the University of Indianapolis majoring in Mathematics with a minor in Data Science, anticipating graduation in December. She is a member of the Rho chapter of the Sigma Zeta national honor society at her university. Bethany is driven by her desire to explore and share the stories that data reveals to us and delights in applying this skill in the field of sports analytics. Following her graduation, she hopes to establish a career in data science and analytics, with a passion for integrating sports into her professional journey, an idea she would readily embrace.

Draw 2: Identifying Key Players in Drawing NHL Power Plays

Penalties in hockey are crucial for game strategy and player assessments. However, traditional penalty-related metrics did not consider offsetting penalties that have no impact on team strength, nor do they accurately depict the link between these neutralizing penalties and power plays. Using play-by-play data, we proposed Power Play/60, a metric for evaluating a player's performance in drawing power plays, thereby enhancing the team's goal-scoring opportunities. We analyzed the past ten NHL seasons and identified the top players in drawing power plays. Furthermore, our study delves into a position-based and team-wise evaluation of power play drawing capability. These findings are presented through the Shiny App we developed, which provides an interactive interface for exploring our results. In our talk, we will discuss our summer work and share our updated research, including the evaluation of the types of penalties that lead to power plays and the temporal distribution of power-play goals.


SMT Data Challenge 2023

Thank you to the all of the SMT Data Challenge Judges for their assistance in evaluating submissions in this year's competition! See below for information about the finalists in the year's competition:

Undergraduate Division Finalists

Jonah Lubin

Jonah Lubin

Rice University

Jonah is currently a junior at Rice University. He is double majoring in Sport Analytics and Sport Law, while double minoring in Data Science and Business Management. Jonah currently serves as the Student Director of Analytics for Rice Football, as he helps lead the analytics initiative for the program, where he covers a wide range of projects, including scouting reports, self-scouting analysis, and transfer portal analysis. Jonah is looking to work in the Analytics Department of an NFL team or for a Sports Analytics Company in the near future.

COPTIMAL APPROACH FOR INFIELDERS ON STOLEN BASE ATTEMPTS

This paper will cover the optimal approach that the shortstop and second baseman should take when a runner from first base steals second base. It will cover which infielder should cover second base, as well as what the positioning the other infielder should take. The determination of the optimal approach will be ultimately decided by run expectancy saved and allowed.

Ethan Park

Ethan Park

University of Southern California

Ethan Park is a senior at the University of Southern California studying Computer Science and Data Science. Ethan is highly passionate about the application of data science in sports and works as a Data Analyst for the USC Men's Baseball Team and the USC Sports Business Association, Analytics Division and various personal projects. After graduation, he plans on pursuing a career in sports analytics for a front office.

A BOTTOM-UP APPROACH TO EVALUATING INFIELDER RANGE ON GROUND BALLS

The most important aspect of an infielder’s defensive play is their ability to make outs on ground balls. I have created a method to capture this ability (which I simply call their "range") through the meticulous deconstruction of ground ball outs into three discrete phases: (1) contact-to-glove, (2) transfer and (3) throw, along with a method to estimate whether an infielder can reach a ground ball. My approach is interpretable and effective with clear applications in the evaluation of individual defensive ability. However, the most useful and unique application is for optimally positioning infielders based on the interaction of infielders’ ranges and batter tendencies.

Isaac Blumhoefer

Isaac Blumhoefer

University of Minnesota

Isaac Blumhoefer is a junior at the University of Minnesota where he is double majoring in Statistics and Computer Science. He is passionate about the intersection of statistical modeling and machine learning with baseball to produce new insights into the sport. At Minnesota, he is an officer in the Sports Analytics Club and loves to participate in pickleball, tennis, and soccer intramurals. After graduation, he plans to pursue a career in data science.

Evaluating Player Relationships in Stolen Base Defense

Catcher defense has been a widely researched topic in the baseball analytics community over the past few years. However, stolen base relationships, and the ability of a catcher to throw out potential base stealers are difficult to study without player tracking data. This project delves into the relationship between the catcher, pitcher, and middle infielder in generating an efficient defense against base-stealers. A GLM was used to predict the probability of a stolen base at four different stages: pitch release, catcher retrieval, catcher release, and middle infielder retrieval. These probabilities can then be used to tell the story of how defensive players combine in their attempt to throw out baserunners. An app was also made for coaches and players to view these relationships in action. Understanding this information can lead to better evaluation of players themselves, but deeper understanding about the mechanics of stolen base defense, and how player chemistry helps in these situations.

Kai Franke

Kai Franke

University of Minnesota

Kai, a junior at the University of Minnesota, is studying Statistical Science and actively contributes to the Golden Gopher baseball team as a student analyst. He works on projects like player dashboards, pitching/hitting models, and analytics interpretation. Kai also serves as Vice President of the Sports Analytics Club. Recently, he and Jack Rogers presented research on catcher blocking at Saberseminar 2023 and is a finalist in the SMT Data Challenge for the second time alongside Jack, Isaac Blumhoefer, and Jackson Balch. Kai's sports research focuses on baseball and hockey, and he's set to work as an R&D Intern with the Tampa Bay Rays next summer.

Evaluating Player Relationships in Stolen Base Defense

Catcher defense has been a widely researched topic in the baseball analytics community over the past few years. However, stolen base relationships, and the ability of a catcher to throw out potential base stealers are difficult to study without player tracking data. This project delves into the relationship between the catcher, pitcher, and middle infielder in generating an efficient defense against base-stealers. A GLM was used to predict the probability of a stolen base at four different stages: pitch release, catcher retrieval, catcher release, and middle infielder retrieval. These probabilities can then be used to tell the story of how defensive players combine in their attempt to throw out baserunners. An app was also made for coaches and players to view these relationships in action. Understanding this information can lead to better evaluation of players themselves, but deeper understanding about the mechanics of stolen base defense, and how player chemistry helps in these situations.

Jack Rogers

Jack Rogers

University of Minnesota

Jack Rogers is a senior in the honors program at the University of Minnesota studying Data Science with a minor in Geography. He is also the president of the UMN sports analytics club where students collaborate on research projects in the field. His research using open-source sports data has been presented at Saberseminar, The Ohio State Sports Analytics Conference, and UConn’s Sports Analytics Symposium. Last summer, Jack interned in the R&D department of the Washington Nationals and is currently seeking post-graduate opportunities in sports.

Evaluating Player Relationships in Stolen Base Defense

Catcher defense has been a widely researched topic in the baseball analytics community over the past few years. However, stolen base relationships, and the ability of a catcher to throw out potential base stealers are difficult to study without player tracking data. This project delves into the relationship between the catcher, pitcher, and middle infielder in generating an efficient defense against base-stealers. A GLM was used to predict the probability of a stolen base at four different stages: pitch release, catcher retrieval, catcher release, and middle infielder retrieval. These probabilities can then be used to tell the story of how defensive players combine in their attempt to throw out baserunners. An app was also made for coaches and players to view these relationships in action. Understanding this information can lead to better evaluation of players themselves, but deeper understanding about the mechanics of stolen base defense, and how player chemistry helps in these situations.

Jackson Balch

Jackson Balch

University of Minnesota

Jackson Balch is a junior at the University of Minnesota studying Statistical Sciences and Computer Science. He is deeply passionate about using data to drive decision-making and tell stories. After graduation, he plans to either pursue a Master's degree or a career in data science. Outside of class, Jackson works as a research assistant making public health, census, and survey data more accessible. In his free time, he is a founding member of the Sports Analytics Club, a solid intramural hockey defenseman, and an avid fan of Football, Hockey, and music.

Evaluating Player Relationships in Stolen Base Defense

Catcher defense has been a widely researched topic in the baseball analytics community over the past few years. However, stolen base relationships, and the ability of a catcher to throw out potential base stealers are difficult to study without player tracking data. This project delves into the relationship between the catcher, pitcher, and middle infielder in generating an efficient defense against base-stealers. A GLM was used to predict the probability of a stolen base at four different stages: pitch release, catcher retrieval, catcher release, and middle infielder retrieval. These probabilities can then be used to tell the story of how defensive players combine in their attempt to throw out baserunners. An app was also made for coaches and players to view these relationships in action. Understanding this information can lead to better evaluation of players themselves, but deeper understanding about the mechanics of stolen base defense, and how player chemistry helps in these situations.

Graduate Division Finalists

Dakota Olson

Dakota Olson

Arizona State University

Dakota Olson is a graduate student at Arizona State University in the Master of Computer Science program with a concentration in Big Data Systems and will graduate in December 2023. Dakota received his B.S. in Computer Science from the University of Minnesota - Twin Cities in 2020 and conducted research under Professor Jaideep Srivastava, which included modeling infectious disease spread with a flexible simulation system that utilized an agent-based modeling and stochastic network approach that was published in the ACM Transactions on Management Information Systems scholarly journal. Dakota is looking to start his career in sports analytics and hopes to be a successful contributor to winning championships.

Analyzing Arm Strength’s Impact on Minor-League Position Player Promotion

Arm strength is considered one of the five tools that make a position player a useful contributor to their team, but how much does the skill factor into the real minor league management decisions an organization makes? This analysis examines the impact arm strength had on position players being promoted in an organization’s minor league system. Estimated throwing speeds of 1,601 throws made by position players in the same organization were calculated using baseball position data collected during the throw with a simulation and optimization approach. In this analysis, the maximum estimated throwing speed was determined and used to represent the arm strength of 117 players in the same organization by position, year, and minor-league level to determine the impact arm strength had on promotion. Results indicated that arm strength had little to no correlation with a position player being promoted or demoted regardless of position as many players with the strongest arm strengths remained at the same minor-league level or appeared to have left the organization, while several players with the weakest arm strengths at their positions were promoted. This indicates that despite being an important skill and one of the five tools, the arm strength skill is being overlooked when determining which players to promote in the organization being analyzed.

Matt Nicholson

Matt Nicholson

University of Colorado Boulder

Matt Nicholson is a second-year PhD student in Information Science at the University of Colorado Boulder. His research focuses on how online communities coordinate and govern within themselves and between each other. Matt earned a BS in Computer Science and Mathematical Methods in the Social Sciences from Northwestern University, and prior to his time at CU, he worked as a business analyst. In his free time, Matt enjoys long-distance running, the television show Survivor and hosting dinner parties where the meal centers around a bad pun.

Vacuums and Stone Hands: An Exploration of First Baseman Receiving

In an era where the game is increasingly quantified, the receiving aspect of first baseman defense has been largely ignored. While particularly skilled (or lacking) play contributes to a fielder’s reputation, even advanced defensive metrics fail to explicitly consider the role of the first baseman in the assist. In this analysis, I describe what makes an out at first from the perspective of the first baseman. I find that bounced and offline throws are less likely to be outs, and that some players seem to be better at fielding these than others, lending credence to the conventional wisdom that some possess this skill. While the data are insufficient to compute full player-level first baseman receiving rankings, I demonstrate a viable framework for its evaluation. Finally, I discuss the role that first baseman receiving can take in player development and valuation, and suggest future extensions of this approach.

David Awosoga

David Awosoga

University of Waterloo

David Awosoga is a graduate student at the University of Waterloo pursuing a Master of Mathematics in Data Science. Passionate about the intersection between athletics and analytics, he takes great interest in deriving insights from data and presenting them in a way that is digestible and actionable for coaches, athletes, and fans. His current research aims to leverage spatiotemporal player tracking data to improve athlete performance evaluation metrics. Before transitioning to sports analytics, David co-authored several research publications in complex network analysis and graph algorithms in computer science. He has previously held data science and analytics positions at Canadian Sport Institute Ontario and the Edmonton Stingers of the Canadian Elite Basketball League, and actively contributes to open-source pages to give fans easier access to advanced metrics.

This Isn’t a Stretch: Quantifying Ball Acquisition Proficiency to Evaluate Fielders on Assisted Put-Outs

Player and ball tracking technologies such as Statcast [1] have greatly improved our ability to measure the defensive proficiency of fielders. However, certain positions have historically been harder to assess than others, necessitating continued development of new statistics. We propose that shortcomings in these evaluative frameworks come as a result of failing to acknowledge differences in the composition of each position’s defensive touches. Current methods focus overwhelmingly on independent batted ball acquisition and throwing abilities, but limited efforts have been made to evaluate a player’s ball-receiving ability from throws, aside from catchers. This is notable because over 80% of a first baseman’s total defensive touches, for example, come from catching the ball. Therefore, this paper will introduce a ball capture metric intended to improve defensive analysis of first basemen and be used as a stepping stone to quantify credit assignment on collaborative plays between fielders. This will augment existing metrics that evaluate fielding performance and be an improved discriminator of defensive ability, allowing us to make more comprehensive assessments of player value.

Daniel Hocevar

Daniel Hocevar

University of Toronto

Daniel Hocevar is a fourth-year student in Computer Science at the University of Toronto. With a background in leading and developing early-stage technology startups, Daniel brings a unique product-development perspective to the world of sports analytics. He is the co-author of the paper which won the 2023 NFL Big Data Bowl, and another which won the undergraduate division of the 2022 Stathletes Big Data Cup. Daniel got his start in sports research when he introduced the first widely accessible analytics tool for curling coaches in 2018. Currently, he serves as an executive with the rapidly growing University of Toronto Sports Analytics Student Group, playing a pivotal role in establishing the club as a leader in sports analytics research. In addition to his academic pursuits, Daniel is also the captain of the University of Toronto varsity men’s curling team, and currently competes as an athlete on the World Curling Tour.

This Isn’t a Stretch: Quantifying Ball Acquisition Proficiency to Evaluate Fielders on Assisted Put-Outs

Player and ball tracking technologies such as Statcast [1] have greatly improved our ability to measure the defensive proficiency of fielders. However, certain positions have historically been harder to assess than others, necessitating continued development of new statistics. We propose that shortcomings in these evaluative frameworks come as a result of failing to acknowledge differences in the composition of each position’s defensive touches. Current methods focus overwhelmingly on independent batted ball acquisition and throwing abilities, but limited efforts have been made to evaluate a player’s ball-receiving ability from throws, aside from catchers. This is notable because over 80% of a first baseman’s total defensive touches, for example, come from catching the ball. Therefore, this paper will introduce a ball capture metric intended to improve defensive analysis of first basemen and be used as a stepping stone to quantify credit assignment on collaborative plays between fielders. This will augment existing metrics that evaluate fielding performance and be an improved discriminator of defensive ability, allowing us to make more comprehensive assessments of player value.

Jaden Majumdar

Jaden Majumdar

University of Toronto

Jaden Majumdar is a third year Engineering Science student at the University of Toronto majoring in Energy Systems. He is an avid sports fan and an executive member of his school’s sports analytics club, UTSPAN. Jaden was previously a finalist for the LINHAC student competition and hopes to apply his engineering knowledge and problem solving skills to improve the interpretability of sports analytics metrics, enhance athlete performance, and mitigate injury risk.

This Isn’t a Stretch: Quantifying Ball Acquisition Proficiency to Evaluate Fielders on Assisted Put-Outs

Player and ball tracking technologies such as Statcast [1] have greatly improved our ability to measure the defensive proficiency of fielders. However, certain positions have historically been harder to assess than others, necessitating continued development of new statistics. We propose that shortcomings in these evaluative frameworks come as a result of failing to acknowledge differences in the composition of each position’s defensive touches. Current methods focus overwhelmingly on independent batted ball acquisition and throwing abilities, but limited efforts have been made to evaluate a player’s ball-receiving ability from throws, aside from catchers. This is notable because over 80% of a first baseman’s total defensive touches, for example, come from catching the ball. Therefore, this paper will introduce a ball capture metric intended to improve defensive analysis of first basemen and be used as a stepping stone to quantify credit assignment on collaborative plays between fielders. This will augment existing metrics that evaluate fielding performance and be an improved discriminator of defensive ability, allowing us to make more comprehensive assessments of player value.

Aaron White

Aaron White

University of Toronto

Aaron is a third-year undergraduate student at the University of Toronto majoring in statistics with minors in mathematics and GIS. At UofT, he is the vice president of the Sports Analytics Student Group (UTSPAN). He was part of the championship-winning teams for the 2023 NFL Big Data Bowl and 2022 Big Data Cup. He plans to pursue a career in sports analytics, although he is also considering graduate studies in the field of data science. Also, Aaron is a superfan of Toronto’s sports teams.

This Isn’t a Stretch: Quantifying Ball Acquisition Proficiency to Evaluate Fielders on Assisted Put-Outs

Player and ball tracking technologies such as Statcast [1] have greatly improved our ability to measure the defensive proficiency of fielders. However, certain positions have historically been harder to assess than others, necessitating continued development of new statistics. We propose that shortcomings in these evaluative frameworks come as a result of failing to acknowledge differences in the composition of each position’s defensive touches. Current methods focus overwhelmingly on independent batted ball acquisition and throwing abilities, but limited efforts have been made to evaluate a player’s ball-receiving ability from throws, aside from catchers. This is notable because over 80% of a first baseman’s total defensive touches, for example, come from catching the ball. Therefore, this paper will introduce a ball capture metric intended to improve defensive analysis of first basemen and be used as a stepping stone to quantify credit assignment on collaborative plays between fielders. This will augment existing metrics that evaluate fielding performance and be an improved discriminator of defensive ability, allowing us to make more comprehensive assessments of player value.


Poster Abstract Submission



CALL FOR POSTER ABSTRACTS

In an effort to foster intellectual growth and discovery among the statistics and data science community, we gladly welcome research submissions from the public.

Submit your research project to present your work as a poster using the form by October 6th. Note that there are limited spaces available, and abstracts for posters will be accepted on a rolling basis until slots are filled. Final acceptance notifications will be sent out by mid-October.

Here's a recap of important dates and requirements to remember:

- October 6th: Abstract submission deadline.
- Abstracts will be selected on a rolling basis, final notification by mid-October.
- Accepted abstracts will present during poster session on Saturday Nov 11th with entry into Poster Competition (cash prize awards).

Our Sponsors

Networking Sponsor

AvenueFour

Workshop Sponsor

Sumer

Coffee Break Sponsor

Brewers

Coffee Break Sponsor

Penguins

Lunch Break Sponsor

Pirates

Coffee Break Sponsor

Dodgers

Coffee Break Sponsor

Reds

Poster Competition Sponsor

and Support for Student Speaker Travel

Zelus

Supporting Sponsor

Astros

Supporting Sponsor

Panthers

Please see the sponsorship form for more information about sponsorship opportunities, and contact cmsac@stat.cmu.edu if you have any questions.


Contact Us

The Carnegie Mellon Sports Analytics Conference is proudly hosted by the Department of Statistics & Data Science.


CMSAC Program Committee:

Carnegie Mellon Sports Analytics Club Executives
  • Mihir Mathur (President)
  • Josh Winick (Operations)
  • Lay Len Ching (Marketing)
  • Miranda Herrera (Outreach)
Questions can be directed to cmsac@stat.cmu.edu.

CMSAC Activities Conduct Policy

(modeled on the ASA Activities Conduct Policy approved November 30, 2018 by American Statistical Association Board of Directors)

The Carnegie Mellon Sports Analytics Conference (CMSAC) is committed to providing an atmosphere in which personal respect and intellectual growth are valued and the free expression and exchange of ideas are encouraged. Consistent with this commitment, it is CMSAC policy that all participants in CMSAC activities enjoy a welcoming environment free from unlawful discrimination, harassment, and retaliation. We strive to be a community that welcomes and supports people of all backgrounds and identities. This includes, but is not limited to, members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.

All CMSAC participants —including, but not limited to, attendees, statisticians, data scientists, sports analysts, students, registered guests, staff, contractors, sponsors, exhibitors, and volunteers —in the conference or any other related activity—whether official or unofficial—agree to comply with all rules and conditions of the activities. Your registration for or attendance at the 2023 Carnegie Mellon Sports Analytics Conference indicates your agreement to abide by this policy and its terms.


Expected Behavior

- Model and support the norms of professional respect necessary to promote the conditions for healthy exchange of scientific ideas.

- Speak and conduct yourself professionally; do not insult or disparage other participants.

- Be conscious of hierarchical structures in the sports analytics and/or broader statistics/data science community, specifically the existence of stark power differentials among students, junior analysts/statisticians, and senior analysts/statisticians—noting that fear of retaliation from those in senior-level positions can make it difficult for students or those in junior level positions to express discomfort, rebuff unwelcome advances, and report violations of the conduct policy.

- Be sensitive to body language and other non-verbal signals and respond respectfully.


Unacceptable Behavior

- Violent threats or language directed against another person

- Discriminatory jokes and language

- Inclusion of unnecessary sexually explicit, violent, or otherwise sensitive materials in presentations

- Posting (or threatening to post), without permission, other people’s personally identifying information online, including on social networking sites

- Personal insults including, but not limited to, those using racist, sexist, homophobic, or xenophobic terms

- Unwelcome solicitation of emotional or physical intimacy such as sexual advances; propositions; sexual flirtations; sexually-related touching; and graphic gestures or comments about sex or another person’s dress, body, or sexual activities

- Advocating for, encouraging, or dismissing the severity of any of the above behaviors.


Consequences of Unacceptable Behavior

At the sole discretion of the CMSAC Program Committee, unacceptable behavior may result in removal from or denial of access to meeting facilities or activities, without refund of any applicable registration fees or costs. In addition, the CMSAC reserves the right to report violations to an individual’s employer or institution or to a law-enforcement agency. Those engaging in unacceptable behavior may also be banned from future CMSAC activities or face additional penalties.


What to Do if You Witness or Are Subject to Unacceptable Behavior

If you are being harassed, notice that someone else is being harassed, or have any other concerns relating to harassment, please contact a member of the CMSAC program committee either in person or at cmsac@stat.cmu.edu. If you witness potential harm to a conference participant, be proactive in helping to mitigate or avoid that harm; if you see or hear something that concerns you, please say something.


Process for Adjudicating Reports of Misconduct

The CMSAC will contract with an independent entity to manage and adjudicate reported violations of the conduct policy.


Note: This Code of Conduct may be revised at any time by the Carnegie Mellon Sports Analytics Conference. Questions, concerns, or comments should be directed to cmsac@stat.cmu.edu.