CMSAC : Home

About The Conference

Now in its fourth year, the Carnegie Mellon Sports Analytics Conference is dedicated to highlighting the latest sports research from the statistics and data science community.

Interested in presenting your research at CMSAC? Submit an abstract using the form below! And if you are using publicly available data then consider entering our third annual Reproducible Research Competition!

Stay tuned for more information about the upcoming #CMSAC20. Check out our 2017, 2018, and 2019 conferences.

Registration is Sold-out!

Registration (until Oct 24th)

Students: $20
Non-students: $50

Registering indicates agreement to abide by the Code of Conduct .

Schedule Details

All times displayed are in EDT.

11:00 AM

Welcome and Opening Remarks
Rebecca Nugent and Carnegie Mellon Sports Analytics Club
11:05 AM

Keynote Address: Patrick Lucey
Stats Perform
11:50 AM

Break
12:00 PM

Player Chemistry: Striving for a Perfectly Balanced Soccer Team
Lotte Bransen
12:25 PM

Points, rating & ranking systems in professional tennis
Paul van Staden
12:50 PM

Virtual Poster Session
2:15 PM

Measuring Spatial Allocative Efficiency in Basketball
Nathan Sandholtz
2:40 PM

Racial Bias in Drafting and Development: The NHL’s Black Quarterback Problem
Chris Watkins
3:05 PM

Break
3:15 PM

Workshop: Introduction to machine learning with the tidyverse and nflfastR data
Tom Mock
4:45 PM

Live Podcast: Chilling With Charlie with Doug Fearing

All times displayed are in EDT.

11:00 AM

Welcome and Opening Remarks
Rebecca Nugent and Carnegie Mellon Sports Analytics Club
11:05 AM

Keynote Address: Christie Aschwanden
11:50 AM

Break
12:00 PM

Redefining the Penalty Kick: Does the Punishment Fit the Crime?
CMSACamp:
Bria Cratty and Jack de la Parra
12:10 PM

Evaluating Parametric Methods for Modeling European Soccer Team Goals
CMSACamp:
Thea Sukianto and Zhiwei Xiao
12:20 PM

Quantifying Passing: Using NBA Tracking Data to Create an Expected Assist Model
CMSACamp:
Raj Dasani, James Hyman, Alex LaGarde, and Caleb Pena
12:30 PM

High Anticipation: Exploring Trends Between Public Perception and Player Value
CMSACamp:
Alana Willis, Fiona Dunn, and Sahana Rayan
12:40 PM

A Puck Above the Rest: Exploring the Effects of New Data on 2020 NHL Draft Decisions
CMSACamp:
Ashley Mullan and Lucy Ward
12:50 PM

Draft Decisions in Uncertain Times: Valuing and Simulating NHL Draft Picks
CMSACamp:
Jill Reiner and Meg Ellingwood
1:00 PM

Virtual Poster Session
2:00 PM

Live Podcast: Too Many Men with Alexandra Mandrycky
2:45 PM

Break
3:00 PM

Reproducible Research Competition Finalists
3:00 PM

Comparing Free-Throw Forms Among NBA Players Through 3D Similarity Measures
Student track finalist: Paul Ibrahim
3:20 PM

Enhancing Public Data Availability and Analysis of Olympic Sports: The Case of College Swimming
Student track finalist: Matthew Flancer
3:40 PM

Grinding the Bayes A Hierarchical Modeling Approach to Predicting the NFL Draft
Open track finalist: Benjamin Robinson
4:00 PM

Bang the Can Slowly: An Investigation into the 2017 Houston Astros
Open track finalist: Gregory Matthews
4:30 PM

Workshop: Leveling Up With The Tidyverse (And Hockey Data)
Meghan Hall

Reproducible Research Competition

Open Track Finalists

Gregory Matthews (view paper)

Loyola University Chicago

Biography

Dr. Gregory J. Matthews is an associate professor of statistics and director of the data science program at Loyola University Chicago. He received his Ph.D. in statistics from the University of Connecticut in 2011 and completed a postdoctoral fellowship in the School of Public Health at the University of Massachusetts Amherst in 2014. In 2016, he, along with Ben Baumer and Shane Jensen, won the SABR Conference Research Award for Contemporary Baseball Analysis for his work on openWAR, and in 2014 he won the March Machine Learning Mania Kaggle contest as part of a team with Mike Lopez.

Abstract

Bang the Can Slowly: An Investigation into the 2017 Houston Astros

This manuscript is a statistical investigation into the 2017 Major League Baseball scandal involving the Houston Astros, the World Series championship winner that same year. The Astros were alleged to have stolen their opponents' pitching signs in order to provide their batters with a potentially unfair advantage. This work finds compelling evidence that the Astros on-field performance was significantly affected by their sign-stealing ploy and quantifies the effects. The three main findings in the manuscript are: 1) the Astros' odds of swinging at a pitch were reduced by approximately 27\% (OR: 0.725, 95\% CI: (0.618, 0.850)) when the sign was stolen, 2) when an Astros player swung, the odds of making contact with the ball increased roughly 80\% (OR: 1.805, 95\% CI: (1.342, 2.675)) on non-fastball pitches, and 3) when the Astros made contact with a ball on a pitch in which the sign was known, the ball's exit velocity (launch speed) increased on average by 2.386 (95\% CI: (0.334, 4.451)) miles per hour.

Benjamin Robinson (view paper)

Grinding the Mocks

Biography

Benjamin Robinson is a data scientist living in Washington, DC and the creator of Grinding the Mocks, a project that tracks how NFL prospects fare in mock drafts. You can follow him on Twitter @benj_robinson and find the Grinding the Mocks project at grindingthemocks.com.

Abstract

Grinding the Bayes A Hierarchical Modeling Approach to Predicting the NFL Draft

Using the 2018 NFL Draft as a case study, this paper extends my initial work (Robinson 2020) on the efficacy of using mock draft data to forecast player-level draft outcomes. By using the same data and applying a more rigorous analytical approach (Bayesian hierarchical modeling with Markov Chain Monte Carlo simulations), methods are developed that allow for NFL decision makers to more accurately gage when a prospect is likely to be selected in the draft. This log-adjusted measure of Expected Draft Position (EDP) explains about 85 percent of variance in actual log-adjusted draft outcomes. Finally, I discuss the implications of using these metrics to inform draft strategy and compare how EDP relates to on-field production.

Student Track Finalists

Matthew Flancer (view paper)

University of Pittsburgh

Abstract

Enhancing Public Data Availability and Analysis of Olympic Sports: The Case of College Swimming

While during the last several years popular team sports have experienced a growth in terms of data and analysis that are publicly available, this is not the case with Olympic sports. While national Olympic committees are reportedly using data to make decisions, “public analytics” have not followed suit. Part of the reason can be attributed to the lack of readily available open datasets to the public for these sports. This work aims at filling this gap by developing an open source application for downloading and analyzing data from college swimming. More specifically the application obtains data and processes them in a machine readable format from swimcloud.com. Furthermore, we provide an interactive application for visualizing and analyzing the data with a focus on two specific applications: (a) swimmer progression across seasons, and (b) tapering during the season in terms of achieving optimal performance in their respective conference finals. We hope that this work will lead to more public interest and analysis in swimming and Olympic sports in general.

Paul Ibrahim (view paper)

Cary Academy

Biography

Paul Ibrahim is a high school senior at Cary Academy in Cary, North Carolina. His areas of interest are game theory, information theory, and analytics across sports, though his research principally focuses on applications of NBA tracking data.

Abstract

Comparing Free-Throw Forms Among NBA Players Through 3D Similarity Measures

In this paper, we propose a method to compare free-throw forms of various NBA players. Using publicly available SportVU tracking data from the 2015-16 NBA season [1], we identify instances of free-throw attempts and track the ball’s motion while it’s in the player’s hands in order to isolate the given player’s shot form from the data. To characterize each player’s form, we apply a multivariate kernel density estimation to the sample of the player’s free throw attempts. Furthermore, using a k-means clustering, we attempt to categorize the multivariate kernel density estimates across the sample of players, characterizing each cluster by the cluster mean. We then proceed to apply a variety of three-dimensional similarity measures to the clustered kernel density estimates, therein providing a variety of metrics by which we can assess free-throw form similarity among NBA players.

CALL FOR ABSTRACTS

In an effort to foster intellectual growth and discovery among the statistics and data science community, we gladly welcome research submissions from the public.

Submit your research project using the form by Sept 28th, indicating whether or not you want your submission considered for a contributed talk and/or poster. Note that there are limited spaces available, and abstracts for talks and posters will be accepted on a rolling basis until slots are filled. Final acceptance notifications will be sent out by mid-September.

Here's a recap of important dates and requirements to remember:

- Sept 28th: Abstract submission deadline.

- Abstracts will be selected on a rolling basis, final notification by early-October, 2020.

NOTE: This research submission form is not considered for entry into the reproducible research competition, meaning it does not require publicly available data and sharing of code (nor entry for cash prizes).

TBA

Contact Us

The Carnegie Mellon Sports Analytics Conference is proudly hosted by the Department of Statistics & Data Science
and the Carnegie Mellon Sports Analytics club.

CMSAC Program Committee:

Rebecca Nugent
Sam Ventura
Ron Yurko

Carnegie Mellon Sports Analytics Club Executives

Shravan Ramamurthy
Toby Junker

Questions can be directed to cmsac@stat.cmu.edu.

CMSAC Activities Conduct Policy

(modeled on the ASA Activities Conduct Policy approved November 30, 2018 by American Statistical Association Board of Directors)

The Carnegie Mellon Sports Analytics Conference (CMSAC) is committed to providing an atmosphere in which personal respect and intellectual growth are valued and the free expression and exchange of ideas are encouraged. Consistent with this commitment, it is CMSAC policy that all participants in CMSAC activities enjoy a welcoming environment free from unlawful discrimination, harassment, and retaliation. We strive to be a community that welcomes and supports people of all backgrounds and identities. This includes, but is not limited to, members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.

All CMSAC participants —including, but not limited to, attendees, statisticians, data scientists, sports analysts, students, registered guests, staff, contractors, sponsors, exhibitors, and volunteers —in the conference or any other related activity—whether official or unofficial—agree to comply with all rules and conditions of the activities. Your registration for or attendance at the 2020 Carnegie Mellon Sports Analytics Conference indicates your agreement to abide by this policy and its terms.

Expected Behavior

- Model and support the norms of professional respect necessary to promote the conditions for healthy exchange of scientific ideas.

- Speak and conduct yourself professionally; do not insult or disparage other participants.

- Be conscious of hierarchical structures in the sports analytics and/or broader statistics/data science community, specifically the existence of stark power differentials among students, junior analysts/statisticians, and senior analysts/statisticians—noting that fear of retaliation from those in senior-level positions can make it difficult for students or those in junior level positions to express discomfort, rebuff unwelcome advances, and report violations of the conduct policy.

- Be sensitive to body language and other non-verbal signals and respond respectfully.

Unacceptable Behavior

- Violent threats or language directed against another person

- Discriminatory jokes and language

- Inclusion of unnecessary sexually explicit, violent, or otherwise sensitive materials in presentations

- Posting (or threatening to post), without permission, other people’s personally identifying information online, including on social networking sites

- Personal insults including, but not limited to, those using racist, sexist, homophobic, or xenophobic terms

- Unwelcome solicitation of emotional or physical intimacy such as sexual advances; propositions; sexual flirtations; sexually-related touching; and graphic gestures or comments about sex or another person’s dress, body, or sexual activities

- Advocating for, encouraging, or dismissing the severity of any of the above behaviors.

Consequences of Unacceptable Behavior

At the sole discretion of the CMSAC Program Committee, unacceptable behavior may result in removal from or denial of access to meeting facilities or activities, without refund of any applicable registration fees or costs. In addition, the CMSAC reserves the right to report violations to an individual’s employer or institution or to a law-enforcement agency. Those engaging in unacceptable behavior may also be banned from future CMSAC activities or face additional penalties.

What to Do if You Witness or Are Subject to Unacceptable Behavior

If you are being harassed, notice that someone else is being harassed, or have any other concerns relating to harassment, please contact a member of the CMSAC program committee either in person or at cmsac@stat.cmu.edu. If you witness potential harm to a conference participant, be proactive in helping to mitigate or avoid that harm; if you see or hear something that concerns you, please say something.

Process for Adjudicating Reports of Misconduct

The CMSAC will contract with an independent entity to manage and adjudicate reported violations of the conduct policy.

Note: This Code of Conduct may be revised at any time by the Carnegie Mellon Sports Analytics Conference. Questions, concerns, or comments should be directed to cmsac@stat.cmu.edu.

About The Conference

Registration is Sold-out!

Registration (until Oct 24th)

Registering indicates agreement to abide by the Code of Conduct .

Schedule Details

All times displayed are in EDT.

Welcome and Opening Remarks

Keynote Address: Patrick Lucey

Break

Player Chemistry: Striving for a Perfectly Balanced Soccer Team

Points, rating & ranking systems in professional tennis

Measuring Spatial Allocative Efficiency in Basketball

Racial Bias in Drafting and Development: The NHL’s Black Quarterback Problem

Break

Workshop: Introduction to machine learning with the tidyverse and nflfastR data

Live Podcast: Chilling With Charlie with Doug Fearing

All times displayed are in EDT.

Welcome and Opening Remarks

Keynote Address: Christie Aschwanden

Break

Live Podcast: Too Many Men with Alexandra Mandrycky

Break

Reproducible Research Competition Finalists

Workshop: Leveling Up With The Tidyverse (And Hockey Data)

Conference Keynotes

Patrick Lucey

Depth vs Coverage: Maximizing the Value of Tracking and Event Data for Better Recruitment

Christie Aschwanden

Telling Stories with Data

How data can inform journalism

Reading List from Christie Aschwanden

Conference Speakers

Lotte Bransen

Player Chemistry: Striving for a Perfectly Balanced Soccer Team

Nathan Sandholtz

Measuring Spatial Allocative Efficiency in Basketball

Paul van Staden

Points, rating & ranking systems in professional tennis

Chris Watkins

Racial Bias in Drafting and Development: The NHL’s Black Quarterback Problem

Workshops

Meghan Hall

Leveling Up With The Tidyverse (And Hockey Data)

Tom Mock

Introduction to machine learning with the tidyverse and nflfastR data

Live Interview: Too Many Men with Alexandra Mandrycky

Alexandra Mandrycky

Alison Lukan

Sara Civian

Shayna Goldman

Live Interview: Chilling with Charlie with Doug Fearing

Doug Fearing

Robert Nguyen

CMSACamp 2020 Student Speakers

Bria Cratty (view slides)

Redefining the Penalty Box: Does the Punishment Fit the Crime?

Raj Dasani (view slides)

Quantifying Passing: Using NBA Tracking Data to Create an Expected Assist Model

Fiona Dunn (view slides)

High Anticipation: Exploring Trends Between Public Perception and Player Value

Meg Ellingwood (view slides)

Draft Decisions in Uncertain Times: Valuing and Simulating NHL Draft Picks

James Hyman (view slides)

Quantifying Passing: Using NBA Tracking Data to Create an Expected Assist Model

Alex Lagarde (view slides)

Quantifying Passing: Using NBA Tracking Data to Create an Expected Assist Model

Ashley Mullan (view slides)

A Puck Above the Rest: Exploring the Effects of New Data on 2020 NHL Draft Decisions

Jack de la Parra (view slides)

Redefining the Penalty Box: Does the Punishment Fit the Crime?

Caleb Peña (view slides)

Quantifying Passing: Using NBA Tracking Data to Create an Expected Assist Model

Sahana Rayan (view slides)

High Anticipation: Exploring Trends Between Public Perception and Player Value

Jill Reiner (view slides)

Draft Decisions in Uncertain Times: Valuing and Simulating NHL Draft Picks

Thea Sukianto (view slides)

Evaluating Parametric Methods for Modeling European Soccer Team Goals

Lucy Ward (view slides)

A Puck Above the Rest: Exploring the Effects of New Data on 2020 NHL Draft Decisions

The Carnegie Mellon Sports Analytics Conference is proudly hosted by the Department of Statistics & Data Science
and the Carnegie Mellon Sports Analytics club.

CMSAC Program Committee:

Rebecca Nugent
Sam Ventura
Ron Yurko

Carnegie Mellon Sports Analytics Club Executives

Shravan Ramamurthy
Toby Junker