CMSAC : Home

About The Conference

Now in its third year, the Carnegie Mellon Sports Analytics Conference is dedicated to highlighting the latest sports research from the statistics and data science community.

Interested in presenting your research at CMSAC? Submit an abstract using the form below! And if you are using publicly available data then consider entering our second annual Reproducible Research Competition!

Stay tuned for more information about the upcoming #CMSAC19. Check out our 2017 and 2018 conferences.

Registration is Sold-out!

Early Bird Registration (until Oct 15th)

High School students – FREE (with school ID)
Undergrad/Grad students Conference: $20 (with school ID)
Undergrad/Grad students Workshop: $10 (with school ID)
Undergrad/Grad students Conference + Workshop: $25 (with school ID)
Non-students Conference: $50
Non-students Workshop: $20
Non-students Conference + Workshop: $60

Regular Registration (Oct 16th - Nov 1st)

High School students – FREE (with school ID)
Undergrad/Grad students Conference: $25 (with school ID)
Undergrad/Grad students Workshop: $10 (with school ID)
Undergrad/Grad students Conference + Workshop: $30 (with school ID)
Non-students Conference: $75
Non-students Workshop: $20
Non-students Conference + Workshop: $85

Registering indicates agreement to abide by the Code of Conduct .

Hotel information

We have a room block with The Oaklander Hotel (reserve now with this link), or make reservations by calling 877-829-2429 and ask for the CMU Sports Analysis Block.

Conference Location

Carnegie Mellon University
Porter Hall 100

4909 Frew St, Pittsburgh, PA 15213

From PIT Airport

1. Head northeast on Airport Blvd
2. Keep left to stay on Airport Blvd - 0.6 mi
3. Keep left to stay on Airport Blvd - 0.7 mi
4. Continue straight to stay on Airport Blvd - 0.2 mi
5. Keep left at the fork, follow signs for
I-376 E/I-79 E/Pittsburgh/Pennsylvania Turnpike E and
merge onto I-376 E - 0.6 mi
6. Merge onto I-376 E - 16.4 mi
7. Keep right to stay on I-376 E - 2.1 mi
8. Take exit 72A to merge onto Forbes Ave toward Oakland - 0.3 mi
9. Merge onto Forbes Ave - 1.0 mi
10. Turn right onto Schenley Drive Extension - 449 ft
11. Turn left onto Schenley Drive - 0.2 mi
12. Turn left onto Frew St 0.2 mi
13. Destination will be on the left

Schedule Details

Football Analytics Workshop in Baker Hall A51, Giant Eagle Auditorium

Led by Ron Yurko, the CMSAC football analytics workshop is a three-hour event (5 to 8 PM), with the first hour dedicated to introducing attendees to reading, wrangling, and visualizing publicly available NFL data with the R statistical programming language, specifically using the tidyverse. The third hour of the workshop will cover the basics of using R to generate ELO ratings for NFL teams, a popular rating system featured on websites such as FiveThirtyEight. The middle hour of the workshop features our keynote speaker, Michael Lopez, who will discuss his work as the NFL Director of Data and Analytics. No prior programming experience is required, more information will be available soon.

5 PM

Into the tidyverse with NFL data
By Ron Yurko
6 PM

Keynote Speaker: Michael Lopez
Director of Football Data and Analytics, NFL
7 PM

Introduction to NFL ELO ratings
By Ron Yurko

Conference sessions in Porter Hall 100

8:00 AM

Registration
8:50 AM

Welcome and Opening Remarks
Rebecca Nugent and Carnegie Mellon Sports Analytics Club
9:00 AM

Keynote Address: Cade Massey
The Wharton School of the University of Pennsylvania
9:45 AM

Coffee Break
10:00 AM

Scouting and Scoring: How We Know What We Know About Baseball
Christopher Phillips
10:25 AM

The 2019 Home Run Surge: A Whole New Ballgame (Again)
Meredith Wills
10:55 AM

Coffee Break
11:05 AM

Expected Hypothetical Completion Probability: An Analysis from the 2019 NFL Big Data Bowl
Sameer Deshpande
11:30 AM

Estimating Player Value in Football Using Plus-Minus Models
Paul Sabin
11:55 AM

CMSACamp Spotlight
Rebecca Nugent
12:05 PM

Poster Previews
Poster Presenters
12:15 PM

Lunch and Poster Session
1:30 PM

Vamos! Estimating Shot Value In Tennis with a Functional Spatiotemporal Gaussian Mixture Model
Stephanie Kovalchick
1:55 PM

From Grapes and Prunes to Apples and Apples: Using Matched Methods to Estimate Optimal Zone Entry Decision-Making in the National Hockey League
Asmae Toumi
2:20 PM

Growth Curves for Predicting Athlete Ratings
Katy McKeough
2:45 PM

Break
3:00 PM

A Bayesian hierarchical regression-based metric for NBA players
Brian MacDonald
3:25 PM

Running out of time: A hierarchical model for estimating foul trouble in the NBA
Dani Chu
3:50 PM

The causal effect of a timeout at stopping an opposing run in the NBA
Connor Gibbs
4:15 PM

Break
4:25 PM

Reproducible Research Competition Final Four Presenters and Awards
6:00 PM

Closing Remarks
Rebecca Nugent
6-7 PM

Networking Reception

Reproducible Research Competition Finalists

(alphabetical order by first author)

Heejong Bong, Wanshan Li, Shamindra Shrotriya (view paper)

Department of Statistics & Data Science, Carnegie Mellon University

Abstract

Efficient Estimation of Distribution-free dynamics in the Bradley-Terry Model

We propose a time-varying generalization of the original Bradley-Terry model. Our model directly captures the temporal dependence structure of the pairwise comparison data to model time-varying global rankings of N distinct objects. The convex formulation enables efficient analysis on sparse time-varying pairwise comparison data. Furthermore, depending on the choice of penalization norm, our model effectively provides a control on the degree of smoothing in the time-varying global rankings. We also prove that a relatively weak condition is necessary and sufficient to guarantee the existence and uniqueness of the solution of our model. Our condition is the weakest in literature till now. We implement various optimization algorithms to solve the model efficiently. We test the practical effectiveness of our model by separately ranking 5 seasons of publicly available National Football League (NFL) team data from 2011-2015, and NASCAR 2002 racing data. In particular, our ranking results on the NFL data compare favourably to the well-accepted and feature-rich NFL ELO ratings system. We thus view our time-varying Bradley-Terry model as a useful benchmarking tool for other feature-rich time-varying ranking models since it simply relies on the minimal time-varying pairwise comparison results for modeling.

Jacob Danovitch (view paper)

Carleton University

Abstract

Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

In baseball, a scouting report profiles a player’s characteristics and traits, usually intended for use in player valuation. This work presents a first-of-its-kind dataset of almost 10,000 scouting reports for minor league, international, and draft prospects. Compiled from articles posted to MLB.com and Fangraphs.com, each report consists of a written description of the player, numerical grades for several skills, and unique IDs to reference their profiles on popular resources like MLB.com, FanGraphs, and Baseball-Reference. With this dataset, we employ several deep neural networks to predict if minor league players will make the MLB given their scouting report. We open-source this data to share with the community, and present a web application demonstrating language variations in the reports of successful and unsuccessful prospects.

Jacob Richey (view paper)

The Wharton School of the University of Pennsylvania

Abstract

playerElo

With the sabermetric revolution of the MLB, a plethora of new statistics have come into the mainstream, and a growing number of fantasy owners, ballclubs, and regular fans are turning to these new statistical methods for player analysis. However, I propose even advanced metrics such as wOBA, FIP, xwOBA, xFIP, and wRC+ are all missing a crucial element to accurately represent player performance thus-far. The playerElo system is able to reveal in aggregate the effects of previously unconsidered aspects of the game. Using an Elo ranking system determined by run-value calculations of all major league baseball players, the model incorporates context-dependent analysis and quality of competition to produce a proper evaluation of batters and pitchers. This enables playerElo to appropriately credit pitchers, especially relievers, for their true impact on the game, particularly when called upon in disadvantageous situations. Additionally, playerElo does not allow relative team strength, which confounds common counting statistics, to influence the evaluation of a player. The model is a holistic approach to the assessment of major league players and has incredible ramifications on player projections during free agency and player acquisition.

Shane Sanders¹, Joel Potter², Justin Ehrlich¹, Justin Perline¹ (view paper)

¹Syracuse University, ² University of North Georgia,

Abstract

Wins Above Replacement and the MLB MVP Vote: A Natural Experiment

Wins above replacement is an objective measure of player on-field value in MLB. The measure was created in 2004 and subsequently popularized on leading MLB data sites such as Baseball Prospectus, Baseball Reference, ESPN and Fangraphs. The creation of WAR provides a natural experimental setting. Before 2004, Baseball Writers of America members cast MLB MVP ballots absent a comprehensive player (win contribution) value measure. This was a daunting task given the apples-to-oranges nature of cross-positional comparisons in baseball. Although WAR was not available to MVP voters before 2004, it was retroactively calculated throughout MLB history using data from sources such as retrosheet.org (such that sabermetricians and baseball historians can and do fuel their GOAT arguments with actual player value measurement). We use these retroactive calculations to estimate the relationship between WAR-estimated player value and MVP voting before 2004. We also estimate this relationship for votes that occurred from 2004 through 2017. Across sets of both fixed effects negative binomial and neural network regression models, we find significant and substantial evidence that the effect of WAR upon MVP vote points was significantly and substantially stronger. from 2004-2017 than from 1980-2003. A unit of additional WAR was worth an additional 37 (45) vote points, in expectation, from 1980-2003 (2004-2017). Further, we present evidence that this shift in voting behavior was not a gradual response to the sabermetric era in general but rather a specific response to the creation of WAR. Namely, voting behavior from 1992-2003 was almost identical to that from 1980-1991. Following the advent of WAR, informed voters were more likely to select the most qualified players.

CALL FOR ABSTRACTS

In an effort to foster intellectual growth and discovery among the statistics and data science community, we gladly welcome research submissions from the public.

Submit your research project using the form by August 31st, indicating whether or not you want your submission considered for a contributed talk and/or poster. Note that there are limited spaces available, and abstracts for talks and posters will be accepted on a rolling basis until slots are filled. Final acceptance notifications will be sent out by mid-September.

Here's a recap of important dates and requirements to remember:

- Aug 31st: Abstract submission deadline.

- Abstracts will be selected on a rolling basis, final notification by mid-September, 2019.

NOTE: This research submission form is not considered for entry into the reproducible research competition, meaning it does not require publicly available data and sharing of code (nor entry for cash prizes).

Our Sponsors

Presenting Sponsor

Coffee Break Sponsor

Supporting Sponsors

Zelus Analytics

Please see the sponsorship form for more information about sponsorship opportunities, and contact cmsac@stat.cmu.edu if you have any questions.

Contact Us

The Carnegie Mellon Sports Analytics Conference is proudly hosted by the Department of Statistics & Data Science
and the Carnegie Mellon Sports Analytics club.

CMSAC Program Committee:

Rebecca Nugent
Sam Ventura
Ron Yurko

Carnegie Mellon Sports Analytics Club Executives

Shravan Ramamurthy
Toby Junker

Questions can be directed to cmsac@stat.cmu.edu.

CMSAC Activities Conduct Policy

(modeled on the ASA Activities Conduct Policy approved November 30, 2018 by American Statistical Association Board of Directors)

The Carnegie Mellon Sports Analytics Conference (CMSAC) is committed to providing an atmosphere in which personal respect and intellectual growth are valued and the free expression and exchange of ideas are encouraged. Consistent with this commitment, it is CMSAC policy that all participants in CMSAC activities enjoy a welcoming environment free from unlawful discrimination, harassment, and retaliation. We strive to be a community that welcomes and supports people of all backgrounds and identities. This includes, but is not limited to, members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.

All CMSAC participants —including, but not limited to, attendees, statisticians, data scientists, sports analysts, students, registered guests, staff, contractors, sponsors, exhibitors, and volunteers —in the conference or any other related activity—whether official or unofficial—agree to comply with all rules and conditions of the activities. Your registration for or attendance at the 2019 Carnegie Mellon Sports Analytics Conference indicates your agreement to abide by this policy and its terms.

Expected Behavior

- Model and support the norms of professional respect necessary to promote the conditions for healthy exchange of scientific ideas.

- Speak and conduct yourself professionally; do not insult or disparage other participants.

- Be conscious of hierarchical structures in the sports analytics and/or broader statistics/data science community, specifically the existence of stark power differentials among students, junior analysts/statisticians, and senior analysts/statisticians—noting that fear of retaliation from those in senior-level positions can make it difficult for students or those in junior level positions to express discomfort, rebuff unwelcome advances, and report violations of the conduct policy.

- Be sensitive to body language and other non-verbal signals and respond respectfully.

Unacceptable Behavior

- Violent threats or language directed against another person

- Discriminatory jokes and language

- Inclusion of unnecessary sexually explicit, violent, or otherwise sensitive materials in presentations

- Posting (or threatening to post), without permission, other people’s personally identifying information online, including on social networking sites

- Personal insults including, but not limited to, those using racist, sexist, homophobic, or xenophobic terms

- Unwelcome solicitation of emotional or physical intimacy such as sexual advances; propositions; sexual flirtations; sexually-related touching; and graphic gestures or comments about sex or another person’s dress, body, or sexual activities

- Advocating for, encouraging, or dismissing the severity of any of the above behaviors.

Consequences of Unacceptable Behavior

At the sole discretion of the CMSAC Program Committee, unacceptable behavior may result in removal from or denial of access to meeting facilities or activities, without refund of any applicable registration fees or costs. In addition, the CMSAC reserves the right to report violations to an individual’s employer or institution or to a law-enforcement agency. Those engaging in unacceptable behavior may also be banned from future CMSAC activities or face additional penalties.

What to Do if You Witness or Are Subject to Unacceptable Behavior

If you are being harassed, notice that someone else is being harassed, or have any other concerns relating to harassment, please contact a member of the CMSAC program committee either in person or at cmsac@stat.cmu.edu. If you witness potential harm to a conference participant, be proactive in helping to mitigate or avoid that harm; if you see or hear something that concerns you, please say something.

Process for Adjudicating Reports of Misconduct

The CMSAC will contract with an independent entity to manage and adjudicate reported violations of the conduct policy.

Note: This Code of Conduct may be revised at any time by the Carnegie Mellon Sports Analytics Conference. Questions, concerns, or comments should be directed to cmsac@stat.cmu.edu.

About The Conference

Registration is Sold-out!

Early Bird Registration (until Oct 15th)

Regular Registration (Oct 16th - Nov 1st)

Registering indicates agreement to abide by the Code of Conduct .

Hotel information

Conference Location

Carnegie Mellon University Porter Hall 100

4909 Frew St, Pittsburgh, PA 15213

From PIT Airport

Schedule Details

Football Analytics Workshop in Baker Hall A51, Giant Eagle Auditorium

Into the tidyverse with NFL data

Keynote Speaker: Michael Lopez

Introduction to NFL ELO ratings

Conference sessions in Porter Hall 100

Registration

Welcome and Opening Remarks

Keynote Address: Cade Massey

Coffee Break

Scouting and Scoring: How We Know What We Know About Baseball

The 2019 Home Run Surge: A Whole New Ballgame (Again)

Coffee Break

Expected Hypothetical Completion Probability: An Analysis from the 2019 NFL Big Data Bowl

Estimating Player Value in Football Using Plus-Minus Models

CMSACamp Spotlight

Poster Previews

Lunch and Poster Session

Vamos! Estimating Shot Value In Tennis with a Functional Spatiotemporal Gaussian Mixture Model

From Grapes and Prunes to Apples and Apples: Using Matched Methods to Estimate Optimal Zone Entry Decision-Making in the National Hockey League

Growth Curves for Predicting Athlete Ratings

Break

A Bayesian hierarchical regression-based metric for NBA players

Running out of time: A hierarchical model for estimating foul trouble in the NBA

The causal effect of a timeout at stopping an opposing run in the NBA

Break

Reproducible Research Competition Final Four Presenters and Awards

Closing Remarks

Networking Reception

Conference Keynote Speaker

Cade Massey

Biography:

Workshop Keynote Speaker

Michael Lopez

Biography:

Conference Speakers

Dani Chu

Running out of time: A hierarchical model for estimating foul trouble in the NBA

Sameer Deshpande

Expected Hypothetical Completion Probability: An Analysis from the 2019 NFL Big Data Bowl

Connor Gibbs

The causal effect of a timeout at stopping an opposing run in the NBA

Stephanie Kovalchik

Vamos! Estimating Shot Value In Tennis with a Functional Spatiotemporal Gaussian Mixture Model

Brian Macdonald

A Bayesian hierarchical regression-based metric for NBA players

Katy McKeough

Growth Curves for Predicting Athlete Ratings

Christopher J. Phillips

Scouting and Scoring: How We Know What We Know About Baseball

Paul Sabin

Estimating Player Value in Football Using Plus-Minus Models

Asmae Toumi

FROM GRAPES AND PRUNES TO APPLES AND APPLES: USING MATCHED METHODS TO ESTIMATE OPTIMAL ZONE ENTRY DECISION-MAKING IN THE NATIONAL HOCKEY LEAGUE

Meredith J. Wills

The 2019 Home Run Surge: A Whole New Ballgame (Again)

Reproducible Research Competition Finalists

(alphabetical order by first author)

Heejong Bong, Wanshan Li, Shamindra Shrotriya (view paper)

Efficient Estimation of Distribution-free dynamics in the Bradley-Terry Model

Jacob Danovitch (view paper)

Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports

Jacob Richey (view paper)

playerElo

Shane Sanders1, Joel Potter2, Justin Ehrlich1, Justin Perline1 (view paper)

Wins Above Replacement and the MLB MVP Vote: A Natural Experiment

CALL FOR ABSTRACTS

In an effort to foster intellectual growth and discovery among the statistics and data science community, we gladly welcome research submissions from the public.

Here's a recap of important dates and requirements to remember:

- Aug 31st: Abstract submission deadline.

Carnegie Mellon University
Porter Hall 100

Shane Sanders¹, Joel Potter², Justin Ehrlich¹, Justin Perline¹ (view paper)

The Carnegie Mellon Sports Analytics Conference is proudly hosted by the Department of Statistics & Data Science
and the Carnegie Mellon Sports Analytics club.

CMSAC Program Committee:

Rebecca Nugent
Sam Ventura
Ron Yurko

Carnegie Mellon Sports Analytics Club Executives

Shravan Ramamurthy
Toby Junker