The Tartan Data Science Cup is a series of Kaggle-like data analysis competitions exclusively for CMU undergraduates and local high school students. Each competition will have a different theme, research scenario, goals, and solutions. The problem description, research question, and data sets will not be released until specified date/times. Students will submit their answers by a given deadline; selected finalists will present in front of a panel of judges.
Winning teams will receive cash prizes, (temporary ownership of) the Tartan Data Science Cup, and glory. So much glory.
All currently enrolled high school students and Carnegie Mellon University undergraduate students on the Pittsburgh campus are eligible to participate. Teams can consist of 1-3 students; students can only participate on one team. All student names and Andrew IDs/email addresses must be included when registering. Registration must also include a (non-identifying) team name.
To register, click here.
The data set and variable descriptions will be available on Friday evening but without details about the specific competition questions. Participants should try to do some exploratory data analysis prior to the competition in order to focus their efforts on Sunday.
THE DATA ARE LIVE!! Click here
There will be a demo session using the data sets during the Carnegie Mellon Sports Analytics Conference at 5:15pm in Baker Hall A51 (Giant Eagle Auditorium). Topics will include loading the data sets, variable discussion, different visualizations of the features, and brainstorming interesting metrics. Students in the Tartan Data Science Cup are welcome to attend this Demo Session even if not attending the entire Carnegie Mellon Sports Analytics Conference.
The research problem and competition question(s) will be released on this website at 9am. Students are welcome to work anywhere, but the TDSC Homebase (Giant Eagle Auditorium) will be open all day as the TDSC Homebase. TDSC organizers will also be available during the day to answer questions.
Lunch will be provided for participants in the TDSC Homebase.
Submissions are due. Each team should submit a 2-3 page report describing the key results and methods used to analyze the data (made as or converted to a .pdf file).
DUE AT 6PM: up to 3 slides for a 5-minute research presentation (made as or converted to a .pdf file)
DUE AT 6PM: all (well-documented!) code used to analyze the data, obtain results, create graphics, etc (any programming language/software is acceptable)
Submission links will be open closer to the deadline.
Submission constitutes permission to post winning team entries online (under non-identifying team name).
A panel of judges from Statistics & Data Science and sports analytics. The judges will review the reports from 5-7pm and then watch the slide presentations at 7pm. Students are encouraged to practice their presentations over the 5-7pm dinner break.
The top 6-8 teams will be given five minutes to present their methods and results to the judges, the other teams, and anyone else who wishes to attend. Teams can have up to three slides, but be careful -- you will be cut off after exactly five minutes! Teams outside of the top 8 are still eligible to win other prizes and encouraged to stay and watch the final presentations.
The judging criteria include:
1st Place Team: $500
2nd Place Team: $300
3rd Place Team $200
Additionally, the 1st place team will receive the Tartan Data Science Cup. After each competition, the Cup is presented to the winning team, who are allowed to keep the cup and gloat for a short period of time. Members of the winning team will have their names engraved onto the Cup.
Rebecca Nugent (rnugent@stat.cmu.edu), Christopher Peter Makris.