General information

Course overview

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify, and write code, so that they can assemble the computational tools needed to solve their data analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to statistically-oriented programming, targeted at statistics majors, without assuming extensive programming background.

Students will learn the core of ideas of programming—data structures, functions, iteration, input and output, debugging, logical design, and abstraction—through writing code to assist in statistical analyses. Students will learn how to write maintainable code, as well as debug and test code for correctness. They will learn how to set up and run stochastic simulations, how to fit basic statistical models and assess the results, and how to work with and filter large data sets. Since code is an important form of communication among scientists, students will also learn how to comment and organize code.

The class will be taught in the R programming language.

Course website

The course website is http://www.stat.cmu.edu/~ryantibs/statcomp-S18/. The course schedule, lecture notes, labs, homework, etc., will be posted there.

Pre-requisites

This is an introduction to programming for statistics students. Prior exposure to statistical thinking, to data analysis, and to basic probability concepts is essential. Previous programming experience is not assumed, but familiarity with the computing system is. Formally, the pre-requisites are “Computing at Carnegie Mellon”, 36-202 or 36-208, and 36-225.

Course mechanics

Each week, lecture will be given for roughly the first 40 minutes of class on Tuesday. The rest of class on Tuesday and the whole class period on Thursday will be lab sessions, in which students work through a set of exercises. The lab from each week is to due 10pm on Thursday, on Canvas. There will also be a homework each week, due 10pm on Sunday, on Canvas. Lastly, there will be a final take-home exam.

Grading

Grades will be calculated as follows:

  • Labs: 30%
  • Homework: 50%
  • Final exam: 20%

Here are the cutoffs for letter grades, based on total percentages:

  • A: 90% or higher
  • B: 80% to 89%
  • C: 70% to 79%
  • D: 60% to 69%
  • R: 59% or lower, on a case by case basis

The Instructors may adjust these cutoffs, but only in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.

R and R Studio

R is a free, open-source programming language for statistical computing. Almost all of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, then let the Instructors or TAs know right away.

R Studio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Most importantly it integrates R Markdown seamlessly. You will use R Studio for the labs, homework, and final.

Getting help

Office hours

Office hours times are spread out over the week. The exact times and locations can be found on the course website.

Piazza

Piazza will be used for questions and discussion on the class contents. Class announcements will also be made through Piazza. The link for the Piazza group is: http://www.piazza.com/cmu/spring2018/36350.

Piazza can be a very successful medium for helpful, class-wide discussions, but without rules, discussions can also quickly get out of hand. Here are the rules for our Piazza group:

  1. Be considerate to others (respectful language, no sarcasm).
  2. Before posting a question, check that it (or a related question) has a not already been posted. If it has, then use the existing thread for further questions or discussion.
  3. For questions about the labs or homework, “What is wrong with this code?” is not an acceptable question. Code that is part of your lab or homework solution cannot be posted to Piazza.
  4. Along with your posted question, explain step-by-step what you’ve tried to answer your own question (without posting your solution code).
  5. Avoid private questions on Piazza (an option for questions that only Instructors and TAs can see), since they might not be answered in a reliable/timely manner.

Content deemed inappropriate—by the above rules and otherwise—will be taken down by the Instructors or TAs.

Email

Email will be used for questions on class administration (class policies, exceptional circumstances, etc.), rather than class contents. Please direct such inquiries to Associate Instructor Kevin Lin. The subject line of all emails should begin with “[36-350]”. Professor Tibshirani is available but only for issues that cannot be resolved first with the Associate Instructor.

Assignments

Submissions

All assignments (labs, homework, final take-home exam) must be turned in electronically, through Canvas.

All assignments must be completed in R Markdown format (file extension Rmd). Since assignments will involve writing a combination of code and written prose, the R Markdown format is crucial since it allows for a combination of the two. All assignments must be submitted only in HTML format, the result of calling “Knit HTML” from R Studio on your R Markdown document. Be careful that you do this, because work submitted in any other format will receive a grade of 0, without exception.

Note also, all code used to produce your results must be shown in your HTML file (e.g., do not use echo=FALSE or include=FALSE as options anywhere).

Labs

Labs will occupy about 120 minutes of class time each week (160 total minutes, minus about 40 minutes for lecture). The grading breakdown for labs: 20% for attendance on Tuesday, 20% for attendance on Thursday, and 60% for completion of the lab questions. Attendance will be checked by random sampling on each day.

The lab each week is due 10pm on Thursday, on Canvas. Students may choose to work with friends on the lab, but read carefully the collaboration policy below.

Homework

There will be a homework assignment nearly every week, due at 10pm on Sunday, on Canvas. Students may choose to collaborate with friends on the homework, but read carefully the collaboration policy below.

Final exam

In place of an in-class final exam, there will be a take-home exam. It will be essentially like a homework, but no collaboration with peers is allowed. More details to come.

Late work

You have a total of 5 late days in the semester, to use between the labs and the homework. For example, you may apply all 5 late days to Homework 2; or you may apply 2 late days to Lab 1, 2 late days to Homework 4, and 1 late day to Homework 6; etc. After these 5 late days are used up, no late work will be accepted.

In case of truly exceptional situations—such as family emergencies or illness—the instructors can make exceptions and allow late work. If you think your situation is truly exceptional but is not an emergency, then you must notify the Associate Instructor of your situation at least 2 full days before the particular assigment (lab or homework) is due.

Collaboration, copying, and plagiarism

You are encouraged to discuss course material—especially lab work, but also including homework assignments—with your classmates. All work you turn in, however, must be your own. This includes both written explanations, and code. Copying from other students, books, websites, or solutions from previous versions of the class, (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences for you, as outlined in the university’s policy on cheating and plagiarism. If, after reading the policy, you are unclear on what is acceptable, please ask the Instructors.

Take care of yourself

Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

If the situation is life threatening, call the police:

If you have questions about this, then please let the Professor know.