General information

Course overview

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify, and write code, so that they can assemble the computational tools needed to solve their data analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to statistically-oriented programming, targeted at statistics majors, without assuming extensive programming background.

Students will learn the core of ideas of programming—functions, objects, data structures, input and output, debugging, logical design and abstraction—through writing code to assist in numerical and graphical statistical analyses. Students will learn how to write maintainable code, as well as debug and test code for correctness. They will learn how to set up and run stochastic simulations, how to fit basic statistical models and assess the results, and how to work with and filter large data sets. Since code is an important form of communication among scientists, students will also learn how to comment and organize code.

The class will be taught in the R programming language.

Course website

The course website is http://www.stat.cmu.edu/~ryantibs/statcomp-F16/. The course schedule, lecture notes, labs, etc., will be posted there.

Pre-requisites

This is an introduction to programming for statistics students. Prior exposure to statistical thinking, to data analysis, and to basic probability concepts is essential. Previous programming experience is not assumed, but familiarity with the computing system is. Formally, the pre-requisites are “Computing at Carnegie Mellon” (or consent of instructor), plus one of either 36-202 or 36-208, with 36-225 as either a pre-requisite (preferable) or co-requisite (if need be).

Furthermore, a short online introductory course to the basics in R, created by the Statistics department, must be completed during the first week of the semester, due in place of the homework, usually due on Sunday evening (there is no homework the first week).

Course mechanics

There will be mini-lectures on Mondays, Wednesdays, and Fridays (except holidays, of course). Each mini-lecture is 10 minutes long, and covers a single topic. Each class period begins with at most 2 mini-lectures (but most usually, just 1 mini-lecture). The rest of the class period is then converted to a lab session, in which students work through a set of practice exercises. These are to be completed and submitted by 11:59pm of the same day, on Blackboard.

There will also be a homework each week, due at 6pm on Sunday, on Blackboard. Lastly, there will be a final project.

Grading

Grades will be calculated as follows:

  • Labs: 30%
  • Homework: 50%
  • Final project: 20%

Here are the cutoffs for letter grades, based on total percentages:

  • A: 90% or higher
  • B: 80% to 89%
  • C: 70% to 79%
  • D: 60% to 69%
  • R: 59% or lower, on a case by case basis

The Professor may adjust these cutoffs, but only in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.

R and R Studio

R is a free, open-source programming language for statistical computing. Almost all of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, then let the Professor or TAs know right away.

R Studio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Most importantly it integrates R Markdown seamlessly. Use of R Studio is required for the labs and homework, and strongly recommended in general.

Textbook

There is no required textbook. Optional readings and supplementary materials can be found on the course website.

Getting help

Office hours

There will be 4 office hours held each week, spread out. The times and locations can be found on the course website.

Piazza

Piazza will be used for class discussions. We highly encourage you to sign up; the signup link is on the course website.

Piazza can be a very successful medium for helpful, class-wide discussions, but without rules, discussions can also quickly get out of hand. Here are the rules for our Piazza group:

  • Be considerate to others (respectful language, no sarcasm).
  • When it comes to the questions about the homework, “What is wrong with this code?” is not an acceptable question. Questions must be sufficiently generalized/modified/abstracted out so that it is not possible to directly construct parts or all of the solutions from them.
  • Read the existing posts before you create your own, as often somebody else will have already asked the same question that you want to ask (or a very similar one).
  • Content deemed inappropriate—by the above rules and otherwise—will be taken down by the TAs or Professor.
  • Questions should be placed in the right folder (e.g., hw1, lab1, general).
  • Private questions on Piazza (an option for questions that only TAs and Professor can see) are not explicitly disallowed, but are discouraged, because the TAs and Professor may not be able to answer private questions in a timely manner.

Private emails

Private emails to the Professor about truly private matters (e.g., a request for an extension due to a family emergency) are of course OK. However, private emails that ask questions about course materials are discouraged. In seeking help, please use the Piazza discussion group and/or office hours.

Assignments

Submission format

All assignments (labs, homework, final project report) must be turned in electronically, through Blackboard.

All assignments must be completed in R Markdown format (file extension Rmd). Since assignments will involve writing a combination of code and written prose, the R Markdown format is crucial since it allows for a combination of the two. Each lab must be submitted in Rmd format, this is just your plain R Markdown document. Each homework must be submitted in HTML format, the result of calling “Knit HTML” from R Studio on your R Markdown document.

Work submitted that does not obey the appropriate format will receive an automatic grade of 0, without exceptions.

Labs

There will be a either a 40 minute or 30 minute lab session every class period, depending on whether there are 1 or 2 mini-lectures. Each lab is worth 9 points: 4 points for attendance, and 5 points for completion. Students may choose to work with a friend on the lab, but read carefully the collaboration policy below.

Unless otherwise noted, all labs are due at 11:59pm on the day they are released, submitted to Blackboard. Each student must submit their own write-up. No late lab work will be accepted.

Homework

There will be a homework assignment nearly every week. Each homework will be worth 45 points, though only a random subset of questions will be graded. Students may choose to collaborate with a friend on the homework, but read carefully the collaboration policy below.

Unless otherwise noted, all homework is due at 6pm on Sunday, submitted on Blackboard. Each student must submit their own write-up. No late homework will be accepted, but the lowest homework score of the semester will be dropped.

Final project

In place of an in-class final exam, there will be a programming project. More details to come.

Collaboration, copying, and plagiarism

You are encouraged to discuss course material—especially lab work, but also including homework assignments—with your classmates. All work you turn in, however, must be your own. This includes both written explanations, and code. Copying from other students, books, websites, or solutions from previous versions of the class, (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences for you, as outlined in the university’s policy on cheating and plagiarism. If, after reading the policy, you are unclear on what is acceptable, please ask the instructor.

(Note: the final project will operate a little differently, since this is explicitly done in a group, and each group will submit a single write-up. But for this project, the above still applies to copying material from books, websites, or solutions from previous versions of the class.)

Take care of yourself

Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

If the situation is life threatening, call the police:

If you have questions about this or your coursework, please let me know.