Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify, and write code, so that they can assemble the computational tools needed to solve their data analysis problems, rather than distorting problems to fit tools provided by otshers. This class is an introduction to statistically-oriented programming, targeted at statistics majors, without assuming extensive programming background.
Students will learn the core of ideas of programming—data structures, functions, iteration, debugging, logical design, and abstraction—through writing code to assist in statistical analyses. Students will learn how to write maintainable code, as well as debug and test code for correctness. They will learn how to set up and run stochastic simulations, how to fit basic statistical models and assess the results, and how to work with and filter large data sets. Since code is an important form of communication among scientists, students will also learn how to comment and organize code.
The class will be taught in the R programming language.
The course website is http://www.stat.cmu.edu/~ryantibs/statcomp-F19/. The course schedule, lecture notes, labs, supplementary materials, etc., will be posted there.
This is an introduction to programming for statistics students. Prior exposure to statistical thinking, to data analysis, and to basic probability concepts is essential. Previous programming experience is not assumed. Formally, the prerequisites are “Computing at Carnegie Mellon”, 36-202 or 36-208, and 36-225.
This class will be run in a flipped format. Instead of having regular lectures Monday, Wednesday, and Friday (our schedule class times), the week will be structured as follows.
Grades will be calculated as follows:
Here are the cutoffs for letter grades, based on total percentages:
The Professor may adjust these cutoffs, but only in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.
R is a free, open-source programming language for statistical computing. All of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, then let the Professor or TAs know right away.
RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Most importantly it integrates R Markdown seamlessly. You will use RStudio for the labs and final.
Coming to labs are the best way to get help. You will be able to ask questions of the Professor and TAs for the entire time.
Office hours will be held by the Professor and TAs, and the times will be spread out over the week. The times and locations can be found on the course website.
Piazza will be used for questions and discussion on the class contents. Class announcements will also be made through Piazza. The link for the Piazza group is given on the course website.
Piazza can be a very successful medium for helpful, class-wide discussions, but without rules, discussions can also quickly get out of hand. Here are the rules for our Piazza group:
Rule #2 above is highlighted because it is important and in our experience it is the usually the first rule to be forgotten. Read Piazza first, then post! Duplicated posts can snowball and then Piazza can quickly become ineffective.
Content deemed inappropriate—by the above rules and otherwise—will be taken down by the Professor or TAs.
Email will be used for questions on class administration (class policies, exceptional circumstances, etc.), rather than class contents. Please direct such inquiries to the Head TA. The subject line of all emails should begin with “[36-350]”. The Professor will be available for issues that cannot be resolved first with the Head TA.
Quizzes will be short (about 8-10 questions), and consist of true/false and multiple choice questions. They will be completed on Canvas, due 11:59pm on Tuesday each week, with the links given on the course website. Quizzes are supposed to be a relatively easy recap of the material covered in the week’s lecture materials. After you submit the quiz, you will immediately see your score, and the correct answers. The system allows you to retake the quiz, and then receive an average of your two quiz scores as your final quiz score. So the worst you can do is to get half credit on any given quiz (get all questions wrong the first time, and all questions right the second time).
Labs will be completed in R Markdown format (file extension Rmd). They will involve writing a combination of code and written prose, and the R Markdown format is crucial since it allows for a combination of the two. Labs will be turned in through Canvas, due 11:59pm on Sunday each week, and they must be submitted only in HTML format, the result of calling “Knit HTML” from RStudio on your R Markdown document. Be careful that you do this, because work submitted in any other format will receive a grade of 0, without exception.
Note also: all code used to produce your results must be shown in your HTML file (e.g., do not use echo=FALSE
or include=FALSE
as options anywhere).
Students may choose to collaborate with friends on the labs, but must indicate with whom they collaborated. Also, be sure to carefully read the collaboration policy below.
There will be a final in-class exam. It will be mostly similar in format to the quizzes (true/false and multiple choice questions), and will be comprehensive.
In general, no late days will be accepted. Instead, your lowest lab score and lowest quiz score will be dropped at the of the semester. In case of truly exceptional situations—such as family emergencies or illness—the Head TA can make exceptions and allow late work (labs or quizzes).
You are encouraged to discuss course material with your classmates. All work you turn in, however, must be your own. This includes both written explanations, and code. Copying from other students, books, websites, or solutions from previous versions of the class, (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences for you, as outlined in the university’s policy on cheating and plagiarism. If, after reading the policy, you are unclear on what is acceptable, please ask the Professor.
If you have a disability and are registered with the Office of Disability Resources, please use their online system to notify us of your accommodations and discuss with us your needs as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, consider contacting them at access@andrew.cmu.edu.
Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.
All of us benefit from support during times of struggle. You are not alone. Asking for support sooner rather than later is often helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.
If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:
If the situation is life threatening, call the police:
If you have questions about this, then please let us know.