Data Mining: Spring 2013

Statistics 36-462/36-662

Instructor: Ryan Tibshirani (ryantibs at cmu dot edu)

Teaching assistants:
Li Liu (lliu1 at andrew dot cmu dot edu)
Cong Lu (congl at andrew dot cmu dot edu)
Jack Rae (jwr at andrew dot cmu dot edu)
Michael Vespe (mvespe at andrew dot cmu dot edu)

Lecture times: Tuesdays and Thursdays 1:30-2:50pm, Porter Hall 125C

Recitation times: Wednesdays 5-6pm, Porter Hall 125C

Office hours: RT: Tuesdays 3-4pm, Baker 229B
LL: Wednesdays 4-5pm, FMS 320
CL: Fridays 11am-12pm, Wean 8110
JR: Wednesdays 11am-12pm, Wean 8110
MV: Mondays 5-6pm, FMS 320

Course syllabus: PDF


Go to:   Lectures | Recitations | Assignments | Schedule | Examples for extra credit

Lecture notes

  1. Introduction to data mining
  2. Information retrieval
  3. PageRank
  4. Clustering 1: K-means and K-medoids
  5. Clustering 2: Hierarchical clustering
  6. Clustering 3: Hierarchical clustering (continued); choosing the number of clusters
  7. Dimension reduction 1: Principal component analysis
  8. Dimension reduction 2: Principal component analysis (continued)
  9. Dimension reduction 3: Nonlinear dimension reduction
  10. Correlation analysis 1: Canonical correlation analysis
  11. Correlation analysis 2: Measures of correlation
  12. Correlation analysis 3: Measures of correlation (continued)
  13. Regression 1: Different perspectives
  14. Regression 2: More perspectives, shortcomings
  15. Regression 3: More perspectives, shortcomings (continued)
  16. Modern regression 1: Ridge regression
  17. Modern regression 2: The lasso
  18. Model selection and validation 1: Cross-validation
  19. Model selection and validation 1: Model assessment, more cross-validation
  20. Classification 1: Linear regression of indicators, linear discriminant analysis
  21. Classification 2: Linear discriminant analysis (continued); logistic regression
  22. Classification 3: Logistic regression (continued); model-free classification
  23. Tree-based methods for classification and regression
  24. Bagging
  25. Boosting
Top

Recitations

Top

Assignments

Top

Schedule

Here is the estimated class schedule. It is subject to change, depending on time and class interests.

Tues Jan 15 1. Introduction to data mining
Thurs Jan 17 2. Information retrieval
Tues Jan 22 3. PageRank Hw 1 out
Thurs Jan 24 4. Clustering 1
Tues Jan 29 5. Clustering 2
Thurs Jan 31 6. Clustering 3
Tues Feb 5 7. Dimension reduction 1 Hw 1 in, Hw 2 out
Thurs Feb 7 8. Dimension reduction 2
Tues Feb 12 9. Dimension reduction 3
Thurs Feb 14 10. Correlation analysis 1
Tues Feb 19 11. Correlation analysis 2 Hw 2 in, Hw 3 out
Thurs Feb 21 12. Correlation analysis 3
Tues Feb 26 Midterm 1
Thurs Feb 28 13. Regression 1
Tues Mar 5 14. Regression 2
Thurs Mar 7 15. Regression 3 Hw 3 in, Hw 4 out
Tues Mar 12 (Spring break, no class)
Thurs Mar 14 (Spring break, no class)
Tues Mar 19 16. Regularized regression 1
Thurs Mar 21 17. Regularized regression 2
Tues Mar 26 18. Model selection and validation 1
Thurs Mar 28 19. Model selection and validation 2 Hw 4 in, Hw 5 out
Tues Apr 2 20. Classification 1
Thurs Apr 4 21. Classification 2
Tues Apr 9 22. Classification 3
Thurs Apr 11 23. Trees and boosting 1 Hw 5 in, Hw 6 out
Tues Apr 16 Midterm 2
Thurs Apr 18 (Spring carnival, no class)
Tues Apr 23 24. Trees and boosting 2
Thurs Apr 25 25. Trees and boosting 3 Hw 6 in
Tues April 30 Work on final projects
Thurs May 2 Work on final projects
Fri May 10
5:30‐8:30pm
Final presentations Final project in

Top

Examples for extra credit

We are trying something new. At the start of class, a student volunteer can give a very short presentation (<= 4 minutes!), showing a cool example of something we learned in class. This can be an example you found in the news or in the literature, or something you thought of yourself---whatever it is, you will explain it to us clearly. And you will get extra credit for doing so.

Click here to sign up for a slot at the start of lecture. When choosing a slot, please keep in mind that there is a preference for examples that have to do with current material that we are covering; e.g., if we are in the middle of our clustering sequences of lectures, examples about clustering are highly encouraged.

Top