36-462/36-662, Data Mining

Cosma Shalizi

Lecture 1, 14 January 2020 — Welcome to the course

Welcome!

Agenda for today

What is data mining?

What are we going to learn about

So many things!

Nearest neighbors

Prediction and decision trees

Nonlinear features and kernels

Information measures

Dimension reduction

Clustering

Checking our guesses

Applications

Information retrieval

Recommendation engines

Fairness in prediction

Waste, fraud and abuse

Waste, fraud and abuse

Where did this come from?

Where did this really come from?

Where did this really come from?

Where did this really come from?

What will you need to know?

Course mechanics

Class meetings

In-class exercises

Reading

Reading: Textbook

Principles of Data Mining

Principles of Data Mining

Reading: Textbook

Reading: Textbook

The Ethical Algorithm: The Science of Socially Aware Algorithm Design

The Ethical Algorithm: The Science of Socially Aware Algorithm Design

Homework

Homework

Grading

Time expectations

Cheating, collaboration & plagiarism

Homework format

Switch to R Studio

Specifically welcome.Rmd

Some lessons from the demo

Next time: The truth about linear regression