Predictions and Decision Theory

36-465/665, Conceptual Foundations of Statistical Learning

4 February 2021 (Lecture 2)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \]

Previously

Prediction

Good and bad predictions

The elements of a decision problem

  1. Possible actions \(A\)
  2. Information \(X\), which we get to see before taking an action
  3. States \(Y\) picked by Nature
  4. A strategy \(s\) is a function from \(X\) (information) to \(A\) (action)
    • There is usually some class of strategies \(S\) available
  5. A loss function \(\Loss(y,a)\): how much it hurts to take action \(a\) when the state turns out to be \(y\)

The risk of a strategy

Risk minimization

Minimizing the conditional risk really is optimal

Minimizing the risk in a class of strategies

The approximation-estimation trade-off

Back to prediction problems

  1. Actions = predictions
  2. Information = covariates, regressors, features (etc.)
  3. States = the target variable we’re trying to predict
  4. Strategy = prediction rule = function from information to actions
  5. Loss function = ?

Regression, for example

  1. Actions = predictions = real numbers = guesses at the regressand
  2. Information = vectors of real numbers = covariates, regressors (“independent variables”)
  3. States = “dependent variable”, “regressand”
  4. Strategy = prediction rule = regression function
  5. Loss function = ?

Linear regression, for example

Alternative loss functions

Some losses for classifiers

0-1 loss vs. log loss

Connecting to data

Back-up: Alternatives to minimizing risk

Back-up: Why decision theory?