Optimism and Over-Fitting

36-465/665, Spring 2021

11 February 2021

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{r}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \DeclareMathOperator{\tr}{tr} \]

Housekeeping

In our last episode

Reminders about “the usual asymptotics”

\[\begin{eqnarray} \hat{\theta} & \rightarrow & \theta^* \\ \hat{\theta} & \approx & \theta^* - \mathbf{k}^{-1} \nabla \EmpRisk(\theta^*)\\ \mathbf{k} & \equiv & \nabla \nabla \Risk(\theta^*)\\ \Var{\hat{\theta}} & \approx & n^{-1} \mathbf{k}^{-1} \mathbf{j} \mathbf{k}^{-1}\\ \mathbf{j} & \equiv & \Var{\nabla \Loss(Y, s(X;\theta^*))}\\ \hat{\theta} & \rightsquigarrow & \mathcal{N}(\theta^*, n^{-1} \mathbf{k}^{-1} \mathbf{j} \mathbf{k}^{-1}) \end{eqnarray}\]

How do we find those magic matrices?

But what about the predictions?

But what about the predictions? (2)

What about the risk?

Being more precise about the risk

Empirical risk minimization is optimistic

A toy example (1)

A toy example (2)

A toy example (3)

A toy example (4)

A toy example (5)

A toy example (6)

Estimating the true risk from the empirical risk

Estimating the true risk from the empirical risk (2)

Estimating the true risk from the empirical risk (3)

\[ \Risk(\hat{\theta}) \approx \EmpRisk(\hat{\theta}) + n^{-1}\tr{\mathbf{j}\mathbf{k}^{-1}} \]

The moral of all this math: approximation-estimation trade-off

Over-fitting

Avoiding over-fitting

Why the optimism is not the end of the story

\[ \Risk(\hat{\theta}) \approx \EmpRisk(\hat{\theta}) + n^{-1}\tr{\mathbf{j}\mathbf{k}^{-1}} \]

The goal: generalization error bounds

Summing up

Backup: Expectations of quadratic forms, and the optimism formula