Deviation Bounds I: Markov Inequality etc.

36-465/665, Spring 2021

16 February 2021 (Lecture 5)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{r}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \DeclareMathOperator{\tr}{tr} \newcommand{\Indicator}[1]{\mathbb{1}\left\{ #1 \right\}} \newcommand{\myexp}[1]{\exp{\left( #1 \right)}} \]

Previously

What we’re building towards

If we have \(n\) samples, then with probability at least \(1-\alpha\), \[ \Risk(\hat{s}) \leq \EmpRisk(\hat{s}) + g(n,\alpha) \] for some function \(g\) we can calculate

If we have \(n\) samples, then with probability at least \(1-\alpha\), \[ \Risk(\hat{s}) \leq \Risk(s^*) + h(n,\alpha) \] for some function \(h\) we can calculate

Why we’ll detour through probability theory

Markov’s inequality

Markov’s inequality (2)

\[ \Prob{Z \geq \epsilon} \leq \frac{\Expect{Z}}{\epsilon} \]

New inequalities from old

\[ \Prob{f(Z) \geq \epsilon} \leq \frac{\Expect{f(Z)}}{\epsilon} \]

From Markov to Chebyshev

Our first deviation inequality

How Chebyshev proved the law of large numbers

How good is Chebyshev?

Why it matters that the Chebyshev bound is so loose for Gaussians

A very hand-wavy introduction to large deviations

(This conclusion is correct but there’s a missing assumption we should be explicit about: what? [see backup])

The exponential Markov inequality

Exponential Markov inequalities / Chernoff Bounds

\[\begin{eqnarray} \Prob{X \geq \epsilon} & \leq & \min_{t \geq 0}{e^{-t\epsilon}\Expect{e^{tX}}}\\ \Prob{X \leq \epsilon} & \leq & \min_{t > 0}{e^{t\epsilon}\Expect{e^{-tX}}} \end{eqnarray}\]

Moment generating functions (1)

\[\begin{eqnarray} e^u & = & \sum_{k=0}^{\infty}{\frac{u^k}{k!}}\\ \Expect{e^{tX}} & = & \sum_{k=0}^{\infty}{\frac{t^k}{k!}\Expect{X^k}} \end{eqnarray}\]

Moment generating functions (2)

Exponential bounds

Exponential bounds (2)

“Sub-Gaussian” distributions

Summing up

Backup: Lower bounds on deviation probabilities

Backup: An ungraded, character-building exercise

Suppose \(Z_n\) are random variables with common mean \(\mu\)

  1. Use Chebyshev to show that if \(\Var{Z_n} \rightarrow 0\), then \(\Prob{(Z_n -\mu)^2 \geq \epsilon} \rightarrow 0\) (no matter how small \(\epsilon\) is)
  2. Can you use Paley-Zygmund to get a lower bound on \(\Prob{(Z_n -\mu)^2 \geq \epsilon}\), just assuming \(\Var{Z_n}\) is finite? If not, what else do you need to assume about the distribution of the \(Z_n\)?
  3. Now assume that \(\Var{Z_n} \rightarrow \sigma^2 > 0\). Can you use Paley-Zygmund to show that \(\Prob{(Z_n - \mu)^2 \geq \epsilon} \not\rightarrow 0\)? If not, what more do you need to assume about the distribution of the \(Z_n\)?

Backup: Large deviations theory

Why there aren’t exponential Chebyshev inequalities

Backup: Central limit theorem and moment generating functions

Backup: Moment generating functions vs. cumulant generating functions

Backup: The implicit assumption on slide 14