From Rademacher complexity to VC dimension

36-465/665, Spring 2021

4 March 2021 (Lecture 9)

Previously

Three ways forward from Rademacher complexity

  1. Theory: calculations involving the distribution and the models
  2. Data-dependence: Looking at how the models actually perform over the data
  3. Distribution-independence: Using properties of the models that hold in any distribution or data set

Theory

Suppose our \(\mathbb{P}\left( \|X\| \leq r \right) = 1\), and we’re using linear models, so \(s(x) = x\cdot \beta\), with \(\| \beta \| \leq b\). Then \(\hat{\mathcal{R}}_n \leq \frac{rb}{\sqrt{n}}\), so the same bound holds for \(\mathcal{R}_n\)

Data-dependent bounds

Distribution-free bounds

Getting a distribution-free bound for classification

Growth function

The growth function upper-bounds the Rademacher complexity

It really matters whether the growth function stays exponential

Shattering and the VC dimension

Some examples of VC dimension

Some examples of VC dimension (2)

Finite VC dimension \(\Rightarrow\) distribution-free bounds

If \(\mathrm{VCdim}(S) = d < \infty\), then for \(n \geq d\), \[ \Pi_{S}(n) \leq \left( \frac{en}{d} \right)^d = O(n^d) \] while if \(\mathrm{VCdim}(S) = \infty\), then \(\Pi_{S}(n) = 2^n\) for all \(n\)

Note 1: What makes this a “dimension”?

Note 2: VC dimension and falsifiability

VC dimension and uniform convergence

Beyond binary classifiers with 0-1 loss

Trade-offs

Between different bounds

Between different model classes

Summing up

References

(Popper’s book is actually from 1934 but R Markdown’s bibliography processor isn’t doesn’t understand how to handle translated works)

Anthony, Martin, and Peter L. Bartlett. 1999. Neural Network Learning: Theoretical Foundations. Cambridge, England: Cambridge University Press.

Lunde, Robert, and Cosma Rohilla Shalizi. 2017. “Bootstrapping Generalization Error Bounds for Time Series.” arxiv:1711.02834. https://arxiv.org/abs/1711.02834.

Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning. Cambridge, Massachusetts: MIT Press.

Popper, Karl R. n.d. The Logic of Scientific Discovery. London: Hutchinson.

Vidyasagar, Mathukumalli. 2003. Learning and Generalization: With Applications to Neural Networks. Second. Berlin: Springer-Verlag.