Model Averaging

36-465/665, Spring 2021

1 April 2021 (Lecture 17)

Why are we picking one model class at all?

Model averaging

Why might averaging models help?

Model averaging uses diversity to lower risk

\[ (\mu-\overline{s})^2 = \frac{1}{q}\sum_{i=1}^{q}{(s_i - \mu)^2} - V \]

The math generalizes

Upshot of this math

\[ (\text{risk of ensemble}) = (\text{average individual risk}) - (\text{ensemble diversity}) \]

How do we get many diverse models?

How do we get many diverse models? (2)

What about model complexity?

\[ \overline{s}(x) = \sum_{i=1}^{q}{w_i s_i(x)} \]

(or similar for other kinds of weighted combination)

Why (sensible) model averaging doesn’t massively overfit

Real-data example using bagging of decision trees

Drawbacks to model averaging

Summing up

