Regularizing Optimization with Penalties and Constraints

36-465/665, Spring 2021

16 March 2021 (Lecture 12)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{r}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \DeclareMathOperator{\tr}{tr} \newcommand{\Indicator}[1]{\mathbb{1}\left\{ #1 \right\}} \newcommand{\myexp}[1]{\exp{\left( #1 \right)}} \newcommand{\eqdist}{\stackrel{d}{=}} \newcommand{\Rademacher}{\mathcal{R}} \newcommand{\EmpRademacher}{\hat{\Rademacher}} \newcommand{\Growth}{\Pi} \newcommand{\VCD}{\mathrm{VCdim}} \newcommand{\OptDomain}{\Theta} \newcommand{\OptDim}{p} \newcommand{\optimand}{\theta} \newcommand{\altoptimand}{\optimand^{\prime}} \newcommand{\ObjFunc}{M} \newcommand{\outputoptimand}{\optimand_{\mathrm{out}}} \newcommand{\Hessian}{\mathbf{h}} \newcommand{\Penalty}{\Omega} \newcommand{\Lagrangian}{\mathcal{L}} \]

Previously

Think about ordinary least squares

Think about ordinary least squares (2)

\[ \hat{\beta} = (\mathbf{x}^T\mathbf{x})^{-1} \mathbf{x}^T \mathbf{y} \]

Thinking about ordinary least squares (3)

Thinking about ordinary least squares (4)

Thinking about ordinary least squares (5)

Penalties

Penalties

Some pictures

Some pictures (2)

Some pictures (3)

What does the penalty do?

What specifically does the \(L_2\) penalty do?

What about \(L_1\)?

What about \(L_1\)? (2)

Penalties \(\Leftrightarrow\) Constraints

Constrained optimization in general

  1. Use the constraint equation \(\Penalty(\optimand) = c\) to eliminate a degree of freedom
    • i.e., write one coordinate in \(\optimand\) as a function of the others and of \(c\)
    • Do unconstrained optimization over the remaining degrees of freedom
    • What about the \(\leq\) case?!?
  2. Add a new variable and do unconstrained optimization over a larger problem

Lagrange multipliers

Lagrange multipliers (2)

Lagrange multipliers are prices

Lagrange multipliers vs. penalties

Many constraints

Inequality constraints

Summing up on constraints and Lagrange multipliers

Mathematical programming

Mathematical programming (2)

What do constraints/penalties do to learning and risk?

Summing up

Backup: “Comrades, let’s optimize!”

References

Kantorovich, L. V. 1965. The Best Use of Economic Resources. Cambrdige, Massachusetts: Harvard University Press.

Robert Dorfman, Paul A. Samuelson, and Robert M. Solow. 1958. Linear Programming and Economic Analysis. New York: McGraw-Hill.

Spufford, Francis. 2010. Red Plenty. London: Faber; Faber.