Low-Regret Learning II

36-465/665, Spring 2021

29 April 2021 (Lecture 24)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{\Risk}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \DeclareMathOperator{\tr}{tr} \newcommand{\Indicator}[1]{\mathbb{1}\left\{ #1 \right\}} \newcommand{\myexp}[1]{\exp{\left( #1 \right)}} \newcommand{\eqdist}{\stackrel{d}{=}} \newcommand{\Rademacher}{\mathcal{R}} \newcommand{\EmpRademacher}{\hat{\Rademacher}} \newcommand{\Growth}{\Pi} \newcommand{\VCD}{\mathrm{VCdim}} \newcommand{\OptDomain}{\Theta} \newcommand{\OptDim}{p} \newcommand{\optimand}{\theta} \newcommand{\altoptimand}{\optimand^{\prime}} \newcommand{\ObjFunc}{M} \newcommand{\outputoptimand}{\optimand_{\mathrm{out}}} \newcommand{\Hessian}{\mathbf{h}} \newcommand{\Penalty}{\Omega} \newcommand{\Lagrangian}{\mathcal{L}} \newcommand{\HoldoutRisk}{\tilde{\Risk}} \DeclareMathOperator{\sgn}{sgn} \newcommand{\Margin}{M} \newcommand{\CumLoss}{L} \newcommand{\EnsembleAction}{\overline{a}} \newcommand{\CumEnsembleLoss}{\overline{\CumLoss}} \newcommand{\Regret}{R} \newcommand{\MetaExpert}{\mathcal{M}} \]

Previously

Agenda for today

Risk and multiplicative weight training

Risk and multiplicative weight training (2)

Risk and multiplicative weight training (3)

Risk and multiplicative weight training: summing up

Competing with sequences of experts

Sequences of experts can be treated like experts

Counting sequences of experts

Restricting sequences of experts

Not explicitly representing sequences of experts

What fixed shares does and why it works

\[\begin{eqnarray} v_{i,t} & = & w_{i, t-1} \myexp{-\beta \Loss(y_t, s_i(t))}\\ w_{i,t} & = & (1-\alpha) v_{i,t} + \alpha \frac{\sum_{i=1}^{q}{v_{i,t}}}{q} \end{eqnarray}\]

Growing ensembles

Summing up

Backup: Further topics in low-regret learning

References

Cesa-Bianchi, Nicolò, and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge, England: Cambridge University Press.

Herbster, Mark, and Manfred Warmuth. 1998. “Tracking the Best Expert.” Machine Learning 32:151–78.

Rakhlin, Alexander, Karthik Sridharan, and Ambuj Tewari. 2010. “Online Learning: Random Averages, Combinatorial Parameters, and Learnability.” In Advances in Neural Information Processing 23 [Nips 2010], edited by John Lafferty, C. K. I. Williams, John Shawe-Taylor, Richard S. Zemel, and A. Culotta, 1984–92. Cambridge, Massachusetts: MIT Press. http://arxiv.org/abs/1006.1138.

———. 2011. “Online Learning: Stochastic and Constrained Adversaries.” In Advances in Neural Information Processing Systems 24 [Nips 2011], edited by John Shawe-Taylor, Richard S. Zemel, Peter Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, 1764–72. http://arxiv.org/abs/1104.5070.

Shalizi, Cosma Rohilla, Abigail Z. Jacobs, Kristina Lisa Klinkner, and Aaron Clauset. 2011. “Adapting to Non-Stationarity with Growing Expert Ensembles.” Statistics Department, CMU. http://arxiv.org/abs/1103.0949.


  1. A “one-armed bandit” is another name for a “slot machine”, a gambling device where you put a coin in a slot, and then get to pull a lever or arm which generates a random pay-off — which is usually zero. It’s called a “bandit” because it takes your money (with high probability). Some machines have two arms, one on each side, often where one arm has a higher probability of small rewards, and the other a lower probability of larger rewards. (Both arms will have negative expected rewards, because the owner of the slot machine wants to make money.) The idea of the “two-armed bandit” statistical problem is to try to figure out, from observation, which arm has higher expected rewards. This has a lot of real-world applications (which medical treatment / educational technique works better, on average?). Calling this a “bandit problem” is yet another example, like “bootstrap” or “Monte Carlo”, of a phrase that began as a joke, and hardened into obscure jargon.