Low-Regret Learning II

36-465/665, Spring 2021

29 April 2021 (Lecture 24)

Agenda for today

Risk and multiplicative weight training

Risk and multiplicative weight training (2)

Risk and multiplicative weight training (3)

Risk and multiplicative weight training: summing up

Competing with sequences of experts

Sequences of experts can be treated like experts

Counting sequences of experts

Restricting sequences of experts

Not explicitly representing sequences of experts

What fixed shares does and why it works

\[\begin{eqnarray} v_{i,t} & = & w_{i, t-1} \myexp{-\beta \Loss(y_t, s_i(t))}\\ w_{i,t} & = & (1-\alpha) v_{i,t} + \alpha \frac{\sum_{i=1}^{q}{v_{i,t}}}{q} \end{eqnarray}\]

Growing ensembles

Summing up

Backup: Further topics in low-regret learning


  1. A “one-armed bandit” is another name for a “slot machine”, a gambling device where you put a coin in a slot, and then get to pull a lever or arm which generates a random pay-off — which is usually zero. It’s called a “bandit” because it takes your money (with high probability). Some machines have two arms, one on each side, often where one arm has a higher probability of small rewards, and the other a lower probability of larger rewards. (Both arms will have negative expected rewards, because the owner of the slot machine wants to make money.) The idea of the “two-armed bandit” statistical problem is to try to figure out, from observation, which arm has higher expected rewards. This has a lot of real-world applications (which medical treatment / educational technique works better, on average?). Calling this a “bandit problem” is yet another example, like “bootstrap” or “Monte Carlo”, of a phrase that began as a joke, and hardened into obscure jargon.