Hidden Markov Models and State Estimation

36-467/36-667

19 November 2020 (Lecture 23)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \]

Housekeeping

Previously

Markov Property vs. Measurement Noise

A Concrete Example

The Current State “Screens Off” the Past from the Future

\(\Prob{X(t+1)=x(t+1)|S(t)=s, X(1:t)=x(1:t)}\) \[\begin{eqnarray} & = & \sum_{r}{\Prob{X(t+1)=x(t+1)|S(t+1)=r, S(t)=s, X(1:t)=x(1:t)}\Prob{S(t+1)=r|S(t)=s,X(1:t)=x(1:t)}}\\ & = & \sum_{r}{\Prob{X(t+1)|S(t+1)=r}\Prob{S(t+1)=r|S(t)=s}} \end{eqnarray}\]

Relating \(X(t)\) to Its Past through the State

\(\Prob{X(t+1)=x(t+1)|X(1:t)=x(1:t)}\) \[\begin{eqnarray} & = & \sum_{s}{\Prob{X(t+1)=x(t+1)|S(t)=s,X(1:t)=x(1:t)}\Prob{S(t)=s|X(1:t)=x(1:t)}}\\ & = & \sum_{s}{\left(\sum_{r}{\Prob{X(t+1)|S(t+1)=r}\Prob{S(t+1)=r|S(t)=s}}\right) \Prob{S(t)=s|X(1:t)=x(1:t)}} \end{eqnarray}\]

\(\Rightarrow\) Given \(X(t)\), the only way \(X(t-1)\) can matter for \(X(t+1)\) is by changing the conditional distribution of \(S(t)\)

Relating \(X(t)\) to \(S(t)\)

History Matters for \(X(t)\)

Hidden Markov Models / State-Space Models

(for continuous variables, replace PMFs by pdfs as needed)

Some Basic Calculations for HMMs

Why We Want the Probability of an Observation Sequence

\[ \Prob{X(1:n)=x(1:n)} = \sum_{s(1:n)}{ \Prob{S(1)=s(1)} \prod_{t=2}^{n}{q(s(t-1), s(t))}\prod_{t=1}^{n}{g(s(t), x(t))}} \]

A Recursion Trick (a.k.a. “The Forward Algorithm”) [1]

A Recursion Trick (a.k.a. “The Forward Algorithm”) [2]

A Recursion Trick (a.k.a. “The Forward Algorithm”) [3]

The Forward Algorithm

Back to the Likelihood

Actually Doing the Calculations

The Particle Filter: Words

The Particle Filter

  1. At \(t=1\), make \(m\) independent draws \(S^*_1, S^*_2, \ldots S^*_m\) from \(\Prob{S(1)}\)
    • These are the particles
  2. For each \(S^*_i\), calculate \(g(S^*_i, x(t)) \equiv w_{i}\)
    • \(\Prob{X(t)=x(t)|X(1:t-1)=x(1:t-1)} \approx m^{-1} \sum_{i=1}^{m}{w_i}\)
  3. Resample the particles: draw \(m\) of them, with replacement, from \(S^*\), with probabilities \(\propto w_i\); call these \(\tilde{S}_1, \tilde{S}_2, \ldots \tilde{S}_m\)
  4. Set \(\hat{F}_t\) to the sample distribution of the \(\tilde{S}\)
    • The distribution of the resampled particles approximates \(\Prob{S(t)|X(1:t)=x(1:t)}\)
  5. Increment \(t\) by 1, and evolve the particles forward in time: for each \(\tilde{S}_i\), draw \(S^*_i\) from \(q(\tilde{S}_i, S^*_i)\)
    • Distribution of the new \(S^*\) approximates \(\Prob{S(t+1)|X(1:t)=x(1:t)}\).
  6. Go to (2)

The Particle Filter: Code

particle.filter <- function(x, rinitial, rtransition, dobs, m) {
    n <- length(x)
    particles <- replicate(m, rinitial())
    particle.history <- matrix(0, nrow = m, ncol = n)
    for (t in 1:n) {
        weights <- dobs(state = particles, observation = x[t])
        particles <- sample(particles, size = m, replace = TRUE, prob = weights)
        particle.history[, t] <- particles
        particles <- sapply(particles, rtransition)
    }
    return(particle.history)
}

The Particle Filter: For Our Little Demo

sim.pf <- particle.filter(x=sim$x,
                          rinitial=function() { sample(-1,+1, size=1) },
                          rtransition=function (x) {
                              stay <- sample(c(TRUE, FALSE), size=1, prob=c(0.75, 0.25))
                              if (stay) { return(x) }
                              if (!stay) { return(-x) }
                          },
                          dobs=function(state, observation) {
                              dnorm(x=observation, mean=state, sd=1)
                          },
                          m=100)

The Particle Filter: Run on Our Little Demo

Issues with the Particle Filter

  1. Monte Carlo error: really want \(m\) to be as large as possible
    • But it costs computing time
    • Error is \(O(1/\sqrt{m})\) so diminishing returns
  2. “Particle collapse”: if one \(w_i \gg w_j\) for all the other \(j\), then (almost) all new particles are copies of \(S^*_i\) and the approximation is bad
    • Lots of hacks around this
  3. What if we don’t know transitions \(q\) or observation probabilities \(g\)?

Parameter Estimation

What About Space? What About Time?

Summing Up

Many more details in the handout on the website

Backup: Applying Indirect Inference to Our Demo

Backup: Finding the Auxiliary Parameter Estimates

aux.estimator <- function(x) {
    ar2.fit <- ar.ols(x, aic = FALSE, order = 2)
    return(ar2.fit$ar)
}
aux.from.sim <- function(q, s, n = length(sim$x)) {
    estimates <- replicate(s, aux.estimator(noisy.rpmchain(n = n, q = q)$x))
    return(rowMeans(estimates))
}

Backup: Can We Go From Auxiliary Parameters to Generative Parameters?

Backup: Optimizing the Generative Model so Simulation-Auxiliaries Match Data-Auxiliaries

toy.indirect.inference <- function(x, starting.q, s) {
    n <- length(x)
    aux.data <- aux.estimator(x)
    discrepency <- function(param) {
        q <- param
        aux.mean <- aux.from.sim(q = q, s = s, n = n)
        return(sum((aux.data - aux.mean)^2))
    }
    fit <- optim(par = starting.q, fn = discrepency, method = "Brent", lower = 0, 
        upper = 1)
    return(fit$par)
}
(toy.ii <- toy.indirect.inference(x = sim$x, starting.q = 0.5, s = 500))
## [1] 0.7092326
summary(replicate(30, toy.indirect.inference(x = sim$x, starting.q = 0.5, s = 500)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6929  0.7037  0.7108  0.7110  0.7167  0.7332
+ All the variation here is from simulation to simulation