Simulation for Inference II — Matching Simulations to Data

36-467/36-667

8 November 2018

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]} \newcommand{\Probwrt}[2]{\mathbb{P}_{#1}\left( #2 \right)} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}`left[ #1 \right]} \newcommand{\Expectwrt}[2]{\mathbb{E}_{#1}\left[ #2 \right]} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \]

In our last episodes

Agenda

How we estimate, in general

\[ \hat{\theta}_n = \argmin_{\theta}{M_n(\theta)} \]

If \(M_n(\theta) \rightarrow m(\theta)\) as \(n\rightarrow\infty\)

and \(m(\theta)\) has a unique minimum at the true \(\theta^*\)

then (generally) \(\hat{\theta}_n \rightarrow \theta^*\) (consistency)

If we’re dealing with well-behaved interior mininma, \[ \hat{\theta}_n \approx \theta^* - (\nabla \nabla M_n(\theta^*))^{-1} \nabla M_n(\theta^*) \] and \[ \Var{\hat{\theta}_n} \approx (\nabla\nabla m(\theta^*))^{-1} \Var{\nabla M_n(\theta^*)} (\nabla\nabla m(\theta^*))^{-1} \]

The Progress of Statistical Methods

  1. Calculate likelihood, solve explicitly for MLE
  2. Can’t solve for MLE but can still write down likelihood, calculate it, and maximize numerically
  3. Even calculating the likelihood is intractable

Why Finding the Likelihood Becomes Hard

Method of simulated moments

Assume: the data really came from our simulation model, at some true value \(\theta^*\) of the adjustable parameters \(\theta\)

  1. Pick your favorite \(q\)-dimensional vector of statistics \(B\) (“generalized moments”)
  2. Calculate from data, \(B(obs) = B(X)\)
  3. Pick a staring parameter value \(\theta\)
    1. simulate multiple times, say \(s\), getting \(\tilde{X}_1, \ldots \tilde{X}_s\)
    2. calculate average of \(B\), \(\overline{B}(\theta, n, s) \equiv \frac{1}{s}\sum_{i=1}^{s}{B(\tilde{X}_i)}\)
    3. For large \(n\), \(\overline{B}(\theta, n, s) \approx \Expectwrt{\theta}{B}\)
  4. Adjust \(\theta\) so expectations are close to \(B(obs)\)

Method of simulated moments

Method of simulated moments

A challenging example

The logistic map in R

# Do one step of the logistic map, $x(t+1) = 4rx(t)(1-x(t))$
# Inputs: initial condition (x)
  # logistic parameter (r)
# Output: next value of the logistic map
# Presumes: x is a single numerical value in [0,1]
  # r is a single numerical value in [0,1]
logistic.map <- function(x,r) {
  return(4*r*x*(1-x))
}

# Generate a time series from the logistic map, $x(t+1)=4rx(t)(1-x(t))$
# Inputs: length of desired time series (timelength)
  # logistic parameter (r)
  # value of x(1) (initial.cond, uniformly randomly generated if omitted)
# Output: vector of length timelength
# Presumes: r is a single numerical value in [0,1]
  # timelength is a positive integer
  # initialcond is a single numerical value in [0,1], or omitted
  # logistic.map() is defined as above
logistic.map.ts <- function(timelength, r, initial.cond=NULL) {
  x <-vector(mode="numeric", length=timelength)
  if(is.null(initial.cond)) {
    x[1] <- runif(1)
  } else {
    x[1] <- initial.cond
  }
  for (t in 2:timelength) {
    x[t] = logistic.map(x[t-1],r)
  }
  return(x)
}

Sensitive dependence on initial conditions

traj1 <- logistic.map.ts(1000, r=0.9, initial.cond=0.5001)
traj2 <- logistic.map.ts(1000, r=0.9, initial.cond=0.4998)
plot(1:100, traj1[1:100], ylim=c(0,1), xlab="t", ylab="X(t)", type="b")
points(1:100, traj2[1:100], pch=2, col="blue", type="b")

Sensitive dependence on initial conditions

plot(traj1, traj2)
rug(x=traj1, side=1, col="black", ticksize=-0.01) # negative sizes for pointing outward
rug(x=traj2, side=2, col="blue", ticksize=-0.01)

Long-run stability

par(mfrow=c(1,2))
hist(traj1)
hist(traj2)

logistic.noisy.ts <- function(timelength,r,initial.cond=NULL,noise.sd=0.1) {
    x <- logistic.map.ts(timelength,r,initial.cond)
    return(x+rnorm(timelength,0,noise.sd))
}

mean.logistic <- function(r,n=1e3,s=10) {
    mean(replicate(s,mean(logistic.map.ts(n,r))))
}

mean.logistic.plottable <- function(r,n=1e3,s=10) {
    sapply(r,mean.logistic,n=n,s=s)
}

var.logistic <- function(r,n=1e3,s=10) {
    mean(replicate(s,var(logistic.map.ts(n,r))))
}

var.logistic.plottable <- function(r,n=1e3,s=10) {
    sapply(r,var.logistic,n=n,s=s)
}

Using MSM on the Logistic Map

# Estimate the logistic map using the method of simulated moments
  # With the moments being the mean and variance
# Inputs: a time series (x)
  # Number of simulations to run (s)
# Output: estimated value of the logistic parameter
# Presumes: x was generated by the logistic map
msm.logistic.est <- function(x, s=10) {
    n <- length(x)
    moments <- c(mean(x), var(x))
        # Define a function to minimize, namely the discrepancy between
        # the moments implied by a given value of the logistic parameter r,
        # and the moments found from the data
    moment.discrep <- function(r) {
                # create an n*s array storing s simulation runs
        sims <- replicate(s, logistic.map.ts(n,r))
                # calculate mean and variance within each run, and average
                # across runs
        moments.from.sims <- c(mean(apply(sims,2,mean)),
                                       mean(apply(sims,2,var)))
                # Return the sum of squared differences in moments
        return(sum((moments-moments.from.sims)^2))
    }
        # Return the logistic parameter that minimizes the discrepancy in
        # the moments
          # see help(optimize) for this function for 1D optimization
    return(optimize(f=moment.discrep,lower=0,upper=1)$minimum)
}

What’s the sampling distribution of \(\widehat{r}_{MSM}\)?

plot(density(msm.estimates),
     main="Sampling distribution of MSM estimates for logistic map",
     sub=expression(r==0.9, n==100, s==10))

Can we identify \(r\) from these moments?

Can we identify \(r\) from these moments?

In-class exercise

\(\mathrm{dim}(\theta) = p\), \(\mathrm{dim}(B) = q\)

\[\begin{eqnarray} M_n(\theta) & \equiv & \| \overline{B}(\theta,n,s) - B(obs)\|^2\\ \hat{\theta}_{MSM} & \equiv & \argmin_{\theta}{M_n(\theta)}\\ \overline{B}(\theta, n, s) &\rightarrow & b(\theta)\\ B(obs) & \rightarrow & b(\theta^*)\\ M_n(\theta) \rightarrow m(\theta) & \equiv & \| b(\theta) - b(\theta^*) \|^2 \end{eqnarray}\]

Show that: \[\begin{eqnarray} \frac{\partial M_n}{\partial \theta_i}(\theta^*) & \rightarrow & 2\sum_{k=1}^{q}{(\overline{B}_k(\theta,n,s) - B_k(obs))\frac{\partial b_k}{\partial \theta_i}(\theta^*)}\\ \frac{\partial^2 m}{\partial \theta_i \partial \theta_j}(\theta^*) & = & 2\sum_{k=1}^{q}{\frac{\partial b_k}{\partial \theta_i}(\theta^*)\frac{\partial b_k}{\partial \theta_j}(\theta^*)} \end{eqnarray}\]

Solution

The chain rule is our friend:

\[\begin{eqnarray} \frac{\partial M_n}{\partial \theta_i}(\theta^*) & = & 2\sum_{k=1}^{q}{(\overline{B}_k(\theta,n,s) - B_k(obs))\frac{\partial \overline{B}_k(\theta,n,s)}{\partial \theta_i}(\theta^*)}\\ & \rightarrow & 2\sum_{k=1}^{q}{(\overline{B}_k(\theta,n,s) - B_k(obs))\frac{\partial b_k}{\partial \theta_i}(\theta^*)}\\ \frac{\partial^2 m}{\partial \theta_i \partial \theta_j}(\theta^*) & = & 2\sum_{k=1}^{q}{ \frac{\partial}{\partial \theta_i}\left( (b_k(\theta) - b_k(\theta^*)) \frac{\partial b_k(\theta)}{\partial \theta_j}(\theta^*)\right)}\\ & = & 2\sum_{k=1}^{q}{(b_k(\theta) - b_k(\theta^*)) \frac{\partial^2 b_k(\theta)}{\partial \theta_i \partial \theta_j}(\theta^*) + \frac{\partial b_k(\theta)}{\partial \theta_i}(\theta^*) \frac{\partial b_k(\theta)}{\partial \theta_j}(\theta^*)}\\ & = & 2\sum_{k=1}^{q}{\frac{\partial b_k}{\partial \theta_i}(\theta^*)\frac{\partial b_k}{\partial \theta_j}(\theta^*)} \end{eqnarray}\]

What’s the moral of this calculus?

Indirect Inference

What’s going on here?

A More Formal Statement

Indirect inference is consistent, if the auxiliary model isn’t too bad

Assume:

  1. As \(n \rightarrow \infty\), \(U_{n,s,\theta}(\beta) \rightarrow u(\beta,\theta)\), uniformly in \(\beta\) and \(\theta\).
  2. For each \(\theta\), \(u(\beta, \theta)\) has a unique optimum in \(\beta\), say \(b(\theta)\).
  3. As \(n \rightarrow \infty\), \(\widehat{\beta}_n \rightarrow b(\theta^*)\).
  4. The equation \(\beta = b(\theta)\) has a unique solution, i.e., \(b^{-1}\) is well-defined.

then as \(n \rightarrow \infty\), \[ \widehat{\theta}_{II} \rightarrow \theta^* \]

Asymptotic Distribution of Indirect Estimates

Checking Indirect Inference

Given real and auxiliary model, will indirect inference work, i.e., be consistent?

What auxiliary models?

Example: Logistic Map + Noise

logistic.noisy.ts <- function(timelength,r,
                              initial.cond=NULL,noise.sd=0.1) {
  x <- logistic.map.ts(timelength,r,initial.cond)
  return(x+rnorm(timelength,0,noise.sd))
}

Indirect inference for the noisy logistic map

  1. fix \(p\) for AR model
  2. Fit AR(\(p\)) to data, get \(\widehat{\beta} = (\widehat{\beta}_1, \ldots \widehat{\beta}_p)\)
  3. Simulate \(s\) sample trajectories with parameter \(r\), calculate \((\widehat{\beta}_1, \ldots \widehat{\beta}_p)\) for each, average over trajectories to get \(\overline{\beta}(r,n,s)\)
  4. Minimize \(\|\widehat{\beta} - \overline{\beta}(r,n,s)\|\)
logistic.map.II <- function(y,order=2,S=10) {
  T <- length(y)
  ar.fit <- function(x) {
    return(ar(x,aic=FALSE,order.max=order)$ar)
  }
  beta.data <- ar.fit(y)
  beta.discrep <- function(r) {
    beta.S <- mean(replicate(S,ar.fit(logistic.noisy.ts(T,r))))
    return(sum((beta.data - beta.S)^2))
  }
  return(optimize(beta.discrep,lower=0.75,upper=1))
}

Does it work \((r=0.8)\)?

To see how well this does, simulate it:

x <- logistic.noisy.ts(1e3,0.8)
plot(density(replicate(100,
                       logistic.map.II(x,order=2)$minimum)),
     main="Density of indirect estimates")

Does it work \((r=0.9)\)?

\(r=0.8\) is periodic, what about chaos, say \(r=0.9\)?

plot(density(replicate(30,
     logistic.map.II(logistic.noisy.ts(1e3,r=0.9),
                     order=2)$minimum)),
     main="Density of indirect estimates, r=0.9")

Does it work \((n\rightarrow\infty)\)?

I promised to check that the inference is working my seeing that the errors are shrinking:

mse.logistic.II <- function(n,r=0.9,reps=300,order=2,S=10) {
  II <- replicate(reps,logistic.map.II(logistic.noisy.ts(n,r),
                  order=order,S=S)$minimum)
  II.error = II - r # Uses recycling
  return(mean(II.error^2))
} 

Does it work \((n\rightarrow\infty)\)?

Model checking for simulation models

Summary

Backup: Asymptotic gory details for the method of simulated moments

Backup: Asymptotic gory details for indirect inference

Backup: Further reading

References

Gelman, Andrew, and Cosma Rohilla Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66:8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x.

Gouriéroux, Christian, and Alain Monfort. 1996. Simulation-Based Econometric Methods. Oxford, England: Oxford University Pres.

Gouriéroux, Christian, Alain Monfort, and E. Renault. 1993. “Indirect Inference.” Journal of Applied Econometrics 8:S85–S118. http://www.jstor.org/pss/2285076.

Kendall, Bruce E., Stephen P. Ellner, Edward Mccauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch, and Peter Turchin. 2005. “Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecological Monographs 75:259–76. https://doi.org/10.1890/03-4056.

Neal, Radford M., and Geoffrey E. Hinton. 1998. “A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants.” In Learning in Graphical Models, edited by Michael I. Jordan, 355–68. Dordrecht: Kluwer Academic. http://www.cs.toronto.edu/~radford/em.abstract.html.

Smith, Anthony A., Jr. n.d. “Indirect Inference.” In New Palgrave Dictionary of Economics, edited by Stephen Durlauf and Lawrence Blume, Second. London: Palgrave Macmillan. http://www.econ.yale.edu/smith/palgrave7.pdf.

Zhao, Linqiao. 2010. “A Model of Limit-Order Book Dynamics and a Consistent Estimation Procedure.” PhD thesis, Carnegie Mellon University. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2067&rep=rep1&type=pdf.