Simulation for Inference II — Matching Simulations to Data

36-467/667

27 October 2020 (Lecture 17)

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]} \newcommand{\Probwrt}[2]{\mathbb{P}_{#1}\left( #2 \right)} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}`left[ #1 \right]} \newcommand{\Expectwrt}[2]{\mathbb{E}_{#1}\left[ #2 \right]} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \]

May their memory be a blessing

In our last episodes

Agenda

How we estimate, in general

\[ \hat{\theta}_n = \argmin_{\theta}{M_n(\theta)} \]

If \(M_n(\theta) \rightarrow m(\theta)\) as \(n\rightarrow\infty\)

and \(m(\theta)\) has a unique minimum at the true \(\theta^*\)

then (generally) \(\hat{\theta}_n \rightarrow \theta^*\) (consistency)

If \(m\) has a well-behaved interior minimum, \[ \hat{\theta}_n \approx \theta^* - (\nabla \nabla M_n(\theta^*))^{-1} \nabla M_n(\theta^*) \] and so \[ \Var{\hat{\theta}_n} \approx (\nabla\nabla m(\theta^*))^{-1} \Var{\nabla M_n(\theta^*)} (\nabla\nabla m(\theta^*))^{-1} \]

The Progress of Statistical Methods

  1. Calculate likelihood, solve explicitly for MLE
  2. Can’t solve for MLE but can still write down likelihood, calculate it, and maximize numerically
  3. Even calculating the likelihood is intractable

Why Finding the Likelihood Becomes Hard

Method of simulated moments

Assume: the data really came from our simulation model, at some true value \(\theta^*\) of the adjustable parameters \(\theta\)

  1. Pick your favorite \(q\)-dimensional vector of statistics \(B\) (“generalized moments”)
  2. Calculate from data, \(B(obs) = B(X)\)
  3. Pick a staring parameter value \(\theta\)
    1. simulate multiple times, say \(s\), getting \(\tilde{X}_1, \ldots \tilde{X}_s\)
    2. calculate average of \(B\), \(\overline{B}(\theta, n, s) \equiv \frac{1}{s}\sum_{i=1}^{s}{B(\tilde{X}_i)}\)
    3. For large \(n\), \(\overline{B}(\theta, n, s) \approx \Expectwrt{\theta}{B}\)
  4. Adjust \(\theta\) so expectations are close to \(B(obs)\)

Method of simulated moments

Method of simulated moments

A challenging example

The logistic map in R

# Do one step of the logistic map, $x(t+1) = 4rx(t)(1-x(t))$
# Inputs: initial condition (x)
  # logistic parameter (r)
# Output: next value of the logistic map
# Presumes: x is a single numerical value in [0,1]
  # r is a single numerical value in [0,1]
logistic.map <- function(x,r) {
  return(4*r*x*(1-x))
}

# Generate a time series from the logistic map, $x(t+1)=4rx(t)(1-x(t))$
# Inputs: length of desired time series (timelength)
  # logistic parameter (r)
  # value of x(1) (initial.cond, uniformly randomly generated if omitted)
# Output: vector of length timelength
# Presumes: r is a single numerical value in [0,1]
  # timelength is a positive integer
  # initialcond is a single numerical value in [0,1], or omitted
  # logistic.map() is defined as above
logistic.map.ts <- function(timelength, r, initial.cond=NULL) {
  x <-vector(mode="numeric", length=timelength)
  if(is.null(initial.cond)) {
    x[1] <- runif(1)
  } else {
    x[1] <- initial.cond
  }
  for (t in 2:timelength) {
    x[t] = logistic.map(x[t-1],r)
  }
  return(x)
}

Sensitive dependence on initial conditions

traj1 <- logistic.map.ts(1000, r=0.9, initial.cond=0.5001)
traj2 <- logistic.map.ts(1000, r=0.9, initial.cond=0.4998)
plot(1:100, traj1[1:100], ylim=c(0,1), xlab="t", ylab="X(t)", type="b")
points(1:100, traj2[1:100], pch=2, col="blue", type="b")

Sensitive dependence on initial conditions

plot(traj1, traj2)
rug(x=traj1, side=1, col="black", ticksize=-0.01) # negative sizes for pointing outward
rug(x=traj2, side=2, col="blue", ticksize=-0.01)

Long-run stability

par(mfrow=c(1,2))
hist(traj1)
hist(traj2)

Using MSM on the Logistic Map

(Dashed lines = values from our two trajectories illustrating sensitive dependence on initial conditions)

# Estimate the logistic map using the method of simulated moments
  # With the moments being the mean and variance
# Inputs: a time series (x)
  # Number of simulations to run (s)
# Output: estimated value of the logistic parameter
# Presumes: x was generated by the logistic map
msm.logistic.est <- function(x, s=10) {
    n <- length(x)
    moments <- c(mean(x), var(x))
        # Define a function to minimize, namely the discrepancy between
        # the moments implied by a given value of the logistic parameter r,
        # and the moments found from the data
    moment.discrep <- function(r) {
                # create an n*s array storing s simulation runs
        sims <- replicate(s, logistic.map.ts(n,r))
                # calculate mean and variance within each run, and average
                # across runs
        moments.from.sims <- c(mean(apply(sims,2,mean)),
                                       mean(apply(sims,2,var)))
                # Return the sum of squared differences in moments
        return(sum((moments-moments.from.sims)^2))
    }
        # Return the logistic parameter that minimizes the discrepancy in
        # the moments
          # see help(optimize) for this function for 1D optimization
    return(optimize(f=moment.discrep,lower=0,upper=1)$minimum)
}

What’s the sampling distribution of \(\widehat{r}_{MSM}\)?

plot(density(msm.estimates),
     main="Sampling distribution of MSM estimates for logistic map",
     sub=expression(r==0.9, n==100, s==10))

Can we identify \(r\) from these moments?

Can we identify \(r\) from these moments?

Sensitivity of the objective function to the parameters

\(\mathrm{dim}(\theta) = p\), \(\mathrm{dim}(B) = q\)

\[\begin{eqnarray} M_n(\theta) & \equiv & \| \overline{B}(\theta,n,s) - B(obs)\|^2\\ \hat{\theta}_{MSM} & \equiv & \argmin_{\theta}{M_n(\theta)}\\ \overline{B}(\theta, n, s) &\rightarrow & b(\theta)\\ B(obs) & \rightarrow & b(\theta^*)\\ M_n(\theta) \rightarrow m(\theta) & \equiv & \| b(\theta) - b(\theta^*) \|^2 \end{eqnarray}\]

What’s the point of this calculus?

Indirect Inference

What’s going on here?

A More Formal Statement

Indirect inference is consistent, if the auxiliary model isn’t too bad

  1. As \(n \rightarrow \infty\), \(U_{n,s,\theta}(\beta) \rightarrow u(\beta,\theta)\), uniformly in \(\beta\) and \(\theta\).
  2. For each \(\theta\), \(u(\beta, \theta)\) has a unique optimum in \(\beta\), say \(b(\theta)\).
    • This is called the binding function
  3. As \(n \rightarrow \infty\), \(\widehat{\beta}_n \rightarrow b(\theta^*)\).
  4. The equation \(\beta = b(\theta)\) has a unique solution, i.e., \(b^{-1}\) is well-defined.

then as \(n \rightarrow \infty\), \[ \widehat{\theta}_{II} \rightarrow \theta^* \]

Asymptotic Distribution of Indirect Estimates

Checking Indirect Inference

Fixing the real and auxiliary models, will indirect inference work, i.e., be consistent, i.e., converge on the truth?

What auxiliary models?

Example: Logistic Map + Noise

logistic.noisy.ts <- function(timelength,r,
                              initial.cond=NULL,noise.sd=0.1) {
  x <- logistic.map.ts(timelength,r,initial.cond)
  return(x+rnorm(timelength,0,noise.sd))
}

Indirect inference for the noisy logistic map

  1. Fix \(p\) for AR model
  2. Fit AR(\(p\)) to data, get \(\widehat{\beta} = (\widehat{\beta}_1, \ldots \widehat{\beta}_p)\)
  3. Simulate \(s\) sample trajectories with parameter \(r\), calculate \((\widehat{\beta}_1, \ldots \widehat{\beta}_p)\) for each, average over trajectories to get \(\overline{\beta}(r,n,s)\)
  4. Minimize \(\|\widehat{\beta} - \overline{\beta}(r,n,s)\|\)
logistic.map.II <- function(y,order=2,S=10) {
  T <- length(y)
  ar.fit <- function(x) {
    return(ar(x,aic=FALSE,order.max=order)$ar)
  }
  beta.data <- ar.fit(y)
  beta.discrep <- function(r) {
    beta.S <- mean(replicate(S,ar.fit(logistic.noisy.ts(T,r))))
    return(sum((beta.data - beta.S)^2))
  }
  return(optimize(beta.discrep,lower=0.75,upper=1))
}

Does it work \((r=0.8)\)?

To see how well this does, simulate it:

x <- logistic.noisy.ts(1e3,0.8)
plot(density(replicate(100,
                       logistic.map.II(x,order=2)$minimum)),
     main="Density of indirect estimates")

Does it work \((r=0.9)\)?

\(r=0.8\) is periodic, what about chaos, say \(r=0.9\)?

plot(density(replicate(30,
     logistic.map.II(logistic.noisy.ts(1e3,r=0.9),
                     order=2)$minimum)),
     main="Density of indirect estimates, r=0.9")

Does it work \((n\rightarrow\infty)\)?

I promised to check that the inference is working my seeing that the errors are shrinking:

mse.logistic.II <- function(n,r=0.9,reps=300,order=2,S=10) {
  II <- replicate(reps,logistic.map.II(logistic.noisy.ts(n,r),
                  order=order,S=S)$minimum)
  II.error <- II - r # Uses recycling
  return(mean(II.error^2))
} 

Does it work \((n\rightarrow\infty)\)?

Model checking for simulation models

Summary

Backup: Asymptotic gory details for the method of simulated moments

Backup: Asymptotic gory details for indirect inference

Backup: Further reading

References

Gelman, Andrew, and Cosma Rohilla Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66:8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x.

Gouriéroux, Christian, and Alain Monfort. 1996. Simulation-Based Econometric Methods. Oxford, England: Oxford University Pres.

Gouriéroux, Christian, Alain Monfort, and E. Renault. 1993. “Indirect Inference.” Journal of Applied Econometrics 8:S85–S118. http://www.jstor.org/stable/2285076.

Kendall, Bruce E., Stephen P. Ellner, Edward Mccauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch, and Peter Turchin. 2005. “Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecological Monographs 75:259–76. https://doi.org/10.1890/03-4056.

Neal, Radford M., and Geoffrey E. Hinton. 1998. “A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants.” In Learning in Graphical Models, edited by Michael I. Jordan, 355–68. Dordrecht: Kluwer Academic. http://www.cs.toronto.edu/~radford/em.abstract.html.

Ripley, Brian D. 1981. Spatial Statistics. New York: Wiley. https://doi.org/10.1002/0471725218.

Smith, Anthony A., Jr. 2008. “Indirect Inference.” In New Palgrave Dictionary of Economics, edited by Stephen Durlauf and Lawrence Blume, Second. London: Palgrave Macmillan. http://www.econ.yale.edu/smith/palgrave7.pdf.

Zhao, Linqiao. 2010. “A Model of Limit-Order Book Dynamics and a Consistent Estimation Procedure.” PhD thesis, Carnegie Mellon University. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2067&rep=rep1&type=pdf.