\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]} \newcommand{\Probwrt}[2]{\mathbb{P}_{#1}\left( #2 \right)} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}`left[ #1 \right]} \newcommand{\Expectwrt}[2]{\mathbb{E}_{#1}\left[ #2 \right]} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator*{\argmax}{argmax} \]

In our last episodes

Monte Carlo: figuring out what a model predicts, in detail, by simulating it
Bootstrap: getting at sampling distributions by simulating from a good approximation to the data-generating distribution
But how do we estimate a model in the first place…?

Agenda

Reminder about estimation in general
The method of simulated moments
Auxiliary models and indirect inference
Model checking

How we estimate, in general

\[ \hat{\theta}_n = \argmin_{\theta}{M_n(\theta)} \]

If \(M_n(\theta) \rightarrow m(\theta)\) as \(n\rightarrow\infty\)

and \(m(\theta)\) has a unique minimum at the true \(\theta^*\)

then (generally) \(\hat{\theta}_n \rightarrow \theta^*\) (consistency)

If we’re dealing with well-behaved interior mininma, \[ \hat{\theta}_n \approx \theta^* - (\nabla \nabla M_n(\theta^*))^{-1} \nabla M_n(\theta^*) \] and \[ \Var{\hat{\theta}_n} \approx (\nabla\nabla m(\theta^*))^{-1} \Var{\nabla M_n(\theta^*)} (\nabla\nabla m(\theta^*))^{-1} \]

The Progress of Statistical Methods

Calculate likelihood, solve explicitly for MLE
Can’t solve for MLE but can still write down likelihood, calculate it, and maximize numerically
Even calculating the likelihood is intractable

Outstanding example: hidden or latent variables \(Y_1, Y_2, \ldots\) plus observed \(X_1, X_2, \ldots\)

Why Finding the Likelihood Becomes Hard

Likelihood become an integral/sum over all possible combinations of latent variables compatible with observations: \[\begin{eqnarray*} \Probwrt{\theta}{X_1^n = x_1^n} & = & \int{\Probwrt{\theta}{X_1^n = x_1^n, Y_1^n=y_1^n} d y_1^n }\\ &= & \int{\Probwrt{\theta}{Y_1^n=y_1^n}\left(\prod_{i=1}^{n}{\Probwrt{\theta}{X_i=x_i|Y_1^n=y_1^n,X_1^{i-1}=x_1^{i-1}}}\right) d y_1^n} \end{eqnarray*}\]
Evaluating this sum-over-histories is, itself, a hard problem
One approach: Expectation-Maximization algorithm, try to simultaneously estimate latent variables and parameters (Neal and Hinton 1998)
- Standard, clever, often messy
Another: simulate!

Method of simulated moments

Assume: the data really came from our simulation model, at some true value \(\theta^*\) of the adjustable parameters \(\theta\)

Pick your favorite \(q\)-dimensional vector of statistics \(B\) (“generalized moments”)
Calculate from data, \(B(obs) = B(X)\)
Pick a staring parameter value \(\theta\)
1. simulate multiple times, say \(s\), getting \(\tilde{X}_1, \ldots \tilde{X}_s\)
2. calculate average of \(B\), \(\overline{B}(\theta, n, s) \equiv \frac{1}{s}\sum_{i=1}^{s}{B(\tilde{X}_i)}\)
3. For large \(n\), \(\overline{B}(\theta, n, s) \approx \Expectwrt{\theta}{B}\)
Adjust \(\theta\) so expectations are close to \(B(obs)\)

Method of simulated moments

How this fits into the general theory: \[\begin{eqnarray} M_n(\theta) & = & \| \overline{B}(\theta,n,s) - B(obs)\|^2\\ \hat{\theta}_{MSM} & = & \argmin_{\theta}{M_n(\theta)} \end{eqnarray}\]
Assume \(\overline{B}(\theta, n, S) \rightarrow b(\theta)\) as \(n\rightarrow\infty\)
Assume \(B(obs) \rightarrow b(\theta^*)\) as \(n\rightarrow\infty\)
Then \(M_n(\theta) \rightarrow m(\theta) \equiv \| b(\theta) - b(\theta^*) \|^2\)

Method of simulated moments

Works if the expectations of \(B\) are enough to characterize the parameter
- i.e., if the function \(\theta \mapsto b(\theta)\) has an inverse, so \(b^{-1}\) is well-defined
- though for optimization we’d really prefer it to have a smooth inverse
and if the sample versions converge on the expectations
- so we’re relying on ergodicity again, at least for these moments
Why expectations rather than medians, modes, … ?
- Basically: easier to prove convergence
- The mean is not always the most probable value!

A challenging example

The logistic map: \[ X(t+1) = 4 r X(t) (1-X(t)) \]
Given \(X(1)\), this is a deterministic dynamical system
- We’ll typically chose \(X(1)\) as uniform on \([0,1]\)
For small \(r\), \(X(t) \rightarrow 0\)
For large \(r\), near 1, \(X(t)\) is chaotic
- Deterministic behavior
- Sensitive dependence on initial conditions = small perturbations get amplified exponentially (at least to start with)
- Long-run behavior is ergodic (not easy to prove!)

The logistic map in R

# Do one step of the logistic map, $x(t+1) = 4rx(t)(1-x(t))$
# Inputs: initial condition (x)
  # logistic parameter (r)
# Output: next value of the logistic map
# Presumes: x is a single numerical value in [0,1]
  # r is a single numerical value in [0,1]
logistic.map <- function(x,r) {
  return(4*r*x*(1-x))
}

# Generate a time series from the logistic map, $x(t+1)=4rx(t)(1-x(t))$
# Inputs: length of desired time series (timelength)
  # logistic parameter (r)
  # value of x(1) (initial.cond, uniformly randomly generated if omitted)
# Output: vector of length timelength
# Presumes: r is a single numerical value in [0,1]
  # timelength is a positive integer
  # initialcond is a single numerical value in [0,1], or omitted
  # logistic.map() is defined as above
logistic.map.ts <- function(timelength, r, initial.cond=NULL) {
  x <-vector(mode="numeric", length=timelength)
  if(is.null(initial.cond)) {
    x[1] <- runif(1)
  } else {
    x[1] <- initial.cond
  }
  for (t in 2:timelength) {
    x[t] = logistic.map(x[t-1],r)
  }
  return(x)
}

Sensitive dependence on initial conditions

traj1 <- logistic.map.ts(1000, r=0.9, initial.cond=0.5001)
traj2 <- logistic.map.ts(1000, r=0.9, initial.cond=0.4998)
plot(1:100, traj1[1:100], ylim=c(0,1), xlab="t", ylab="X(t)", type="b")
points(1:100, traj2[1:100], pch=2, col="blue", type="b")

Sensitive dependence on initial conditions

plot(traj1, traj2)
rug(x=traj1, side=1, col="black", ticksize=-0.01) # negative sizes for pointing outward
rug(x=traj2, side=2, col="blue", ticksize=-0.01)

Long-run stability

par(mfrow=c(1,2))
hist(traj1)
hist(traj2)

logistic.noisy.ts <- function(timelength,r,initial.cond=NULL,noise.sd=0.1) {
    x <- logistic.map.ts(timelength,r,initial.cond)
    return(x+rnorm(timelength,0,noise.sd))
}

mean.logistic <- function(r,n=1e3,s=10) {
    mean(replicate(s,mean(logistic.map.ts(n,r))))
}

mean.logistic.plottable <- function(r,n=1e3,s=10) {
    sapply(r,mean.logistic,n=n,s=s)
}

var.logistic <- function(r,n=1e3,s=10) {
    mean(replicate(s,var(logistic.map.ts(n,r))))
}

var.logistic.plottable <- function(r,n=1e3,s=10) {
    sapply(r,var.logistic,n=n,s=s)
}

Using MSM on the Logistic Map

Use mean and variance as the moments for logistic map; chose \(r\) where simulated moments are closest (Euclidean distance) to observed \[ \widehat{r}_{MSM} = \argmin_{r\in[0,1]}{\left((m-\widehat{\mu}_r)^2+(s^2-\widehat{\sigma}^2_r)^2\right)} \]
- No particular reason to weight both moments equally

# Estimate the logistic map using the method of simulated moments
  # With the moments being the mean and variance
# Inputs: a time series (x)
  # Number of simulations to run (s)
# Output: estimated value of the logistic parameter
# Presumes: x was generated by the logistic map
msm.logistic.est <- function(x, s=10) {
    n <- length(x)
    moments <- c(mean(x), var(x))
        # Define a function to minimize, namely the discrepancy between
        # the moments implied by a given value of the logistic parameter r,
        # and the moments found from the data
    moment.discrep <- function(r) {
                # create an n*s array storing s simulation runs
        sims <- replicate(s, logistic.map.ts(n,r))
                # calculate mean and variance within each run, and average
                # across runs
        moments.from.sims <- c(mean(apply(sims,2,mean)),
                                       mean(apply(sims,2,var)))
                # Return the sum of squared differences in moments
        return(sum((moments-moments.from.sims)^2))
    }
        # Return the logistic parameter that minimizes the discrepancy in
        # the moments
          # see help(optimize) for this function for 1D optimization
    return(optimize(f=moment.discrep,lower=0,upper=1)$minimum)
}

What’s the sampling distribution of \(\widehat{r}_{MSM}\)?

plot(density(msm.estimates),
     main="Sampling distribution of MSM estimates for logistic map",
     sub=expression(r==0.9, n==100, s==10))

Can we identify \(r\) from these moments?

Kinks in the curve of the moments: potentially confusing to optimizer, reduces sensitivity
- big change in parameter leads to negligible change in moments
- curve crossing itself \(\Rightarrow\) non-identifiability (from these moments)

In-class exercise

\(\mathrm{dim}(\theta) = p\), \(\mathrm{dim}(B) = q\)

\[\begin{eqnarray} M_n(\theta) & \equiv & \| \overline{B}(\theta,n,s) - B(obs)\|^2\\ \hat{\theta}_{MSM} & \equiv & \argmin_{\theta}{M_n(\theta)}\\ \overline{B}(\theta, n, s) &\rightarrow & b(\theta)\\ B(obs) & \rightarrow & b(\theta^*)\\ M_n(\theta) \rightarrow m(\theta) & \equiv & \| b(\theta) - b(\theta^*) \|^2 \end{eqnarray}\]

Show that: \[\begin{eqnarray} \frac{\partial M_n}{\partial \theta_i}(\theta^*) & \rightarrow & 2\sum_{k=1}^{q}{(\overline{B}_k(\theta,n,s) - B_k(obs))\frac{\partial b_k}{\partial \theta_i}(\theta^*)}\\ \frac{\partial^2 m}{\partial \theta_i \partial \theta_j}(\theta^*) & = & 2\sum_{k=1}^{q}{\frac{\partial b_k}{\partial \theta_i}(\theta^*)\frac{\partial b_k}{\partial \theta_j}(\theta^*)} \end{eqnarray}\]

Solution

The chain rule is our friend:

\[\begin{eqnarray} \frac{\partial M_n}{\partial \theta_i}(\theta^*) & = & 2\sum_{k=1}^{q}{(\overline{B}_k(\theta,n,s) - B_k(obs))\frac{\partial \overline{B}_k(\theta,n,s)}{\partial \theta_i}(\theta^*)}\\ & \rightarrow & 2\sum_{k=1}^{q}{(\overline{B}_k(\theta,n,s) - B_k(obs))\frac{\partial b_k}{\partial \theta_i}(\theta^*)}\\ \frac{\partial^2 m}{\partial \theta_i \partial \theta_j}(\theta^*) & = & 2\sum_{k=1}^{q}{ \frac{\partial}{\partial \theta_i}\left( (b_k(\theta) - b_k(\theta^*)) \frac{\partial b_k(\theta)}{\partial \theta_j}(\theta^*)\right)}\\ & = & 2\sum_{k=1}^{q}{(b_k(\theta) - b_k(\theta^*)) \frac{\partial^2 b_k(\theta)}{\partial \theta_i \partial \theta_j}(\theta^*) + \frac{\partial b_k(\theta)}{\partial \theta_i}(\theta^*) \frac{\partial b_k(\theta)}{\partial \theta_j}(\theta^*)}\\ & = & 2\sum_{k=1}^{q}{\frac{\partial b_k}{\partial \theta_i}(\theta^*)\frac{\partial b_k}{\partial \theta_j}(\theta^*)} \end{eqnarray}\]

What’s the moral of this calculus?

Remember that \[ \Var{\hat{\theta}_n} \approx (\nabla\nabla m(\theta^*))^{-1} \Var{\nabla M_n(\theta^*)} (\nabla\nabla m(\theta^*))^{-1} \]
- All derivatives with respect to \(\theta\)
We’ve just seen that \((\nabla\nabla m(\theta^*))\) will be large when \(\partial b_k/\partial \theta_i\) is big
- i.e., when the moments are very sensitive to changes in the parameters
- small derivatives \(\Rightarrow\) lots of noise in the estimate
Obviously we want to pick sensitive moments
What about \(\Var{\nabla M_n(\theta^*)}\)?
- Say \(\Var{B(obs)} = v(n,\theta^*)\)
- Simulations are uncorrelated with each other and the data, so \(\Var{\overline{B}(\theta,n,s)} = v(n,\theta)/s\)
Therefore limiting variance \(\propto \left(1+\frac{1}{s}\right)v(n,\theta^*)\)
- and usually \(v(n,\theta^*) \propto 1/n\)

Indirect Inference

Given: an easy-to-simulate mechanistic model with parameters \(\theta\), and data \(X\) of size \(n\)
Introduce an auxiliary model which is wrong but easy to fit
- Auxiliary model has objective function \(U_n\) and parameter \(\beta\)
- So we get \(\hat{\beta}_n\) from the data
Pick a \(\theta\) and simulate the mechanistic model \(s\) times
- Fit the auxiliary model to each simulation run
- Average to get \(\overline{\beta}(\theta, s, n)\)
Adjust \(\theta\) to minimize \(M_n(\theta) = \|\overline{\beta}(\theta, s, n) - \hat{\beta}_n\|^2\)

What’s going on here?

The auxiliary model says: the data has these sorts of patterns
Pick parameters which come as close as possible to matching those parameters
For this to work, those patterns must be enough to pin down the original parameter
- Requires at a minimum that \(\dim{\beta} \geq \dim{\theta}\)

A More Formal Statement

Auxiliary objective function \(U\), depends on data and \(\beta\) \[\begin{eqnarray} \widehat{\beta}_n & \equiv & \argmin_{\beta}{ U_n(\beta)}\\ \widehat{\beta}_{n,s,\theta} & \equiv & \argmin_{\beta}{U_{n,s,\theta}(\beta)}\\ \widehat{\theta}_{II} & \equiv & \argmin_{\theta}{\|\widehat{\beta}_{n,s,\theta} - \widehat{\beta}_n \|^2} \end{eqnarray}\]

Indirect inference is consistent, if the auxiliary model isn’t too bad

Assume:

As \(n \rightarrow \infty\), \(U_{n,s,\theta}(\beta) \rightarrow u(\beta,\theta)\), uniformly in \(\beta\) and \(\theta\).
For each \(\theta\), \(u(\beta, \theta)\) has a unique optimum in \(\beta\), say \(b(\theta)\).
As \(n \rightarrow \infty\), \(\widehat{\beta}_n \rightarrow b(\theta^*)\).
The equation \(\beta = b(\theta)\) has a unique solution, i.e., \(b^{-1}\) is well-defined.

then as \(n \rightarrow \infty\), \[ \widehat{\theta}_{II} \rightarrow \theta^* \]

Asymptotic Distribution of Indirect Estimates

The same math that we did for the method of moments carries over \[ \Var{\widehat{\theta}_{II}} \propto \frac{1}{n}\left(1+\frac{1}{s}\right) \]
Variance also depends on the matrix of \(\frac{\partial b_k}{\partial \theta_i}\) derivatives
- The more sensitive the auxiliary parameters are to the mechanistic parameters, the more precise the estimates
Variance also depends on the variance of \(\hat{\beta}\)
- but we can figure that out from the theory of lectures 14–15, because it’s an estimator

Checking Indirect Inference

Given real and auxiliary model, will indirect inference work, i.e., be consistent?

Do the math Provides proof; often hard (because the simulation model leads to difficulty-to-manipulate distributions
Simulate some more Simulate from model for a particular \(\theta\), apply II, check that estimates are getting closer to \(\theta\) as simulation grows, repeat for multiple \(\theta\)
- Not as fool-proof but just requires time (you have all the code already)

What auxiliary models?

In principle, whatever you want
- Experience with what sorts of models people have tried out in the past
- Prominent patterns in the data
A stand-by for time series: use an AR model!
- If \(\dim{\theta} = p\), make sure to use at least an AR\((p)\)

Example: Logistic Map + Noise

Take logistic map and add Gaussian noise to each observation \[\begin{eqnarray} X(t) & = & Y(t) + \epsilon_t, ~ \epsilon_t \sim \mathcal{N}(0,\sigma^2)\\ Y(t+1) & = & 4r Y(t) (1-Y(t)) \end{eqnarray}\]
Any sequence \(X(1), \ldots X(n)\) could be produced by any \(r\)

logistic.noisy.ts <- function(timelength,r,
                              initial.cond=NULL,noise.sd=0.1) {
  x <- logistic.map.ts(timelength,r,initial.cond)
  return(x+rnorm(timelength,0,noise.sd))
}

Assume that \(\sigma^2\) is known — simplifies plotting if only one unknown parameter! Set it to \(\sigma^2=0.1\)

Indirect inference for the noisy logistic map

fix \(p\) for AR model
Fit AR(\(p\)) to data, get \(\widehat{\beta} = (\widehat{\beta}_1, \ldots \widehat{\beta}_p)\)
Simulate \(s\) sample trajectories with parameter \(r\), calculate \((\widehat{\beta}_1, \ldots \widehat{\beta}_p)\) for each, average over trajectories to get \(\overline{\beta}(r,n,s)\)
Minimize \(\|\widehat{\beta} - \overline{\beta}(r,n,s)\|\)

logistic.map.II <- function(y,order=2,S=10) {
  T <- length(y)
  ar.fit <- function(x) {
    return(ar(x,aic=FALSE,order.max=order)$ar)
  }
  beta.data <- ar.fit(y)
  beta.discrep <- function(r) {
    beta.S <- mean(replicate(S,ar.fit(logistic.noisy.ts(T,r))))
    return(sum((beta.data - beta.S)^2))
  }
  return(optimize(beta.discrep,lower=0.75,upper=1))
}

Offline exercise: comment this function

Does it work \((r=0.8)\)?

To see how well this does, simulate it:

x <- logistic.noisy.ts(1e3,0.8)
plot(density(replicate(100,
                       logistic.map.II(x,order=2)$minimum)),
     main="Density of indirect estimates")

Some bias (here upward) but it shrinks as \(n\) grows, and it’s pretty tight around the true value (\(r=0.8\))
Notice: fixed data set, all variability is from simulation
Also: \(p=2\) is arbitrary, can use more simulation to pick good/best

Does it work \((r=0.9)\)?

\(r=0.8\) is periodic, what about chaos, say \(r=0.9\)?

plot(density(replicate(30,
     logistic.map.II(logistic.noisy.ts(1e3,r=0.9),
                     order=2)$minimum)),
     main="Density of indirect estimates, r=0.9")

Does it work \((n\rightarrow\infty)\)?

I promised to check that the inference is working my seeing that the errors are shrinking:

mse.logistic.II <- function(n,r=0.9,reps=300,order=2,S=10) {
  II <- replicate(reps,logistic.map.II(logistic.noisy.ts(n,r),
                  order=order,S=S)$minimum)
  II.error = II - r # Uses recycling
  return(mean(II.error^2))
}

Does it work \((n\rightarrow\infty)\)?

Model checking for simulation models

Calculate some summary statistics from the data
Simulate your model, with estimated parameter \(\hat{\theta}\), and calculate the same summary statistics from them, in exactly the same way
See whether the actual values of the summaries are plausible under the simulation
- Easy to calculate a \(p\)-value for simple numerical summaries
Important cautions:
- The test statistics shouldn’t be ones you used to fit the model (or algebraically equivalent to them)
- Test statistics should have distributions which are sensitive to the parameters and/or to changing the form of your simulation model (otherwise, no power)

Summary

If you can simulate your model, you can estimate it
Find summary statistics / moments that identify the parameters of your model
- Good summary statistics have expectations very sensitive to the parameters
- Indirect inference: use the parameters of an easy-to-fit auxiliary model as your summary statistics
Adjust the parameters of the simulation model to match the data
Check fit by using independent summary statistics

Backup: Asymptotic gory details for the method of simulated moments

Define \(B(obs)\), \(\overline{B}(\theta,n,s)\) and \(b(\theta)\) as \(q\times 1\) matrices
So \(\nabla \overline{B}(\theta,n,s)\) and \(\nabla b(\theta)\) are \(q\times p\) matrices, \((\nabla b(\theta))_{ij} = \frac{\partial b_i}{\partial \theta_j}(\theta)\)
From the exercises, \[\begin{eqnarray} \nabla M_n(\theta^*) & \rightarrow & 2 (\overline{B}(\theta^*,n,s) - B(obs))^T \nabla b(\theta^*)\\ \nabla \nabla m(\theta^*) & \rightarrow & 2 (\nabla b(\theta^*))^T (\nabla b(\theta^*)) \end{eqnarray}\]
Abbreviate \((\nabla b(\theta^*)) \equiv \mathbf{d}\) (dimension \([q\times p]\))
- implicitly a function of \(\theta^*\), leave out of notation for brevity
So by the general theory in lecture 14 and 15, \[\begin{eqnarray} \Var{\hat{\theta}_{MSM}} & \approx & \left(2 \mathbf{d}^T \mathbf{d}\right)^{-1} \Var{2 (\overline{B}(\theta^*,n,s) - B(obs))^T \mathbf{d}} \left(2 \mathbf{d}^T\mathbf{d}\right)^{-1}\\ & = & \left(\mathbf{d}^T\mathbf{d}\right)^{-1} \mathbf{d}^T \Var{\overline{B}(\theta^*,n,s) - B(obs)} \mathbf{d} \left(\mathbf{d}^T \mathbf{d}\right)^{-1}\\ & = & \left(1+\frac{1}{s}\right) \left(\mathbf{d}^T \mathbf{d}\right)^{-1} \mathbf{d}^T v(n,\theta^*) \mathbf{d} \left(\mathbf{d}^T \mathbf{d}\right)^{-1}\\ \end{eqnarray}\]
If \(\mathbf{d}\) is invertible, this simplifies: \[ = \left(1+\frac{1}{s}\right) \mathbf{d}^{-1} v(n, \theta^*) \left(\mathbf{d}^{-1}\right)^T \]
In general, we expect on ergodic-theory grounds that \(v(n,\theta^*) \propto 1/n\)

Backup: Asymptotic gory details for indirect inference

Everything for the method of simulated moments applied
But we have more control over \(v(n, \theta^*)\): for large \(n\), \[ \Var{\hat{\beta}_n} = v(n,\theta^*) \approx (\nabla_{\beta}\nabla_{\beta} u)^{-1} \Var{\nabla_{\beta} U_n(\beta^*)}(\nabla_{\beta}\nabla_{\beta} u)^{-1} \]
So in general \[ \Var{\hat{\theta}_{II}} \approx \left(1+\frac{1}{s}\right) \left(\mathbf{d}^T \mathbf{d}\right)^{-1} \mathbf{d}^T (\nabla_{\beta}\nabla_{\beta} u)^{-1} \Var{\nabla_{\beta} U_n(\beta^*)}(\nabla_{\beta}\nabla_{\beta} u)^{-1} \mathbf{d} \left(\mathbf{d}^T \mathbf{d}\right)^{-1} \]
and if \(\mathbf{d}\) is invertible, \[ = \left(1+\frac{1}{s}\right)\mathbf{d}^{-1} (\nabla_{\beta}\nabla_{\beta} u)^{-1} \Var{\nabla_{\beta} U_n(\beta^*)}(\nabla_{\beta}\nabla_{\beta} u)^{-1} (\mathbf{d}^{-1})^T \]
Again, on ergodic-theory grounds, we generally expect \(\Var{\nabla_{\beta} U_n(\beta^*)} \propto 1/n\)

Backup: Further reading

The simulation-based inference has mostly come out of econometrics, and to a lesser extent ecology
- Gouriéroux and Monfort (1996) is an excellent general reference
Indirect inference was introduced by Gouriéroux, Monfort, and Renault (1993)
- Smith (n.d.) is a somewhat easier read
- Zhao (2010) has the most general convergence results I know of (drawn on above)
- Kendall et al. (2005) is an interesting ecological application
- [http://bactra.org/notebooks/indirect-inference.html] collects references on II and related methods
Simulation-based model-checking is largely “folklore”, but see Gelman and Shalizi (2013) and its references

References

Gelman, Andrew, and Cosma Rohilla Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66:8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x.

Gouriéroux, Christian, and Alain Monfort. 1996. Simulation-Based Econometric Methods. Oxford, England: Oxford University Pres.

Gouriéroux, Christian, Alain Monfort, and E. Renault. 1993. “Indirect Inference.” Journal of Applied Econometrics 8:S85–S118. http://www.jstor.org/pss/2285076.

Kendall, Bruce E., Stephen P. Ellner, Edward Mccauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch, and Peter Turchin. 2005. “Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses.” Ecological Monographs 75:259–76. https://doi.org/10.1890/03-4056.

Neal, Radford M., and Geoffrey E. Hinton. 1998. “A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants.” In Learning in Graphical Models, edited by Michael I. Jordan, 355–68. Dordrecht: Kluwer Academic. http://www.cs.toronto.edu/~radford/em.abstract.html.

Smith, Anthony A., Jr. n.d. “Indirect Inference.” In New Palgrave Dictionary of Economics, edited by Stephen Durlauf and Lawrence Blume, Second. London: Palgrave Macmillan. http://www.econ.yale.edu/smith/palgrave7.pdf.

Zhao, Linqiao. 2010. “A Model of Limit-Order Book Dynamics and a Consistent Estimation Procedure.” PhD thesis, Carnegie Mellon University. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2067&rep=rep1&type=pdf.

Simulation for Inference II — Matching Simulations to Data

In our last episodes

Agenda

How we estimate, in general

The Progress of Statistical Methods

Why Finding the Likelihood Becomes Hard

Method of simulated moments

Method of simulated moments

Method of simulated moments

A challenging example

The logistic map in R

Sensitive dependence on initial conditions

Sensitive dependence on initial conditions

Long-run stability

Using MSM on the Logistic Map

What’s the sampling distribution of \(\widehat{r}_{MSM}\)?

Can we identify \(r\) from these moments?

Can we identify \(r\) from these moments?

In-class exercise

Solution

What’s the moral of this calculus?

Indirect Inference

What’s going on here?

A More Formal Statement

Indirect inference is consistent, if the auxiliary model isn’t too bad

Asymptotic Distribution of Indirect Estimates

Checking Indirect Inference

What auxiliary models?

Example: Logistic Map + Noise

Indirect inference for the noisy logistic map

Does it work \((r=0.8)\)?

Does it work \((r=0.9)\)?

Does it work \((n\rightarrow\infty)\)?

Does it work \((n\rightarrow\infty)\)?

Model checking for simulation models

Summary

Backup: Asymptotic gory details for the method of simulated moments

Backup: Asymptotic gory details for indirect inference

Backup: Further reading

References