In recent episodes

Also introduced generative models
- AR(p), VAR, spatial AR, etc.
Some complicated math for the distributions
But straightforward to use them to make up new data
- Very direct for time series, Gibbs sampler for spatial AR
Estimation theory looked even more complicated for these models

Agenda

What simulation is
The general strategy
Why we do it:
- Understanding how the model behaves
- Calculations (“the Monte Carlo method”)
- What-ifs?

Simulation

Originally: to simulate is to make up something similar to an original
A model is a story or account of how the data could have been generated
\(\Rightarrow\) We should be able to step through that model and get something like the data
Now: “simulate” means running the model to produce a realization or trajectory

The general strategy

Some variables are set outside of the model; these are exogenous
- Some are fixed (like the parameters)
- Some are random (think \(X(0)\) in an AR(1))
All others are endogenous
- First generation: only depend on exogenous variables
- Second generation: depend on exogenous and first-generation endogenous
- Generation \(k\): depend on up to generation \(k-1\)
For variables in generation \(k\)
- Write down the distribution in terms of earlier variables
- Draw from this conditional distribution
- Substitute in these random draws for all later variables

(The Gibbs sampler is a way of adapting this strategy to variables which don’t have a natural order)

The general strategy applied to AR(1)

\[ X(t) = \alpha + \beta X(t-1) + \epsilon(t) \]

Parameters \(\alpha\) and \(\beta\) are fixed
\(X(0)\) is exogeneous
All \(\epsilon\)’s are exogenous
\(X(1)\) depends only on \(X(0)\) and \(\epsilon(1)\) (and parameters)
\(X(2)\) depends only on \(X(1)\) and \(\epsilon(2)\) (and parameters)

It’s easy to turn this into code

# Simulate a time series of length n from a Gaussian AR(1) model with given
# coefficients, noise level and initial condition
  # Based on Lecture 11 code
# Inputs: length of output time series (n)
  # intercept of AR(1) (a)
  # coefficient of AR(1) (b)
  # standard deviation of the innovations (sd.innovation)
  # initial value of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
  # a, b, sd.innovation, x.start are all length-1 numerical values
ar1.sim <- function(n, a, b, sd.innovation, x.start) {
    x <- vector(length=n)
    x[1] <- x.start # Because R enumerates from 1, not 0
    for (t in 2:n) {
        x[t] <- a+b*x[t-1] + rnorm(1, sd=sd.innovation)
    }
    return(x)
}

It’s easy to extend this to higher-order models

# Simulate a time series of length n from a Gaussian AR(2) model with given
# coefficients, noise level and initial condition
  # Derived from ar1.sim above
# Inputs: length of output time series (n)
  # intercept of AR(2) (a)
  # vecotr of coefficients of AR(2) (b)
  # standard deviation of the innovations (sd.innovation)
  # initial two values of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
  # a and sd.innovation are length-1 numerical values
  # b and x.start are length-2 numerical vectors
  # b has coefficient on X(t-1) first
  # x.start has oldest value first
ar2.sim <- function(n, a, b, sd.innovation, x.start) {
    x <- vector(length=n)
    x[1] <- x.start[1] # Because R enumerates from 1, not 0
    x[2] <- x.start[2]
    for (t in 3:n) {
        x[t] <- a+b[1]*x[t-1] + b[2]*x[t-2] + rnorm(1, sd=sd.innovation)
    }
    return(x)
}

For you to think through offline: How would you make this work for an AR\((p)\)?

“Math is hard; let’s go simulate”

Three big reasons to simulate a model:

We can’t easily see what it’s going to do;
We can’t easily calculate its exact behavior;
We can’t easily see what would happen if something changed

See what it’s going to do

Theme of homework 8
Sometimes we can work out what aspects of model’s behavior by analyzing it mathematically
- E.g., using eigenvalues and eigenvectors to understand AR and VAR models
But sometimes that math is difficult, and/or we don’t know how to do it properly
- All the eigen-stuff only works for linear dynamics, what do we do when things are nonlinear?
Then we can simulate and see

Replacing calculations with simulations

We want to know \(\Expect{f(Y)}\) for some (complicated) function \(f\) of the whole trajectory \(Y\)
- If we can find \(\Expect{f(Y)}\), we can find \(\Prob{Y \in A} = \Expect{\mathbf{1}(Y \in A)}\) or \(\Var{f(Y)} = \Expect{f^2(Y)} - \Expect{f(Y)}^2\), etc., etc.
This is just an integral, but a hard integral
- Usually no analytical form
- Numerical integration of \(\int{f(y) p(y) dy}\) needs an exponential number of values of \(y\) as \(\mathrm{dim}(y)\) grows \(\Rightarrow\) computationally hard

“The Monte Carlo method”

Re-run the simulation \(m\) times to get \(Y_1, Y_2, \ldots Y_m\)
Calculate \(f(Y_1), f(Y_2), \ldots f(Y_m)\)
- With the right \(f\)s, these are (terms in) means, variances, indicators for events, prediction errors, …
Then \(m^{-1} \sum_{i=1}^{m}{f(Y_i)} \rightarrow \Expect{f(Y)}\)
- by law of large numbers over independent simulation runs
In fact, with independent simulation runs, central limit theorem says \[ m^{-1} \sum_{i=1}^{m}{f(Y_i)} \rightsquigarrow \mathcal{N}(\Expect{f(Y)}, \Var{f(Y)}/m) \]
- \(\mathrm{dim}(y)\) doesn’t matter (directly) for accuracy, just \(\Var{f(Y)}\)

The power of Monte Carlo

Take two independent AR(1) processes, say \(X(t)\) and \(Z(t)\)
Calculate the sample correlation \(\hat{\rho}\) between \(X(1), \ldots X(n)\) and \(Z(1), \ldots Z(n)\)
\(\Expect{\hat{\rho}} = 0\) (Why?)
What is \(\Var{\hat{\rho}}\)?
- Answer: much, much bigger than you’d expect from independent \(n\) independent random variables (Yule 1926)
- You can get an exact value from heroic probability calculations and ninety years of dead ends (Ernst, Shepp, and Wyner 2017)
- … or get a very good approximate value in seconds with a few lines of R

What’s the sample correlation between two independent random walks of length 100?

x <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
z <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
cor(x, z)

## [1] 0.2034217

We can do the simulation over (and get a different answer):

cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
    ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))

## [1] -0.6896914

We can replicate the simulation many times, and look at the variance of the answers:

var(replicate(1000, cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
    ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))))

## [1] 0.2340391

We can change the length of each simulation:

var(replicate(1000, cor(ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1),
    ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1))))

## [1] 0.2462083

Why is this happening?

The random walk is no more likely to rise than to fall
but \(\Var{X(t)} \propto t\), so \(X(1), \ldots X(n)\) has either been mostly rising or mostly falling with high probability
and the same with \(Z(t)\)

What good is the math then?

Simulation can suggest that the variance is constant in \(n\), but not prove it
- Often a good idea to simulate first, to see what math you should do!
Short-cut: once we have a direct way to solve the problem, we don’t need to re-run the simulation
- The same math can apply to many situations, where the simulation approach would need to be re-run
- But if we change assumptions, we need to re-do the math (hard), not just run new simulations (easy)

What if?

Change starting values
- Or impose arbitrary changes later, after the start
Change parameters
Change assumptions
- Say \(t\)-distributed noise rather than Gaussian

Summary

A model’s specification determines its behavior, but we’re not smart enough to just read it off
Simulation lets us see what the model does
Monte Carlo lets us calculate what the model does from repeated simulation
Going forward: applying this idea to inference, both for getting at uncertainty and for fitting models to data

Backup: The origin of Monte Carlo

Using simulations to approximate integrals goes back at least to Buffon in the 18th century, but that was a one-off
Scattered 19th century efforts
Modern Monte Carlo goes back to Los Alamos and the bombs:
- 1935: Enrico Fermi starts simulating neutron-nuclei interactions with pencil, paper, and random number tables (Schwartz 2017, 124)
- 1940s: Needing to calculate behavior of chain reactions in atom bombs, Fermi designs a mechanical simulator to trace over 2D blueprints
- 1946: Stanislaw Ulam realizes that you could do simulations on the new digital “electronic computing machines”, develops the idea with John von Neumann under the code name of “the Monte Carlo method” (suggested by Nicholas Metropolis)
- 1946–1953: Rapid development by Los Alamos scientists, culminating in Metropolis et al. (1953)
Thereafter spreads from Los Alamos, RAND, etc.
- Starts showing up in statistics in the 1960s
- Not widespread until the 1980s

Backup: Stanislaw Ulam holding the Fermiac

via

References

Ernst, Philip A., Larry A. Shepp, and Abraham J. Wyner. 2017. “Yule’s ‘Nonsense Correlation’ Solved!” Annals of Statistics 45:1789–1809. https://doi.org/10.1214/16-AOS1509.

Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. “Equations of State Calculations by Fast Computing Machines.” Journal of Chemical Physics 21:1087–92. https://doi.org/10.1063/1.1699114.

Schwartz, David N. 2017. The Last Man Who Knew Everything: The Life and Times of Enrico Fermi, Father of the Nuclear Age. New York: Basic Books.

Yule, G. Udny. 1926. “Why Do We Sometimes Get Nonsense-Correlations Between Time-Series? — a Study in Sampling and the Nature of Time-Series.” Journal of the Royal Statistical Society 89:1–63. https://doi.org/10.2307/2341482.

Simulation and the Monte Carlo Method