Simulation

36-467/36-667

1 November 2018

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \]

Previously

Agenda

Simulation

The general strategy

The general strategy applied to AR(1)

\[ X(t) = \alpha + \beta X(t-1) + \epsilon(t) \]

It’s easy to turn this into code

# Simulate a time series of length n from a Gaussian AR(1) model with given
# coefficients, noise level and initial condition
  # From solutions to HW 7, in turn based on Lecture 12 code
# Inputs: length of output time series (n)
  # intercept of AR(1) (a)
  # coefficient of AR(1) (b)
  # standard deviation of the innovations (sd.innovation)
  # initial value of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
  # a, b, sd.innovation, x.start are all length-1 numerical values
ar1.sim <- function(n, a, b, sd.innovation, x.start) {
    x <- vector(length=n)
    x[1] <- x.start # Because R enumerates from 0
    for (t in 1:(n-1)) {
        x[t+1] <- a+b*x[t] + rnorm(1, sd=sd.innovation)
    }
    return(x)
}

Exercise

Write out the code for a Gaussian AR(2)

Exercise: Solution

# Simulate a time series of length n from a Gaussian AR(2) model with given
# coefficients, noise level and initial condition
  # From solutions to HW 7, in turn based on Lecture 12 code
# Inputs: length of output time series (n)
  # intercept of AR(2) (a)
  # vecotr of coefficients of AR(2) (b)
  # standard deviation of the innovations (sd.innovation)
  # initial two values of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
  # a and sd.innovation are length-1 numerical values
  # b and x.start are length-2 numerical vectors
  # b has coefficient on X(t-1) first
  # x.start has first value first
ar2.sim <- function(n, a, b, sd.innovation, x.start) {
    x <- vector(length=n)
    x[1] <- x.start[1] # Because R enumerates from 0
    x[2] <- x.start[2]
    for (t in 1:(n-2)) {
        x[t+2] <- a+b[1]*x[t+1] + b[2]*x[t] + rnorm(1, sd=sd.innovation)
    }
    return(x)
}

For you to think through offline: How would you make this work for an AR\((p)\)?

“Math is hard; let’s go simulate”

Three big reasons to simulate a model:

  1. We can’t easily see what it’s going to do;
  2. We can’t easily calculate its exact behavior;
  3. We can’t easily see what would happen if something changed

See what it’s going to do

Replacing calculations with simulations

“The Monte Carlo method”

The power of Monte Carlo

What’s the sample correlation between two independent random walks of length 100?

x <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
z <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
cor(x, z)
## [1] 0.7201643
plot(x, ylab="X or Z", xlab="t", ylim=range(c(x,z)), pch=1)
points(1:length(z), z, pch=2)
legend("topleft", legend=c("X","Z"), pch=1:2)

We can do the simulation over (and get a different answer):

cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
    ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))
## [1] 0.1754642

We can replicate the simulation many times, and look at the variance of the answers:

var(replicate(1000, cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
    ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))))
## [1] 0.2404867

We can change the length of each simulation:

var(replicate(1000, cor(ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1),
    ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1))))
## [1] 0.2423016

Why is this happening?

What good is the math then?

What if?

Summary

Backup: The origin of Monte Carlo

Backup: Stanislaw Ulam holding the Fermiac

via

References

Ernst, Philip A., Larry A. Shepp, and Abraham J. Wyner. 2017. “Yule’s ‘Nonsense Correlation’ Solved!” Annals of Statistics 45:1789–1809. https://doi.org/10.1214/16-AOS1509.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. “Equations of State Calculations by Fast Computing Machines.” Journal of Chemical Physics 21:1087–92. https://doi.org/10.1063/1.1699114.

Schwartz, David N. 2017. The Last Man Who Knew Everything: The Life and Times of Enrico Fermi, Father of the Nuclear Age. New York: Basic Books.

Yule, G. Udny. 1926. “Why Do We Sometimes Get Nonsense-Correlations Between Time-Series? — a Study in Sampling and the Nature of Time-Series.” Journal of the Royal Statistical Society 89:1–63. https://doi.org/10.2307/2341482.