Simulation Basics

Statistical Computing, 36-350

Wednesday October 5, 2016

Why simulate?

R gives us unique access to great simulation tools (unique compared to other languages). Why simulate? Welcome to the 21st century! Two reasons:

Random number generation

Already, we’ve simulated random numbers in R according to various distributions. For the normal distribution, we have the utility functions:

Replace “norm” with the name of another distribution, all the same functions apply. E.g., “t”, “exp”, “gamma”, “chisq”, “binom”, “pois”, etc.

Random number examples

Standard normal random variables (mean 0 and variance 1)

n = 1000
z = rnorm(n, mean=0, sd=1) # These are the defaults for mean, sd
mean(z)  # Check: sample mean is approximately 0
## [1] -0.01977815
var(z)   # Check: sample variance is approximately 1
## [1] 0.9958435

(Continued)

Normal distribution and density functions

x = seq(-3,3,length=100)
plot(ecdf(z), ylab="Distribution", main="Empirical distribution",
     lwd=2, col="red")
lines(x, pnorm(x), lwd=2)
legend("topleft", legend=c("Empirical distribution", "Actual distribution"),
       lwd=2, col=c("red","black"))

hist(z, breaks=30, main="Histogram", col="pink", 
     prob=TRUE)
lines(x, dnorm(x), lwd=2)
legend("topleft", legend=c("Histogram", "Actual density"),
       lwd=2, col=c("pink","black"))

(Interesting statistical fact: in general—not just for the normal distribution—the empirical distribution function is pretty much always quite close to the actual distribution function. This is not true on the density scale, i.e., the histogram typically converges much more slowly)

Same function call, different results

Not surprisingly, we get different draws each time we call rnorm()

mean(rnorm(n))
## [1] -0.02321726
mean(rnorm(n))
## [1] 0.06506526
mean(rnorm(n))
## [1] 0.003378405
mean(rnorm(n))
## [1] 0.003923676

Is it really random?

The number generated in R (in any language) are not “truly” random; they are what is called pseudorandom

Setting the random seed

All pseudorandom number generators depend on what is called a seed value

Seed examples

# Getting the same 5 random normals over and over
set.seed(0); rnorm(5)
## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414
set.seed(0); rnorm(5)
## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414
set.seed(0); rnorm(5)
## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414

(Continued)

# Different seeds, different numbers
set.seed(1); rnorm(5)
## [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
set.seed(2); rnorm(5)
## [1] -0.89691455  0.18484918  1.58784533 -1.13037567 -0.08025176
set.seed(3); rnorm(5)
## [1] -0.9619334 -0.2925257  0.2587882 -1.1521319  0.1957828

(Continued)

# Each time the seed is set, the same sequence follows (indefinitely)
set.seed(0); rnorm(3); rnorm(2); rnorm(1)
## [1]  1.2629543 -0.3262334  1.3297993
## [1] 1.2724293 0.4146414
## [1] -1.53995
set.seed(0); rnorm(3); rnorm(2); rnorm(1)
## [1]  1.2629543 -0.3262334  1.3297993
## [1] 1.2724293 0.4146414
## [1] -1.53995
set.seed(0); rnorm(3); rnorm(2); rnorm(1)
## [1]  1.2629543 -0.3262334  1.3297993
## [1] 1.2724293 0.4146414
## [1] -1.53995