Statistical Computing, 36-350
Wednesday October 5, 2016
R gives us unique access to great simulation tools (unique compared to other languages). Why simulate? Welcome to the 21st century! Two reasons:
Already, we’ve simulated random numbers in R according to various distributions. For the normal distribution, we have the utility functions:
rnorm(): generate normal random variablespnorm(): normal distribution function, \(\Phi(x)=P(Z \leq x)\)dnorm(): normal density function, \(\phi(x)= \Phi'(x)\)qnorm(): normal quantile function, \(q(y)=\Phi^{-1}(y)\), i.e., \(\Phi(q(y))=y\)Replace “norm” with the name of another distribution, all the same functions apply. E.g., “t”, “exp”, “gamma”, “chisq”, “binom”, “pois”, etc.
Standard normal random variables (mean 0 and variance 1)
n = 1000
z = rnorm(n, mean=0, sd=1) # These are the defaults for mean, sd
mean(z)  # Check: sample mean is approximately 0## [1] -0.01977815var(z)   # Check: sample variance is approximately 1## [1] 0.9958435Normal distribution and density functions
x = seq(-3,3,length=100)
plot(ecdf(z), ylab="Distribution", main="Empirical distribution",
     lwd=2, col="red")
lines(x, pnorm(x), lwd=2)
legend("topleft", legend=c("Empirical distribution", "Actual distribution"),
       lwd=2, col=c("red","black"))hist(z, breaks=30, main="Histogram", col="pink", 
     prob=TRUE)
lines(x, dnorm(x), lwd=2)
legend("topleft", legend=c("Histogram", "Actual density"),
       lwd=2, col=c("pink","black"))(Interesting statistical fact: in general—not just for the normal distribution—the empirical distribution function is pretty much always quite close to the actual distribution function. This is not true on the density scale, i.e., the histogram typically converges much more slowly)
Not surprisingly, we get different draws each time we call rnorm()
mean(rnorm(n))## [1] -0.02321726mean(rnorm(n))## [1] 0.06506526mean(rnorm(n))## [1] 0.003378405mean(rnorm(n))## [1] 0.003923676The number generated in R (in any language) are not “truly” random; they are what is called pseudorandom
?Random in your R console to read more about this (and to read how to change the algorithm used for pseudorandom number generation, which you should never really have to do, by the way)All pseudorandom number generators depend on what is called a seed value
set.seed()# Getting the same 5 random normals over and over
set.seed(0); rnorm(5)## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414set.seed(0); rnorm(5)## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414set.seed(0); rnorm(5)## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414# Different seeds, different numbers
set.seed(1); rnorm(5)## [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078set.seed(2); rnorm(5)## [1] -0.89691455  0.18484918  1.58784533 -1.13037567 -0.08025176set.seed(3); rnorm(5)## [1] -0.9619334 -0.2925257  0.2587882 -1.1521319  0.1957828# Each time the seed is set, the same sequence follows (indefinitely)
set.seed(0); rnorm(3); rnorm(2); rnorm(1)## [1]  1.2629543 -0.3262334  1.3297993## [1] 1.2724293 0.4146414## [1] -1.53995set.seed(0); rnorm(3); rnorm(2); rnorm(1)## [1]  1.2629543 -0.3262334  1.3297993## [1] 1.2724293 0.4146414## [1] -1.53995set.seed(0); rnorm(3); rnorm(2); rnorm(1)## [1]  1.2629543 -0.3262334  1.3297993## [1] 1.2724293 0.4146414## [1] -1.53995