Bootstrap

36-462/662, Spring 2020

2 April 2020 (Lecture 21)

The Big Picture

\[ \newcommand{\Expect}[1]{\mathbf{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathrm{Pr}\left( #1 \right)} \newcommand{\Probwrt}[2]{\mathrm{Pr}_{#2}\left( #1 \right)} \]

  1. Knowing the sampling distribution of a statistic tells us about statistical uncertainty (standard errors, biases, confidence sets)
  2. The bootstrap principle: approximate the sampling distribution by simulating from a good model of the data, and treating the simulated data just like the real data
  3. Sometimes we simulate from the model we’re estimating (model-based or “parametric” bootstrap)
  4. Sometimes we simulate by re-sampling the original data (resampling or “nonparametric” bootstrap)
  5. Stronger assumptions \(\Rightarrow\) less uncertainty if we’re right

Statistical Uncertainty

Measures of Uncertainty

The Sampling Distribution Is the Source of All Knowledge

The Difficulties

The Solutions

The Monte Carlo Principle

The Bootstrap Principle

  1. Find a good estimate \(\hat{P}\) for \(P_{X}\)
  2. Generate a simulation \(\tilde{X}\) from \(\hat{P}\), set \(\tilde{T} = \tau(\tilde{X})\)
  3. Use the simulated distribution of the \(\tilde{T}\) to approximate \(P_{T}\)
    • “Pull yourself up by your bootstraps”: use \(\hat{P}\) to get at uncertainty in itself
    • Invented by Bradley Efron in the 1970s

Model-based Bootstrap

If we are using a model, our best guess at \(P_{X}\) is \(P_{X,\hat{\theta}}\), with our best estimate \(\hat{\theta}\) of the parameters

The Model-based Bootstrap

Resampling

The Resampling (“Nonparameteric”) Bootstrap

Bootstrapping Regressions

Different Regression Bootstraps

What About Classification?

Using the Bootstrap to Improve Prediction

Bootstrap Estimates of Risk

Yet Another Bootstrap Scheme

(After Lunde (2018); Lunde and Shalizi (2017))

Sources of Error in Bootstrapping

Summing Up

Backup: Improving on the Crude Confidence Interval

Crude CI use distribution of \(\tilde{\theta}\) under \(\hat{\theta}\)

But really we want the distribution of \(\hat{\theta}\) under \(\theta\)

Mathematical observation: Generally speaking, (distributions of) \(\tilde{\theta} - \hat{\theta}\) is closer to \(\hat{\theta}-\theta_0\) than \(\tilde{\theta}\) is to \(\hat{\theta}\)

(“centering” or “pivoting”)

Backup: The Basic, Pivotal CI

quantiles of \(\tilde{\theta}\) = \(q_{\alpha/2}, q_{1-\alpha/2}\)

\[ \begin{eqnarray*} 1-\alpha & = & \Probwrt{q_{\alpha/2} \leq \tilde{\theta} \leq q_{1-\alpha/2}}{\hat{\theta}} \\ & = & \Probwrt{q_{\alpha/2} - \hat{\theta} \leq \tilde{\theta} - \hat{\theta} \leq q_{1-\alpha/2} - \hat{\theta}}{\hat{\theta}} \\ & \approx & \Probwrt{q_{\alpha/2} - \hat{\theta} \leq \hat{\theta} - \theta_0 \leq q_{1-\alpha/2} - \hat{\theta}}{\theta_0}\\ & = & \Probwrt{q_{\alpha/2} - 2\hat{\theta} \leq -\theta_0 \leq q_{1-\alpha/2} - 2\hat{\theta}}{\theta_0}\\ & = & \Probwrt{2\hat{\theta} - q_{1-\alpha/2} \leq \theta_0 \leq 2\hat{\theta}-q_{\alpha/2}}{\theta_0} \end{eqnarray*} \]

Basically: re-center the simulations around the empirical data

Backup: Extended example of model-based bootstrapping, with cats

Example: Is Karakedi overweight?

Example (cont’d.)

library(MASS); data(cats); summary(cats)
##  Sex         Bwt             Hwt       
##  F:47   Min.   :2.000   Min.   : 6.30  
##  M:97   1st Qu.:2.300   1st Qu.: 8.95  
##         Median :2.700   Median :10.10  
##         Mean   :2.724   Mean   :10.63  
##         3rd Qu.:3.025   3rd Qu.:12.12  
##         Max.   :3.900   Max.   :20.50
(q95.gaussian <- qnorm(0.95,mean=mean(cats$Bwt),sd=sd(cats$Bwt)))
## [1] 3.521869

Example (cont’d.)

Simulate from fitted Gaussian; bundle up estimating 95th percentile into a function

rcats.gaussian <- function() {
  rnorm(n=nrow(cats),
        mean=mean(cats$Bwt),
        sd=sd(cats$Bwt))
}

est.q95.gaussian <- function(x) {
  m <- mean(x)
  s <- sd(x)
  return(qnorm(0.95,mean=m,sd=s))
}

Example (cont’d.)

Simulate, plot the sampling distribution from the simulations

sampling.dist.gaussian <- replicate(1000, est.q95.gaussian(rcats.gaussian()))
plot(hist(sampling.dist.gaussian,breaks=50, plot=FALSE), freq=FALSE)
lines(density(sampling.dist.gaussian),lwd=2)
abline(v=q95.gaussian,lty="dashed",lwd=4)

Example (cont’d.)

Find standard error and a (crude) confidence interval

sd(sampling.dist.gaussian)
## [1] 0.06010144
quantile(sampling.dist.gaussian,c(0.025,0.975))
##     2.5%    97.5% 
## 3.399080 3.628943

Model Checking

As always, if the model isn’t right, relying on the model is asking for trouble

Is the Gaussian a good model for cats’ weights?

Example (re-cont’d.)

Compare histogram to fitted Gaussian density and to a smooth density estimate

Example, Take 2

Model-free estimate of the 95th percentile = 95th percentile of the data

(q95.np <- quantile(cats$Bwt,0.95))
## 95% 
## 3.6

How precise is that?

Example, Take 2

Resampling, re-estimating, and finding sampling distribution

resample <- function(x) {
  sample(x,size=length(x),replace=TRUE)
}

est.q95.np <- function(x) { quantile(x,0.95) }

Example, Take 2

sampling.dist.np <- replicate(1000, est.q95.np(resample(cats$Bwt)))
plot(density(sampling.dist.np), main="", xlab="Body weight (kg)")
abline(v=q95.np,lty=2)

Example, Take 2 (cont’d)

standard error, bias, CI

sd(sampling.dist.np)
## [1] 0.07919035
mean(sampling.dist.np - q95.np)
## [1] -0.020185
quantile(sampling.dist.np,c(0.025,0.975))
##  2.5% 97.5% 
## 3.400 3.785

Cats’ Hearts

cats has weights for cats’ hearts, as well as bodies

(Much cuter than any photo of real cats’ hearts)

How does heart weight relate to body weight?

(Useful if Kara’s vet wants to know how much heart medicine to prescribe)

Cats’ Hearts (cont’d)

plot(Hwt~Bwt, data=cats, xlab="Body weight (kg)", ylab="Heart weight (g)")
cats.lm <- lm(Hwt ~ Bwt, data=cats)
abline(cats.lm)

Cats’ Hearts (cont’d)

coefficients(cats.lm)
## (Intercept)         Bwt 
##  -0.3566624   4.0340627
confint(cats.lm)
##                 2.5 %   97.5 %
## (Intercept) -1.725163 1.011838
## Bwt          3.539343 4.528782

Cats’ Hearts (cont’d)

The residuals don’t look very Gaussian:

Cats’ Hearts (cont’d)

Resample residuals:

sim.cats.resids <- function() {
  new.cats <- cats
  noise <- resample(residuals(cats.lm))
  new.cats$Hwt <- fitted(cats.lm) + noise
  return(new.cats)
}

Re-estimate on new data:

coefs.cats.lm <- function(df) {
  fit <- lm(Hwt~Bwt,data=df)
  return(coefficients(fit))
}

Cats’ Hearts (cont’d)

Re-sample to get CIs:

cats.lm.samp.dist.resids <- replicate(1000,
   coefs.cats.lm(sim.cats.resids()))
t(apply(cats.lm.samp.dist.resids,1,quantile,c(0.025,0.975)))
##                  2.5%     97.5%
## (Intercept) -1.709555 0.9892904
## Bwt          3.572479 4.5158109

Cats’ Hearts (cont’d)

Try resampling whole rows:

resample.data.frame <- function(df) {
  return(df[resample(1:nrow(df)),])
}
cats.lm.samp.dist.cases <- replicate(1000,
  coefs.cats.lm(resample.data.frame(cats)))
t(apply(cats.lm.samp.dist.cases,1,quantile,c(0.025,0.975)))
##                  2.5%    97.5%
## (Intercept) -1.866811 1.219944
## Bwt          3.455298 4.598331

Exercise

  1. “Conventional”/Gaussian:
##                 2.5 %   97.5 %
## (Intercept) -1.725163 1.011838
## Bwt          3.539343 4.528782
  1. By resampling residuals:
##                  2.5%     97.5%
## (Intercept) -1.709555 0.9892904
## Bwt          3.572479 4.5158109
  1. By resampling rows/cases:
##                  2.5%    97.5%
## (Intercept) -1.866811 1.219944
## Bwt          3.455298 4.598331

References

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second. Berlin: Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/.

Lunde, Robert. 2018. “Bootstrapping and Sample Splitting Under Weak Dependence.” PhD thesis, Carnegie Mellon University.

Lunde, Robert, and Cosma Rohilla Shalizi. 2017. “Bootstrapping Generalization Error Bounds for Time Series.” arxiv:1711.02834. https://arxiv.org/abs/1711.02834.