The Big Picture

\[ \newcommand{\Expect}[1]{\mathbf{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathrm{Pr}\left( #1 \right)} \newcommand{\Probwrt}[2]{\mathrm{Pr}_{#2}\left( #1 \right)} \]

Knowing the sampling distribution of a statistic tells us about statistical uncertainty
- standard error
- bias
- confidence sets
- \(p\) values
The bootstrap principle: approximate the sampling distribution by simulating from a good model of the data, and treating the simulated data just like the real data
Sometimes we simulate from the model we’re estimating
- (model-based or “parametric” bootstrap)
Sometimes we simulate by re-sampling the original data
- (resampling or “nonparametric” bootstrap)
Stronger assumptions \(\Rightarrow\) less uncertainty if we’re right

Statistical Uncertainty

Re-run the experiment (survey, census, …) and get different data
\(\therefore\) everything we calculate from data would change from run to run
- estimates
- test statistics
- \(p\)-values
- predicted values
- policy recommendations
This variability = statistical uncertainty
Quantifying statistical uncertainty = honesty about what we actually know

Measures of Uncertainty

Standard error = standard deviation of an estimator
- (could instead use median absolute deviation, etc.)
\(p\)-value = Probability we’d see a signal this big if there was just noise
Confidence region = All the parameter values we can’t reject at low error rates
- Either the true parameter is in the confidence region
- or we are very unlucky
- (or our model is wrong)
etc., etc.

Statistical Uncertainty Comes from the Sampling Distribution

Data \(X \sim P_X\) for some unknown true distribution \(P_X\)
We calculate a statistic \(T = \tau(X)\) so it has distribution \(P_{T}\), its sampling distribution
If we knew \(P_{T}\), we could calculate
- \(\Var{T}\) (and so standard error \(=\sqrt{\Var{T}}\))
- \(\Expect{T}\) (and so bias)
- quantiles (and so confidence intervals or \(p\)-values), etc.

The Difficulties

Difficulty 1: Most of the time, \(P_{X}\) is very complicated
Difficulty 2: Most of the time, \(\tau\) is a very complicated function
\(\therefore\) We couldn’t solve for \(P_T\)
Difficulty 3: Actually, we don’t know \(P_X\)
Upshot: We really don’t know \(P_{T}\) and can’t use it to calculate anything

The Solutions

Classically (\(\approx 1900\)–\(\approx 1975\)): Restrict the model and the statistic until you can calculate the sampling distribution, at least for very large \(n\)
- Paradigm: \(t\) distribution for linear models with nice Gaussian noise
Modern (\(\approx 1975\)–): Use complex models and statistics, but simulate calculating the statistic on the model

The Monte Carlo Principle

Generate / simulate \(\tilde{X}\) from \(P_X\)
Set \(\tilde{T} = \tau(\tilde{X})\)
Repeat many times
Use the simulated distribution of the \(\tilde{T}\) to approximate \(P_{T}\)
- (As a general method, invented by Enrico Fermi in the 1930s, spread through the Manhattan Project)
Still needs \(P_X\)
- Works in HW 3 because we’re testing a fixed model

The Bootstrap Principle

Find a good estimate \(\hat{P}\) for \(P_{X}\)
Simulate \(\tilde{X}\) from \(\hat{P}\), set \(\tilde{T} = \tau(\tilde{X})\)
Use the simulated distribution of the \(\tilde{T}\) to approximate \(P_{T}\)
- “Pull yourself up by your bootstraps”: use \(\hat{P}\) to get at uncertainty in itself
- Invented by Bradley Efron in the 1970s

First step: find a good estimate \(\hat{P}\) for \(P_{X}\)

Model-based Bootstrap

If we are using a model, our best guess at \(P_{X}\) is \(P_{X,\hat{\theta}}\), with our best estimate \(\hat{\theta}\) of the parameters

The Model-based Bootstrap

Get data \(X\), estimate \(\hat{\theta}\) from \(X\)
Repeat \(b\) times:
- Simulate \(\tilde{X}\) from \(P_{X,\hat{\theta}}\) (simulate data of same size/“shape” as real data)
- Calculate \(\tilde{T} = \tau(\tilde{X}\)) (treat simulated data the same as real data)
Use simulated distribution of \(\tilde{T}\) as \(P_{T}\)

Example: Is Karakedi overweight?

Concretely: Is she over the 95th percentile of body mass for adult cats?

Example (cont’d.)

library(MASS); data(cats); summary(cats)

##  Sex         Bwt             Hwt       
##  F:47   Min.   :2.000   Min.   : 6.30  
##  M:97   1st Qu.:2.300   1st Qu.: 8.95  
##         Median :2.700   Median :10.10  
##         Mean   :2.724   Mean   :10.63  
##         3rd Qu.:3.025   3rd Qu.:12.12  
##         Max.   :3.900   Max.   :20.50

(q95.gaussian <- qnorm(0.95, mean=mean(cats$Bwt), sd=sd(cats$Bwt)))

## [1] 3.521869

Example (cont’d.)

Simulate from fitted Gaussian; bundle up estimating 95th percentile into a function

rcats.gaussian <- function() {
  rnorm(n=nrow(cats),
        mean=mean(cats$Bwt),
        sd=sd(cats$Bwt))
}

est.q95.gaussian <- function(x) {
  m <- mean(x)
  s <- sd(x)
  return(qnorm(0.95,mean=m,sd=s))
}

Example (cont’d.)

Simulate, plot the sampling distribution from the simulations

sampling.dist.gaussian <- replicate(1000, est.q95.gaussian(rcats.gaussian()))
plot(hist(sampling.dist.gaussian,breaks=50, plot=FALSE), freq=FALSE)
lines(density(sampling.dist.gaussian),lwd=2)
abline(v=q95.gaussian, lty="dashed", lwd=4)

Example (cont’d.)

Find standard error and a (crude) confidence interval

sd(sampling.dist.gaussian)

## [1] 0.06396329

quantile(sampling.dist.gaussian,c(0.025,0.975))

##     2.5%    97.5% 
## 3.390592 3.637970

Model Checking

As always, if the model isn’t right, relying on the model is asking for trouble
Is the Gaussian a good model for cats’ weights?

Example (re-cont’d.)

Compare histogram to fitted Gaussian density and to a smooth density estimate

Resampling

Difficulty: We might not have a trust-worthy model

Resource: We do have data, which tells us a lot about the distribution

Solution: Resampling, treat the sample like a whole population

The Resampling (“Nonparameteric”) Bootstrap

Get data \(X\), containing \(n\) samples, find \(T=\tau(X)\)
Repeat \(b\) times:
- Generate \(\tilde{X}\) by drawing \(n\) samples from \(X\) with replacement (resample the data)
- Calculate \(\tilde{T} = \tau(\tilde{X})\) (treat simulated data the same as real data)
Use simulated distribution of \(\tilde{T}\) as \(P_{T}\)

(See backup for \(\hat{P}\) implicit here)

Example, Take 2

Model-free estimate of the 95th percentile = 95th percentile of the data

(q95.np <- quantile(cats$Bwt,0.95))

## 95% 
## 3.6

How precise is that?

Example, Take 2

Resampling, re-estimating, and finding sampling distribution

resample <- function(x) {
  sample(x,size=length(x),replace=TRUE)
}

est.q95.np <- function(x) { quantile(x,0.95) }

Example, Take 2

sampling.dist.np <- replicate(1000, est.q95.np(resample(cats$Bwt)))
plot(density(sampling.dist.np), main="", xlab="Body weight (kg)")
abline(v=q95.np,lty=2)

Example, Take 2 (cont’d)

standard error, bias, CI

sd(sampling.dist.np)

## [1] 0.08150953

mean(sampling.dist.np - q95.np)

## [1] -0.022125

quantile(sampling.dist.np,c(0.025,0.975))

##  2.5% 97.5% 
## 3.400 3.785

Bootstrapping Regressions

A regression is a model for \(Y\) conditional on \(X\) \[ Y= \mu(X) + \epsilon, ~ \Expect{\epsilon|X} = 0 \]
Silent on the distribution of \(X\), so how do we simulate?
Options, putting less and less trust in the model:
- Hold \(x_i\) fixed, set \(\tilde{y}_i = \hat{\mu}(x_i) + \tilde{\epsilon}_i\) from model’s estimated noise distribution (e.g., Gaussian or Poisson)
- Hold \(x_i\) fixed, set \(\tilde{y}_i = \hat{\mu}(x_i) + \tilde{\epsilon}_i\), noise resampled from the residuals
- Resample \((x_i, y_i)\) pairs (resample data-points or resample cases)

Different Regression Bootstraps

Resampling residuals works as long as the noise is IID
- Noise could be Gaussian…
- Or noise could be very non-Gaussian
- Noise does need have same distribution everywhere
- Dubious unless the regression model is right
Resampling whole cases works as long as observations are IID
- noise needn’t be independent of \(X\)
- needn’t be Gaussian
- regression model needn’t be right

Cats’ Hearts

cats has weights for cats’ hearts, as well as bodies

(Much cuter than any photo of real cats’ hearts)

How does heart weight relate to body weight?

(Useful when Kara’s vet needed to know how much heart medicine to prescribe)

Cats’ Hearts (cont’d)

plot(Hwt~Bwt, data=cats, xlab="Body weight (kg)", ylab="Heart weight (g)")
cats.lm <- lm(Hwt ~ Bwt, data=cats)
abline(cats.lm)

Cats’ Hearts (cont’d)

Coefficients and standard confidence intervals:

coefficients(cats.lm)

## (Intercept)         Bwt 
##  -0.3566624   4.0340627

confint(cats.lm)

##                 2.5 %   97.5 %
## (Intercept) -1.725163 1.011838
## Bwt          3.539343 4.528782

These confidence intervals assume IID Gaussian noise

Cats’ Hearts (cont’d)

The residuals don’t look that Gaussian:

Cats’ Hearts (cont’d)

Resample residuals:

sim.cats.resids <- function() {
  new.cats <- cats
  noise <- resample(residuals(cats.lm))
  new.cats$Hwt <- fitted(cats.lm) + noise
  return(new.cats)
}

Re-estimate on new data:

coefs.cats.lm <- function(df) {
  fit <- lm(Hwt~Bwt, data=df)
  return(coefficients(fit))
}

Cats’ Hearts (cont’d)

Re-sample to get CIs:

cats.lm.samp.dist.resids <- replicate(1000,
                                      coefs.cats.lm(sim.cats.resids()))
t(apply(cats.lm.samp.dist.resids, 1, quantile, c(0.025,0.975)))

##                  2.5%     97.5%
## (Intercept) -1.736062 0.9288667
## Bwt          3.555308 4.5194927

Cats’ Hearts (cont’d)

Try resampling whole rows:

resample.data.frame <- function(df) {
  return(df[resample(1:nrow(df)),])
}

cats.lm.samp.dist.cases <- replicate(1000,
  coefs.cats.lm(resample.data.frame(cats)))
t(apply(cats.lm.samp.dist.cases,1,quantile,c(0.025,0.975)))

##                  2.5%    97.5%
## (Intercept) -1.857572 1.148128
## Bwt          3.472462 4.618870

Comparison

We now have three sets of confidence intervals

“Conventional”/Gaussian:

##                 2.5 %   97.5 %
## (Intercept) -1.725163 1.011838
## Bwt          3.539343 4.528782

By resampling residuals:

##                  2.5%     97.5%
## (Intercept) -1.736062 0.9288667
## Bwt          3.555308 4.5194927

By resampling rows/cases:

##                  2.5%    97.5%
## (Intercept) -1.857572 1.148128
## Bwt          3.472462 4.618870

Why do the intervals keep getting wider?
Which of these is most reliable?

Sources of Error in Bootstrapping

Simulation Using only \(b\) bootstrap replicates
- Make this small by letting \(b\rightarrow\infty\)
- Costs computing time
- Diminishing returns: error is generally \(\propto 1/\sqrt{b}\)
- There are tricks for speeding up simulations, re-using parts of them, etc.
Approximation Using \(\hat{P}\) instead of \(P_{X}\)
- Make this small by careful statistical modeling
Estimation Only a finite number of samples
- Make this small by being careful about what we simulate (example in the backup slides)
For fixed \(n\), resampling generally shows more uncertainty than model-based bootstrap
Resampling is less vulnerable to modeling mistakes

Summing Up

We need to know the sampling distribution of our statistic \(T = \tau(X)\)
If we could simulate \(\tilde{X}\) from the true distribution of \(X\), we’d use the distribution of \(\tau(\tilde{X})\) as the sampling distribution (“Monte Carlo”)
We simulate from an estimate of the distribution of \(X\) instead (“bootstrap”)
- The simulation could be based on a model
- Or on re-sampling the observations
- Or some combination

Backup: What’s \(\hat{P}\) in Resampling?

(After discussion in class)

\(\hat{P}\) says:

Put probability \(1/n\) on each observed value of \(X\)
and new values are independent and identically distributed (IID)

Symbolically:

\[ \hat{P} = \frac{1}{n}\sum_{i=1}^{n}{\delta(x-x_i) \]

where the delta distribution or delta function puts probability 1 on \(0\) (and probability 0 on every other value)

(Think of an infinitely tall, infinitely narrow Gaussian centered at \(0\))

Backup: Improving on the Crude Confidence Interval

Crude CI use distribution of \(\tilde{\theta}\) under \(\hat{\theta}\)

But really we want the distribution of \(\hat{\theta}\) under \(\theta\)

Mathematical observation: Generally speaking, (distributions of) \(\tilde{\theta} - \hat{\theta}\) is closer to \(\hat{\theta}-\theta_0\) than \(\tilde{\theta}\) is to \(\hat{\theta}\)

(“centering” or “pivoting”)

Backup: The Basic, Pivotal CI

quantiles of \(\tilde{\theta}\) = \(q_{\alpha/2}, q_{1-\alpha/2}\)

\[ \begin{eqnarray*} 1-\alpha & = & \Probwrt{q_{\alpha/2} \leq \tilde{\theta} \leq q_{1-\alpha/2}}{\hat{\theta}} \\ & = & \Probwrt{q_{\alpha/2} - \hat{\theta} \leq \tilde{\theta} - \hat{\theta} \leq q_{1-\alpha/2} - \hat{\theta}}{\hat{\theta}} \\ & \approx & \Probwrt{q_{\alpha/2} - \hat{\theta} \leq \hat{\theta} - \theta_0 \leq q_{1-\alpha/2} - \hat{\theta}}{\theta_0}\\ & = & \Probwrt{q_{\alpha/2} - 2\hat{\theta} \leq -\theta_0 \leq q_{1-\alpha/2} - 2\hat{\theta}}{\theta_0}\\ & = & \Probwrt{2\hat{\theta} - q_{1-\alpha/2} \leq \theta_0 \leq 2\hat{\theta}-q_{\alpha/2}}{\theta_0} \end{eqnarray*} \]

Basically: re-center the simulations around the empirical data

The Bootstrap

Last Week

The Big Picture

Statistical Uncertainty

Measures of Uncertainty

Statistical Uncertainty Comes from the Sampling Distribution

The Difficulties

The Solutions

The Monte Carlo Principle

The Bootstrap Principle

Model-based Bootstrap

The Model-based Bootstrap

Example: Is Karakedi overweight?

Example (cont’d.)

Example (cont’d.)

Example (cont’d.)

Example (cont’d.)

Model Checking

Example (re-cont’d.)

Resampling

The Resampling (“Nonparameteric”) Bootstrap

Example, Take 2

Example, Take 2

Example, Take 2

Example, Take 2 (cont’d)

Bootstrapping Regressions

Different Regression Bootstraps

Cats’ Hearts

Cats’ Hearts (cont’d)

Cats’ Hearts (cont’d)

Cats’ Hearts (cont’d)

Cats’ Hearts (cont’d)

Cats’ Hearts (cont’d)

Cats’ Hearts (cont’d)

Comparison

Sources of Error in Bootstrapping

Summing Up

Backup: What’s \(\hat{P}\) in Resampling?

Backup: Improving on the Crude Confidence Interval

Backup: The Basic, Pivotal CI