Testing a parametric regression model
21 February 2019
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\]
The general problem
- You have a parametric regression model for your data
- e.g., linear with such-and-such variables
- You are worried that it might be misspecified, that the true \(\mu(x)\) isn’t in the model
- Now that we know nonparametric regression, we can test this
First case: linear specification is wrong
What this looks like
Plot it, plus the true regression curve
What linear fit do we get?
Fit the linear model
##
## Call:
## lm(formula = y ~ x, data = gframe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.571 -0.098 0.011 0.105 0.401
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.228 0.018 13 <2e-16
## x 0.428 0.011 40 <2e-16
##
## Residual standard error: 0.16 on 298 degrees of freedom
## Multiple R-squared: 0.84, Adjusted R-squared: 0.84
## F-statistic: 1.6e+03 on 1 and 298 DF, p-value: <2e-16
(N.B.: everything is crazily significant, but the linear model is wrong.)
What the mis-specified linear fit looks like
plot(x, yg, xlab="x", ylab="y")
curve(log(1+x), col="grey", add=TRUE, lwd=4)
abline(glinfit, lwd=4)
legend("bottomright", legend=c("Truth", "Linear"),
col=c("grey", "black"), lwd=4)
MSE of linear model, in-sample: 0.0251.
We’ll need to do that a lot, so make it a function:
## function(model) { mean(residuals(model)^2) }
- OFFLINE: Write comments giving the inputs and outputs of the function
ms.residuals
- OFFLINE: Check that the function works (how?)
Fit the non-parametric alternative; use splines as implemented in gam()
:
Add the fitted values from the spline to the plot:
plot(x,yg,xlab="x",ylab="y")
curve(log(1+x),col="grey",add=TRUE,lwd=4)
abline(glinfit, lwd=4)
lines(x,fitted(gnpr),col="blue",lwd=4)
legend("bottomright", legend=c("Truth", "Linear", "Spline"),
col=c("grey", "black", "blue"), lwd=4)
Calculate the difference in MSEs:
## [1] 0.003589289
- OFFLINE: Why do we believe this code works?
How do we test the linear model?
- \(H_0\): \(\mu(x)\) is a linear function vs. \(H_A\): \(\mu(x)\) is not linear
- Test statistic: Difference in MSEs \(=D\)
- If the linear model is right, it will predict at least as well as a non-parametric model
- If the linear model is wrong, eventually the non-parametric model will beat it
- Need the sampling distribution of \(D\) under the null hypothesis
- Need to simulate the linear model
- And need to calculate \(D\) over and over
Simulate the parametric model
Simulate from the parametric model, assuming Gaussian noise
- OFFLINE: Write comments describing input and output of the
sim.lm
function
- OFFLINE: How can we check that
sim.lm
is working?
- OFFLINE: How can we switch to non-Gaussian noise?
Calculate the test statistic
Calculate difference in MSEs between parametric and nonparametric models on a data frame:
- OFFLINE: Comments once again (with feeling this time)
- OFFLINE: Check that this function works properly (how?)
Put the pieces together
Calculate the MSE difference on one simulation run:
## [1] 0.0001258959
Calculate the MSE difference on 200 simulation runs, so we get a sample from the null hypothesis:
How often does the simulation produce gaps bigger than what we really saw?
## [1] 0
Look at the distribution
Plot histogram of the sampling distribution, and the observed value:
Second case: linear model is properly specified
Deliberately left uncommented so that you can explore
y2 <- 0.2+0.5*x + rnorm(length(x),0,0.15)
# Why don't I have to make up x values again?
y2.frame <- data.frame(x=x,y=y2)
plot(x,y2,xlab="x",ylab="y",pch=16)
abline(0.2,0.5,col="grey",lwd=4)
## [1] 7.64804e-11
## [1] 0.42
Extensions/lessons
- Don’t need to guess at specific nonlinearities; let the smoother figure it out
- Extends to testing a linear model with multiple predictors easily
- Choice of whether the alternative then is an additive model vs. a general nonparametric regression
- Additive: more power to detect additive nonlinearities, can easily miss nonlinear interactions
- General model: sensitive to more mis-specifications but less power over-all (curse of dimensionality again)
- Nothing special about linear models — the same idea works for testing any parametric regression model
Another approach, based on residuals
\[
Y = \mu(X) + \epsilon, ~ \Expect{\epsilon|X} = 0
\]
- \(\Rightarrow\) if we’ve got the right model, \(\Expect{Y-\mu(X)|X=x} = 0\) for all \(x\)
- \(\Rightarrow\) smooth the residuals and see if we get (close to) 0
par(mfrow=c(1,2))
plot(gam(residuals(glinfit) ~ s(x)),residuals=TRUE,
lwd=4,ylab="Residuals of the linear model", main="Reality nonlinear")
abline(h=0,col="grey")
plot(gam(residuals(y2.fit) ~ s(x)), residuals=TRUE,
lwd=4, ylab="Residuals of the linear model", main="Reality linear")
abline(h=0,col="grey")
Smoothing residuals, cont’d
- We’d need to calculate how far the smoothed-residuals curve is from 0
- Once common choice is the \(L_2\) norm, \(\|r\|^2_2 \equiv \int{r^2(x) dx}\), or \(\int{r^2(x) f(x) dx}\) to weight by the pdf \(f(x)\)
- Could approximate the weighted version by \(\frac{1}{n}\sum_{i=1}^{n}{r^2(x_i)}\) (why?)
- Then simulate to get a null distribution for that test statistic
Backup: Interactions in gam()
- Can extend the additive model to include interaction terms: \[
Y = \alpha + \sum_{j=1}^{p}{f_j(x_j)} + \sum_{j=1}^{p}{\sum_{k=j+1}^{p}{f_{jk}(x_j, x_k)}} + \epsilon
\]
- Estimate \(f_{jk}\) by taking partial residuals and doing a 2D smooth against \(x_j\) and \(x_k\)
- How do we do this in R?
s(xj, xk)
fits a 2D (“thin plate”) spline, good if xj
and xk
are on the same scale
te(xj, xk)
fits a “tensor-product” spline, good if xj
and xk
are measured on different scales
- When do you want interactions?
- When it makes sense (location interacts latitude and longitude)
- Diagonstic: scatter-plot residuals vs.
xj
and xk
, look for patterns
- Reading: textbook, section 8.5
Convergence when we add interactions to a GAM
- Convergence rate for a \(k\)-dimensional smoother is \(O(n^{-4/(4+k)})\)
- MSE of true regression function \(\mu\) is \(\sigma^2\)
- For the additive model (w/ no interactions), \[
MSE = \sigma^2 + (\text{approximation bias of best additive model})^2 + O(pn^{-4/5})
\]
- For the model with all two-way interactions, \[
MSE = \sigma^2 + (\text{approximation bias of best two-way model})^2 + O(p^2 n^{-4/6})
\]
- There are \({p \choose 2} = O(p^2)\) possible interactions, each converging at rate \(O(n^{-4/6})\), and these dominate the \(O(n^{-4/5})\) of the additive terms
- For the model with all \(k\)-way interactions, \[
MSE = \sigma^2 + (\text{approximation bias})^2 + O\left({p \choose k} n^{-4/(4+k)}\right)
\]