Additive Models

36-402, Section A, Spring 2019

19 February 2019

\[ \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\mu}} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathbb{V}\left[ #1 \right]} \DeclareMathOperator*{\argmin}{argmin} % thanks, wikipedia! \]

Previously:

Previously (cont’d):

EXERCISE

  1. What’s the best bandwidth? How does it change with \(n\) and \(p\)?
  2. Does the bias at the best bandwidth \(\rightarrow 0\) as \(n\rightarrow\infty\)?
  3. Does the variance at the best bandwidth \(\rightarrow 0\) as \(n\rightarrow\infty\)?
  4. Does \(\EstRegFunc \rightarrow \TrueRegFunc\) as \(n\rightarrow\infty\)?

SOLUTION

Part 1:

\[\begin{eqnarray} \left.\frac{dMSE}{dh}\right|_{h=h_{opt}} & = & 0 = 0 + 3bh_{opt}^3 - \frac{vp}{n}h_{opt}^{-p-1}\\ h_{opt}^{4+p} & = & \frac{vp}{3b}n^{-1}\\ h_{opt} & = & \left(\frac{vp}{3b}\right)^{1/(4+p)} n^{-\frac{1}{4+p}}\\ h_{opt} & = & O(n^{-\frac{1}{4+p}}) \end{eqnarray}\]

Part 2:

\[\begin{eqnarray} \mathrm{bias} & = & O(h_{opt}^2)\\ & = & O(n^{-\frac{2}{4+p}}) \rightarrow 0 \end{eqnarray}\]

Part 3:

\[\begin{eqnarray} \mathrm{variance} & = & O(n^{-1} h_{opt}^{-p})\\ & = & O(n^{-1} n^{-\frac{p}{4+p}})\\ & = & O(n^{-\frac{4}{4+p}}) \rightarrow 0 \end{eqnarray}\]

Part 4:

Yes, if bias and variance both \(\rightarrow 0\), then the estimate must converge on the truth

Kernel regression is universally consistent

Kernel regression can converge very slowly

Kernel regression can converge very slowly

The problem isn’t kernel regression

Lifting the curse

Additive models

So how do we estimate an additive model?

Backfitting (or Gauss-Seidel)

  1. Start with \(\alpha = \overline{y}\), and \(f_1 = f_2 \ldots = f_p = 0\)
  2. For \(j \in 1:p\):
    1. \(r_i \leftarrow y_i - (\alpha + \sum_{k\neq j}{f_k(x_{ik})})\) (partial residual)
    2. Use your favorite smoother to regress \(r\)’s on \(x_j\) (1D smoothing), get function \(s_j\)
    3. \(f_j \leftarrow s_j - \overline{s_j}\) (keep functions centered at 0)
  3. Repeat (1) until things stop changing

In R: mgcv

library(mgcv)
x <- matrix(runif(1000, min=-pi, max=pi), ncol=2)
y <- 0.5*x[,1]+sin(x[,2])+rnorm(n=nrow(x), sd=0.1)
df <- data.frame(y=y, x1=x[,1], x2=x[,2])
babys.first.am <- gam(y ~ s(x1)+s(x2), data=df)
par(mfrow=c(1,2)); plot(babys.first.am)

s()

Methods for mgcv::gam

A small real-data example

library(gamair)
data(chicago)
chicago.am <- gam(death ~ s(tmpd) + s(pm10median), data=chicago)

A small real-data example (cont’d)

par(mfrow=c(1,2)); plot(chicago.am)

A small real-data example (cont’d)

# Handling missing values by padding in NAs:
am.fits <- predict(chicago.am, newdata=chicago, na.action=na.pass)
plot(death ~ time, data=chicago)
points(chicago$time, am.fits, pch="x", col="red")

But what about the curse?

Summing up