\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]

In our last episode

General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\beta = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
- Prediction is also \(\Expect{Y} + \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y} \cdot (\vec{Z} - \Expect{\vec{Z}})\)
Today: application to time series
- What can this do for us?
- How do we find the covariances?

Optimal linear prediction for time series

Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
Desired: prediction of \(X(t_0)\)

\[\begin{eqnarray} m(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\ \alpha & = & \Expect{X(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right] ~ \text{(goes away if everything's centered)}\\ \vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\ \Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\ \vdots & \vdots & \ldots & \vdots\\ \Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{X(t_0), X(t_1)}\\ \Cov{X(t_0), X(t_2)}\\ \vdots \\ \Cov{X(t_0), X(t_n)}\end{array}\right] \end{eqnarray}\]

What is this good for?

Interpolation

Time series often have gaps
- Instruments fail, people mess up, circumstances…
What happened at the times in between the observations?

Back to Kyoto

A lot of what we see in this plot is just made up

What we didn’t tell R to make up

When did the cherries flower in 1015?

We need \(\Cov{X(1015), X(t_i)}\) for every year \(t_i\) where we have data
We need \(\Expect{X(t_i)}\), \(\Var{X(t_i)}\) and \(\Cov{X(t_i), X(t_j)}\) ditto
We need \(\Expect{X(1015)}\)

Similarly for extrapolation

What was \(X(800)\)? (“retrodiction”)
What will \(X(2021)\) be? (“prediction” or “forecast” in the strictest sense)

Getting the expectations and covariances

We only see each \(X(t_i)\) once
- Maybe gives us an idea of \(\Expect{X(t_i)}\)
- but not \(\Var{X(t_i)}\) or \(\Cov{X(t_i), X(t_j)}\)
- let alone \(\Cov{X(t_0), X(t_i)}\)
We could repeat the experiment many times
We could make assumptions

Repeating the experiment

\[\begin{eqnarray} \overline{x}(t) \equiv \frac{1}{n}\sum_{i=1}^{n}{x^{(i)}(t)} & \rightarrow & \Expect{X(t)}\\ \frac{1}{n}\sum_{i=1}^{n}{(x^{(i)}(t) - \overline{x}(t)) (x^{(i)}(s) - \overline{x}(s))} & \rightarrow & \Cov{X(t), X(s)} \end{eqnarray}\]

Making assumptions

Assume some covariances (and expectations) are equal
Weaker: assume some of them are similar

Stationarity (assuming covariances are equal)

A time series is weakly stationary when \(\Expect{X(t)} = \Expect{X(s)}\) (constant in time) and \(\Cov{X(t), X(s)} = \gamma(|t-s|)\)
- \(\gamma(h)\) is the autocovariance function at lag \(h\)
- this is also called “second-order” or “wide-sense” stationarity
  - We’ll cover strong / full / strict stationarity later
Weak stationarity implies all the expectations are the same
Weak stationarity implies \(\Cov{X(t), X(t+h)} = \Cov{X(s), X(s+h)}\) so lots of covariances are the same
Weak stationarity lets us pool information: \[\begin{eqnarray} \overline{x} \equiv \frac{1}{n}\sum_{i=1}^{n}{X(t_i)} & \rightarrow & \Expect{X(0)}\\ \frac{1}{n}\sum_{i=1}^{n}{(x(t_i) - \overline{x})(x(t_i + h) - \overline{x})} & \rightarrow & \gamma(h) \end{eqnarray}\]

The autocovariance function

\(\gamma(0) = \Var{X(t)}\) (constant in \(t\))
\(\rho(h) \equiv \frac{\gamma(h)}{\gamma(0)} =\) autocorrelation function

Assuming stationarity…

\[\begin{eqnarray} \mathbf{v} & \equiv & \left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\ \gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]\\ \mathbf{c} & = & \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|) & \vdots \\ \gamma(|t_0 -t_n|)\end{array} \right]\\ \vec{\beta} & = & \mathbf{v}^{-1}\mathbf{c}\\ m(t_0) & = & \overline{x} + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] = \overline{x} + \mathbf{c}^{T}\mathbf{v}^{-1} \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right]\\ \Var{X(t_0) - m(t_0)} & = & \gamma(0) - \mathbf{c}^T\mathbf{v}^{-1} \mathbf{c} \end{eqnarray}\]

In R

acf(x, lag.max, type, plot, na.action, ...)

x = a time series (e.g., a vector) or a data frame of multiple time series
lag.max = maximum value of lag \(h\)
- Calculates \(\gamma(0), \gamma(1), \ldots \gamma(h)\) and stops there
type = correlation (default) or covariance?
plot = make a plot? (default)
na.action = how to handle NAs? default is give up, na.pass will use complete pairs

In R

kyoto.acf <- acf(kyoto$Flowering.DOY, lag.max = 100, type = "covariance", na.action = na.pass)

In R

kyoto.acf[0:5]

## 
## Autocovariances of series 'kyoto$Flowering.DOY', by lag
## 
##     0     1     2     3     4     5 
## 41.54  8.71  8.30  8.46  6.69  6.78

kyoto.acf[0:5]$acf

## , , 1
## 
##           [,1]
## [1,] 41.540228
## [2,]  8.706940
## [3,]  8.304284
## [4,]  8.464531
## [5,]  6.692514
## [6,]  6.780411

When was the flowering in 1015?

available.years <- with(na.omit(kyoto), Year.AD[Year.AD > 1015 - 49 & Year.AD < 1015 + 
    49])
historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
CovYZ <- matrix(kyoto.acf[abs(available.years - 1015)]$acf, ncol = 1)
year.lags <- outer(available.years, available.years, "-")
year.lags <- abs(year.lags)
VarZ <- kyoto.acf[year.lags]$acf
VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
(fitted.value <- historical.mean + (Z - historical.mean) %*% solve(VarZ) %*% CovYZ)

##          [,1]
## [1,] 106.2388

(fitted.value.se <- sqrt(as.matrix(kyoto.acf[0]$acf) - t(CovYZ) %*% solve(VarZ) %*% 
    CovYZ))

##         [,1]
## [1,] 5.87276

When was the flowering in 1015?

This is a lot of work…

…and we’d need to re-do most of it for every other year

… so we should write a function

kyoto.prediction <- function(times) {
    historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
    historical.variance <- as.matrix(kyoto.acf[0]$acf)
    fits <- matrix(0, nrow = length(times), ncol = 3)
    colnames(fits) <- c("time", "fit", "se")
    fits[, "time"] <- times
    rownames(fits) <- paste(times)
    for (t in times) {
        available.years <- with(na.omit(kyoto), Year.AD[Year.AD > t - 49 & Year.AD < 
            t + 49])
        available.years <- setdiff(available.years, t)
        CovYZ <- matrix(kyoto.acf[abs(available.years - t)]$acf, ncol = 1)
        year.lags <- outer(available.years, available.years, "-")
        year.lags <- abs(year.lags)
        VarZ <- kyoto.acf[year.lags]$acf
        VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
        Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
        fits[paste(t), "fit"] <- historical.mean + (Z - historical.mean) %*% solve(VarZ) %*% 
            CovYZ
        fits[paste(t), "se"] <- sqrt(max(0, historical.variance - t(CovYZ) %*% solve(VarZ) %*% 
            CovYZ))
    }
    return(data.frame(fits))
}

Finally, some interpolation: close up

Finally, some interpolation: zoom out

Some extrapolation

plot(Flowering.DOY ~ Year.AD, data = kyoto, type = "p", pch = 16, cex = 0.5, xlim = c(1970, 
    2030))
new.fits <- kyoto.prediction(1970:2030)
lines(fit ~ time, data = new.fits, col = "blue")
lines(fit - se ~ time, data = new.fits, col = "blue", lty = "dashed")
lines(fit + se ~ time, data = new.fits, col = "blue", lty = "dashed")

Some further extrapolation

What’s going on here?

Stationarity and trend-plus-fluctuation

We can always say \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\Expect{X(t)} =\TrueRegFunc(t)\) and \(\Expect{\TrueNoise(t)} = 0\)
If \(X\) is (weakly) stationary, then
- \(\TrueRegFunc(t) = \mu(0)\), a constant
- \(\Cov{\TrueNoise(t), \TrueNoise(s)} = \Cov{X(t), X(s)} = \gamma(|t-s|)\)
Assuming a flat trend is pretty strong…

Stationary fluctuations around a trend

If \(\TrueNoise\) is weakly stationary, then \(X - \TrueRegFunc\) is weakly stationary
so \(X - \EstRegFunc\) should be (approximately) weakly stationary
Recipe:
- Use smoothing to estimate \(\Expect{X(t)}\) as \(\EstRegFunc(t)\)
- Find ACF from \(\EstNoise(t) = X(t) - \EstRegFunc(t)\)
  - Warning: Yule-Slutsky says there’s bound to be some correlations in \(\EstNoise\)…
- Now get the coefficients and predict

Detrending the cherry blossoms with a spline

kyoto.spline <- with(na.omit(kyoto), smooth.spline(x = Year.AD, y = Flowering.DOY))
residuals.with.NAs <- rep(NA, times = nrow(kyoto))
residuals.with.NAs[!is.na(kyoto$Flowering.DOY)] <- na.omit(kyoto$Flowering.DOY) - 
    kyoto.spline$y
kyoto$fluctuation <- residuals.with.NAs

Detrending the cherry blossoms with a spline

Covariance estimation

acf uses the sample covariance
Estimates covariance at each lag separately
Sometimes inefficient
- Covariances might have known form, e.g., \(\gamma(h) = \gamma(0)e^{-h/\tau}\)
  - We’ll look more at that next time, in the spatial context
- Even if we don’t believe something like that, we might believe that \(\gamma(h)\) should be close to \(\gamma(h-1)\) and \(\gamma(h+1)\) \(\Rightarrow\) use smoothing

Checking stationarity

Divide the data into intervals
Re-estimate ACF on each interval
ACF shouldn’t change much

Checking stationarity: raw data

Checking stationarity: after spline detrending

Checking stationarity

Divide the data into intervals
Re-estimate ACF on each interval
Shouldn’t change much
- How much difference is too much? (Will answer later with simulation)
Some inherent pitfalls:
- Very slight non-stationarity looks pretty stationary
- Stationary but with \(\gamma(h) \rightarrow 0\) as \(h\rightarrow\infty\) very slowly looks pretty non-stationary
  - “Long-range correlations”, “slowly-decaying correlations”
- Trends show up as slowly-decaying correlations

Multiple time series

\(X\) and \(U\) are jointly weakly stationary when
- \(X\) is weakly stationary, with ACF \(\gamma_X\)
- \(U\) is weakly stationary, with ACF \(\gamma_U\)
- \(\Cov{X(t), U(s)} = \gamma_{UX}(|t-s|)\)
ccf in R will calculate the cross-covariance (or cross-correlation) function
- So will acf if it’s given a matrix or data frame
Plug and chug: if predicting \(X\) from \(U\)
- \(\gamma_U\) goes into the variance matrix \(\mathbf{v}\)
- \(\gamma_{UX}\) goes into the covariance matrix \(\mathbf{c}\)

Where did all this come from?

Norbert Wiener (1894–1964)
- One of the great mathematicians of the 20th century
  - If you think you’ve got issues with parents pushing you to succeed in school, read Wiener (1953)
- Worked very closely with engineers
- His ideas are why we talk about “feedback”, “information”, “cyber-”, etc. (Wiener 1961, 1954)
- (Photo from Kline (2017))

Wiener’s research in 1942

Anti-aircraft fire control
- Aim at where the target aircraft will be
- The target moves erratically
- \(\Rightarrow\) Need to extrapolate a random process
Wiener’s solution: basically what we’ve just done
- Plus the continuous-time version
- Plus implementation using 1940s electronics
- Declassified after the war as Wiener (1949)
- Parallel work by Kolmogorov (1941)
Wiener and Kolmogorov’s big idea was to realize that optimal linear prediction didn’t need the usual regression assumptions (Gaussian noise, independent variables, etc.)

Summing up

Optimal linear prediction needs to know the trend and the autocovariance function
Estimating them from one time series needs assumptions
- Remove trend
- Find ACF from residuals / estimated fluctuations
Once we have the ACF, finding coefficients is just linear algebra

Backup: Not inverting the variance matrix

\(\vec{\beta} = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
But inverting an \(n\times n\) matrix takes \(O(n^3)\) operations
- Then matrix multiplying \(\Var{\vec{Z}}^{-1}\) by \(\Cov{\vec{Z}, Y}\) is \(O(n^2)\), and multiplying \(\vec{\beta}\) by \(Z\) is \(O(n)\), and we need to do everything \(n\) times so over-all time complexity of getting \(n\) fits is \(n(O(n^3) + O(n^2) + O(n)) = O(n^4)\)
Strictly speaking we don’t need to invert \(\Var{\vec{Z}}\); we just need to find the \(\vec{\beta}\) which solves the equation \(\Var{\vec{Z}} \vec{\beta} = \Cov{\vec{Z}, Y}\)
There are algorithms for solving a linear system of equations like this which take only \(O(n^2)\) time
- So getting all \(n\) predictions is just \(O(n^3)\)
- May not sound impressive but if \(n=1000\) you’ll be very glad of that factor of \(n\)
Details of how to solve without inverting are left to numerical linear algebra texts
- Programming exercise: re-do my code for the optimal linear predictor so it doesn’t invert the variance matrix but does solve() for the optimal coefficients

Backup: How good is the optimal linear predictor?

all.fits$std.residuals <- (kyoto$Flowering.DOY - all.fits$fit)/all.fits$se
mean(all.fits$std.residuals, na.rm = TRUE)

## [1] -0.008035619

sd(all.fits$std.residuals, na.rm = TRUE)

## [1] 1.007411

Ideally: mean 0, standard deviation 1

Backup: How good is the optimal linear predictor?

This is a pretty good random scatter of points

Backup: fancier covariance estimation

Two (equivalent) definitions of the covariance: \[ \Cov{X(t), X(s)} = \Expect{X(t) X(s)} - \Expect{X(t)}\Expect{X(s)} = \Expect{(X(t) - \Expect{X(t)}) (X(s) - \Expect{X(s)})} \]
Use the 2nd form: define \[ \Gamma(t,s) \equiv (X(t) - \Expect{X(t)})(X(s) - \Expect{X(s)}) \] so \(\Cov{X(t), X(s)} = \Expect{\Gamma(t,s)}\)
Now assume constant mean so we can estimate \[ \widehat{\Gamma}(t,s) = (X(t) - \overline{x}) (X(s) - \overline{x}) \]
Now assume stationarity so \(\Cov{X(t), X(s)} = \Expect{\Gamma(t,s)} = \gamma(|t-s|)\)
Finally assume \(\gamma(h)\) changes slowly in \(h\)
We can estimate \(\gamma(h)\) by finding pairs \(t, s\) with \(|t-s| \approx h\) and averaging \(\widehat{\Gamma}(t,s)\)
- Or kernel smoothing or spline smoothing or…

Backup: stationary linear predictor vs. spline

References

Kline, Ronald R. 2017. The Cybernetics Moment: Or Why We Call Our Age the Information Age. Baltimore, Maryland: Johns Hopkins University Press.

Kolmogorov, Andrei N. 1941. “Interpolation Und Extrapolation von Stationären Zufälligen Folgen.” Bulletin of the Academy Sciences, USSR, Mathematical Series 5:3–14.

Wiener, Norbert. 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications. Cambridge, Massachusetts: The Technology Press of the Massachusetts Institute of Technology.

———. 1953. Ex-Prodigy: My Childhood and Youth. New York: Simon; Schuster.

———. 1954. The Human Use of Human Beings: Cybernetics and Society. 2nd ed. Garden City, New York: Doubleday.

———. 1961. Cybernetics: Or, Control and Communication in the Animal and the Machine. 2nd ed. Cambridge, Massachusetts: MIT Press.

Linear Prediction for Time Series

In our last episode

Optimal linear prediction for time series

Interpolation

Back to Kyoto

What we didn’t tell R to make up

When did the cherries flower in 1015?

Similarly for extrapolation

Getting the expectations and covariances

Repeating the experiment

Making assumptions

Stationarity (assuming covariances are equal)

The autocovariance function

Assuming stationarity…

In R

In R

In R

When was the flowering in 1015?

When was the flowering in 1015?

This is a lot of work…

Finally, some interpolation: close up

Finally, some interpolation: zoom out

Some extrapolation

Some further extrapolation

Stationarity and trend-plus-fluctuation

Stationary fluctuations around a trend

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Covariance estimation

Checking stationarity

Checking stationarity: raw data

Checking stationarity: after spline detrending

Checking stationarity

Multiple time series

Where did all this come from?

Wiener’s research in 1942

Summing up

Backup: Not inverting the variance matrix

Backup: How good is the optimal linear predictor?

Backup: How good is the optimal linear predictor?

Backup: fancier covariance estimation

Backup: stationary linear predictor vs. spline

References