\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \]
For a univariate time-series \(X(0), X(1), \ldots X(t), \ldots\), the AR\((p)\) model is \[ X(t) = a + \sum_{j=1}^{p}{b_j X(t-j)} + \epsilon(t) \] where \(\epsilon(t)\) are uncorrelated (and usually IID) innovations, with common variance \(\tau^2\). We need to also specify distributions for \(X(0), X(1), \ldots X(p-1)\).
On the other hand, for a multivariate, i.e., vector, time-series \(\vec{Y}(0), \vec{Y}(1), \ldots \vec{Y}(t), \ldots\), the VAR(1) model is \[ \vec{Y}(t) = \vec{\alpha} + \mathbf{\beta}\vec{Y}(t-1) + \vec{\epsilon}(t) \] where now the innovations are uncorrelated across time, and have a common variance-covariance matrix, \(\Var{\vec{\epsilon}(t)} = \mathbf{T}\). As a first-order model, this also needs a distribution for \(\vec{Y}(0)\). I have switched to Greek letters for the VAR, to help keep it clear that these aren’t the same coefficients as in the univariate model.
That said, any AR(p) can be turned into a VAR(1), as follows. Let \[ \mathbf{\beta} = \left[\begin{array}{ccccc} b_1 & b_2 & \ldots & b_{p-1} & b_p\\ 1 & 0 & \ldots & 0 & 0\\ 0 & 1 & \ldots & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \ldots 1 & 0 \end{array}\right] \] That is, the first row of \(\mathbf{\beta}\) contains the \(b_j\) coefficients of the AR\((p)\), and each subsequent row has a single 1 (in column \(i-1\) for row \(i\)) and all other entries 0. Let \[ \vec{\alpha} = \left[\begin{array}{c} a\\ 0 \\ \vdots \\ 0\end{array}\right] \] and let \(\mathbf{T}_{11} = \tau^2\), while otherwise \(\mathbf{T}_{ij} = 0\). Finally, let \[ \vec{Y}(0) = \left[\begin{array}{c} X(p-1)\\ X(p-2)\\ \vdots \\ X(0)\end{array}\right] \]
What will happen when we evolve this VAR(1)? The next value of the state vector will clearly be \[ \vec{Y}(1) = \left[\begin{array}{c} a + \sum_{j=1}^{p}{b_j X(p-j)}+\epsilon_1(p)\\ X(p-1)\\ X(p-2) \\ \vdots X(1)\end{array}\right] \] So the first coordinate of \(\vec{Y}(1)\) is \(X(p)\) from the univariate time-series. Iterating, the first coordinate of \(\vec{Y}(2)\) will be \(X(p+1)\), and so forth. The first coordinate of \(\vec{Y}(t)\) is \(X(t+p-1)\), the second coordinate is \(X(t+p-2)\), etc.
A VAR(1) can only be stationary if it can have constant variance. Having constant variance, in turn, depends on the eigenvalue of \(\mathbf{\beta}\).
To see this, look at the equation for the variance of \(Y(t)\): \[\begin{eqnarray} \Var{\vec{Y}(t)} & = & \Var{\vec{\alpha} + \mathbf{\beta}\vec{Y}(t-1) + \vec{\epsilon}(t)}\\ & = & \Var{\mathbf{\beta}\vec{Y}(t-1) + \vec{\epsilon}(t)}\\ & = & \Var{\mathbf{\beta}\vec{Y}(t-1)} + \Var{\vec{\epsilon}(t)}\\ & = & \mathbf{\beta}\Var{vec{Y}(t-1)}\mathbf{\beta}^T + \mathbf{T} \end{eqnarray}\] The process can only be stationary if \(\Var{\vec{Y}(t)} = \Var{\vec{Y}(t-1)}\), say \(\mathbf{\Sigma}\), so we need solutions to the equation \[ \mathbf{\Sigma} = \mathbf{\beta}\mathbf{\Sigma}\mathbf{\beta}^T + \mathbf{T}\\ \] But this implies that \[ \mathbf{\Sigma} - \mathbf{\beta}\mathbf{\Sigma}\mathbf{\beta}^T = \mathbf{T} \] Since \(\mathbf{T}\) is a covariance matrix, it is positive definite, meaning that for any vector \(\vec{v} \neq 0\), \(\vec{v}^T \mathbf{T}\vec{v} > 0\). Thus \[ \vec{v}^T\mathbf{\Sigma}\vec{v} > \vec{v}^T(\mathbf{\beta}\mathbf{\Sigma}\mathbf{\beta}^T)\vec{v} \] Suppose \(\vec{v}\) is an eigenvector of \(\mathbf{b}^T\) with eigenvalue \(\lambda\). Then \[\begin{eqnarray} \vec{v}^T(\mathbf{\beta}\mathbf{\Sigma}\mathbf{\beta}^T)\vec{v} & = & \lambda \vec{v}^T \mathbf{\Sigma} \vec{v} \lambda\\ & = & |\lambda^2| \vec{v}^T \mathbf{\Sigma} \vec{v} \end{eqnarray}\] Hence \(|\lambda| < 1\).
To sum up, a VAR(1) can be stationary only if the eigenvalues of its slope matrix \(\mathbf{\beta}\) are all inside the unit circle. Conversely, if all the eigenvalues are inside the unit circle, there is some invariant covariance matrix \(\mathbf{\Sigma}\) for any innovation-variance matrix \(\mathbf{T}\).
We’ve seen how to represent AR\((p)\) processes as VAR(1) processes on a higher-dimensional state space. The representation makes it clear that either both the AR\((p)\) and the VAR(1) are stationary, or neither is. So a necessary condition for an AR\((p)\) to be stationary is that all the eigenvalues of the corresponding VAR(1)’s \(\mathbf{\beta}\) matrix be inside the unit circle. What does this mean in terms of the original coefficients of the AR\((p)\) model?
Recall that eigenvalues are solutions to \[ |\mathbf{\beta} - \lambda \mathbf{I}| = 0 \] In this case, that amounts to \[ \left|\begin{array}{ccccc} b_1-\lambda & b_2 & \ldots & b_{p-1} & b_p\\ 1 & -\lambda & \ldots & 0 & 0\\ 0 & 1 & \ldots & 0 & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \ldots 1 & -\lambda \end{array}\right| = 0 \]
If \(p=3\) we can evaluate this explicitly: \[ (b_1 - \lambda)\lambda^2 + b_2 \lambda + b_3 = 0 \]
The general pattern continues to hold for higher-order models: the eigenvalues \(\lambda\) solve the equation \[ \left(\sum_{j=1}^{p}{b_j \lambda^{p-j}}\right) - \lambda^p = 0 \] The VAR(1), and hence the corresponding AR\((p)\), can be stationary only if all the solutions \(\lambda\) to this equation lie inside the unit circle, i.e., only if \(|\lambda_i| < 1\) for all \(i \in 1:p\).
If you open up a typical reference on time series, you will see the assertion that an AR\((p)\) model is stationary only if all the roots of the polynomial \[ z^p - \sum_{j=1}^{p}{b_j z^{p-j}} \] lie inside the unit circle. Since the roots of a polynomial are the values of \(z\) which make the polynomial equal to 0, this is exactly the same criterion we just got from the VAR(1) representation.
In some parts of the time-series literature, especially in time-series econometrics, people talk, and think, about checking stationarity by checking for “unit roots”. What they have in mind is seeing whether any of the eigenvalues of the \(\mathbf{\beta}\) matrix are on the unit circle (a “unit root” of the characteristic polynomial) or outside it. This is common, but in exact in two important ways: