11.3 Vector autoregressions

One limitation with the models we have considered so far is that they impose a unidirectional relationship — the forecast variable is influenced by the predictor variables, but not vice versa. However, there are many cases where the reverse should also be allowed for — where all variables affect each other. In Chapter 9, the changes in personal consumption expenditure (\(C_t\)) were forecast based on the changes in personal disposable income (\(I_t\)). However, in this case a bi-directional relationship may be more suitable: an increase in \(I_t\) will lead to an increase in \(C_t\) and vice versa.

An example of such a situation occurred in Australia during the Global Financial Crisis of 2008–2009. The Australian government issued stimulus packages that included cash payments in December 2008, just in time for Christmas spending. As a result, retailers reported strong sales and the economy was stimulated. Consequently, incomes increased.

Such feedback relationships are allowed for in the vector autoregressive (VAR) framework. In this framework, all variables are treated symmetrically. They are all modelled as if they influence each other equally. In more formal terminology, all variables are now treated as “endogenous”. To signify this we now change the notation and write all variables as \(y\)s: \(y_{1,t}\) denotes the \(t\)th observation of variable \(y_1\), \(y_{2,t}\) denotes the \(t\)th observation of variable \(y_2\), and so on.

A VAR model is a generalisation of the univariate autoregressive model for forecasting a collection of variables; that is, a vector of time series.23 It comprises one equation per variable considered in the system. The right hand side of each equation includes a constant and lags of all the variables in the system. To keep it simple, we will consider a two variable VAR with one lag. We write a 2-dimensional VAR(1) as \[\begin{align} \label{var1a} y_{1,t} &= c_1+\phi _{11,1}y_{1,t-1}+\phi _{12,1}y_{2,t-1}+e_{1,t} \tag{11.1}\\ y_{2,t} &= c_2+\phi _{21,1}y_{1,t-1}+\phi _{22,1}y_{2,t-1}+e_{2,t} \tag{11.2} \end{align}\] where \(e_{1,t}\) and \(e_{2,t}\) are white noise processes that may be contemporaneously correlated. Coefficient \(\phi_{ii,\ell}\) captures the influence of the \(\ell\)th lag of variable \(y_i\) on itself, while coefficient \(\phi_{ij,\ell}\) captures the influence of the \(\ell\)th lag of variable \(y_j\) on \(y_i\).

If the series modelled are stationary we forecast them by directly fitting a VAR to the data (known as a “VAR in levels”). If the series are non-stationary we take differences to make them stationary and then we fit a VAR model (known as a “VAR in differences”). In both cases, the models are estimated equation by equation using the principle of least squares. For each equation, the parameters are estimated by minimising the sum of squared \(e_{i,t}\) values.

The other possibility which is beyond the scope of this book and therefore we do not explore here, is that series may be non-stationary but they are cointegrated, which means that there exists a linear combination of them that is stationary. In this case a VAR specification that includes an error correction mechanism (usually referred to as a vector error correction model) should be included and alternative estimation methods to least squares estimation should be used.24

Forecasts are generated from a VAR in a recursive manner. The VAR generates forecasts for each variable included in the system. To illustrate the process, assume that we have fitted the 2-dimensional VAR(1) described in equations (11.1)(11.2), for all observations up to time \(T\). Then the one-step-ahead forecasts are generated by \[\begin{align*} \hat y_{1,T+1|T} &=\hat{c}_1+\hat\phi_{11,1}y_{1,T}+\hat\phi_{12,1}y_{2,T} \\ \hat y_{2,T+1|T} &=\hat{c}_2+\hat\phi _{21,1}y_{1,T}+\hat\phi_{22,1}y_{2,T}. \end{align*}\]

This is the same form as (11.1)(11.2), except that the errors have been set to zero and parameters have been replaced with their estimates. For \(h=2\), the forecasts are given by \[\begin{align*} \hat y_{1,T+2|T} &=\hat{c}_1+\hat\phi_{11,1}\hat y_{1,T+1}+\hat\phi_{12,1}\hat y_{2,T+1}\\ \hat y_{2,T+2|T}&=\hat{c}_2+\hat\phi_{21,1}\hat y_{1,T+1}+\hat\phi_{22,1}\hat y_{2,T+1}. \end{align*}\] Again, this is the same form as (11.1)(11.2), except that the errors have been set to zero, parameters have been replaced with their estimates, and the unknown values of \(y_1\) and \(y_2\) have been replaced with their forecasts. The process can be iterated in this manner for all future time periods.

There are two decisions one has to make when using a VAR to forecast. They are, how many variables (denoted by \(K\)) and how many lags (denoted by \(p\)) should be included in the system. The number of coefficients to be estimated in a VAR is equal to \(K+pK^2\) (or \(1+pK\) per equation). For example, for a VAR with \(K=5\) variables and \(p=3\) lags, there are 16 coefficients per equation making for a total of 80 coefficients to be estimated. The more coefficients to be estimated the larger the estimation error entering the forecast.

In practice it is usual to keep \(K\) small and include only variables that are correlated to each other and therefore useful in forecasting each other. Information criteria are commonly used to select the number of lags to be included.

VARs are implemented in the vars package in R. It contains a function VARselect to choose the number of lags \(p\) using four different information criteria: AIC, HQ, SC and FPE. We have met the AIC before, and SC is simply another name for the BIC (SC stands for Schwarz Criterion after Gideon Schwarz who proposed it). HQ is the Hannan-Quinn criterion and FPE is the “Final Prediction Error” criterion.25 Care should be taken using the AIC as it tends to choose large numbers of lags. Instead, for VAR models, we prefer to use the BIC.

A criticism VARs face is that they are atheoretical. They are not built on some economic theory that imposes a theoretical structure to the equations. Every variable is assumed to influence every other variable in the system, which makes direct interpretation of the estimated coefficients very difficult. Despite this, VARs are useful in several contexts:

  • forecasting a collection of related variables where no explicit interpretation is required;
  • testing whether one variable is useful in forecasting another (the basis of Granger causality tests);
  • impulse response analysis, where the response of one variable to a sudden but temporary change in another variable is analysed;
  • forecast error variance decomposition, where the proportion of the forecast variance of one variable is attributed to the effect of other variables.

Example: A VAR model for forecasting US consumption

library(vars)
#> Loading required package: MASS
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> The following objects are masked from 'package:fma':
#> 
#>     cement, housing, petrol
#> Loading required package: strucchange
#> Loading required package: sandwich
#> 
#> Attaching package: 'strucchange'
#> The following object is masked from 'package:stringr':
#> 
#>     boundary
#> Loading required package: urca
#> Loading required package: lmtest
VARselect(uschange[,1:2], lag.max=8, type="const")[["selection"]]
#> AIC(n)  HQ(n)  SC(n) FPE(n) 
#>      5      1      1      5

var <- VAR(uschange[,1:2], p=3, type="const")
serial.test(var, lags.pt=10, type="PT.asymptotic")
#> 
#>  Portmanteau Test (asymptotic)
#> 
#> data:  Residuals of VAR object var
#> Chi-squared = 34, df = 28, p-value = 0.2

summary(var)
#> 
#> VAR Estimation Results:
#> ========================= 
#> Endogenous variables: Consumption, Income 
#> Deterministic variables: const 
#> Sample size: 184 
#> Log Likelihood: -380.155 
#> Roots of the characteristic polynomial:
#> 0.782 0.498 0.498 0.444 0.286 0.286
#> Call:
#> VAR(y = uschange[, 1:2], p = 3, type = "const")
#> 
#> 
#> Estimation results for equation Consumption: 
#> ============================================ 
#> Consumption = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const 
#> 
#>                Estimate Std. Error t value Pr(>|t|)    
#> Consumption.l1   0.1910     0.0796    2.40  0.01752 *  
#> Income.l1        0.0784     0.0545    1.44  0.15245    
#> Consumption.l2   0.1595     0.0825    1.93  0.05479 .  
#> Income.l2       -0.0271     0.0572   -0.47  0.63683    
#> Consumption.l3   0.2265     0.0797    2.84  0.00502 ** 
#> Income.l3       -0.0145     0.0543   -0.27  0.78906    
#> const            0.2908     0.0830    3.50  0.00058 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.595 on 177 degrees of freedom
#> Multiple R-Squared: 0.213,   Adjusted R-squared: 0.187 
#> F-statistic: 7.99 on 6 and 177 DF,  p-value: 1.22e-07 
#> 
#> 
#> Estimation results for equation Income: 
#> ======================================= 
#> Income = Consumption.l1 + Income.l1 + Consumption.l2 + Income.l2 + Consumption.l3 + Income.l3 + const 
#> 
#>                Estimate Std. Error t value Pr(>|t|)    
#> Consumption.l1   0.4535     0.1156    3.92  0.00013 ***
#> Income.l1       -0.2730     0.0791   -3.45  0.00070 ***
#> Consumption.l2   0.0217     0.1198    0.18  0.85666    
#> Income.l2       -0.0900     0.0831   -1.08  0.27979    
#> Consumption.l3   0.3538     0.1157    3.06  0.00257 ** 
#> Income.l3       -0.0538     0.0787   -0.68  0.49570    
#> const            0.3875     0.1205    3.22  0.00154 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Residual standard error: 0.864 on 177 degrees of freedom
#> Multiple R-Squared: 0.175,   Adjusted R-squared: 0.148 
#> F-statistic: 6.28 on 6 and 177 DF,  p-value: 5.31e-06 
#> 
#> 
#> 
#> Covariance matrix of residuals:
#>             Consumption Income
#> Consumption       0.355  0.185
#> Income            0.185  0.747
#> 
#> Correlation matrix of residuals:
#>             Consumption Income
#> Consumption       1.000  0.359
#> Income            0.359  1.000

The R output on the following page shows the lag length selected by each of the information criteria available in the vars package. There is a large discrepancy between a VAR(5) selected by the AIC and a VAR(1) selected by the BIC. This is not unusual. As a result we first fit a VAR(1), selected by the BIC. In similar fashion to the univariate ARIMA methodology we test that the residuals are uncorrelated using a Portmanteau test26 The null hypothesis of no serial correlation in the residuals is rejected for both a VAR(1) and a VAR(2) and therefore we fit a VAR(3) as now the null is not rejected. The forecasts generated by the VAR(3) are plotted in Figure ??.

# forecast(var) %>%
#   autoplot() + xlab("Year")

  1. A more flexible generalisation would be a Vector ARMA process. However, the relative simplicity of VARs has led to their dominance in forecasting. Interested readers may refer to Athanasopoulos, Poskitt, and Vahid (2012).

  2. Interested readers should refer to Hamilton (1994) and Lütkepohl (2007).

  3. For a detailed comparison of these criteria, see Chapter 4.3 of Lütkepohl (2005).

  4. The tests for serial correlation in the “vars” package are multivariate generalisations of the tests presented in Section 3.3.