We use cookies to enhance your experience on our website. By continuing to use our website, you are agreeing to our use of cookies. You can change your cookie settings at any time.Find out moreJump to
Content

Related Articles

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, ECONOMICS AND FINANCE (oxfordre.com/economics). (c) Oxford University Press USA, 2019. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

date: 15 September 2019

Bayesian Vector Autoregressions: Applications

Summary and Keywords

Bayesian vector autoregressions (BVARs) are standard multivariate autoregressive models routinely used in empirical macroeconomics and finance for structural analysis, forecasting, and scenario analysis in an ever-growing number of applications.

A preeminent field of application of BVARs is forecasting. BVARs with informative priors have often proved to be superior tools compared to standard frequentist/flat-prior VARs. In fact, VARs are highly parametrized autoregressive models, whose number of parameters grows with the square of the number of variables times the number of lags included. Prior information, in the form of prior distributions on the model parameters, helps in forming sharper posterior distributions of parameters, conditional on an observed sample. Hence, BVARs can be effective in reducing parameters uncertainty and improving forecast accuracy compared to standard frequentist/flat-prior VARs.

This feature in particular has favored the use of Bayesian techniques to address “big data” problems, in what is arguably one of the most active frontiers in the BVAR literature. Large-information BVARs have in fact proven to be valuable tools to handle empirical analysis in data-rich environments.

BVARs are also routinely employed to produce conditional forecasts and scenario analysis. Of particular interest for policy institutions, these applications permit evaluating “counterfactual” time evolution of the variables of interests conditional on a pre-determined path for some other variables, such as the path of interest rates over a certain horizon.

The “structural interpretation” of estimated VARs as the data generating process of the observed data requires the adoption of strict “identifying restrictions.” From a Bayesian perspective, such restrictions can be seen as dogmatic prior beliefs about some regions of the parameter space that determine the contemporaneous interactions among variables and for which the data are uninformative. More generally, Bayesian techniques offer a framework for structural analysis through priors that incorporate uncertainty about the identifying assumptions themselves.

Bayesian vector autoregressions (BVARs) have been applied to an increasingly large number of empirical problems. This article reviews some of the most common applications of BVARs in macroeconomics and finance such as structural identification, forecasting, and scenario analysis. A companion article surveys Bayesian inference methods for vector autoregression models, commonly used priors for economic and financial variables, and introduces the notation used throughout this paper (Miranda-Agrippino & Ricco, 2018).

Forecasting has featured predominantly in the development of BVARs. In this context, BVARs with informative priors have often proved to be superior tools compared to standard frequentist/flat-prior VARs. VARs are highly parametrized autoregressive models, whose number of parameters grows with the square of the number of variables times the number of lags included. Given the limited length of standard macroeconomic datasets—that usually involve monthly, quarterly, or even annual observations—, such overparametrization makes the estimation of VARs impossible with standard (frequentist) techniques, already for relatively small sets of variables. This is known in the literature as the “curse of dimensionality.” BVARs efficiently deal with the problem of over-parametrization through the use of prior information about the model coefficients. The general idea is to use informative priors that shrink the unrestricted model toward a parsimonious naïve benchmark, thereby reducing parameter uncertainty and improving forecast accuracy. Section “Forecasting with BVARs” discusses forecasting with BVARs, while section “Conditional Forecasts and Scenario Analysis” focuses on conditional forecast and scenario analysis.

Another important area of application is the study of causal relationships among economic variables with structural (B)VARs (Sims & Zha, 1998). It is common practice to present results from SVARs in the form of impulse response functions—that is, causal responses over time of a given variable of interest to an “identified” economic shock—together with bands that characterize the shape of the posterior distribution of the model (see Sims & Zha, 1999).1 Section “Structural VARs” reviews Bayesian techniques in SVARs.

The application of Bayesian techniques to “big data” problems is one of the most active frontiers in the BVAR literature. Indeed, because they can efficiently deal with parameters proliferation, large BVARs are valuable tools for handling empirical analysis in data-rich environments (Bańbura, Giannone, & Reichlin, 2010). Important applications in this case also concern forecasting and structural analysis, where large-information BVARs can efficiently address issues related to misspecification and non-fundamentalness. De Mol, Giannone, and Reichlin (2008) have discussed the connection between BVARs and factor models, another popular way to handle large datasets. We review large BVARs in section “Large Bayesian VARs.”

Forecasting With BVARs

Reduced form Bayesian vector autoregressions, that are written as

yt=A1yt−1+…+Apyt−p+c+ut,

(1)

when estimated with informative priors over the autoregressive parameters {Ai,…Ap} and the variance-covariance of the errors Σ‎, usually outperform VARs estimated with frequentist techniques (or flat priors). Using the frequentist terminology, reasonably specified priors reduce estimated parameters variance and hence improve forecast accuracy, at the cost of the introduction of relatively small biases. From a more Bayesian perspective, the prior information that may not be apparent in short samples—as, for example, the long-run properties of economic variables captured by the Minnesota priors—helps in forming sharper posterior distributions for the VAR parameters, conditional on an observed sample (see, e.g., Todd, 1984, for an early treatment of forecasting with BVARs).

Bayesian Forecasting

The fundamental object in Bayesian forecasting is the posterior predictive density.2 That is, the distribution of future data points yT+1:T+H=[yT+1',…,yT+H']', conditional on past data y1–p:T. Choosing a particular forecast ℱ—for example, the mode or median of the predictive distribution, alongside appropriate probability intervals –, is essentially a decision problem, given a specified loss function ℒ(⋅). The Bayesian decision corresponds to choosing the forecast that minimizes the expected loss, conditional on past data

E[L(F,yT+1:T+H|y1−p:T)]=∫L(F,yT+1:T+H)p(yT+1:T+H|y1−p:T)dyT+1:T+H.

(2)

For a given loss function, the solution to the minimization problem is a function of the data, that is, F(y1-p:T). For example, with quadratic loss function ℒ(ℱ,yT+1:T+H|y1−p:T)=(ℱ−yT+1:T+H)′(ℱ−yT+1:T+H), the solution is the conditional expectation ℱ(y1−p:T)=E[yT+1:T+H|y1−p:T]. The predictive density is given by

p(yT+1:T+H|y1−p:T)=∫p(yT+1:T+H|y1−p:T,θ)p(θ|y1−p:T)dθ,

(3)

where θ‎ is the vector collecting all the VAR parameters, that is A and Σ‎, p(θ‎|y1–p:T) is the posterior distribution of the parameters, and p(yT+1:T+H|y1–p:T, θ‎) is the likelihood of future data. Eq. (3) highlights how Bayesian forecasts account for both the uncertainty related to future events via p(yT+1:T+H|y1–p:T, θ‎), and that related to parameters values via p(θ‎|y1–p:T).

The posterior predictive density for h > 1 is not given by any standard density function. However, if it is possible to sample directly from the posterior probability for the parameters, Eq. (3) provides an easy way to generate draws from this predictive density.

Algorithm 1:Sampling from the Posterior Predictive Density.

For s = 1,. . .,nsim:

1. Draw θ‎(s) from the posterior p(θ‎|y1–p:T).

2. Generate uT+1(s),…,uT+H(s) from the distribution of the errors and calculate recursively y~T+1(s),…,y~T+H(s) from the VAR equations with parameters A(s).

The set {y~T+1(s),…,y~T+H(s)}s=1nsim is a sample of independent draws from the joint predictive distribution.

Kadiyala and Karlsson (1993) analyzed the forecasting performance of different priors and find that those that induce correlation among the VAR coefficients, for example, the sums-of-coefficient priors (Doan et al., 1984) and the co-persistence prior (Sims, 1993), tend to do better.

Carriero et al. (2015a) conducted an extensive assessment of Bayesian VARs under different specifications and evaluate the relative merits of different model specifications and treatments of the data. In particular, starting from a benchmark VAR in levels and with NIW, sums-of-coefficients, and co-persistence priors, they evaluate (1) the effects of the optimal choice of the tightness hyperparameters; (2) of the lag length; (3) of the relative merits of modeling in levels or growth rates; (4) of direct, iterated, and pseudoiterated h-step-ahead forecasts; (5) the treatment of the error variance Σ‎; and (6) of cross-variable shrinkage f(ℓ). They find that in general simpler specifications tend to be effective.3

Bayesian Model Averaging and Prediction Pools

Bayesian analysis offers a straightforward way to deal with model uncertainty. Consider for instance the two competing models ℳ1 and ℳ2 with likelihood p(y|θ1,ℳ1,y1−p:0) and p(y|θ2,ℳ2,y1−p:0) prior probabilities p(θ2|ℳ1), and p(θ2|ℳ2) respectively. Bayesian model averaging (BMA) obtains the marginalized (with respect to the models) predictive distribution as

p(yT+1:T+H|y)=p(yT+1:T+H|y,ℳ1)p(ℳ1)+p(yT+1:T+H|y,ℳ2)p(ℳ2),

(4)

where p(ℳj) is the prior probability assigned to model ℳj, and p(yT+1:T+H|y,ℳj) is the model’s marginal likelihood. Eq. (4) can be extended to allow for M different models. This can be seen as a generalization of the predictive distribution in Eq. (3) where instead of conditioning on a single model, M different models are considered. BMA was introduced in economic forecasting by the seminal Geweke (1999), and its applications in the context of forecast combinations and pooling have been numerous. Notable extensions to BMA include the Linear Optimal Prediction Pools of Geweke and Amisano (2011, 2012) and the Dynamic Prediction Pools of Del Negro et al. (2016).4 Earlier reviews of BMA and forecast combinations are in Geweke and Whiteman (2006) and Timmermann (2006). The evolution of forecast density combinations is discussed in detail in Aastveit et al. (2018) in the chapter in this collection. We refer the reader to their paper for further details.

Conditional Forecasts and Scenario Analysis

Forecasts that condition on a specific path for one of the variables, such as, for example, a preferred path for the policy interest rate, are of particular interest to central banks. Early treatment of such forecasts, also referred to as scenario analysis, is in Doan, Litterman, and Sims (1984), who noted that a conditional forecast is equivalent to imposing restrictions on the disturbances uT+l,…,ut+H. Waggoner and Zha (2012) suggested a way to compute conditional forecasts that does not condition on specific parameters values (e.g., the posterior means) and produces minimum squared forecast errors conditional on the restrictions. Moreover, it yields posterior distributions for the parameters that are consistent with the constrained paths. Let

RyT+1:T+H=r

(5)

denote the desired restrictions on the future path of some of the variables in yt. These can be rewritten as

defining G = RC′ and g≡r−RE(yT+1:T+H|y1−p:T,θ), and noting that uT+1:T+H~N(0,IH⊗Σ), one obtains the conditional distribution of uT+1:T+H as

uT+1:T+H|(GuT+1:T+H=g)~N(ΣHG′(GΣHG′)−1g,ΣH−ΣHG′(GΣHG′)−1GΣH)

(10)

that can be used to draw from the predictive distribution. In order to ensure consistency of the posterior distribution with the restriction in Eq. (9), Waggoner and Zha (2012) suggested treating yT+1:T+H as latent variables and simulating the joint posterior of the parameters and the future observations using the following MCMC sampler.

Algorithm 2: MCMC Sampler for VAR with restrictions on yT+1:T+H.

Given restrictions as in Eq. (9), select starting values for A(0) and Σ‎(0) using e.g. simulation on historical data. For s = 1,…, nsim:

1. Draw uT+1:T+H from the distribution in Eq. (10) and recursively calculate

yT+h(s)=∑j=hh−1yT+h−j(s)′Aj(s−1)+∑j=hpyT+h−j′Aj(s−1)+uT+h(s)′.

2. Augment y1–p:T with yT+1:T+h(s) and draw A(s) and Σ‎(s) from the full conditional posteriors

Σ(s)|y1−p:T,yT+1:T+h(s),A(s−1),A(s)|y1−p:T,yT+1:T+h(s),Σ(s),

using an appropriate sampling given the chosen VAR specification and priors.

3. Discard the parameters to obtain a draw {yT+1(s),…,yT+h(s)} from the joint predictive density consistent with the restrictions in Eq. (9).

Jarociński (2010) suggested an efficient way to sample uT+1:T+H that reduces the computational burden of the algorithm discussed above. An extension to this method is in Andersson et al. (2010), who restrict the forecasts yT+1:T+H to be in a specified region S∈ℝnH. This is a case of “soft” restrictions, as opposed to those in Eq. (9). Robertson et al. (2005) followed a different approach and propose exponential tilting as a way to enforce moment conditions on the path of future yt. This is the approach also implemented in Cogley et al. (2005). These methods are typically used in conjunction with small VARs and become quickly computationally cumbersome as the system’s dimension increases.

Bańbura et al. (2015) proposed instead a Kalman Filter–based algorithm to produce conditional forecasts in large systems that admit a state-space representation such as large Bayesian VARs and factor models. Intuitively, this method improves on computational efficiency due to the recursive nature of filtering techniques that allow to tackle the problem period by period.

Antolin-Diaz et al. (2018) proposed a method to conduct “structural scenario analysis” that can be supported by economic interpretation by choosing which structural shock is responsible for the conditioning path.

Structural VARs

Reduced form VARs can capture the autocovariance properties of multiple time series. However, their “structural interpretation” as the data generating process of the observed data, and of their one-step-ahead forecast errors in terms of economically meaningful shocks, requires additional identifying restrictions.

A VAR in structural form (SVAR) can be written as

B0yt=B1yt−1+⋯+Bpyt−p+Bc+et,et~i.i.d.N(0,In),

(11)

where B0 is a matrix of contemporaneous (causal) relationships among the variables, and et is a vector of structural shocks that are mutually uncorrelated and have an economic interpretation. All structural shocks are generally assumed to be of unitary variance. This does not imply a loss of generality, however, since the diagonal elements of B0 are unrestricted. In the structural representation, the coefficients have a direct behavioral interpretation, and it is possible to provide a causal assessment of the effects of economic shocks on variables—for example, the effect of a monetary policy shock on prices and output. Pre-multiplying the SVAR in Eq. (11) by B0–1 yields its reduced-form representation: that is, the VAR in Eq. (1). Comparing the two representations one obtains that Ai = B0–1Bi, i = 1,…,p, and ut = B0–1et. The variance of the reduced form forecast errors, ut is

Σ=B0−11B0−1.

(12)

Since Σ‎ is symmetric, it has only n(n + 1)/2 independent parameters. This implies that the data can provide information to uniquely identify only n(n + 1)/2 out of the n2 parameters in B0. In fact, given a positive definite matrix Σ‎, it is possible to write B0 as the product of the unique lower triangular Cholesky factor of Σ‎ (Σ‎ = Σ‎Chol Σ‎′Chol) times an orthogonal matrix Q

B0=QΣChol.

(13)

From this decomposition it is clear that while Σ‎Chol is uniquely determined for a given Σ‎, the n(n – 1)/2 unrestricted parameters span the space of the O(n) group of n × n orthogonal matrices. The central question in structural identification is how to recover the elements of B0 given the variance-covariance matrix of the one-step-ahead forecast errors, Σ‎. That is, how to choose Q out of the many possible n-dimensional orthogonal matrices.5

From a Bayesian perspective, the issue is that since yt depends only on Σ‎ and not on its specific factorization, the conditional distribution of the parameter Q does not get updated by the information provided in the data, that is

p(Q|Y,A,Σ)=p(Q|A,Σ).

(14)

For some regions of the parameter space, posterior inference will be determined purely by prior beliefs even if the sample size is infinite, since the data are uninformative. This is a standard property of Bayesian inference in partially identified models, as discussed for example in Kadane (1975), Poirier (1998), and Moon and Schorfheide (2012).

Much of ingenuity and creativity in the SVAR literature has been devoted to provide arguments—or “identification schemes”—about the appropriate choice of p(Q|A, Σ‎).6 These arguments translate into what can be viewed as Bayesian inference with dogmatic prior beliefs—or distributions with singularities—about the conditional distribution of Q, given the reduced form parameters. For example, the commonly applied recursive identification amounts, from a Bayesian perspective, to assuming with dogmatic certainty that all of the upper diagonal elements of B0 are zero, while we do not have any information on the other values of B0. Equivalently, it assumes with certainty that Q=In. Similarly, other commonly used identifications—for example, long-run, medium-run, sign restrictions, etc.—can be expressed in terms of probabilistic a priori statements about the parameters in B0.

Once a B0 matrix is selected, dynamic causal effects of the identified structural shocks on the variables in yt are usually summarized by the structural impulse response functions (IRFs). In a VAR(p), they can be recursively calculated as

IRFh=ΘhB0−1h=0,…,H,

(15)

where

Θh=∑τ=1hΘh−τAτh=1,…,H,

(16)

Θ0=In, and Aτ‎ are the reduced form autoregressive coefficients of Eq. (1) with Aτ‎ = 0 for τ‎ > p. The (i, j) element of IRFh denotes the response of variable i to shock j at horizon h. Uncertainty about dynamic responses to identified structural shocks is typically reported in the Bayesian literature as point-wise coverage sets around the posterior mean or median IRFs, at each horizon—that is, as the appropriate quantiles of the IRFs posterior distribution. For example, 68% of coverage intervals can be reported as two lines representing the posterior 16th and 84th percentiles of the distribution of the IRFs. Such credible sets usually need to be interpreted as point-wise (i.e., as credible sets for the response of a specific variable) to a specific shock, at a given horizon. However, point-wise bands effectively ignore the existing correlation between responses at different horizons. To account for the time (horizon) dependence, Sims and Zha (1999) suggested to use the first principal components of the covariance matrix of the IRFs.

Sims and Zha (1998) discussed a general framework for Bayesian inference on the structural representation in Eq. (11). Rewrite the SVAR as

yB0=xB+e,

(17)

where the T × n matrices y and e and the T × k matrix x are defined as

y=[y1′⋮yT′],x=[x1′⋮xT′],e=[e1′⋮eT′],

(18)

and B = [B1,…, Bp,Bc]. The likelihood can be written as

p(y|B0,B)∝|B0|Texp{−12tr[(yB0−xB)′(yB0−xB)]},

(19)

where |B0| is the determinant of B0 (and the Jacobian of the transformation of e in y). Conditional on B0, the likelihood function is a normal distribution in B. Define β‎ ≡ vec(B) and β‎0 ≡ vec(B0). A prior for the SVAR coefficients can be conveniently factorized as

p(β0,β)=p(β|β0)p(β0),

(20)

where p(β‎0) is the marginal distribution for β‎0, and can include singularities generated by, for example, zero restrictions. The (conditional) prior for β‎ can be chosen to be a normal p.d.f.7

β|β0~N(β_0,λ−1In⊗Γ_β0).

(21)

The posterior distribution of β‎ is hence of the standard form

β|β0,y~N(β¯0,In⊗Γ¯β0),

(22)

where the posterior moments are updated as in the standard VAR with Normal-Inverse Wishart priors (see, e.g., Kadiyala & Karlsson, 1997). The posterior for β‎0 will depend on the assumed prior.8

Baumeister and Hamilton (2015) applied a streamlined version of this framework to provide analytical characterization of the informative prior distributions for impulse-response functions that are implicit in a commonly used algorithm for sign restrictions. Sign restrictions are a popular identification scheme, pioneered in a Bayesian framework by Canova and De Nicolo (2002) and Uhlig (2005). The scheme selects sets of models whose B0 comply with restrictions on the sign of the responses of variables of interests over a given horizon. Bayesian SVARs with sign restrictions are typically estimated using algorithms such as in Rubio-Ramírez, Waggoner, and Zha (2010), where a uniform (or Haar) prior is assumed for the orthogonal matrix. Operationally, a n × n matrix X of independent N(0,1) values is generated and decomposed using a QR decomposition where Q is the orthogonal factor and R is upper triangular. The orthogonal matrix is used as candidate rotation Q, and the signs of the responses of variables at the horizons of interest are assessed against the desired sign restrictions. Baumeister and Hamilton (2015) showed that this procedure implies informative distributions on the structural objects of interest. In fact, it implies that the impact of a one standard-deviation structural shock is regarded (before seeing the data) as coming from a distribution with more mass around zero when the number of variables n in the VAR is greater than 3 (and with more mass at large values when n = 2). It also implies Cauchy priors for structural parameters such as elasticities. The influence of these priors does not vanish even asymptotically, since the data do not contain information about Q. In fact, as the sample size goes to infinity, the height of the posterior distribution for the impact parameters is proportional to that of the prior distribution for all the points in the parameter space for which the structural coefficients satisfy the set restrictions that orthogonalize the true variance-covariance matrix.

Giacomini and Kitagawa (2015) suggested the use of “ambiguous” prior for the structural rotation matrix in order to account for the uncertainty about the structural parameters in all under-identified SVARs. The methodology consists in formally incorporating in the inference all classes of priors for the structural rotation matrix, which are consistent with the a priori “dogmatic” restrictions. In a similar vein, Baumeister and Hamilton (2017) discussed how to generalize priors on B0 to a less restrictive formulation that incorporates uncertainty about the identifying assumptions themselves, and used this approach to study the importance of shocks to oil supply and demand.

Large Bayesian VARs

The size of the VARs typically used in empirical applications ranges from three to a dozen variables. VARs with larger sets of variables are impossible to estimate with standard techniques due the “curse of dimensionality” induced by the densely parametrized structure of the model.9 However, in many applications there may be concerns about the omission of many potentially relevant economic indicators that may affect both structural analysis and forecasting.10 Additionally, big datasets are increasingly important in economics to study phenomena in a connected and globalized world, where economic developments in one region can propagate and affect others.11

VARs involving tens or even hundreds of variables have become increasingly popular following the work of Bańbura et al. (2010), which showed that standard macroeconomic priors—Minnesota and sums-of-coefficients—with a careful setting of the tightness parameters allowed to effectively incorporate large sets of endogenous variables. Indeed, a stream of papers have found large VARs to forecast well (see, e.g., Bańbura et al., 2010; Carriero, Clark, & Marcellino, 2015a; Carriero, Kapetanios, & Marcellino, 2009; Giannone, Lenza, Momferatou, & Onorante, 2014; Koop, 2013).

Early examples of higher-dimensional VARs are Panel VARs, where small country-specific VARs are interacted to allow for international spillovers (see, e.g., Canova & Ciccarelli, 2004, 2009). These models can be seen as large-scale models that impose more structure on the system of equations. Koop and Korobilis (2015) studied methods for high-dimensional panel VARs. In the study of international spillovers, an alternative to Panel VARs are Global VARs (Pesaran et al., 2004). A Bayesian treatment to G-VARs is in Cuaresma, Feldkircher, and Huber (2016).

A recent development in this literature has been the inclusion of stochastic volatility in Large BVAR models. Carriero et al. (2016a) assumed a factor structure in the stochastic volatility of macroeconomic and financial variables in Large BVARs. In Carriero et al. (2016b), stochastic volatility and asymmetric priors for large n are instead handled using a triangularization method that allows for the simulation of conditional mean coefficients of the VAR by drawing them equation by equation. Chan et al. (2017) propound composite likelihood methods for large BVARs with multivariate stochastic volatility that involve estimating large numbers of parsimonious submodels and then taking a weighted average across them. Koop, Korobilis, and Pettenuzo (2016) discuss large Bayesian VARMA. Koop (2017) reviews the applications of big data in macroeconomics.

Bayesian VARs and Dynamic Factor Models

Research started with Bańbura et al. (2010) has shown that large BVARs are competitive models in leading with large-n problems in empirical macroeconomics, along with factor models (see Forni, Hallan, Lippi, & Reichlin, 2000; Stock & Watson, 2002) and factor-augmented VARs (FAVARs, see Bernanke, Boivin, & Eliasz, 2005). Indeed, Bayesian VARs are strictly connected to factor models as shown by De Mol et al. (2008) and Bańbura et al. (2015).

The link can be better understood in terms of data that have been transformed to achieve stationarity, Δ‎yt, and that have been standardized to have zero mean and unit variance. A VAR in first differences can be written as

Δyt=Φ1Δyt−1+⋯+ΦpΔyt−p+νt.

(23)

Imposing the requirement that the level of each variable yt must follow an independent random walk process is equivalent to requiring its first difference Δ‎yt to follow an independent white noise process. Hence, the prior on the autoregressive coefficients in Eq. (23) can be characterized by the following first and second moments:

The covariance between coefficients at different lags is set to zero. Since the variables have been rescaled to have the same variance, we can set Σ=σIn, where Σ=E[νtνt′].

Denote the eigenvalues of the variance-covariance matrix of the standardized data by ζ‎j, and the associated eigenvectors by υ‎j, for j = 1, . . . , n, i.e.

[1T∑t=1TΔytΔyt′]υj=υjζj,

(25)

where υi′υj=1 if i = j and zero otherwise. We assume an ordering such that ζ1≥ζ2≥⋯≥ζn. The sample principal components of Δ‎yt are defined as

zt=[υ1ζ1⋯υnζn]′yt≡WΔyt.

(26)

The principal components transform correlated data, Δ‎yt, into linear combinations which are cross-sectionally uncorrelated and have unit variance, that is T−1∑t=1Tztzt′=In. The principal components can be ordered according to their ability to explain the variability in the data, as the total variance explained by each principal component is equal to ζ‎j.

Rewrite the model in Eq. (23) in terms of the ordered principal components, as

Δyt=Φ1W−1zt−1+⋯+ΦpW−1zt−p+vt.

(27)

The priors that impose a uniform shrinkage on the parameters in Eq. (24) map into a non-uniform shrinkage on the parameters in Eq. (27):

Importantly, the prior variance for the coefficients on the j-th principal component is proportional to its share of explained variance of the data ζj.

If the data are characterized by a factor structure, then, as n and T increase, ζ‎j will go to infinity at a rate n for j = 1, . . . , r where r is the number of common factors. Conversely, ζr+1,…,ζn will grow at a slower rate, which cannot be faster than n/T. If λ‎1 is set such that it converges to zero at a rate that is faster than that for the smaller eigenvalues and slower than that for the largest eigenvalues, for example λ1∝Tn1Tϱ, with 0 < ϱ‎ < 1/2, then λ‎1 ζ‎j will go to infinity for j = 1,…,r and the prior on the coefficients associated with the first r principal components will become flat (see Bańbura et al., 2015). Conversely, the coefficients related to the principal components associated with the bounded eigenvalues will be shrunk to zero, since λ‎1 ζ‎j will go to zero for j > r.

De Mol et al. (2008) show that if the data are generated by a factor model and λ‎1 is set according to the rate described above, the point forecasts obtained by using shrinkage estimators converge to the unfeasible optimal forecasts that would be obtained if the common factors were observed.

Large SVARs, Non-Fundamentalness

One of the open problems in SVARs is the potential “non-fundamentalness” of structural shocks for commonly employed VARs (a review on this issue is in Alessi, Barigozzi, & Capasso, 2011). Non-fundamentalness implies that the true structural shocks (i.e., et in Eq. 11) cannot be retrieved from current and past forecast errors of the VARs of choice (see Hansen & Sargent, 1980; Lippi & Reichlin, 1994). This situation arises when, for example, the econometrician does not have all the information available to economic agents, such as news about future policy actions. This is notoriously the case for fiscal shocks, as explained in Leeper, Walker, and Yang (2013). In this case, economic agents’ expectations may not be based only on the current and past yt, implying that the residuals of the reduced-form model (i.e., ut in Eq. 1) are not the agents’ expectation/forecast errors. As a consequence, the shocks of interest may not be retrieved from the forecast errors and may be non-fundamental. A possible solution is to allow for noninvertible moving average (MA) components. A different strategy is to view non-fundamentalness as an omitted variables problem. In this respect BVARs (and factor models) can offer a solution to the incorporation of larger information sets. For example, Ellahie and Ricco (2017) discussed the use of large BVARs to study the propagation of government purchases shocks, while controlling for potential non-fundamentalness of shocks in small VARs.12

Forecasting in Data-Rich Environments

A research frontier is the application of Bayesian VARs to forecasting in data-rich environment, where the predictive content of large datasets (typically counting 100 or more variables) is exploited to forecast variables of interest. A recent survey is in Bok et al. (2017).

Bańbura et al. (2010) studied the forecasting performance of large Bayesian VARs. They found that while it increases with model size—provided that the shrinkage is appropriately chosen as a function of n –, most of the gains are in fact achieved by a 20-variable VAR. Evaluation of the forecasting performance of medium and large Bayesian VARs is also provided in Koop (2013). Carriero et al. (2011) evaluated the forecasting accuracy of reduced-rank Bayesian VARs in large datasets. The reduced-rank model adopted has a factor model underlying structure, with factors that evolve following a VAR. Korobilis (2013) extended the framework to allow for time-varying parameters. Giannone, Lenza, and Primiceri (2017) argued in favor of dense (as opposed to sparse) representations of predictive models for economic forecasting and use a “spike-and-slab” prior that allows for both variable selection and shrinkage.

BVARs are also a valuable tool for real-time forecasting and nowcasting with mixed-frequency datasets. In fact, they can be cast in state-space form, and filtering techniques can be easily used to handle missing observations, data in real time, and data sampled at different frequencies. Ghysels (2016) introduced a class of mixed-frequency VAR models that incorporated data sampled at different frequencies and discussed Bayesian approaches to the estimation of these models. Recent examples of these applications include Korobilis (2013); Schorfheide and Song (2015); Carriero et al. (2015b); Brave, Butters, and Justiniano (2016); Clark (2011); Giannone et al. (2014); and McCracken, Owyang, and Sehkposyan (2015).

Koop, Korobilis, and Pettenuzo (2016) proposed the use of Bayesian compressed VARs for high-dimensional forecasting problems and found that these tend to outperform both factor models and large VAR with prior shrinkage. More recently, Kastner and Huber (2017) developed BVARs that can handle vast dimensional information sets and also allow for changes in the volatility of the error variance. This is done by assuming that the reduced-form residuals have a factor stochastic volatility structure (which allows for conditional equation-by-equation estimation) and by applying a Dirichlet-Laplace prior (Bhattacharya, Pati, Pillai, & Dunson, 2015) to the VAR coefficients that heavily shrinks the coefficients toward zero while still allowing for some non-zero parameters. Kastner and Huber (2017) provided MCMC-based algorithms to sample from the posterior distributions and showed that their proposed model typically outperforms simpler nested alternatives in forecasting output, inflation, and the interest rate.

Acknowledgments

We thank the editors of the Oxford Research Encyclopedia of Economics and Finance and two anonymous referees for useful comments and suggestions. We are also grateful to Fabio Canova, Andrea Carriero, Matteo Ciccarelli, Domenico Giannone, Marek Jarocihski, Marco del Negro, Massimiliano Marcellino, Giorgio Primiceri, Lucrezia Reichlin, and Frank Shorfheide for helpful comments and discussions. The views expressed in this paper are those of the authors and do not necessarily reflect those of the Bank of England or any of its committees.

De Mol, C., Giannone, D., & Reichlin, L. (2008). Forecasting using a large number of predictors: Is Bayesian shrinkage a valid alternative to principal components? Journal of Econometrics, 146(2), 318–328.Find this resource:

Notes:

(1.)
An extreme version of lack of sample information arises in this context. In fact, structural VARs can be parametrized in terms of reduced form VARs that capture the joint dynamics of economic variables, and an “impact matrix” describing the casual connection between stochastic disturbances and economic variables. This matrix is not uniquely identified by sample information, and hence the investigator has to elicit prior beliefs on it (see Sims & Zha, 1998; Baumeister & Hamilton, 2015).

(2.)
The exposition in this section follows Karlsson (2013). See also Geweke and Whiteman (2006).

(3.)
Since the work of Sims et al. (1990), it is common practice to keep variables in (log-)levels. However, whether to employ in VARs variables in growth rates, log-levels, or levels remains an empirically important question. In the context of forecasting, for example, Carriero et al. (2015a) recommend the use of differenced data. Carriero et al. (2015a) also found that overall the differences between the iterated and direct forecasts are small, but there are large gains from the direct forecast for some of the variables. This is presumably because the direct forecast is more robust to misspecification.

(5.)
It is assumed that the information in the history of yt is sufficient to recover the structural shocks et, that is, it is possible to write the structural shocks as a linear combination of the reduced form innovations ut. In this case, it is said that the shocks are fundamental for yt. Departures from this case are discussed in section “Large Bayesian VARs.” Relevant references are provided therein.

(6.)
A survey of the identification schemes proposed in the literature goes beyond the scope of this article. A recent textbook treatment on the subject is in Kilian and Lütkepohl (2017).

(7.)
As it is usually done in the literature, Sims and Zha (1998) suggested preserving the Kronecker structure of the likelihood to avoid the inversion of nk×nk matrices and gain computational speed.

(8.)
Canova and Pérez Forero (2015) provided a general procedure to estimate structural VARs also in the case of overidentified systems where identification restrictions are of linear or of nonlinear form.

(9.)
The number of parameters to be estimated in an unrestricted VAR increases in the square of n, the number of variables in yt. Even when mechanically feasible—that is, when the number of available data points allows to produce point estimates for the parameters of interest—the tiny number of available degrees of freedom implies that parameters are estimated with substantial degrees of uncertainty and typically yield imprecise out-of-sample forecasts.

(10.)
A standard example of this has been the debate about the so-called price puzzle—positive reaction of prices in response to a monetary tightening—that is often found in small scale VARs (see, e.g., Christiano, Eichenbaum, & Evans, 1999). The literature has often connected such a puzzling result as an artifact resulting from the omission of forward-looking variables such as the commodity price index. In fact, one of the first instances of VARs incorporating more than a few variables was the 19-variable BVAR in Leeper, Sims, and Zha (1996) to study the effects of monetary policy shocks.

(11.)
Large datasets of macroeconomic and financial variables are increasingly common. For example, in the United States, the Federal Reserve Bank of St. Louis maintains the FRED-MD monthly database for well over 100 macroeconomic variables from 1960 to the present (see McCracken & Ng, 2015), and several other countries and economic areas have similarly sized datasets.

(12.)
Lütkepohl (2014) has observed that while large information techniques can be of help in dealing with the problem, they are bound to distort the parameter estimates and also the estimated impulse responses, hence results have to be taken with some caution.