Granger causality is a statistical concept of causality that is based on prediction. According to Granger causality, if a signal X1 "Granger-causes" (or "G-causes") a signal X2, then past values of X1 should contain information that helps predict X2 above and beyond the information contained in past values of X2 alone. Its mathematical formulation is based on linear regression modeling of stochastic processes (Granger 1969). More complex extensions to nonlinear cases exist, however these extensions are often more difficult to apply in practice.

Granger causality (or "G-causality") was developed in 1960s and has been widely used in economics since the 1960s. However it is only within the last few years that applications in neuroscience have become popular.

Personal account by Clive Granger

The following is a personal account of the development of Granger causality kindly provided by Professor Clive Granger (Figure 1). Please do not edit this section.

The topic of how to define causality has kept philosophers busy for over two thousand years and has yet to be resolved. It is a deep convoluted question with many possible answers which do not satisfy everyone, and yet it remains of some importance. Investigators would like to think that they have found a "cause", which is a deep fundamental relationship and possibly potentially useful.

In the early 1960's I was considering a pair of related stochastic processes which were clearly inter-related and I wanted to know if this relationship could be broken down into a pair of one way relationships. It was suggested to me to look at a definition of causality proposed by a very famous mathematician, Norbert Weiner, so I adapted this definition (Wiener 1956) into a practical form and discussed it.

Applied economists found the definition understandable and useable and applications of it started to appear. However, several writers stated that "of course, this is not real causality, it is only Granger causality." Thus, from the beginning, applications used this term to distinguish it from other possible definitions.

The basic "Granger Causality" definition is quite simple. Suppose that we have three terms, \(X_t\ ,\) \(Y_t\ ,\) and \(W_t\ ,\) and that we first attempt to forecast \(X_{t+1}\) using past terms of \(X_t\) and \(W_t\ .\) We then try to forecast \(X_{t+1}\) using past terms of \(X_t\ ,\) \(Y_t\ ,\) and \(W_t\ .\) If the second forecast is found to be more successful, according to standard cost functions, then the past of \(Y\) appears to contain information helping in forecasting \(X_{t+1}\) that is not in past \(X_t\) or \(W_t\ .\) In particular, \(W_t\) could be a vector of possible explanatory variables. Thus, \(Y_t\) would "Granger cause" \(X_{t+1}\) if (a) \(Y_t\) occurs before \(X_{t+1}\ ;\) and (b) it contains information useful in forecasting \(X_{t+1}\) that is not found in a group of other appropriate variables.

Naturally, the larger \(W_t\) is, and the more carefully its contents are selected, the more stringent a criterion \(Y_t\) is passing. Eventually, \(Y_t\) might seem to contain unique information about \(X_{t+1}\) that is not found in other variables which is why the "causality" label is perhaps appropriate.

The definition leans heavily on the idea that the cause occurs before the effect, which is the basis of most, but not all, causality definitions. Some implications are that it is possible for \(Y_t\) to cause \(X_{t+1}\) and for \(X_t\) to cause \(Y_{t+1}\ ,\) a feedback stochastic system. However, it is not possible for a determinate process, such as an exponential trend, to be a cause or to be caused by another variable.

It is possible to formulate statistical tests for which I now designate as G-causality, and many are available and are described in some econometric textbooks (see also the following section and the #references). The definition has been widely cited and applied because it is pragmatic, easy to understand, and to apply. It is generally agreed that it does not capture all aspects of causality, but enough to be worth considering in an empirical test.

There are now a number of alternative definitions in economics, but they are little used as they are less easy to implement.

Further references for this personal account are (Granger 1980; Granger 2001).

Mathematical formulation

G-causality is normally tested in the context of linear regression models. For illustration, consider a bivariate linear autoregressive model of two variables \(X_1\) and \(X_2\ :\)

where \(p\) is the maximum number of lagged observations included in the model (the model order), the matrix \(A\) contains the coefficients of the model (i.e., the contributions of each lagged observation to the predicted values of \(X_1(t)\) and \(X_2(t)\ ,\) and \(E_1\) and \(E_2\) are residuals (prediction errors) for each time series. If the variance of \(E_1\) (or \(E_2\)) is reduced by the inclusion of the \(X_2\) (or \(X_1\)) terms in the first (or second) equation, then it is said that \(X_2\) (or \(X_1\)) Granger-(G)-causes \(X_1\) (or \(X_2\)). In other words, \(X_2\) G-causes \(X_1\) if the coefficients in \(A_{12}\) are jointly significantly different from zero. This can be tested by performing an F-test of the null hypothesis that \(A_{12}\) = 0, given assumptions of covariance stationarity on \(X_1\) and \(X_2\ .\) The magnitude of a G-causality interaction can be estimated by the logarithm of the corresponding F-statistic (Geweke 1982). Note that model selection criteria, such as the Bayesian Information Criterion (BIC, (Schwartz 1978)) or the Akaike Information Criterion (AIC, (Akaike 1974)), can be used to determine the appropriate model order \(p\ .\)

As mentioned in the previous section, G-causality can be readily extended to the \(n\) variable case, where \(n>2\ ,\) by estimating an \(n\) variable autoregressive model. In this case, \(X_2\) G-causes \(X_1\) if lagged observations of \(X_2\) help predict \(X_1\) when lagged observations of all other variables \(X_3 \ldots X_N\) are also taken into account. (Here, '\(X_3 \ldots X_N\) correspond to the variables in the set \(W\) in the previous section; see also Boudjellaba et al. (1992) for an interpretation using autoregressive moving average (ARMA) models.) This multivariate extension, sometimes referred to as ‘conditional’ G-causality (Ding et al. 2006), is extremely useful because repeated pairwise analyses among multiple variables can sometimes give misleading results. For example, a repeated bivariate analyses would be unable to disambiguate the two connectivity patterns in Figure 2. By contrast, a conditional/multivariate analysis would infer a causal connection from \(X\) to \(Y\) only if past information in \(X\) helped predict future \(Y\) above and beyond those signals mediated by \(Z\ .\) Another instance in which conditional G-causality is valuable is when a single source drives two outputs with different time delays. A bivariate analysis, but not a multivariate analysis, would falsely infer a causal connection from the output with the shorter delay to the output with the longer delay.

Application of the above formulation of G-causality makes two important assumptions about the data: (i) that it is covariance stationary (i.e., the mean and variance of each time series do not change over time), and (ii) that it can be adequately described by a linear model. The #Limitations and extensions section will describe recent extensions that attempt to overcome these limitations.

Spectral G-causality

By using Fourier methods it is possible to examine G-causality in the spectral domain (Geweke 1982; Kaminski et al. 2001). This can be very useful for neurophysiological signals, where frequency decompositions are often of interest. Intuitively, spectral G-causality from \(X_1\) to \(X_2\) measures the fraction of the total power at frequency \(f\) of \(X_1\) that is contributed by \(X_2\ .\) For examples of the advantages of working in the frequency domain see #Application in neuroscience.

For completeness, we give below the mathematical details of spectral G-causality. The Fourier transform of (1) gives

in which the asterisk denotes matrix transposition and complex conjugation, \(\Sigma\) is the covariance matrix of the residuals \(E(t)\ ,\) and \(H\) is the transfer matrix. The spectral G-causality from \(j\) to \(i\) is then

in which \(S_{ii}(f)\) is the power spectrum of variable \(i\) at frequency \(f\ .\) (This analysis was adapted from (Brovelli et al. 2004; Kaminski et al. 2001)).

Recent work by Chen et al. (2006) indicates that application of Geweke’s spectral G-causality to multivariate (>2) neurophysiological time series sometimes results in negative causality at certain frequencies, an outcome which evades physical interpretation. They have suggested a revised, conditional version of Geweke's measure which may overcome this problem by using a partition matrix method. Other variations of spectral G-causality are discussed by Breitung and Candelon (2006) and Hosoya (1991).

Two alternative measures which are closely related to spectral G-causality are partial directed coherence (Baccala & Sameshima 2001) and the directed transfer function (Kaminski et al. 2001; note these authors showed an equivalence between the directed transfer function and spectral G-causality). For comparative results among these methods see Baccala and Sameshima (2001), Gourevitch et al. (2006), and Pereda et al. (2005). Unlike the original time-domain formulation of G-causality, the statistical properties of these spectral measures have yet to be fully elucidated. This means that significance testing often relies on surrogate data, and the influence of signal pre-processing (e.g., smoothing, filtering) on measured causality remains unclear.

Limitations and extensions

Linearity

The original formulation of G-causality can only give information about linear features of signals. Extensions to nonlinear cases now exist, however these extensions can be more difficult to use in practice and their statistical properties are less well understood. In the approach of Freiwald et al. (1999) the globally nonlinear data is divided into a locally linear neighborhoods (see also Chen et al. 2004), whereas Ancona et al. (2004) used a radial basis function method to perform a global nonlinear regression.

Stationarity

The application of G-causality assumes that the analyzed signals are covariance stationary. Non-stationary data can be treated by using a windowing technique (Hesse et al. 2003) assuming that sufficiently short windows of a non-stationary signal are locally stationary. A related approach takes advantage of the trial-by-trial nature of many neurophysiological experiments (Ding et al. 2000). In this approach, time series from different trials are treated as separate realizations of a non-stationary stochastic process with locally stationary segments.

Dependence on observed variables

A general comment about all implementations of G-causality is that they depend entirely on the appropriate selection of variables. Obviously, causal factors that are not incorporated into the regression model cannot be represented in the output. Thus, G-causality should not be interpreted as directly reflecting physical causal chains (see Personal account).

Application in neuroscience

Over recent years there has been growing interest in the use of G-causality to identify causal interactions in neural data. For example, Bernasconi and Konig (1999) applied Geweke’s spectral measures to describe causal interactions among different areas in the cat visual cortex, Liang et al. (2000) used a time-varying spectral technique to differentiate feedforward, feedback, and lateral dynamical influences in monkey ventral visual cortex during pattern discrimination, and Kaminski et al. (2001) noted increasing anterior to posterior causal influences during the transition from waking to sleep by analysis of EEG signals (fig 2). More recently, Brovelli et al. (2004) identified causal influences from primary somatosensory cortex to motor cortex in the beta-frequency range (15-30 Hz) during lever pressing by awake monkeys. In the domain of functional MRI, Roebroeck et al. (2005) applied G-causality to data acquired during a complex visuomotor task, and Sato et al. (2006) used a wavelet variation of G-causality to identify time-varying causal influences. G-causality has also been applied to simulated neural systems in order to probe the relationship between neuroanatomy, network dynamics, and behavior (Seth 2005; Seth & Edelman 2007).

A recurring theme in the application of G-causality to empirical data is whether the aim is to recover the underlying structural connectivity, or to supply a description of network dynamics which may differ from the structure. In the former case, G-causality and its variants are successful to the extent that they can account for indirect interactions (using conditional G-causality) and unobserved variables (an unsolved problem). In the latter case, while accounting for these factors is still important, network dynamics is seen as a joint product of network structure and the dynamical processes operating on that structure, which may be modulated by environment and context. Spectral G-causality is a good example of a causal description which goes beyond inferring structural connectivity. Other examples are provided by modeling work which shows how the same network structure can generate different causal networks depending on context (Seth 2005; Seth & Edelman 2007).

Alternative techniques

Information theory

Schreiber (2000) introduced the concept of transfer entropy which is a version of mutual information operating on conditional probabilities. It is designed to detect the directed exchange of information between two variables, conditioned to common history and inputs. For a comparison of transfer entropy with other causal measures, including various implementations of G-causality, see (Lungarella et al. (in press)). An advantage of information theoretic measures, as compared to standard G-causality, is that they are sensitive to nonlinear signal properties. A limitation of transfer entropy, as compared to G-causality, is that it is currently restricted to bivariate situations. Also, information theoretic measures often require substantially more data than regression methods such as G-causality (Pereda et al., 2005).

Maximum likelihood models

For neural data consisting of spike trains, it is possible to use a combination of point process theory and maximum likelihood modeling to detect causal interactions within an ensemble of neurons (Chornoboy et al. 1988; Okatan et al. 2005).

Alternative frameworks

The above techniques (including G-causality) are data driven inasmuch as causal interactions are inferred directly from simultaneously recorded time series. A different model driven framework for analyzing causality is given by structural equation modeling (Kline 2005). In this approach, a causal model is first hypothesized, free parameters are then estimated, and only then is the fit of the model to the data assessed. While this method makes more efficient use of data than standard G-causality, it does so at the expense of constraining the repertoire of possible. causal descriptions.

Another alternative framework for determining causality is to measure the effects of selective perturbations or lesions (Keinan et al. 2004; Tononi & Sporns 2003). While this method can in principle deliver unambiguous information about physical causal chains, perturbing or lesioning a system is not always possible or desirable and may disrupt the natural behavior of the system.

Concluding remarks

Time series analysis methods are becoming increasingly prominent in attempts to understand the relationships between network structure and network dynamics in neuroscience settings. Many linear and nonlinear methods now exist, based on a variety of techniques including regression modeling, information theory, and dynamical systems theory (Gourevitch et al., 2006; Pereda et al. 2005). G-causality provides one method that, in its most basic form (see #Mathematical formulation), is easy to implement and rests on a firm statistical foundation. More complex extensions to the frequency domain and to nonlinear data now exist and continue to be developed. However, in the analysis of neurophysiological signals it may be that simple, linear methods should be tried first before moving on to more complicated alternatives.

Resources

A detailed review of the theory and application of G-causality can be found in Ding et al. (2006).

James P. LeSage has provided a toolbox that includes G-causality analysis among a wide selection of econometric routines. This toolbox, designed for MATLAB (Mathworks, Natick, MA), can be downloaded from www.spatial-econometrics.com.

Anil K. Seth has developed a toolbox that focuses on time-domain G-causality and which includes methods for graph-theoretic analysis of G-causality interactions. This toolbox, also designed for MATLAB, can be downloaded from Anil K Seth's website.

Links to many other MATLAB resources for a variety of nonlinear time series analysis methods are provided in the Appendix of Pereda et al. (2005).