Pages

Friday, April 29, 2011

Testing for Granger Causality

Several people have asked me for more details about testing for Granger (non-) causality in the context of non-stationary data. This was prompted by my brief description of some testing that I did in my"C to Shining C" posting of 21 March this year. I have an of example to go through here that will illustrate the steps that I usually take when testing for causality, and I'll use them to explain some of pitfalls to avoid. If you're an EViews user, then I can also show you a little trick to help you go about things in an appropriate way with minimal effort.

In my earlier posting, I mentioned that I had followed theToda and Yamamoto (1995) procedure to test for Granger causality. If you check out this reference, you'll find you really only need to read the excellent abstract to get the message for practitioners. In that sense, it's rare paper!

It's important to note that there are other approaches that can be taken to make sure that your causality testing is done properly when the time-series you're using are non-stationary (& possibly cointegrated). For instance, see Lütkepohl (2007, Ch. 7).

The first thing that has to be emphasised is the following:

If you are using a Wald test to test linear restrictions on the parameters of a VAR model, and (some of) the data are non-stationary, then the Wald test statistic does not follow its usual asymptotic chi-square distribution under the null.

In fact, if you just apply the test in the usual way, the test statistic's asymptotic distribution involves 'nuisance parameters' that you can't observe, and so it is totally non-standard. It would be very unwise to just apply the test, and hope for the best on the grounds that you have a large sample size.

Of course, testing for Granger (non-) causality is just a specific example of testing some zero restrictions on certain of the parameters in a VAR model, so the warning given above applies here. (Parenthetically, you can't get around the problem by using an LM test or an LR test, either.)

What I'm going to do is:

Remind you of what we mean by Granger non-causality testing.

Spell out the steps that are involved in applying the Toda-Yamamoto (T-Y) procedure.

Illustrate the analysis with a simple example, including some screen-shots from EViews.

List a few things that you should not do when testing for causality.

First, a simple definition of Granger Causality, in the case of two time-series variables, X and Y:

"X is said to Granger-cause Yif Ycan be better predicted using the histories of both Xand Ythan it can by using the history of Y alone."

We can test for the absence of Granger causality by estimating the following VAR model:

In each case, a rejection of the null implies there is Granger causality.

Note that in what follows I''ll often refer to the 'levels' of the data. This simply means that the data have not been differenced. The series may be in the original units, or logarithms may have been taken (e.g., to linearize a trend). In either case, I'll talk about the 'levels'.

Now, here are the basic steps for the T-Y procedure:

Test each of the time-series to determine their order of integration. Ideally, this should involve using a test (such as the ADF test) for which the null hypothesis is non-stationarity; as well as a test (such as the KPSS test) for which the null is stationarity. It's good to have a cross-check.

Let the maximum order of integration for the group of time-series be m. So, if there are two time-series and one is found to be I(1) and the other is I(2), then m = 2. If one is I(0) and the other is I(1), then m = 1, etc.

Set up a VAR model in the levels of the data, regardless of the orders of integration of the various time-series. Most importantly, you must not difference the data, no matter what you found at Step 1.

Determine the appropriate maximum lag length for the variables in the VAR, say p, using the usual methods. Specifically, base the choice of p on the usual information criteria, such as AIC, SIC.

Make sure that the VAR is well-specified. For example, ensure that there is no serial correlation in the residuals. If need be, increase p until any autocorrelation issues are resolved.

If two or more of the time-series have the same order of integration, at Step 1, then test to see if they are cointegrated, preferably using Johansen's methodology (based on your VAR) for a reliable result.

No matter what you conclude about cointegration at Step 6, this is not going to affect what follows. It just provides a possible cross-check on the validity of your results at the very end of the analysis.

Now take the preferred VAR model and add in madditional lags of each of the variables into each of the equations.

Test for Granger non-causality as follows. For expository purposes, suppose that the VAR has two equations, one for X and one for Y. Test the hypothesis that the coefficients of (only) the first p lagged values of X are zero in the Y equation, using a standard Wald test. Then do the same thing for the coefficients of the lagged values of Y in the X equation.

It's essential that you don't include the coefficients for the 'extra' m lags when you perform the Wald tests. They are there just to fix up the asymptotics.

The Wald test statistics will be asymptotically chi-square distributed with p d.o.f., under the null.

Rejection of the null implies a rejection of Granger non-causality. That is, a rejection supports the presence of Granger causality.

Finally, look back at what you concluded in Step 6 about cointegration

"If two or more time-series are cointegrated, then there must be Granger causality between them - either one-way or in both directions. However, the converse is not true."

So, if your data are cointegrated but you don't find any evidence of causality, you have a conflict in your results. (This might occur if your sample size is too small to satisfy the asymptotics that the cointegration and causality tests rely on.) If you have cointegration and find one-way causality, everything is fine. (You may still be wrong about there being no causality in the other direction.) If your data are not cointegrated, then you have no cross-check on your causality results.

Now it's time for our example. As usual, the data are available on the Data page that goes with this blog, and there is an EViews workfile on the Code page. We're going to take a look at the world prices of Arabica and Robusta coffees. Here's a plot of the monthly data from January 1960 to March 2011 - a nice long time series with lots of observations:

It looks as if there may be a structural break in the form of a shift in the levels of the series in 1975. We know that this will affect our unit root and cointegration tests, and it will also have implications for the specification of our VAR model and causality tests. This can all be handled, of course, but rather than getting side-tracked by these extra details, I'll focus on the main issue here, and we'll shorten the sample as follows:

Now let's go through the various steps for the T-Y causality testing procedure. The results to back up what I conclude along the way are in the EViews file, which contains a 'Read_me" text object that gives more explanation.

Both of the series are I(1) when we apply the ADF and KPSS tests, allowing for a drift and trend in each series.

So, m = 1.

We set up a 2-equation VAR model in the levels of the data, including an intercept in each equation.

The various information criteria suggest that we should have a maximum lag length of 3 for each variable:

5. However, when we then examine the residuals and apply the LM test for serial independence against the alternative of AR(k)/MA(k), for k = 1, ...., 12, we find that there are problems. This serial correlation is removed (at least at the 5% sig. level) if we increase the maximum lag length to p = 6:

This estimated model is also 'dynamically stable':

6. Johansen's Trace Test and Max. Eigenvalue Test both indicate the presence of cointegration between the 2 series, at the 10% significance level:

7. This last result is not going to affect anything we do.

8. As m = 1, we now re-estimate the levels VAR with one extra lag of each variable in each equation.

Here is where we need to be careful if we're going to "trick" EViews into doing what we want when we test for causality shortly. Rather than declare the lag interval for the 2 endogenous variables to be from 1 to 7 (the latter being p + m), I'm going to leave the interval at 1 to 6, and declare the extra (7th.) lag of each variable to be an "exogenous" variable. The coefficients of these extra lags will then not be included when the subsequent Wald tests are conducted. If I just specified the lag interval to be from 1 to 7, then the coefficients of all seven lags would be included in the Wald tests, and this would be incorrect. If I did that, the the Wald test statistic would not have its usual asymptotic chi-square null distribution.

9. & 10. Now we can undertake the Granger non-causality testing:

11. Note that the degrees of freedom are 6 in each part of the above image - that's correct: p = 6. The extra 7th. lag has not been included in the tests.
12. From the upper panel of results, we see that we cannot reject the null of no causality from Robusta to Arabica. From the lower panel we see that we can reject the null of no causalityfrom Arabica to Robusta, at the 10% significance level, and virtually at the 5% significance level as well.

In summary, we have reasonable evidence of Granger causality from the price of Arabica coffee to the price of Robusta coffee, but not vice versa.

Some things to watch out for:

Don't fit the VAR in the differences of the data when testing for Granger non-causality.

If you are using a VAR model for other purposes, then you would use differenced data if the series are I(1), but not cointegrated.

If you are using a VAR model for purposes other than testing for Granger non-causality and the series are found to be cointegrated, the you would estimate a VECM model.

The usual F-test for linear restrictions is not valid when testing for Granger causality, given the lags of the dependent variables that enter the model as regressors.

Don't use t-tests to select the maximum lag for the VAR model - these test statistics won't even be asymptotically std. normal if the data are non-stationary, and there are also pre-testing issues that affect the true significance levels.

If you fail to use the T-Y approach (adding, but not testing, the 'extra' m lags), or some equivalent procedure, and just use the usual Wald test, your causality test results will be meaningless, even asymptotically.

If all of the time-series are stationary, m = 0, and you would (correctly) just test for non-causality in the 'old-fashioned' way: estimate a levels VAR and apply the Wald test to the relevant coefficients.

The current Wikipedia entry for Granger Causality has lots of things wrong with it. In particular, see the 'Method' & 'Mathematical Statement' sections of that entry

Finally, if you want to check things out some more, I've put a second EViews workfile, relating to the prices of natural gas in Europe and the U.S., on this blog's Code page. In that file you'll find a "Read_Me" object that will tell you what's going on.

Note: The links to the following references will be helpful only if your computer's IP address gives you access to the electronic versions of the publications in question. That's why a written References section is provided.

References

Lütkepohl, H. (2006). New Introduction to Multiple Time Series Analysis. Springer, Berlin.

368 comments:

Very interesting and thorough explanation. Do you know and can you elaborate on how the Group Statistics/Granger Causility Test command differs from the above procedure and whether it is safe to use? I tried a bit and get different results with both methods.Marvin

Marvin - thanks for your comment and question. Using the commands you asked about, the extra "m" lags don't get included in the VAR model. If you use that approach and specify, say, p=4, then 4 lags of each variable get included in each equation of the VAR, but ALL 4 of them then get tested for G-causality. This is OK if every variable in the model is stationary, but not otherwise. I hope this helps.DG

Dear ProfessorIf my time series are I(0) and I(1).It is correct to use level of data when test for Granger non-causality.There are no need to difference of the data I(1).Is my understand correct or not?

Anonymous - that's right. You use the levels of BOTH variables. But you MUST then follow the Toda-Yamamoto procedure by adding ONE extra lag of both variables in both equations, but you DON'T include this extra lag in the causality tests. Now, this is all to do with causality testing. If you were wanting to estimate a VAR for some other reason, such as testing, then you difference the I(1) variable. For overall consistency in this case you`d probably want to difference the I(0) variable too. If you difference an I(0) variable it is still stationary. The risk is you may introduce (negative) autocorrelatioj into the errors because of over-differencing one of the variables. But you easily test for this, and you can usually get rid of it by just adding one or more extra lags of one or both variables. I hope this helps!DG

HI PROFESSOR...THIS IS MUHAMMAD MERAJ FROM PAKISTAN, KARACHI...I NEED UR HELP regarding granger causality testing..if all of my variables are i(i) even than i should use the above mentioned procedure as suggested by Toda n Yamamoto...please explain.regardsmuhammad meraj

Muhammad: Yes, if all of the variables are I(1), and whether or not they are cointegrated, you need to use the Toda-Yamamoto procedure (or some equivalent one such as that proposed by Helmut Lutkepohl. Take a look at the T & Y paper - just read the abstract - it is very clear and easy to follow.DG

Dear Sir,Just a brief comment. If I am not mistaking, with only two variables (and providing that they are cointegrated) Granger causality testing could be done by transforming VAR in levels to its ECM representation and using Wald test for joint significance of EC term and lagged differenced variables. However, when one is considering larger systems T-Y should be used.

Anonymous- Thanks for the comment. Actually, it's got nothing to do with the number of variables in the system. You can transform a multivariate VAR into a VECM. The trouble is that the limit distribution of the Wald test statistic will not be chi square if ANY variable in the VAR is I(1), whether or not any of the variables are cointegrated. The tst statistic's limit distribtion has nuisance parameters in it.

In addition to this main point, consider the following. What if one variables is I(1) and one is I(0), in which case they can't be cointegrated? What if there is some uncertainty about the outcome of the cointegration tests? In these cases the T-Y(or equivalent) methodology is the way to go. All you need to know is the maximumoder of integration among the variables in question. The point is to alter the probelm in a way that ensures that Wald statistic has its usual asymptotic distribution.DG

Thank you very much for perfect instruction!I hope you will answer my question.

If there is a structural break like in your data before you have cut them how it changes the T-Y results? Unfortunately I could not cut my data because I have sample only from 2006 and there is a structural break due to the crisis.

I know that ADF test has lower power but as far as I understand in the T-Y procedure one should find the largest possible lag of integration. So low power is not a drawback.

Thank you for your response. As I remember there is a paper by Toda and Phillips about this issue in which they are talking about "sufficient cointegration condition" that makes usual distribution valid? Also, for two variables it is easy to test the condition. That is what my first post was about (I apologise for not making it clear).

In addition, when using two-step OLS procedure (and not Johansen ML) in which EC term estimated in first step is included in short run equation usual F-test or Wald-test should in my opinion be valid, because testing now includes only stationary variables (again assuming that cointegration holds). Looking forward to yor comment. Thank you for your time.

Nataliya: If you have a short sample with a structural break, the ADF test has 2 problems: low power due small sample size; and a tendency to "discover" unit roots that aren't there, due to the structural break. Both of these will lead you in the direction of concluding the series is I(1), whether it is or not. Then you would use the T-Y procedure. The only thing I would change is to include a dummy variable (or variables) for the break in the equation in the VAR that "explains" the variable with the break.I hope this helps.

Goran - thanks for the comment. You are right about the Toda-Phillips result.

However, things don't work for the Wald test in the case of a VEC or VECM model unless you take special steps - equivalent to T-Y. Just because very variable is stationary, this doesn't guarnatee that the limit distribution of the usual Wald test statistic is chi-square. Take a look at the paper by Dolado and Lutkephol in 1996 "Econometric Reviews", for example. The following link may also be helpful:http://www.jmulti.de/download/help/vecm.pdf

Some argued that T-Y approach is less powerful than the Toda and Phillips apporach and it is also inefficient, as the order of the VAR is intentianally set too large.Then other than t-y process, is there any new way to conduct granger causality test in nonstationay VAR?

thank you for th valuable info provided by you on ur blog. if i want to apply granger causality test on the volatility index data. in which vol index is independent variable I(0) and stock market index is I(1). then what should be the appt prodecure?kindly reply.

Harjevan:Consider the hypothesis, "GNP does not cause MCAP". The p-value is 61.9%, which is very large. It means that the probability of seeing a value for the test statistic of 0.96081 (your value), or larger, if the hypothesis is true, is 61.9%. So, what you have observed is quite a likely event, if the null hypothesis is true. Accordingly, I would NOT reject the hypothesis of "no causality".

In all of the other caes the p-values are essentially zero. You have observed events (values for the test statistics) that are very rare if the null hypothesis is true. But you HAVE observed them! SO, in all likelihood the null hypothesis is not true - I'd REJECT the hypothesis of "no causality" in each of these other cases.

SO there IS causality frm MCAP to GDP (for example), but not the reverse.

Ben: Helmut Lutkepohl has an alternative method to T-Y. You might want to take a look at his book "New Introduction to Multiple Time Series Analysis", and check his website at http://www.eui.eu/Personal/Luetkepohl/

Dear Professor Giles,First, congratulations for your inspiring thoughts and useful blog!!!My name is Peter and I am a PHD student in political economy, and my PHD thesis's subject is "determinants of bank loan supply and demand in Bulgaria for the period 2000-2010"I am running a regression using a VECM (all my data is time series, i.e. nonstationary at the levels and stationary at the first difference, guided by ADF and PP tests). As expected most of my variables (GDP, Gross Value Added, Gross investments, CPI, GDP deflator, salaries, loans stock, loans new business volume, deposits, bank balance sheet data, interest rates, etc. are trending, nonstationary in levels. next I test for cointegration using the embedded in EViews 5.0 Johansen Cointegration test. I am assuming all my demand and supply determinants to be endogenous loan variables including, and I am experimenting with different lags and combinations (keeping the economic logic of signs of coefficients)…So I am struck with the following three problems:-The Johansen Cointegration test for example shows that there are three cointegration vectors (rank=3), can I run a VECM model with only one error correction equation in which the credit variable is explained by the other 4 variables in the regression, skipping the fact that 3 cointegration vectors are assumed by the Johansen test? Concerning the error correction term, I know there are interactions between endogenous variable, but since I am interested only of loans as dependent in the long term, can I omit the other two cointegrating vectors, in which loans are not included?-If the Johansen Cointegration test shows me 5 cointegrating vectors for a 5 variables test, (rank=5, having 5 variables), does this signal a spurious regression and misspecification?-Assuming that everything is ok with the cointegrating equations, but it happens in the short term model lagged variables are changing signs of coefficient for the same variable. (t-stat are with high values, signaling that coefficients are different from zero and lagged variables can not be omitted) For example loan demand is positively related to lnGDP (in the cointegrating equation and in the first and second lags in the short term model, but the third and fourth lags of LnGDP are with negative signs, and still with high t-stas- how is this interpreted?Thanks for your time and consideration,Peter

Dear Professor Giles,First, congratulations for your inspiring thoughts and useful blog!!!My name is Peter and I am a PHD student in political economy, and my PHD thesis's subject is "determinants of bank loan supply and demand in Bulgaria for the period 2000-2010"I am running a regression using a VECM (all my data is time series, i.e. nonstationary at the levels and stationary at the first difference, guided by ADF and PP tests). As expected most of my variables (GDP, Gross Value Added, Gross investments, CPI, GDP deflator, salaries, loans stock, loans new business volume, deposits, bank balance sheet data, interest rates, etc. are trending, nonstationary in levels. next I test for cointegration using the embedded in EViews 5.0 Johansen Cointegration test. I am assuming all my demand and supply determinants to be endogenous loan variables including, and I am experimenting with different lags and combinations (keeping the economic logic of signs of coefficients)…So I am struck with the following three problems:-The Johansen Cointegration test for example shows that there are three cointegration vectors (rank=3), can I run a VECM model with only one error correction equation in which the credit variable is explained by the other 4 variables in the regression, skipping the fact that 3 cointegration vectors are assumed by the Johansen test? Concerning the error correction term, I know there are interactions between endogenous variable, but since I am interested only of loans as dependent in the long term, can I omit the other two cointegrating vectors, in which loans are not included?-If the Johansen Cointegration test shows me 5 cointegrating vectors for a 5 variables test, (rank=5, having 5 variables), does this signal a spurious regression and misspecification?-Assuming that everything is ok with the cointegrating equations, but it happens in the short term model lagged variables are changing signs of coefficient for the same variable. (t-stat are with high values, signaling that coefficients are different from zero and lagged variables can not be omitted) For example loan demand is positively related to lnGDP (in the cointegrating equation and in the first and second lags in the short term model, but the third and fourth lags of LnGDP are with negative signs, and still with high t-stas- how is this interpreted?Thanks for your time and consideration,Peter

Dear Professor Giles,thanks a lot for this interesting blog! Regarding your coffee example I was wondering about one step in your procedure:You showed that the inverse roots are inside the unit circle, which implies stability of the model. But I am not sure what this fact should tell me? Is there then a contradiction to the unit roots test in the beginning (i.e. can the model be stable when the series are all I(1)?)Thanks a lot!Best regards,Paul

Paul: Thanks for the interesting comment. I don't think there's any conflict here. In the case of the unit root testing, and the finding that the data are I(1), the underlying model is an AR(1) model, and we find that we can't reject the hypothesis that the autorcorrelation coefficient is unity. When we get the VAR model we have a much more complex underlying process. We now a bivariate AR(6) process, and when we estimate it this model is found to be dynamically stable.

To me, the explanation lies in the the fact that we two completely different models. If you use my EViews code and estimate a 2-equation model with lag s of length one in each equation, the inverse roots are 0.991 and 0.983. Indeed, the estimated coefficient on the own-lag for Arabica is 0.994 (se = 0.0215), so the t-statistic for testing that the coefficient is unity is -0.28. We can't reject a unit root. In the case of Robusta, the corresponding numbers are 0.9796 (0.0164), t = -1.24, giving the same conclusion.

thanks for the quick response and the good explanation. Am i right in assuming that stability or unstability of our model does not make any difference with regard to the G-causality tests? In other words, if it turns out that my model is unstable, I will nevertheless proceed as usual?

I think in most papers stability is not checked at all as only stationarity of the process matters, or?

Paul: You are correct that a lot of people don't check the dynamic stability of the model in this particulr context. It is obviously something that is crucial if, say, your objective in estimating the VAR was to generate forecasts, or look at impulse response functions associated with policy shocks.

Strictly speaking, the proof of the Toda and Yamamoto does not rely on the VAR being dynamically stable, so yes, you could still go ahead as described in the event that it was not. However, personally I still like to check this out for the following reason. If there are inverse roots outside the unit circle then this suggests that the VAR is in some sense mis-specified, and I don't like to apply the test in the context of such a model I find, invariably that the issue of non-stationary roots can be resolved by adjusting the maximum lag length in the mmodel

First of all many thanks for the clear explanation of the workings of Granger causality. I am currently working on a VECM for my thesis in which I study the linkages between energy consumption and a number of economic indicators. I have two questions:

1) A number of similar studies report the sum of the lagged coefficients of the VECM as the sign of the Granger causality (calculated with Joint Wald Chi-square). What does the sign of the causality imply w.r.t. the relationship between the variables? Does the sign of the Granger causality even matter at all?

2) I would like to perform impulse response analysis. However, Eviews does not provide confidence intervals. How can I obtain p-values or confidence intervals to show the significance of the impulse responses?

Question 1: It's not clear to me that the sum of the coefficients really tells us the "sign" of the causality. There are all of the dynamic effects between the equations that have to be taken into account, and that's precisely what an impulse response function does. If the IRF is potitive for all periods, fading away to zero, I'd say that's a postive "sign" for the causality. If it is positive, then negative, and then dampens down, I'd say that the "sign" depends on the time-horizon. Whether or not the sign matters for the causailty is dependent on the context, I think. If we have a 2-equation mmdel for income and consumption, and the IRF for consumption responding to a shock in income is not positive everywhere, I'd a bit worried, personally about the specification of the lags in the model, etc. In other situations the "direction" of the causality may be all that is of interest.

Regarding your second question - you're right. EViews does this for the VAR impulse responses, but not the VECM ones. Grrrr!

You're not the only one to be asking. See http://forums.eviews.com/viewtopic.php?f=5&t=4952My best answer is to bootstrap them. This is what is done in Helmut Lutkepohl's software, JMulTi. See: http://www.jmulti.de/download/help/vecm.pdf

Nick: A follow-up; There is a step-by-step description of bootstrapping confidence intervals for IRFs from VECMs in the following paper: A. Benkwitz & H. Lutkepohl, "Comparison of Bootstrap Confidence Intervals for Impulse Responses of German Monetary Systems", Macroeconomic Dynamics, 2001, 5, 81-100.

Jasmin: Because the highest order of integration among the series is I(1), we need to add one more lag of each variable, beyond the 6 lags that we've already decided upon. It's CRUCIAL that the coefficient on this extra lag is NOT included in the Wald test for non-causality. (See steps 8 & 9 in the post.)

Now, this poses no problem. However, if you want to use the "built-in" Granger causality test in EViews, you have to use a "trick" to ensure that only 6 lag coefficients are included in the test, and not all 7. The way to do this is to sya you are using lags 1 to 6 in the lag langth" bos, and then add the 7th lags in the extra "exogenous variables" box.

This is an EViews-specific situation. You could, of course, fit the VAR with 7 lags, and then select "VIEW", "Coefficient tests", "Wald Test", and specify the six coefficents that you eant to test. This would take a bit more work, but gives identical answers. Doing it the way I suggested gives you ALL of the causality tests in one hit.

Many thanks for the quick response and the Lütkepohl references. I don't quite 'get it' though, how to perform the bootstrapping of the confidence intervals in Eviews. I guess I should settle for the Wald Chi-square tests for Granger causality (I can explain the majority of the results on the basis of economic reasoning), and merely use the IRFs as a point of reference for the 'sign' of the relationship. Is it right for me to use the IRFs in such a manner? Or would you suggest to not discuss the IRFs at all, seeing as though I cannot provide coinfidence intervals/ significance levels (thus no empirical evidence).

Nick: I'd definitely include the IRFs, even without the confidence intervals.

To construct the intervals you'll have to write an EViews program to go through the steps I referred to previously. You certainly can't "trick" EViews into doing it. I'm afraid I haven't written a program myself - I've never had the need to date.

Well unfortunately programming isn't really my forte, nor is econometrics to be honest. It took me quite some time to get where I'm at right now in terms of understanding the workings of VAR models and cointegrated data. I will include the IRFs in the study, since they do provide useful information. I would like to thank you for being actively involved with solving my issues! And if, by any chance, you might find a solution to our IRF confidence interval issue I am looking forward to reading about it on your blog.

First of all,thank you for your helpful Blogsecond,I want to investigate the relationship between exchange rate and stock market index in Malaysia using daily time series from 2005 to 2011.i have three Time seires variables which i transformed them in to log.Stock index,Exchange rate and gold price.i used ADF and KPSS test in Eviews and the result showed that that are integrated of order one I(1).then i applied Johansen cointegration test .VAR test and lag lengh criteria showed that AIC=3 and LM=6 for maxlag=12So i used 3 lags but got no cointegration .i rad somewhere that if your equation has break it might give you faulty results.so i want to know how to test my cointegration test for structural break in Eviews and if my lag selection is correct?Regards

Dear PaulThe software that you mentioned was useful.thank you so much for that.I have another favor to aski have done the test as you told me and i have the results but iam having a hard time interpreting them.i was wondering if you have time to take a look at them.i dont know if my VECM have a break of not.and if it has a break then for which observation?i have posted my result in the link below:http://madrebuy.blogspot.com/2011/11/results-of-jmulti-vecm-chow-test.html

Hello Dave,I'd like to commend you on the excellent explanation of the VAR. Question: While checking for serial independence, how did you settle for 6 lags using the LM statistics? I tried using 5, 4,3 lags, but the p-values were not consistent in all cases. How did you arrive at 6? Thanks and keep doing the great job you are doing!

@Anonymous: Thanks for the comment! When I look back at what happens with 3, 4, 5 lags there are always some very small p-values for the LM test at low-order lags (of the autocorrelation function). I went to 6 lags to be conservative. I'd rather over-fit the model than under-fit it.Hope that helps.

@Anonymous: That's the thing with p-values - the choice is subjective. A value close to zero implies there is a very low probability of observing the actually observed value of the test statistic, if the null is true. But we HAVE observed it, so we then reject the null (of independence, in this case).

I focussed on the short lags in the autocorrelation function - very small p-values when the alternative is autorcorrelation of orders one, two three,...suggests model mis-specification (e.g., through the omission of variables - lagged values in the case of a VAR).

Very informative piece on VAR. I have a simple question, I want to construct an unrestricted VAR on 3 variables: hunger incidence (indicator of food security), rice price (measure of access) and rice yield (measure of productivity). If you construct a correlation matrix, the value for rice price and yield is 0.6. Does correlation even matters in a VAR framework?

I'm back again with a new question about the interpretation of the VECM estimates. I'll try to keep it short.

1) As previously described on this blog I use Wald Chi-square to test for short-run (Granger) causality between 6 endogenous variables (in VECM context). 2) I test for long-run causality by testing the adjustment coefficients of the error correction terms (ECT, four of them to be precise).

This is where it gets tricky; I have 4 ECTs and 6 simultaneous equations. Does an ECT have any indicative value if it's adjustment parameter is insignificant? I am trying to figure out the interaction between long-run causality and long-run equilibrium relationships, but I have to admit that I'm quite puzzled.

Thanks for the thorough explanation on the causality test in nonstationary framework. I see that you used EViews to demonstrate your method. Can you also demonstrate it using R? It will be very helpful. Thank you.

Razi: Thanks for the comment. No, I wouldn't be worrying about stability after steps 8, 9 and 10. The T&Y approach requires that the model be "properly specified" in the levels before you add the extra lag(s) to allow for the unit roots. That's all. I hope that helps.

Anonymous: Any positive value for the ADF statistic leads to NON-rejection of the null hypothesis that there is a unit root. It's very common. There is nothing to "solve". The data are simply non-stationary. First-difference the data and in all likelihood the series will then be I(0), implying that the original series was I(1).

Thanks for the response relating to the positive ADF values - When checking whether to reject the null or not, we are checking values in absolute terms right? E.g. We can reject the null that there is unit root if a series has a t-statistic of 5.091214 (with p-value 1.000) where the critical values are -4.2349, -3.5403, -3.202445..

Dear Professor Giles:Thank you for the excellent information; very helpful. However, I have a question about the number of lags in Johansen cointegration test. Suppose that I tested for cointegration between two series that have structural breaks without considering the breaks and determined the number of lags to be, for example 5. When considering the breaks, do I have to go back and determine the number of lags? In other words, would including the breaks affect the number of lags, or I should be using the same number of lags as in the case without breaks, that is 5.

Dear Mr.Giles. I am Aditya Bhan. I am doing my post-grad in quantitative economics. In case of VECM, the significance of the error correction term helps us to conclude upon long run causation. Could you please outline the procedure for inferring on long run causation in the case of unrestricted VAR model?

Dear Professor Giles:Thanks for the helpful comments.For the world prices of Arabica and Robusta coffees example that you illustrated, if you used the full sample from January 1960 to March 2011 to test for Granger causality, do we include a dummy variable (D= 0 from January 1960 to December 1975, and 1 from January 1976 to March 2011) for break in the “exogenous variables” box asC Arabica(-7) Robusta(-7) D

Dear Professor Giles,Thanks for your excellent explanation about Granger Causality test. I am going to test the causality between two variables during the economic recessions. I have data for a long period including the recession periods. Please let me know how I can use all my sample data and test causality just for the recession periods.Many Thanks in advance

dear professor Giles,thanks for your assistance, please how can i use Autoregressive Distributed Lag ARDL) bounds testing approach to investigating the existence of cointegration relationship among variables. dele

Dear Prof. Giles,I was wondering if it is possible to demonstrate with an example on how to carry out non-linear Granger-Causality test between two variables. I do have some thoughts but, I am not sure whether it’s correct. Bierens (1997) argues that the presence of structural breaks might imply broken deterministic trends, which is a particular case of a non-linear time trend. He suggests approximating broken time trends by non-linear trends. Based on this, I was wondering if adding dummy variables (to account for structural breaks) in the “Exogenous Variables” box in VAR specification in Eviews, and then carrying out Granger-Causality test would be considered non-linear test in Bierens’ sense. Bierens H. (1997). Testing the unit root with drift hypothesis against nonlinear trend stationarity, with an application to the U.S. price level and interest rate. Journal of Econometrics, 81, 29-64.Could you please advise.

Dear professor Giles, I am from the Philippines and is currently in my undergraduate studies in economics. I am doing a thesis using time series data and I would like to ask you some questions about the johansen cointegration test. It was not thoroughly discussed to us and I'm having a hard time conducting the test in Eviews 4. We weren't advised to use the Engle cointegration test. Is it possible that you can give me some pointers as to how to conduct the test in eviews and how may I be able to interpret it? It will be very helpful for my study.I admire how concise and specific you are in explaining the methodology in econometrics.This will be deeply appreciated. Thank you.

Say if you want to test if nominal price of USD/EUR Granger causes nominal oilprice (WTI). The timeseries are of course I(1), and they're co-integrated.

How can you perform a Granger-test on this data? The T&Y method you've described here is a little bit too complicated for my work. Is there any way to test Granger-causality with the usual F-statistics? If so, should you test i level or dlog? And how do I which lags to include?

Richard - thanks for the comment. If the data are integrated or cointegrated then there are no short-cuts. The usual F-test will fail, even asymptotically, That's precisely why you need to use something like the T-Y procedure. Choosing the lag length by minimizing the Schwartz criterion is simle, and is "constistent. It will choose the correct lag length with probability one if you have a large enough sample size. You can't use the t-statistics on the lag coefficients to select the lag length, for the same reason that the F-test fails.

Thank you very much for the quick answer! I'm currently using the program OxMetrics "PcGive", I don't know if you're familiar with it, but I have not found a way to test a VAR model for Schwartz/AIC.

Is there any other way to determine lags, and then manually create the model in the program, and then use the chi2 table to test for signif.level?

And one more question, if I may: Say, if you want to test Granger-causality on two co-integrated time series which happens to NOT be in level (i.e. if you want to test for Granger causality in two variables that are in %-changes (dlog)). isn't this possible then?

Richard: To answer the last part of your question - no, this is NOT O.K. You still need to use the T-Y procedure (or its equivalent), and this requires that you fit the VAR in the levels - for the causality testing exercise. Otherwise the usual Wald (chi-square) test won't be asymptoticaclly shi-squre distributed.

1. There are papers using T-Y procedures perform post-VAR(p+d) diagnostic tests such as adj-r squared, B-G test, Ramsey test etc. I am curious about that. I am using STATA to test for VAR(p+d) T-Y type. But I dont do any any post tests because the tests (varnorm, varstable, varlmar -- in STATA) suggest VAR(p) instead of VAR(p+d). I dont know how perform those test (B-G etc.) in STATA after VAR unless I regress each equation in VAR separately. What do you think about this?

2. To test for wald test, I use 'test' command after VAR(p+d) e.g. VAR(2+1) test[depvar]l1.indvar [depvar]l2.indvar=0. This test gives me the p-value. I hope this correct.

3. Now, recent papers used generalized IRF. Could you suggest any software to perform this? Or could be any tricks?

ADIB: Thanks for your comment. If you're doing T-Y in EViews, all of the usual diagostic tests are fully available, so that's easy. I'm not a STATA user, so I can't help you there, or with question 2, I'm afraid.

And I'm afraid I don't have any tricjs up my sleeve with respect to your last question.

Dear Prof. Giles,I was wondering if it is possible to demonstrate with an example on how to carry out non-linear Granger-Causality test between two variables. I do have some thoughts but, I am not sure whether it’s correct. Bierens (1997) argues that the presence of structural breaks might imply broken deterministic trends, which is a particular case of a non-linear time trend. He suggests approximating broken time trends by non-linear trends. Based on this, I was wondering if adding dummy variables (to account for structural breaks) in the “Exogenous Variables” box in VAR specification in Eviews, and then carrying out Granger-Causality test would be considered non-linear test in Bierens’ sense. Bierens H. (1997). Testing the unit root with drift hypothesis against nonlinear trend stationarity, with an application to the U.S. price level and interest rate. Journal of Econometrics, 81, 29-64.Could you please advise.

Thank you for interesting information. I have a question and look for your kind help. I'm doing Y-T for my paper. Normally in empirical studies, we always transform the series to logged variables and take log to acquire first log difference to get growth rate of the data. However, the series I got, which are trade balances, are negative for many years, thus I could not take log. Is it possible that I could just enter the data in level (without log) and take their first difference to achieve stationary data, enter them into my models. I know it's unusual in published articles. But, is it possible to do that?

Thanks for the comment. There is actually nothing unusual in using the original data rather than their logarithms. We would often do this with interest rates, for example.One reason for taking logs is often just to linearize the upward trend in the data.

Thank you very much for your prompt reply, Prof. Dave. I think the main reason for taking log of the series is if they are non-stationary, we have to take first difference of the logged series (usual cases in papers in my research field) to enter them in VAR models. If we do so, we can then interpret the coefficients as elasticities, thus more economically meaningful. If we just keep the original series and take first difference to get a stationary process, and we enter the first differenced data into VAR models then the coefficients are meaningless. And to check whether the results are economically plausible, it would be necessary not only to check with causality direction is stat significant but also to check whether the coefficients are reasonable. Kindly advise me if I'm right. Thank you.

Actually I'm doing T-Y test for annual relationships among the variables (I used monthly data and would like to see how the relationship changes year over year). Thus I divide the whole sample (13 years) into 13 sub-samples corresponding for 13 years. Pls, kindly take a look at short questions of mine. - If the series is I(0) (or I(1)) for the whole sample, does it imply the series is also I(0) (or I(1)) for any small sample (i.e. any year)?- To find the maximum order of integration (step 2), can I use the result for the whole sample to apply for each sub-sample?- Basically, all the steps in Y-T procedure that you described, can I use the result of the whole sample for sub-sample testing procedure, or I have to do for sub-sample?

Could you kindly advise me pls?

I'm sorry for disturbing you too much but your reply is greatly appreciated.

Anonymous - yuo really need to do the analysis for each sub-sample (assuming you have enough observations, of course).Regarding logarithms vs. levels of the data. It's not really a matter of convenience - e.g., to get elasticities easily. It's a matter of whether the data are additively integrated, or multiplicatively integrated. I think I should prepare a separate post on this!

Good question! You have to be careful. The real issues are: (i) whether the data are additively integrated or multiplicatively integrated; and (ii) how robust is the test you use to mis-specification between these two forms.

The following paper gives a good overview and provides references to the earlier literature:https://eldorado.tu-dortmund.de/bitstream/2003/5035/1/2000_30.pdf

First, thanks for this very clear and interesting blog: it's very helpful and pretty scarce in the econometric field.

Regarding granger causality test associated with cointegration models, some authors analyse short run as well as long run causality between the set of endogenous variables. I wonder how they can perform both tests using Eviews? I guess that long run causality corresponds to a granger test performed on the VAR model and the short run is the same using the VECM part on differenced series, but i'm not sure. Thank you very much in advance if you could explain this point.Best.

I'm employing the Y-T procedure for my paper and I use the data series which is likely to have a structural break.

You said in your post, under the practical example that:"It looks as if there may be a structural break in the form of a shift in the levels of the series in 1975. We know that this will affect our unit root and cointegration tests, and it will also have implications for the specification of our VAR model and causality tests. This can all be handled, of course,..."

Could you pls illustrate a bit more details? I know how to handle with unit roots (I use Zivot-Andrews) and cointegration tests (I use Gregory and Hansen, 1996) (actually you already guided how to deal with this issue in another post). But how about VAR modification and Y-T procedure? Could you pls elaborate a bit on how to do (esp. with Y-T) when the series is having breaks, or pls suggest me with some references.

Thank you for your generous teachings. On an off topic, may I know what are the ways or steps in conducting estimation for random walk hypothesis? And I am curious about the interpretation too. Once again, thank you and may God bless you.

Hi, Prof. Giles,Thanks for your interesting blogs. Your explanations really makes me more understanding and useful. Between this, I have a few questions would like to ask Prof.1. You did mention that we must no difference the data under the VAR when want to do the TYDL procedure. How about to do VECM? Assuming all my variables are I(1), can i add difference the variables in the endogenous variables such as dlnint dlngdp to find the 'lag length criteria' in the VAR model before to do VECM? 2. If there is no cointegating factor in the VECM, we want to find short run relationship in the VAR model and assuming all the variables are I(1), so, we need to set up VAR model by using first difference variables (dlnint dlngdp) or straight away level form of data (Int gdp)? 3. I found when i generate my data, some of the variables show conflict between KPSS, PP and ADF test? For example, one of the variable's result for PP and ADF test are stationary in the I(1) while KPSS test show stationary in I(0). Is my result can be acceptable? Which result should i take?Thanks. Your reply is highly appreciated.Have a nice day!!!

Dear Prof Giles,I am studying the relationship between credit default swap(cds) spreads and credit ratings. I want to check if ratings have a impact on cds spreads over a long period of time. the problem is that there may be other things affecting cds spread besides ratings. also there is a possibility that because of certain pattern of cds spreads they may be rated high or low. in such a scenario, what could be the best way to analyze this? i am thiking of granger test. is this appropriate? what do i need to keep in mind while i do such an analysis on time series data. how do i make sure that i get robust results? thanks in advance.

There's not much more I can really add to the detailed discussion in the post. If you are working with quarterly or monthly data, be aware of the possibility of seasonal unit roots and/or cointegration. Plot your data and look carefully for any signs of structural breaks. If you have sufficiently long time-series, then you might test the "robustness" of your results by performing the causality tests using different sub-samples.

Dear Prof,I'm a beginner in econometrics. I'm interesting to know what theorical(s) reference(s) in econometrics support the fact that Wald test statistic does not follow its usual asymptotic chi-square distribution under the null.

Dear Prof,I am having a problems with some data I'm working with. I am trying to construct a VECM. The issue is one of the variables is stationary wen converted to natural logarithms whiles the others are nonstationary. Is it posibble to go on constructing a VECM with the data and what should be my next cause of action.

Hi Prof Giles, thank you for such an informative and generous blog. Although your steps are very detailed, I can't help but to wonder, in regards to Johansen Test, is there any formal approach on the specification of the deterministic components i.e a test/steps to determine which model (linear unrestricted vs linear restricted)?

Thanks for clear explanations. I am trying to use T-Y procedure to study interdependencies between Russian stock index and macroeconomic factors using monthly time series. I discovered that there is serial correlation of VAR model's residuals on seasonal lag, i.e. 12. Even after I use p=12, there is still remain serial correlation in residuals. My first question is how to avoid the problem of serial correlation. Also I found the problem of multicollinearity. Specifically, gdp and oil price are highly correlated and hence the inclusion of both gdp and oil_price distorts the coefficients in VAR equations. Should I exclude one of the these variables? Another surprise is that in the end I obtained results that contradict economic sense: the Wald tests show that Russian stock index Granger cause oil prices (actually, it is reasonable to assume that oil prices Granger cause Russian stock index). At the same time it shows that oil prices Granger cause Russian gdp, which has economic sense. The third question is how to interprete economically nonsensical results?

Respected Prof. Giles, I am using 54 observations to test unit root with one structural break by Lee Strazicich method by RATS. For general to specific procedure what maximum lag should I consider? How the maximum lag can be determined?RegardsBipradas

I am trying to find the direction of causality between bilateral aid and bilateral trade for one country. It is a panel data as it is from 1987 to 2010 (annual) and each year has around 180 aid recipients. I was wondering how to run a granger causality test. I am having trouble finding the appropriate lag lengths as depending on the lag length, the result changes.

regarding the output from Eviews under Step 6 (Johansen's Trace Test and Max. Eigenvalue Test both indicate the presence of cointegration between the 2 series)..i do not understand why the output for "Lags interval(for first differences) is '1 to 5'"because as u mention in step 5, the max length is p=6..

it is because we need to reduced the length lag?

i have run the same step as T_Y steps. for my project,i get p=7 in order to remove serial correlation. when i run for "1 to 7", i'll get different no of cointegration, for example; trace test: 2 cointegrating max eigenvalue test: 1 cointegratingbut, when i run for "1 to 6", i'll get the same no cointegrating for both trace and max test..

can u explain to me which one should i run for my project,,and if i need to run "1 to 6", what are the reasons need to be address?

very informative post. really helpful for me. also the comments are also very informative.Manual Testing is a type of software testing where Testers manually execute test cases without using any automation tools. Manual testing is the most of primitive of all testing types. Manual Testing helps find bugs in the software system.

I am investigating the causality between media attention (A) and terrorism (T) for the period 1970-2010. I have set up a VAR-model, for which the optimal lag is calculated to be between 2 (SC) and 14 (AIC). I have decided to go for the SC criterium.

2nd question - this is a common problem, especially with moderate sized samples. Make sure that you have allowed properly for any structural breaks - this can add to the propblem of conflicting results.

Bottom line - ask yourself, "which is the more costly mistake to make? Concluding that the series is stationary, when really it is non-stationary? Or concluding that it is non-stationary, when really it is stationary?"

Usually, the first type of error is more costly - you'll end up with a meaningless mmodel. In the other case, you may up being conservative and unnecessarily "difference" data that are stationary. This over-differencing results in a series that is still stationary (although not I(0)) - that's nt such a bad thing, in general.

I'm testing whether the different segments of my VAR-model are well specified.

If the lag order is high enough serial correlation can be eliminated. However, JB-test shows that the residuals are not normally distributed. In addition, the Harrison-McCabe test shows that heteroscedasticity is present. Is this a serious issue, or is mentioning it enough?

The normality of the errors isn't needed when testing for Granger non-causality. The heteroskedasticity is worth mentioning, but is not really a serious problem. The usual non-standard asymptotic results associated with integrated and cointegrated data hold when the generating process is mildly heteroskedastic.

Yes, bootstrapping is always a good option - it's likely to be a bit tedious in the context you are talking about, though.

Hello Professor,I'm an Economics student from the Philippines and I'm having a hard time determining what to do especially using eviews. :( Our study is about tourism-led growth hypothesis, if it is applicable to the Philippines. How many observations should there be for granger causality?

Dear Professor Giles,first of thanks a lot for your informative blog.Here i need a little clarification, if both series (say X and Y) are I(0), then performing Granger causality with usual process and following T-Y procedure, will it give different results? If yes, then please clarify how to go for usual Granger causality test in this case as i have got idea to perform T-Y from your blog.

Nain: Thanks for your comment. If each series is I(0), then you should just estimate a VAR in the levels (or log-levels), not the differences, of the data. You would choose the maximum lag length in the usual way (AIC. SIC, etc.). The test for Granger non-causality in the usual way.

Because the data are stationary, the Wald test statistic will have its usual asymptotic chi-square distribution.

You shouldn't add any "extra" lags, as would be the case with the T-Y method.

Just found your blog and read much of it already - fantastic work! I am quite a novice at estimating VARs however, and, while working through my data with the help of your notes, i have a brief query. The initial lag length selection choice(say: [m], not [m+p]), in my data and as well as per your example, seems extremely subjective. For example; different IC give different recommendations, and when these different choices are chosen, the residuals are still serially correlated, heteroskedastic and non-normal.

To rectify this problem, the lag length of my bivariate and trivariate VARs has to be increased up to ~20 periods to get 'well behaved' residuals. If the choice of the initial lag length [m] (as in your example; you jump from 2-->6?) is, as mentioned, so subjective, the addition of an additional augmented lag as per TY [p] seems almost trivial due to [m] being so arbitrary, no?

Thanks again for all your great work on this blog by the way! Apologies if my naivety in your profession offends!

why do you not consider the normality of the error terms in the unaugmented VAR? isn't this condition required to ensure that the distribution of the chi-sq in the augmented VAR reaches its asymptotic values?

Dear Professor Giles,Thanks for sharing this information and for the perfect instructions! Thank God for a generous Professor like you. I've conducted the ARDL bounds testing for my current study. Now, I'm thinking of conducting TY-causality test too. Is it appropriate to compare the results from ARDL and TY in an article? I'm using the multivariate time series with small sample size.

Zai - thanks for the kind comment. Keep in mind that the ARDL test is a test for cointegration, while the TY test is a test for Granger non-causality. You can do both with the same data-set, but you are testing for different things. You'll also have to be very careful if you have a small sample size, as teh results asociated with both tests are valid only asymptotically.

Your instructions are very very usefull.I have two time series from 30 observations (quarter data, I(1)), and I wanted to explore causality. Does it correct use Toda-Yamamoto procedure in that case? If not, what minimum sample size should be?Can you propose me another method for testing causality?

First, if you have a small sample, for the ADF test you should be using the usual MacKinnon response-surface critical values. It may appear that his crit. values are only for cointegration testing, but for his N=2 case, thay are fo the ADF test for a unit root.If you're using EViews, the exact finite-sample critical values are automatically used in computing the p-values.

In the case of the KPSS test, several authors have published finite-sample critical values, including Hornol and Larsson (Econometrics Journal, 2000).In EViews only the asymptotic critical values are used, which is a pity.

You'll probably also find the following paper helpful:http://www.nek.lu.se/publications/workpap/Papers/WP06_23.pdf

Dear sir, Q1. If my times series are co-integrated, but does have granger causality in both direction. - Is this siginifies that the series have some long run relation but not short run due to small sample size.

Q2. If my time series are cointegrated with different order, can i use granger causality test or should i need to unrestricted VECM to find the long run and short run causality between the varaibles/series.

first of all, thanks a lot for your valuable blog which is of great interest and help. I have a question concerning the T-Y test:

I have a VAR consisting of 5 series having (very) similar trends. They all are not trend-stationary but I(1). A test for cointegration arrives at the result that there is 1 cointegrating vector (with a restricted linear trend in this vector). This sounds plausible. All variables, however, appear to be weakly exogenous. That means – as far as I know – that the long run relationship does not provide any information in the EC model. How is this result to be interpreted?

Next, I did the T-Y test to look for some Granger causality between the series. I found some significant relations what is, I think, consistent with what you wrote in point 13 of your original contribution. But my problem is that I want to show that there is indeed a long run relationship (common trend) between the 5 series but no “contagion” in the narrow sense. Is it possible to include a time-trend in the VAR to “account” for the common trend in the series. In this case, all significant Granger causalities disappear by using the T-Y procedure. May I conclude from that result that there is no short-run influence between the series?

dear proffesor am doing a dissetation entitled relationship between economic grwoth and the current account balance 1990 -2010(zimbabwean case) annual data. i hve a problem whereby in eviws 3.1 the variables are both stationery a I(O) that is for the unit root test but have been told to do cointergration test using the johansons test... and am not able to undersstand the results as most previous findings of other scholars have used the johansons test when they have at least one variable being at I(1) and above so what should i do

I have got a question regarding the Johansen test for cointegration in Eviews. If the Johansen test is performed using five variables, according to the output obtained, the following number of cointegration relationships is possible: none, at most 1, at most 2, at most 3, at most 4. If there is an asterisk (*) behind any of those options (none*, at most 1*, at most 2*, at most 3*, at most 4*) there is the following text below this listing: “Trace/Max-eigenvalue test indicates 5 cointegrating eqn(s) at the 0.05 level.”

Now my question is: In case of five variables, can there be ‘at most’ 4 or 5 cointegration equations? The reason behind this question is the following: If I try to estimate a VEC model with 5 research variables in Eviews and enter ‘cointegrating rank 5’ in the cointegration section, I get the following error message: “Invalid specification of number of cointegrating equations.”

Maybe you can help me with that issue. Thank you very much in advance!

Jan - regardless of the context, the maximum number of cointegrating relationships is always one less than the number of variables under consideration. If there are just two I(1) variables you either have cointegration; or you don't. In this case the maximum number of cointegrating relationships is just one. Also, in this case of 2 variables,if a cointegrating relationship exists, it is unique. (This is NOT the case when there are 3 or more I(1) variables.)

Dear Professor Giles,Thank you very much for your quick reply!Unfortunately, I am still a little bit confused when it comes to the EViews Johansen output. For a case with five I(1) variables, the maximum number of cointegrating relationships would then be four. Is the interpretation of the Johansen output below correct? If so, how should I interpret the last case? If not, where`s my mistake?

NoneAt most 1At most 2At most 3 At most 4=>no cointegration relationship

None*At most 1At most 2At most 3 At most 4=>1 cointegration relationship (= cointegrating rank 1)

None*At most 1*At most 2At most 3 At most 4=>2 cointegration relationships (= cointegrating rank 2)

None*At most 1*At most 2*At most 3 At most 4=>3 cointegration relationships (= cointegrating rank 3)

None*At most 1*At most 2*At most 3* At most 4=>4 cointegration relationships (= cointegrating rank 4)

This is not an original output from any econometric software. I`ve tried to set up an example, but that might have been confusing.The output below is from EViews and has been produced for five variables which are all I(1):

My question with regard to the output above is as follows: If there can only be a maximum of 4 cointegrating relations, why does the output include the following remark: “Trace test indicates 5 cointegrating eqn(s) at the 0.05 level”In other words: Why does the test indicate 5 cointegrating equations if there can only be a maximum of 4?

If you check p.367 of vol 2 of the User's Guide for EViews 6, you find the following. The case of k cointegrating relations (k=5) in your case, is taken to correspond to the situation where all of the series are in fact stationary - none of them have unit roots. So, you have a conflict between the results of your unit root testing and those of your cointegration testing. There could be several reasons for this - e.g., structural breaks in the data; a very short sample size, etc.

Dear Professor Giles,Thank you for your answers earlier. I have another question. In the case Toda-Yamamoto procedure when I have time series I(0) and I(1), haw I can know does effects earlier values series Y have positive or negative impact on the current value Y?

I just had a referee report concerning a paper submission. I followed your methodology for Granger causality. I am focused on the relationship between 2 variables for the US and EU and have a number of other variables as controls.

I presented the results of the procedure you described concerning Granger-causality on pairwise tests. However the referee states that "it is known that it is not optimal to test for causality in a bivariate model, particularly if there is an auxiliary variable that influences the two variables in the bivariate system".

I am a bit surprised by this comment as I am not to test for causality per se, but actually checking if Y has information with respect to X.

Also, it is a purely forecasting paper, with no structural model behind. Following the referee's reasoning, there is no reason why I should exclude a priori any variables from the testing. Should I end up having to estimate a VAR with pairwise testing and all other variables as exogenous variables? My data set consists of 11 variables for the EU and 11 variables for the US.. Would anything change in the testing procedure above?

Thanks a lot for any feedback you may provide. And congratulations on your service to the community with this blog. I wish more people would follow your example :)

Sorry to hear about your rejection. :-(First of all, if there are additional variables that might cause X or Y, or if X might cause Y indirectly, through some other variable Z, then this really should be taken into account.

One thing I'm not clear about, from your description. Are you interested in testing for causality between one variable in the US, and one variables in the EU, with lots of control variables in the picture? Or are you interested in testing, pairwise (11 times) between a US variable and its counterpart in the EU?

Depending on which you're interested in, this will affect how you should set up the model and proceed from there.

For example, if it's the first of these two cases, then you'd presumably estimate a 2-equation VAR, for the US and EU variables of interest, with all of the other covariates added into the each of the equations. These additional exogenous variables might enter with or without lags. Then, you'd undertake the usual T-Y testing procedure.

Perhaps you could elaborate a little on my questions above, or email me directly at dgiles@uvic.ca

Dear professor,I'm unsure if you've already answered this (it's hard to go through so many comments), but I was wondering if you had to check the VAR residuals, and that they fall inside the confidence interval when you graph them, to asses if the model is properly specified.

Would you mind telling me more how to include the multiplicative dummy variable for granger causality test? I'm not quite sure on how to allocate it in either dependent or independent side since I'm testing VAR.

I think a good next post on this topic would be "instantaneous causality" within the T-Y framework. I see that it hasn't been covered yet on your blog and receives only a mystifying treatment in "New Time Series Analysis".

I want to examine long run relationship between 14 stock market indices through JJ cointegration. However, I found that all the series are stationary (I(0)). I have run VAR system. So how can I proceed then?

Thanks for the comment.It really doesn't matter from the viewpoint of the Granger causality testing. The important thing is that you test with the variable in the form that is consistent with the economic hypothesis that you are interestd in.