The Dickey-Fuller test is used to check for a unit root. It can be used as part of the general Engle-Granger two-step method (although it isn't the only option).

In this case, while the two assets themselves are not stationary, you are able to test if the residuals between a regression of the two assets isstationary. Most people prefer another approach, the Johansen test, which uses a VECM model.

The intuition behind pairs trading is that two cointegrated instruments will follow the same long-run path (since they presumably have some common factor, such as they are both oil companies and are heavily influenced by the price of oil), and any deviations will ultimately return back to the mean. Needless to say, pairs trading (or any other form of statistical arbitrage) is still a risky endeavor, as should be clear by the performance of arbitrage funds.

$\begingroup$Okay... Maybe little bit more technical please! Does it mean that in cointegration one is concerned about the difference between two price time-series?$\endgroup$
– user40Feb 7 '11 at 16:23

10

$\begingroup$I find it unfair to downvote this answer. It was asked to give a non-technical explanation! This one an investor will understand! The same association came Shane to mind...$\endgroup$
– vonjdFeb 7 '11 at 16:27

2

$\begingroup$I agree completely and gave you a +1. Your answer was spot on for the question.$\endgroup$
– ShaneFeb 7 '11 at 16:29

Two time series $X_1$ and $X_2$ are cointegrated if a linear combination $aX_1+bX_2$ is stationary i.e. it has constant mean, standard deviation and autocorrelation function for some $a$ and $b$. In other words, the two series never stray very far from one another.
Cointegration might provide a more robust measure of the linkage between two financial quantities than correlation which is very unstable in practice.

I have borrowed the following two examples from Willmot's Frequently Asked Questions in Quantitative Finance, one may be typical for a hedge fund trader and another illustrates the job of a mutual fund manager.

A. Suppose you have two stocks $S_1$ and $S_2$ and you find that $S_1 − 3 S_2$ is stationary, so that this combination never strays too far from its mean. If one day this ‘spread’ is particularly large then you would have sound statistical reasons for thinking
the spread might shortly reduce, giving you a possible source of statistical arbitrage profit. This can be the basis for pairs trading.

B. Suppose we find that the S&P500 index is cointegrated with a portfolio of 15 stocks. We can then use these fifteen stocks to track the index. The error in this tracking
portfolio will have constant mean and standard deviation, so should not wander too far from its average. This is clearly easier than using all 500 stocks for the tracking (when, of
course, the tracking error would be zero).

In this case you have two assets that are essentially the same but with a few details different. The buying and selling of these assets will make the prices fluctuate from each other. However they are unlikely to stray too far from each other because there will be arbitrageurs that will bring the prices back together. Arbitrage is the leash in the human-canine analogy.

But there is a difference between cointegration and high correlation. I'm guessing that a lot of pairs trading based on "cointegration" is actually based on high correlation. The difference is risk: if two assets are truly cointegrated, then they will eventually snap back towards each other; two assets that have a history of high correlation need not snap back together.

Then both $X_t$ and $Y_t$ are non stationary because they are linear functions of the non-stationary (stochastic trend) variable $u_t$.

However
$$\beta X_t - \alpha Y_t = \beta \nu_t - \alpha_t \eta_t $$
is a linear combination of the stationary disturbances and is therefore stationary. When this happens $X_t$ and $Y_t$ are said to be cointegrated. $X_t$ and $Y_t$ are said to contain the same stochastic trend.

The idea behind the Dickey-Fuller test is to estimate a regression which estimates the ratio of $\alpha$ and $\beta$ and test if the estimated residuals are stationary. These residuals do not follow a standard distribution.

If they are both stationary then model $Y_t$ or $X_t$ in levels (and nothing is wrong).

If one of the two is $I(1)$ (non-stationary for one level), then take differences to ensure stationarity.

If they are both non-stationary, and hence $I(1)$, then test for co-integration:

if the residuals are $I(0)$, then we speak of the presence of cointegration. Estimate then an ECM model ($Y_t = \beta_0 + \beta_1 X_t + \eta_t$ obtaining $\hat{\beta_0}$ and $\hat{\beta_1}$ and using it in: $\Delta Y_t = \Delta X_t'\phi - \psi(Y_{t-1}-\hat{\beta_0} - \hat{\beta_1}X_t) + \varepsilon_t$. When $\varepsilon_t \sim N(0,1)$ then both $\psi$ and $\phi$ are asymptotically valid.

if the residuals are $I(1)$ then we speak of spurious regression. In that case you should model both variables by taking the first differences.

$\begingroup$that is true, but I think that the strategy is useful in answering (for a part) the two questions right?$\endgroup$
– JohnAndrewsApr 13 '12 at 2:36

$\begingroup$No, don't do that. The spam filters went off and alerted me (a moderator) that you had done this. I also notice this post is identical to this one$\endgroup$
– chrisaycockApr 13 '12 at 2:49

$\begingroup$ok, I will try to customize my answers more in the following...$\endgroup$
– JohnAndrewsApr 13 '12 at 2:51