Transcription

1 PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU

2 The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard normal distribution, where x is the sample mean and s is the sample standard deviation. But if the true mean µ is (say) positive, then t will typically be large, in the right tail of a standard normal distribution. If, for example, the t -statistic is 3, we would have strong evidence that the true population mean is not zero. Indeed, the probability that a standard normal exceed 3 is just So by looking at t -statistics, we can draw conclusions from the data, while controlling the error rates (false positive, false negative). Consider a data set of monthly global temperatures (n = 1632). Is the plot sloping up (global warming), or is it just an illusion?

4 - 2 - A simple approach to this: Look at the monthly changes in temperature and test whether these changes have a zero population mean. We get x = Degrees C / Month and t = No evidence of global warming. Another way to approach the problem: Run a simple linear regression of the temperatures on a time variable. The estimated slope is βˆ = Degrees C / Month, and the t -statistic for the slope is t = Now get strong evidence of global warming! There s something strange here, since two apparently reasonable methods give completely different results. What s the problem?

5 - 3 - Regression is also used for prediction. Let s try predicting this month s stock return (y t ) based on three logged financial ratios from the previous month (time t 1). Data for NYSE, December December 1994 (n = 385). The t -statistics for the least-squares coefficients of log dividend yield, log Book-to-Market ratio and log Earnings-to-Price ratio are 3.02, 2.40 and 2.43, respectively. So we have strong evidence of predictability of stock returns based on past financial ratios.

6 - 4 - Now, let s see if current stock price can be predicted from past stock price. Consider the Russell 2000 stock index. The slope in the linear regression of today s price on yesterday s price is βˆ =.994, with a t -statistic of t = 260. So price is highly predictable from past prices.

8 - 5 - Of course, to make money, we have to predict returns. The scatterplot indicates that returns are not too predictable. Linear regression of today s returns on yesterday s yields an estimated slope of βˆ = and t =.07. No evidence of predictability of stock returns based on past returns.

10 - 6 - Another useful statistical tool is correlation. Consider daily US and UK bond yields (n = 960). The Pearson correlation between the yields is.317, which is highly statistically significant, with a p - value less than Could also try regressing UK yield against US yield. The slope is βˆ =.3709, t = This slope is essentially the same as the correlation in this case. The two yields seem to be significantly linked.

11 - 7 - The problem: None of our conclusions above can be trusted, because the t -statistic does not behave in the usual way in these situations. In time series, we cannot assume that the observations are independent! This will often affect the distribution of the t -statistic, and invalidate the usual inferences. Plan for the rest of the talk: Discuss correlation Describe the autoregressive model for time series Explain why above analyses were flawed Discuss cointegration to measure co-movement of two or more series.

13 - 9 - The Autoregressive Model Let {x t } be a time series, i.e., a sequence of random variables. A very useful model for {x t } is the first-order autoregressive (AR(1)) model. The model is x t = ρx t 1 +ε t, 1<ρ<1 where the {ε t } are independent normal with constant mean (say, zero) and constant variance. Autocorrelation describes the correlations between the series and its time-lagged values. We could plot x t versus x t 1 and estimate the slope. The estimated and true slopes represent the sample and population autocorrelation at lag 1. We could do the same thing for any lag. So we get a sample and population autocorrelation sequence, {ρˆ r } and {ρ r }, for r = 0,1,2,... For the AR (1) model, we have ρ r = ρ r.

14 The AR (1) process is mean reverting: The next value is expected to be closer to the mean (zero) than the current value. The conditional mean of x t +1 is ρx t, and ρ <1. The autocorrelation leads to predictability. As long as ρ 0, the process is predictable. The best predictor of x t +1 is ρx t. However, there is a downside to correlation: It typically invalidates the standard methods of statistical inference. In the global temperatures example, the temperatures show autocorrelation (potentially with a trend added). When you adequately account for the autocorrelation, the t -statistic for global warming based on a regression on time becomes t = This is much less than the value t = 22.2 we got earlier assuming no autocorrelation, but still provides moderately strong evidence of global warming. The autocorrelation also affects the variance of the sample mean, thereby invalidating the corresponding t -statistic.

15 In the example on prediction of stock returns based on financial ratios, it turns out that the financial ratios show strong autocorrelation. If we devise an AR(1) model for the ratios, together with a regression model for the stock returns, there will be a correlation between the errors in the two models. The net result of this is that the least-squares coefficients will be biased (they estimate the wrong thing, on average), and the t -statistics will not be valid. When we correctly account for these problems, the t -statistics on the financial ratios become 1.96, 1.31 and 1.25, as compared to the original (incorrect) values of 3.02, 2.40 and So the evidence for predictability of stock returns based on financial ratios is actually quite marginal, and far weaker than it seemed before.

16 The Random Walk In the AR (1) model, as ρ approaches 1, the mean reversion becomes weaker: We get longer excursions from zero. For an AR (1) model, we have Var (ε t ) Var (x t ) =. 1 ρ 2 As ρ approaches 1, Var (x t ) goes to. When ρ becomes exactly equal to 1, we get the Random Walk, x t =x t 1 +ε t. The random walk is not stationary, and has an infinite variance. In a random walk, the expected waiting time to get back to the current value is infinite. (Extremely long excursions!). In a random walk starting from zero, the path is much more likely to spend almost all of its time above zero than it is to spend about 50% of its time above zero.

17 Stock prices follow a random walk, as long as markets are efficient. If the price change were predictable, investors would quickly figure this out, thereby removing the predictability. In an efficient market, the best forecast of the future price is the current price, and the best forecast of the future return is zero. Since the variance of a random walk is infinite, it makes no sense to talk about the correlation between stock prices (assuming that the prices follow a random walk, or simply assuming that prices have an infinite variance).

18 Two independent random walks Estimated Correlation = xt Index

19 It can be shown that if we take two random walks that are completely independent of each other, there is a very high probability of finding a (spuriously) high correlation coefficient between them. (This may explain the bond yield example). This underscores the futility of looking at correlations between two price series. The t -statistic in the regression of one independent random walk on the other goes to as the sample size increases. So even though there is no relationship between the two series, we are guaranteed to declare (wrongly) that there is a relationship if we use naive regression methods and the sample size is large enough. My two simulated independent random walks seem to move together, but it s just an illusion. The Pearson correlation is.53, and the estimated regression coefficient is.74, with a t -statistic of All of this "structure" is spurious!

20 Unit Root Tests The random walk nature of prices also invalidates the t -statistic in the regression of current price on past price. To try to determine whether our price data came from a random walk, we can test whether the true slope is 1. But the t -statistic for this hypothesis does not have an approximately standard normal distribution, even if we really have a random walk. Fortunately, the distribution of this t -statistic has been determined (Dickey and Fuller), and tables are available. The result is a unit root test. In the unit root test, we test the null hypothesis that the series is a random walk against the alternative hypothesis that it is an AR (1) with ρ<1. Note that under the alternative hypothesis, the series is stationary, and therefore mean reverting, while under the null hypothesis is it nonstationary.

21 Cointegration Suppose we have two nonstationary series {x t } and {y t }, both (approximately) random walks. How do we measure their tendency to move together? Correlation is meaningless here. Both series wander all over the place, since they are nonstationary. Instead of looking at how they wander from a particular point (such as zero), let s look at how they wander from each other. Maybe the "spread" {y t x t } is stationary. Then even though both series wander all over the place separately, they are tied to each other in that the spread between them is mean reverting. So we can make bets on the reversion of this spread. More generally, maybe there is a β such that the linear combination {y t βx t } is stationary. If so, then we say that {x t } and {y t } are cointegrated.

22 A simple approach to cointegration is first to do unit root tests on {x t } and {y t } separately. Next, estimate β by an (ordinary) regression of {y t } on {x t }, and finally do a unit root test on the residuals {y t βˆx t }. If the tests indicate that {x t } and {y t } are nonstationary, but {y t βˆx t } is stationary, then we declare that {x t } and {y t } are cointegrated, with cointegrating parameter βˆ.

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate? Emily Polito, Trinity College In the past two decades, there have been many empirical studies both in support of and opposing

Review for Exam 2 Instructions: Please read carefully The exam will have 25 multiple choice questions and 5 work problems You are not responsible for any topics that are not covered in the lecture note

Financial Market Efficiency: The Efficient Market Hypothesis (EMH) Financial Market Efficiency and Its Implications Financial markets are efficient if current asset prices fully reflect all currently available

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical

Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

THE PRICE OF GOLD AND STOCK PRICE INDICES FOR THE UNITED STATES by Graham Smith November 2001 Abstract This paper provides empirical evidence on the relationship between the price of gold and stock price

Applied Time Series Analysis ANALYSIS OF EUROPEAN, AMERICAN AND JAPANESE GOVERNMENT BOND YIELDS Stationarity, cointegration, Granger causality Aleksandra Falkowska and Piotr Lewicki TABLE OF CONTENTS 1.

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in

ARE STOCK PRICES PREDICTABLE? by Peter Tryfos York University For some years now, the question of whether the history of a stock's price is relevant, useful or pro table in forecasting the future price

University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Is the Basis of the Stock

COMP6053 lecture: Time series analysis, autocorrelation jn2@ecs.soton.ac.uk Time series analysis The basic idea of time series analysis is simple: given an observed sequence, how can we build a model that

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

Department of Economics Working Paper Do Stock Market Risk Premium Respond to Consumer Confidence? By Abdur Chowdhury Working Paper 2011 06 College of Business Administration Do Stock Market Risk Premium

Review for Exam Instructions: Please read carefully The exam will have 1 multiple choice questions and 5 work problems. Questions in the multiple choice section will be either concept or calculation questions.

ACTM State Exam-Statistics For the 25 multiple-choice questions, make your answer choice and record it on the answer sheet provided. Once you have completed that section of the test, proceed to the tie-breaker

Ito Excursion Theory Calum G. Turvey Cornell University Problem Overview Times series and dynamics have been the mainstay of agricultural economic and agricultural finance for over 20 years. Much of the

SAMPLE MID-TERM QUESTIONS William L. Silber HOW TO PREPARE FOR THE MID- TERM: 1. Study in a group 2. Review the concept questions in the Before and After book 3. When you review the questions listed below,

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

Chapter 2, part 2 Petter Mostad mostad@chalmers.se Parametrical families of probability distributions How can we solve the problem of learning about the population distribution from the sample? Usual procedure:

Impulse Response Functions Wouter J. Den Haan University of Amsterdam April 28, 2011 General definition IRFs The IRF gives the j th -period response when the system is shocked by a one-standard-deviation

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

Página 1 de 12 Unit Root Testing The theory behind ARMA estimation is based on stationary time series. A series is said to be (weakly or covariance) stationary if the mean and autocovariances of the series

AR(1) TIME SERIES PROCESS Econometrics 7590 Zsuzsanna HORVÁTH and Ryan JOHNSTON Abstract: We define the AR(1) process and its properties and applications. We demonstrate the applicability of our method

Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

XVII. SECURITY PRICING AND SECURITY ANALYSIS IN AN EFFICIENT MARKET Consider the following somewhat simplified description of a typical analyst-investor's actions in making an investment decision. First,

Name: Score: / Homework 11 Part 1 null 1 For which of the following correlations would the data points be clustered most closely around a straight line? A. r = 0.50 B. r = -0.80 C. r = 0.10 D. There is

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

International Review of Economics and Finance 9 (2000) 387 415 Stock market booms and real economic activity: Is this time different? Mathias Binswanger* Institute for Economics and the Environment, University

Outline: Demand Forecasting Given the limited background from the surveys and that Chapter 7 in the book is complex, we will cover less material. The role of forecasting in the chain Characteristics of

CHAPTER 19 TIME SERIES ANALYSIS & FORECASTING Basic Concepts 1. Time Series Analysis BASIC CONCEPTS AND FORMULA The term Time Series means a set of observations concurring any activity against different

There are three kinds of people in the world those who are good at math and those who are not. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1 Positive Views The record of a month

Should we Really Care about Building Business Cycle Coincident Indexes! Alain Hecq University of Maastricht The Netherlands August 2, 2004 Abstract Quite often, the goal of the game when developing new

Time Series 1 April 9, 2013 Time Series Analysis This chapter presents an introduction to the branch of statistics known as time series analysis. Often the data we collect in environmental studies is collected

Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

The Capital Asset Pricing Model: Some Empirical Tests Fischer Black* Deceased Michael C. Jensen Harvard Business School MJensen@hbs.edu and Myron Scholes Stanford University - Graduate School of Business

2009/2010 CAIA Prerequisite Diagnostic Review (PDR) And Answer Key Form A --------------------------------------------------------------------------------- Candidates registered for the program are assumed

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements