I read conflicting opinions about the inclusion of lagged dependent variables in modeling, and I guess it is partly up to the researcher and depending on the scope and goal of the research.

I'm currently modeling the liquidity of German stocks, with panel data regression (fixed time effects), and my independent variables are price (logged), freefloat number of shares (logged) en book-to-market-value.

Using E-views, my results are OK, except for a Durbin-Watson value around 1.5.

Assuming Durbin Watson is valid for paneldata (but for the seperate stocks, DW is also too low ), we have autocorrelation in the errors.

This is a problem because:.

(i) Estimates of the regression coefficients are inefficient.
(ii) Forecasts based on the regression equations are sub-optimal.
(iii) The usual significance tests on the coefficients are invalid.
[source: Granger]

Including a lagged dependent variable, i.e. liquidity from the day before, solves this issue and as expected increases the R^2 a bit more. But I am not really sure if this is the way to go. This is modeling liquidity where liquidity of the previous day is the most important factor.... Another option would be that I'm missing a independent variable?

Specifically the papers of Achen(To Lag or Not to Lag? Re-evaluating the Use of Lagged
Dependent Variables in Regression Analysis) and Wilkins (Why Lagged Dependent Variables Can Supress the Explanatory Power of Other Independent Variables) talk about these issues.

I would strongly recommend to add such lagged variable. If liquidity today indeed has a lot of predictive power to forecast liquidity tomorrow then you should of course include it in your model. I do not see a reason why not. The rest of the market has any and all past/prior information at its disposal and you are missing an important input by not including it. I get the impression you potentially make your life much harder than it has to be...
–
Matt WolfJan 7 '14 at 15:04

1 Answer
1

If there is autocorrelation than you need to add the lagged dependent variable. By not including it, your regression is suffering from the omitted variable bias. You say that by doing this you will be "modelling liquidity where liquidity of the previous day is the most important factor" but since your regression "demands" adding the LDV (due to the AC) then most likely this period's liquidity is strongly dependent on last period's liquidity.