I am doing some time series analysis on some 5 minute bar stock data. If I want to specifically focus on data during trading hours and ignore any pre and post market activity, how would I effectively conduct time series on a disjoint set of data?

For instance if I look at trading data for SPY, how do I deal with the overnight gap and jump from 4 pm to 9:30 in the time series data?

3 Answers
3

It depends on how you're actually interacting with your data in practice.

If you have no overnight holding positions and forcefully liquidate at the end of bar 10 minutes before the market close, then you truncate the last three returns since you can't actually act on them.

If you have overnight holding positions, and the market has continuous trading in non-US business hours, then this is merely a challenge in data acquisition because those samples actually exist. You may want to address the seasonal regime shift in the data in this case.

If you have overnight holding positions, and the market halts trading for some nontrivial amount of time $\gg$ 5 min between two sessions, but you aren't actually trying to predict the jumps, you can just leave out those data points.

If you have overnight holding positions, and the market halts trading as above, but you are actually trying to predict this gap, it is reasonable to model this separately.

It is best to think of your data as two dimensional: the rows can be labeled with successive dates (20170102, 20170103, ...), and the columns are labeled 930,935,940,..., 1600. This structure lets you analyze the data any way you want.

A Bayesian method would allow you to partition the data so that trade time could be permitted to impact the likelihood function. If you held $\beta$ constant, but for the time elapsed, you could use $\beta^{\Delta{t}/5}$ in the likelihood function. Alternatively, you could treat the final trades as unique due to the change in liquidity and partition the Bayesian likelihood function into two mutually exclusive sets. You should probably do both and create two parameters, one for model one and one for model two. Then solve $\Pr(\text{Model one}|\text{data})$ versus $\Pr(\text{Model two}|\text{data})$.