I seem to be having trouble in comprehending what it means for a time series to be covariance stationary. Specifically, with the third condition that for any $t,h$ the $cov(x_t,x_{t+h})$ only depends on $h$ and not $t$.

Would anyone have any examples of how a time series might be covariance stationary or any examples of non-covariance stationary time series?

$\begingroup$Would you be able to maybe give an example or two of how this might come up in applications, rather than more of a math explanation?$\endgroup$
– M. Damon Sep 19 '18 at 16:31

1

$\begingroup$Hi: Any stationary ARIMA model is covariance stationary. Take an ARIMA(2,0,0), calculate the covariance and you'll see that it's only a function of h. Same thing for any of them, as long as they are stationary..$\endgroup$
– mark leedsSep 19 '18 at 17:56

$\begingroup$@markleeds Yes, that's why I chose that as a simple example. If you want to provide an explanation for a general case, write it up as an answer, I'll vote for it!$\endgroup$
– GiskardSep 19 '18 at 19:01

$\begingroup$@denesp: My apologies. I didn't read your answer carefully. I imagine that there must be a general proof but I'm not able to come up with one, atleast in some reasonable amount of time. ( and maybe not even in an unreasonable amount of time ). I'll google and and see if anything is out there.$\endgroup$
– mark leedsSep 20 '18 at 0:12

$\begingroup$@denesp: I think 4.5, 4.6 and 4.7 of link below is sort of a proof because, since any stationary arima model can be written in form of a wold decomposition and wold says that any covariance stationary process can be written that way, then, any stationary arima model is covariance stationary. ( but check me on that. I may be hand waving. statement is true for sure ). sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/…$\endgroup$
– mark leedsSep 20 '18 at 0:27

@ M Damon: If you're just starting with time series ( actually, even if you're not ) , think it's best to forget about Wold Decomp and ARIMA models and all that jibber jabber. I re-read your question and I think the best way to think of covariance stationary is the following.

You're sitting in the audience and there's a stage that you look at. The stage is such that the beginning of the curtain to the end of the curtain is of length h. You also have the ability to see the h numbers
( picture each number taking up a unit length of space ) that come out ( some stochastic process causes them to arrive on stage ) in the h second interval
so that they perfectly fit on stage from the left side of the stage all the
way to the right.

So, at time $t = 1, X_1$ arrives on the stage. Then, the $X_2$ arrives on stage at $t = 2$, to the right of $X_1$ ( from your vantage point ). So, after $h$ seconds, you have $X_1, \ldots X_h$ on stage and ordered from left to right.

You are sitting there at time $t=h$ ( picture time stopping for a moment ) and since the observations are from some stochastic process, there is a covariance structure between those h elements on stage. Whatever it is, doesn't matter. Maybe it's $\theta^i$ where $i$ is the lag between two $X_t$ say $X_t$ and $X_{t-i}$.

Next, time starts again and you close your eyes for another $h$ seconds so that a new set of $h$ elements arrive on stage and the old ones leave. Now you're at time $2h$ and you open your eyes. So, it's time $t = 2h$, but the covariance structure between the new $h$ elements on stage does not change from what it was earlier. So, the specific elements themselves don't matter when it comes to the covariance of any two elements. All one needs to know is the distance between any two elements in order to know what the covariance ( or correlation ) of them is. 2h plays no role. All that matters for knowing the covariance of any two elements on stage is their distance. This is the practical meaning of the term "covariance stationary".

Note that stage could have been bigger or smaller. $h$ was just random number picked.

This assumption is made and is useful because ( informally speaking ) it kind of allows one to estimate model parameters in an unbiased way with only one set of time series observations. If this assumption was not made, then it would be kind of like having a sample size of $n=1$ because the stochastic realization is one element. I hope that helps.