Exponential smoothing models

Fortran must die

I am generating predictions of future data using common exponential smoothing models. I am predicting a hold out data set of the last 12 months with the previous 37 months and choosing the best ESM model to predict with this way (as is common). The ones I am choosing between are linear, damptrend, simple, and various seasonal models. The ones that work best, by a good margin are seasonal models. One of my coworkers argues that since we have no reason to believe our data is seasonal we should never use seasonal models even if they predict the hold out data set better.

I wonder what others thought about this. We are trying to predict future levels of a those who enter the waiting list, there is no obvious this should be seasonal in our case, but seasonal ESM models clearly predict better.

Fortran must die

Thanks for your comments ondansetron. He believes that unless there is a theoretical reason for something to be true you should not model it. My guess is that someone made this point to him in the context of methods like regression (where many argue a variable should only be in a model if it makes theoretical sense) and he applies this to univariate time series.

While authors disagree I think this concern is generally less in univariate time series, be it ESM or ARIMA, which simply models what has occurred and assumes that it will continue (because it assumes there is a unknown generation process that generates the results and these will not change). He is not the easiest person to discuss things with, it ends up being confrontational and I am not good at confrontation.

Thanks for your comments. I would argue the same thing in honesty.

One thing I forgot to ask. If you are modeling monthly data (including seasonality) is it an issue if you don't have complete sets of 12? I have 49 months (not exactly 48). I have never seen suggestions if this matters to model time series, just that you would want at least 48 points.

TS Contributor

Thanks for your comments ondansetron. He believes that unless there is a theoretical reason for something to be true you should not model it. My guess is that someone made this point to him in the context of methods like regression (where many argue a variable should only be in a model if it makes theoretical sense) and he applies this to univariate time series.

I agree with him on those grounds, absolutely, but that depends on the context. I think prediction is different from inferential modeling. In your case, you want practical results that equate to good predictions, right? If the toenail length of the firm's CEO allowed you to accurately predict sales, no one in their right mind would argue this is causal or even a real relationship, but if it plays out well in real life, you will probably use it based on its performance.

While authors disagree I think this concern is generally less in univariate time series, be it ESM or ARIMA, which simply models what has occurred and assumes that it will continue (because it assumes there is a unknown generation process that generates the results and these will not change). He is not the easiest person to discuss things with, it ends up being confrontational and I am not good at confrontation.

I think you can approach it from the aspect that you both agree the common goal is a high performing model in the sense of accurate predictions. Seasonality may be a proxy in some way or capturing some other underlying process, but if it leads to good out of sample predictions, why change unless discarding seasonality results in better predictions. If you want to make inferences about the relationships with the DV, then I agree to remove something without substantial justification beyond a significant p-value.

One thing I forgot to ask. If you are modeling monthly data (including seasonality) is it an issue if you don't have complete sets of 12? I have 49 months (not exactly 48). I have never seen suggestions if this matters to model time series, just that you would want at least 48 points.

I don't think it matters to be honest. Whatever the seasonality is (4 periods, 12, etc.) each observation will have an indicator for which "season" it is, if I remember correctly. If I seem off point, please give me an example of what your model might look like.

Fortran must die

Here is a more general question. Some authors in forecasting argue that you should average all the ESM models. Some say that you should use the best, with best being determined through MAPE against a holdout data set (the last 12 months are predicted, you pick the mode that best predicted them). You then use this and the total data to predict the future.

My own sense if that averages of all models work best with us because I think there is considerable variation from year to year (driven by policy change which should not occur this year as far as I know). But I was wondering if other had experienced comments on this in the literature. Is there any consensus on this issue?