Fortunately, for obvious reasons, few applications require simulating asset returns over horizons in excess of 30 years.

Nevertheless, simulations over long horizons are sometimes conducted as part of the ALM process at defined-benefit pension funds. The purpose is to understand portfolio risk and compare alternatives by making an assessment of the probability of attaining a severely under-funded status. The simulation of liabilities is based on actuarial models which carry a relatively high degree of confidence for long horizons.

The asset side is problematic. For the sake of argument, assume that investment is in the S&P 500 index and cash. Simple models like geometric Brownian motion result in excessively high dispersion of long-term returns and many absurd scenarios. This is at odds with historical evidence and what we believe is the likely volatility of future economic growth. Such simplistic models also ignore the propensity for central bank intervention through monetary policy which can induce faster drawdown recovery and (possibly) limit the formation of asset bubbles.

For the S&P 500 index over the period 1927 - 2017, a reasonable estimate of the standard deviation of annualized returns is 19% over a 1-year horizon and 2% over a 30-year horizon. Assuming independence of returns over non-overlapping periods as with a GBM model, the standard deviation for the 30-year period would be considerably higher at $19\% \, / \, \sqrt{30} \approx 4\%.$ This variance reduction is commonly cited in economic research papers that argue for long-term mean reversion of equity index returns.

We could try to fix this, for example, by imposing mean reversion in returns through AR- or ARMA-type models. Unfortunately, the desired variance reduction can only be achieved by choosing parameters that result in unrealistic and empirically unjustifiable levels of return autocorrelation -- for example, -0.5 autocorrelation of returns over non-overlapping annual periods.

The better approach, I believe, is a regime-switching model where mean reversion is triggered in bubbles and crashes. The question is what metric would be appropriate to define the regimes and what would be a reasonable model to incorporate the conditional mean reversion?

$\begingroup$Can you explain this: a reasonable estimate of the standard deviation of annualized returns is 19% over a 1-year horizon$\endgroup$
– jd8Aug 5 '17 at 4:06

$\begingroup$Take the S&P 500 price index from 1927 to present. Calculate the volatility of monthly returns times $\sqrt{12}.$ Or look at $90$ one-year rolling returns and calculate the standard deviation. It is generally understood that the S&P has exibeted an annual arithmetic average price return of about 7.5% and volatility of 19% over that period. The actual numbers are not relevant. That the volatitily does not scale like $\sqrt{T}$ is commonly cited in papers going back to the seminal work by Poterba and Summers as evidence of long-term mean reversion.$\endgroup$
– RRLAug 5 '17 at 4:16

One economic model you could look at is the Habit model of Campbell and Cochrane (1999). The basic idea is that as the consumption of the representative investor approaches the (appropriately defined) habit level of consumption the representative investors risk aversion spikes: this means discount rates increase dramatically and we see a big drop in stock prices. Discount rates recover after this shock and so do the stock prices.

If you want a purely statistical model you can see how Lu Zhang (2005,JF) writes down a statistical process for the stochastic discount factor that gives us this kind of a stock return process.

Here $\gamma_0>0$, and $\gamma_1<0$, (usually 50,-1000). Think of x like consumption, M is the discount factor process (i.e. how much do we discount future cash flows on equity), and notice the asymmetric response of risk aversion to changes in consumption above and below the mean.

$\begingroup$Campbell Cochrane (1999) is a great paper, but I'd have some serious concerns deploying a consumption based, macro-finance economic model to produce long-term forecasts for real world pension fund decisions. In some sense, the purpose of the Campbell and Cochrane (1999) paper is to illustrate an economics result that habit preferences over consumption leads to forecastable time-series variation in returns. The problem? What's the true stochastic discount factor is a hugely controversial, unsolved question in macro-finance and consumption based asset pricing models tend to perform poorly.$\endgroup$
– Matthew GunnAug 9 '17 at 18:22

$\begingroup$Of course, this is just to give an idea of what one could use. I would be extremely skeptical of any model built using only returns, and an honest forecaster, I think, would include model uncertainty along with the parameter uncertainty giving a standard error so large that the forecast may not even add to intuition.$\endgroup$
– jd8Aug 9 '17 at 23:08

Although I agree with jd8 answer, practical implementation issues may arise. Here I suggest a parsimonious engineering solution relying on economic intuition of Habit model of Campbell and Cochrane (1999).

2 – Use “drawdown” as an observable variable for measuring risk aversion.

3 – Create a methodology for mapping drawdown to mean and standard deviation.
For the simulation shown below I have used a rule based technique calibrated based on historical market behavior. Simply speaking, mean and standard deviation are proportionate to drawdown.

I just wrote two papers on a related topic. Let us not use the log method right now as it was originally intended as an approximation from the time we used punch cards. You can, but we will come back to why you may not want to.

If we begin with a simple AR(1) model $$x_{t+1}=\beta{x}_t+\varepsilon_{t+1},$$ then we know that we are buying assets with the subjective intent of making money. Whether we do or not is a different question. Because of this, it is irrational for $\beta\le{1}$, asymptotically. If there were a local event where that existed, it would not be an issue unless more than half of all trades were by people who purposefully wanted to take a loss. By theorem, there does not exist a non-Bayesian solution for this problem.

You can verify this at

White, J.S. (1958) The Limiting Distribution of the Serial Correlation Coefficient in
the Explosive Case. The Annals of Mathematical Statistics, 29, 1188-1197.

I show that a Bayesian solution does exist. In fact, depending on assumptions, there may be different solutions under differing cases. In general, however, this will hold. There is a different issue that the S&P 500 is constantly changing membership, and so $\beta$ is also constantly changing with each rebalance. It might be a small change at each step, but it is changing composition, and so this may be a poor model. Technically, this is not stationary since by definition $\beta$ updates quarterly.

The Bayesian solution is to solve using a likelihood of $$f(\mathbf{X}|\beta;\alpha;\sigma)=\frac{1}{\pi}\frac{\sigma}{\sigma^2+(x_{t+1}-\beta{x}_t-\alpha)^2},$$ where $\mathbf{X}$ is the matrix of data.

If $\pi(\alpha;\beta;\sigma)$ is your prior and $\pi'(\alpha;\beta;\sigma|\mathbf{x})$ is your posterior, then you solve for the posterior as $$\pi'(\alpha;\beta;\sigma|\mathbf{x})=\frac{\prod_{t=0}^{T-1}f(\mathbf{X}|\beta;\alpha;\sigma)\pi(\alpha;\beta;\sigma)}{\int\int\int\prod_{t=0}^{T-1}f(\mathbf{X}|\beta;\alpha;\sigma)\pi(\alpha;\beta;\sigma)\mathrm{d}\sigma\mathrm{d}\alpha\mathrm{d}\beta}$$

For predictive work, there is a predictive density available. Do note that the prior mass on $\beta<1$ is zero and that the prior mass on $\sigma\le{0}$ is zero. This will improve the quality of your estimate. There are time frames where the MLE is less than 1. As this is impossible, asymptotically, the spurious result of an unusual sample is avoided by the regularization of the prior.

If we denote the predictive distribution for future values of $\tilde{x}_\tau$ as $\pi''(\tilde{x}_\tau|\mathbf{X})$, then we can solve for a prediction using $$\pi''(\tilde{x}_\tau|\mathbf{X})=\int\int\int{f(\tilde{x}_\tau|\beta;\alpha;\sigma)}\pi'(\alpha;\beta;\sigma|\mathbf{X})\mathrm{d}\sigma\mathrm{d}\alpha\mathrm{d}\beta$$

If you take the logarithm, the mean of the logs will overstate return by 2% per year on the disaggregated returns for all annual trades in the CRSP universe from 1925-2013 and understate risk by 4%.

If you perform spectral analysis on the S&P the period is about 40-41 years. This implies that one swing of the proverbial pendulum takes 40-41 years of data and that any other amount will generate biased return and scale estimators. The full spectrum of returns in the density covers this period.

Logarithms overstate return and understate risk because the distribution of log returns, ignoring bankruptcy and mergers, follows a hyperbolic secant distribution and the mean of the logs works out to be the median of the data. The center of location, $\mu$, however, is at the mode. The limitation on liability truncates the distribution and shifts the median 2% away from the mode.

I do not know what it does to S&P data; I only know what it does to disaggregated data. I did perform a test to determine whether standard models or this model was better and the Bayes factors exclude the standard solution.

There isn't a proper reversion to the mode concept here as capital is a source and not a sink. Nonetheless, Slutzky's paper

Slutzky, Eugen (1937) The Summation of Random Causes as the Source of Cyclic Processes. Econometrica,5(2), 105-146

would imply a swing around the mode. This would look like mean-reversion in log space.

Interestingly, there does not exist an admissible non-Bayesian estimator for financial returns, in the general case. There are some special cases where it would exist.

There are two serious caveats for this method using the S&P 500. The first was mentioned above, changes in composition imply that you do not know what you are really measuring. To imply simple substitution would result in no change in the slope would imply that all stocks provide the same return. The second is that even if there are no issues with composition, we do not know whether returns or the scale parameter is scale invariant.

Two minor notes, whereas the relationship of the standard deviation with respect to time for the normal distribution is $\sqrt{t}\sigma$, it is ${t}\sigma$ with this distribution; and, there does exist a case where such an AR(1) process can behave close to a normal and that is when endpoint prices are far from equilibrium. In that case, you get a bizarre distribution that has no defined mean, but is the convex combination of two densities, one with a mean and one without. As prices move far from equilibrium, the percentage composition of the one with a mean gets close to unity.

$\sigma$ is a scale parameter and not a standard deviation. No mean exists for this density. As a result, the sample variance is undefined and appears as a random number. It should look like it is heteroskedastic with volatility clusters when there are runs. It is askedastic. While returns cannot be thought of in terms of variances, prices can and the $\sigma$ in return is a measure of the heteroskedasticity in prices.

Finally, there is no squared error for this type of problem. The appropriate cost function is the linear absolute loss function.