Discussions about the testing and simulation of mechanical trading systems using historical data and other methods. Trading Blox Customers should post Trading Blox specific questions in the Customer Support forum.

A friend loaned me the book Black Swan by Taleb. It reinforces that the risk in markets is much greater than a normal distribution would suggest. I did some research to verify this myself. Using IBM daily close prices from 1962 - 2012 from Yahoo! Finance, I first plotted percentage price changes in a histogram (see attached). Looks pretty similar to a normal distribution, i.e., to a bell curve, although the upward bias of the stock market is evident in the much greater instances of 1 - 2% moves as opposed to -1 - -2% moves. Curiously, the average of the price changes is slightly negative, which indicates there must be a large number of -1% - 0% price changes. I then calculated the standard deviation (sigma) of the price changes, which equals almost 2%. In a normal distribution 68.2% of observations occur within 1 sigma of the mean, so if IBM price changes are normally distributed, then 68.2% of price changes should lie between -2% and 2%. In fact, 83.7% of IBM's price changes are within 1 sigma of the mean. This indicates that the distribution of IBM's price changes exhibits positive kurtosis, which appears as a taller, narrower hump than the hump of the normal distribution. It also seems to indicate that IBM is in fact LESS risky than a hypothetical investment that exhibits a normal distribution. However, in looking at much larger moves, you find that they are much more probable in IBM's price distribution than in a normal one. For example, the probability of a greater than 5-sigma move on any given day for a normally-distributed hypothetical investment is 1 in 1.74 million. The historical probability of a similar move, that is, up or down more than 10% in one day, by IBM? About 1 in 500. For a greater than 6-sigma move? Normal: 1 in 506 million. IBM: 1 in 900. Historically, IBM was 562,000 times more likely to exhibit a greater than 6-sigma move on any given day than a normal distribution would suggest. This greater frequency of large moves in IBM's price distribution is likened as "fat tails" to the left and right of the hump. For example, you see that IBM closed up greater than 15% 9 times throughout the data period, which is a far, FAR greater frequency than expected if IBM prices were normally distributed. You could say that these bigger moves pose more risk for the trader and that the fat tails in IBM's price distribution make IBM much more risky than a hypothetical, normally-distributed investment.

So how does this impact system design? For one, any statistical measurements that assume that price changes are normally distributed and that assign a probability to a particular degree of move grossly underestimate the actual risk if that move is more than 3-sigma. More broadly, the lesson here is that before utilizing tools that assume a normal distribution, make sure what you are measuring is ACTUALLY normally distributed, otherwise you have to take your probability estimates with a grain of salt.

Still learning. Need to learn more about what tools you can use to measure probability and determine statistical significance for non-normally distributed data series.

Your studies will quickly lead you to the mighty central limit theorem , which gives the surprising result that the sum of N independent random variables tends towards the normal (aka Gaussian) distribution (as N gets larger and larger), regardless of the distributions of the independent random variables themselves. They could even have the (empirical) distribution you observed for IBM stock price changes - as long as they are independent.

Here's a nice introduction from the University of California at Santa Cruz: pdf file See just above equation (19) where they say

This amazing result is the reason why the normal (Gaussian) distribution plays such an important role in statistics.

Your comments raise a few questions for me. What do you mean by the "sum" of independent random variables? Is the central limit theorem saying that any independent random variable will tend toward a normal distribution as N grows larger?

Thanks for the PDF, but unfortunately I understood very little of the math. I need to gets me some familiarity in math speak.

This amazing result is the reason why the normal (Gaussian) distribution plays such an important role in statistics.

It can be useful to be fully aware that not every base distribution converges to a normal distribution. It converges if the base distribution has finite variance, and even some rare distributions with infinite variance can converge. Some of these base distributions that do not converge to a normal distribution seem to occur in finance.

Even when a base distribution does converge, the center converges first and the tails only later. The tails can converge quite slowly - too slowly to be of practical significance. The width of the area that looks like a normal distribution generally grows only with the square root of N. Nassim Taleb has written copiously on this point.

So, just because the means and sums of independent random variables converge to a normal distribution, this doesn't vindicate the use of statistics that rely on a normal distribution for a raw distribution that is non-normal, right? If I was trading the average or sum of prices then maybe...

I'll add that you can also look into parameter-free or empirical based modelling so as to reduce Gaussian assumptions on your model building and analysis.

Regarding the CLT; it is fantastic, yes. But keep in mind the assumptions behind it, particularly the requirement of N independent distributions which will converge to a normal distribution. In reality, they are not always so independent.

Those assumptions can and will fail at the worst time(s) in financial markets.
That being said, there is certainly a HUGE amount of wisdom gained by understanding the CLT.

I believe he's talking about something that we have all observed, that the distribution of finacial markets, particularly in times of turmoil (or great euphoria), appear to be less independent than theory suggests they should be.

sunyata wrote:Could you expound on that a little squaredQ? What knowledge do you think there is to gain in understanding the CLT as it relates to trading?

Aside from the limitations mentioned above, it is also a very useful conclusion that allows you to model and evaluate traditional statistical properties of estimators (like SE(mean(rtns)), SE(variance(rtns)), confidence intervals,etc..) that require normality in the distributions. By only drawing conclusions from one non-normal sample, as you did (in your excellent work), there are limits to what can be said (statistically) about your resulting conclusions.

But there are other very useful conclusions about the property (applied towards multiple assets/trading systems etc) as well, that I won't go into here. Keep in mind, there are many different versions of the CLT as well.