Fooled by Randomness Through Selection Bias

There are software programs for traders that allow combining technical indicators with exit conditions for the purpose of designing systems that fulfill desired performance criteria and risk/reward objectives. In general and due to data-mining bias it is very difficult to differentiate the random systems from those that may possess some intelligence in pairing their trades with the market returns.

Suppose that you have such a program and you want to use it to develop a system for SPY. After a number of iterations, manual or automatic, you get a relatively nice equity curve in the in-sample and in the out-of-sample with a total of about 1000 trades (horizontal axis):

Before obtaining the above equity curve, you, by using some optimization feature, or the program in automated mode, tried many combinations of entry and exit methods, usually hundreds or thousands, but in some cases billions or even trillions, and generated a large number of equity curves that were not acceptable. You think that this is a good equity curve but you also suspect the system may be random due to data-mining bias and you are absolutely correct. But before going into these issues in more detail, let us consider the case where a trader thinks that by increasing the number of trades, randomness is minimized. This is a possible result:

In this example the number of trades was increased by two orders of magnitude to about 100,000 and both the in-sample and out-of-sample performance look acceptable. Does this mean that in this case the system has lower probability to be random?

The answer is no. Both of the above equity curves were generated by tossing a coin with a payout equal to +1 for heads and -1 for tails. Actually, the second equity curve was generated after only a few simulations. Both curves are random. You can try the simulation yourself and see how successive random runs can at some point generate nice looking equity curves by luck alone.

The correct logic here is that random processes can generate nice looking equity curves but how can we know if a nice looking equity curve selected from a group of other not so nice looking curves actually represents a random process and the underline algorithm has no intelligence? This inverse question is much more difficult to answer. Here are some criteria one can use to minimize the possibility of a random system due to selection and data-mining biases:

(1) When selection from a group of candidates must be made, is the underline process that generated the equity curve deterministic or does it involve randomness? If randomness is involved and each time the process runs the system with the best equity curve is different, then the probability that it is a random system is very high. The justification for this is that it is impossible for a large number of edges to exist in a market and most of those systems must be flukes.
(2) If you remove the exit logic of the system and you apply a small profit target and stop-loss just outside 1-bar volatility range does the system remain profitable? (This test applies to short-term trading systems). If not, then the probability that the system possesses no intelligence in timing entries is very high. This is because a large class of exits, like for example trailing stops, curve-fit performance to the price series. If market conditions change in the future the system will fail.
(3) Does the generated system involve indicators with parameters optimized to get the final equity curve through the maximization of some objective function(s)? If there are such parameters, then the data-mining bias dimensionality increases to the power of the number of parameters involved making it extremely unlikely that the system possesses intelligence because it may be fitted to the data.
(4) Does the software run only once to produce the final result in the in-sample or multiple runs are involved? Similarly, did the developer test only one hypothesis or multiple adjustments were made to an initial hypothesis? If many runs are involved, the data-mining bias is large. This bias is the result of the dangerous practice of using data from the in-sample multiple times with many combinations of indicators and heuristic rules until an acceptable equity curve is obtained. Some of the systems that generate the good performance curves in the in-sample may also generate good equity performance in the out-of-sample just by chance alone.
(5) This is the most important consideration: If results of an out-of-sample test are used to reset the software to start a fresh run or adjust the system parameters manually, then, in addition to data-mining bias, data-snooping bias is also introduced. In this case cross-validation in an out-of-sample beyond the first run is useless because the out-of-sample has become part of the in-sample. If an additional forward sample is used, then this reduces to the original in-sample design problem with high probability that the good performance in the forward sample is obtained by chance.

The coin toss experiment illustrates how when one uses a process that generates many equity curves, some acceptable and some unacceptable, one may get fooled by randomness. Minimizing data-mining , data snooping and selection bias is a very involved process that for the most part falls outside the capabilities of the average developer who lacks an in-depth understanding of these issues. Actually, the methods for analyzing systems for the presence of bias are in many cases more important that the methods used to generate the systems in the first place and are considered an integral part of a trading edge.