Despite the fact WFO should bring verification to the development process it rather yields more uncertainty. Simply changing WFO insample and outsample periods or fitness functions the outsample changes wildly, sometimes I feel it is simply random.

I tried Pessimistic Return on Capital from Pardo, but it does not help in this regard at all.

I even tried to rank and select the best N based on fitnessA, then select N based on fitnessB, etc. then intersect them to find the ones which is good enough by several fitness, but this did not result more consistency than I hoped.

I tried to filter out the best N trades to avoid overfitting for some good outlyer, but did not help either.

Now I’ll try to examine the chosen parameter surroundings to avoid selecting a spiky result, we will see.

Did you come up with something usable for WFO? Currently it ruins most of my algs, even the live profitable ones. Maybe this is just a fact of WFO, I don’t know. What do you think?

Thanks, Z.S (name abbreviated for privacy)

First, thank you for contacting me Z.S. I have a bit of a confession to make, as well. One of my most profitable and successful systems was backtested over a long history and did well. It provided mixed to failed results on WFA. However, it continued to perform in live for years and years with results in-line with the historical results.

One problem with WFA is the “selection” of the parameters which is not at all like most developers would select the parameters. If you think about it, most developers wouldn’t just pick some random best performing value but would look for an area of stability. They would prefer to pick the more stable value.

There are some solutions you can use to “assist” your WFA. By doing so, you will be trading off some the validation benefits of WFA in-exchange for greater intelligence. Let’s go back to the idea that markets are efficient. This means all systems are “fit” to the market to some degree or they couldn’t work. There are two ideas on this and they conflict somewhat but I can explain where they originate. See my recent post on “Market Cognition” for more.

I suggest the following:

Threshold the WFA allowed range for your parameters more tightly. For example, if you know the optimal is say 30 for a value then you might choose a variance of 25% above and below that as allowed values.

Analyze the rolling values that come up during the WFA. Lock or threshold only into stable regimes. For example, let’s say your system tends to pick either a low value for a param such as 10 or a high value such as 90. However, somewhat randomly you’ll get a middle value that outperforms. In this case, you are looking at a bi-modal distribution most likely. And, you presumably understand the logic for this– just as example maybe your system is adaptive to momentum or reversion environments. You could lock the WFA to only allow picking a low or high value. There are a few ways to do this. One way would be to have an “off mode” in your system. If the value is an average value, say from 25 to 75, then you turn off trading. You could also try a custom WFA script to only test the low and high values.

You could also count the number of times an optimal value is chosen. You threshold only on values that are chosen more then say 55% of the time. I think this would also provide some sanity when running the system live because you know that historically that whatever value it picks was a historically stable one that has been tested.

Develop a more intelligent selection algorithm that the WFA analysis uses. For example, instead of picking top most performing value. You would pick the value where the average of all the values above and below 15% are profitable. This would help with outliers.

Bin your data into specific market regimes. Try testing only over the specific regimes. Try to add switching code to detect new regimes. The idea is that your WFA might be able to adjust within a regime but you might need to reset things when the old regime gives way to new regime.

Try to employ better market cognition in your systems. You have to think some types of things are more likely to break. For example, there are near infinite numbers of moving averages. The idea that a moving average event will be significant is therefore probably lower then a system based on other sorts of things. It is not impossible per say but even if you find such a system that works then it probably has a lower “market cognition” score. For example, you need to have a thesis for why such a system might work in the first place: trader psychology is the most probable explanation. So, you’d test the most dominant moving averages such as 5, 10, 50, 200. If these don’t show good results for moving average events then there is no reason to try to optimize or perform WFA. If they do show good results then you’d probably not want to optimize this parameter. You might instead optimize your profit target or exit method.

About the Author

Curtis is passionate about markets. He has developed top ranked futures strategies. His core focus is (1) applying machine learning and developing systematic strategies, and (2) solving the toughest problems of discretionary trading by applying quantitative tools, machine learning, and performance discipline. You can contact him at curtis@beyondbacktesting.com.

All content (C) BeyondBacktesting. All rights reserved. Futures trading is risky. All content represents authors personal opinion only and author is not a financial adviser. Please read the DISCLAIMER for important risk information.

Session expired

Please log in again. The login page will open in a new tab. After logging in you can close it and return to this page.