Discussions about the testing and simulation of mechanical trading systems using historical data and other methods. Trading Blox Customers should post Trading Blox specific questions in the Customer Support forum.

I am trying to come up with a hypothetical "complete" objective/bliss function for evaluating a trading system.
One characteristic I would like to measure, and include in the function is the robustness of the system. However how would one go about measuring it?...

Well, trying to define robustness first, I can see three areas where a system needs to be robust on:
- markets traded
- time periods
- system parameters

meaning that a robust system should not get impacted too much by variations in the 3 criterion above.

- Time period robustness should theoritically be easier to measure for one system by measuring some sort of variance applied to the equity curve (a straight line would be robust whereas a system making a big zig-zag-zig would not be so robust) - however I am not sure it is relevant for LTTF and taking that approach would imply LTCM was very robust (up to its dramatic "zag") - similarly to Taleb's turkey just before Thanksgiving.

- System parameters: this would require more analysis than the current system, such as measuring the performance of similar systems by varying the parameters involved, ie if measuring robustness of a Donchian Channel Breakout system with Channel length at 20 days and ATR-based stop at 30 days with a multiplier of 2, a robust system would exhibit very similar performance with slightly different parameters: the robustness could be quantified by measuring the overall difference in system's performance across a set of system parameters ranging around 10% of the initial system parameters, eg Donchian Channel Breakout system with Channel length at 22 days and ATR-based stop at 33 days with a multiplier of 1.8.

- Markets traded: this one seems more complicated as the market selection might actually be part of the system design. Maybe, have a universe of symbols for testing and for each one of them, find one or two correlated proxies to build the alternative universe: measure the performance of the system on both universes and compare the results.
I assume that you can find a proxy for each instrument that you intent to trade/include in your system though...
The other option would be to split your universe of instruments into several sub-universes and measure the performance of the system on each sub-universe.

I would be interested in hearing of any other way that you might have tried to measure robustness for your system. What do you think of the above?

The next problem to think about is how to go about _building_ a robust system - as opposed to just measuring it...

Maybe it might be a good idea to avoid the word "robustness" entirely. It seems to mean different things to different people. No one calculation can satisfy them all. Perhaps "Jez-Righteousness-Measurement", although a mouthful, is less likely to be controversial among people with different opinions (strongly held opinions!) of what robustness means.

Some people consider "robust" to be the exact opposite of "curve fitted". To them, the hallmark of a robust system is that it has extremely few adjustable knobs and dials and parameters. And what few there are, get set to values chosen by Common Sense Reasoning (yay!) rather than by Optimization (boo!).

Other people consider "robust" to mean "insensitive to small perturbations". Perturbing the parameter settings of the system by (let's say) 10%, shouldn't change the backtest results very much. Perturbing the price data history fed to the system, by adding or subtracting small random numbers (let's say about 0.1 ATRs) to the Open, High, Low, Close prices, shouldn't change the backtest results very much. Perturbing the portfolio of instruments traded by replacing (let's say) 15% of the instruments with other instruments not in the portfolio, shouldn't change the backtest results very much. Perturbing the start date and/or end date of the backtest, shouldn't change the results very much.

Yet other people consider "robust" to mean "The system will perform about the same in live trading on future prices, that it performed in backtesting on historical prices." Naturally this is tough to measure, because it's hard to get reliable data on next month's, next year's, and the next 10 year's prices.

So it seems prudent to sidestep the controversy. Go ahead and measure whatever you want, whichever way you feel like measuring it. Just don't call it "robustness" and thereby avoid dissent.

Except the fact that "Jez-Righteousness" makes me sound like a vicar or the Pope I get your point Sluggo...

Thanks for the reply. it has actually pushed me back to think of what I really mean and want behind this word "robustness".
My main motive for a robustness/"Jez-righteousness" index is to quantify how likely a back-tested system will perform on future price data - but as you say this (future) data is pretty hard to come by...

The assumption I was making here was that if a system was robust on several counts (ie the various definitions you point out: few adjustable knobs and dials and parameters, insensitive to small perturbations, etc.), it would likely be robust in terms of backtested vs future performance as well.

Maybe I should start by trying to measure and quantify the various robustnesses I want to look at for a given system, and then try to back-test its robustness on future prices (ie with some in-sample/out-of sample data) to see if there is some correlation in the various "areas of robustness" in a system...

Sluggo beat me to the point. All of your measurements are going to be subjective. There ain't no such thing as an "objective" evaluation metric. Our biases color our choices. Sharpe, Sortino, IR, compounding, all have embedded assumptions about our preferences. Good luck. Find what YOU want, young man.

Let's assume for argument you have completed building a metric, which you're happy with, that incorporates everything EXCEPT "robustness." Your first task will be how to incorporate "robustness" into your total metric.

So let's say you have two systems which both score equally on your other metric. Then take the most robust one. That's easy. But what if your two systems are slightly unequal on your other metric, and the slightly less preferable guy is slightly more robust? You have a weighting problem. If you've ever looked behind the scenes at the indexing arguments for stocks, you know that weighting problems are fundamentally important and terribly expressive about our biases and preconceptions. To top it all off, our biases and preconceptions will change over time ...

I dunno about testing robustness statistically. Perhaps a standard deviation of performance on your other metrics when moving across: markets, time periods, parameters? Probably not raw performance but a ratio of improvement on the performance? For example, if it were a system trading S&P 500 stocks, I wouldn't want to use the raw return across different years, I would want to examine the relative outpeformance of the system across different years and look for its consistency. I would also like to look for the outliers and remove them to see if the system parameters still look favorable. For example, trendfollowers are fond of the long-option-like positive skew of this system type, but when the return profile is based on rare events, such as the once- or twice-a-decade-trade, what can we really, STATISTICALLY, say about these returns? Anyone in the biz knows that the rarer the event, the harder it is to pin down its odds ... and "outlier removed" performance may be a measure of robustness.

So what if you have a bunch of non-robust systems? If you have systems that aren't robust, but their cycles of relative outperformance are uncorrelated or negatively correlated, you could trade the equity curves of your sytems using moving averages and have a system of systems (which is, in fact, a system).

JezLiberty wrote:My main motive for a robustness/"Jez-righteousness" index is to quantify how likely a back-tested system will perform on future price data - but as you say this (future) data is pretty hard to come by...

The assumption I was making here was that if a system was robust on several counts (ie the various definitions you point out: few adjustable knobs and dials and parameters, insensitive to small perturbations, etc.), it would likely be robust in terms of backtested vs future performance as well.

If you use few enough if/then statements in your testing, show enough consistency between the creation data and the testing data, show little change as your variables have small changes, and you have shown the discipline to follow rules, then i think you have a usable model .
________________________________________

AFJG response:
I am happy to think that such a durable model may be possible. It is certainly what I have tried to design for myself and, using futures market data going back to 1970, I have made sure that my models are profitable in every time period during that time frame. In 2050 somebody trading such a system will be able to look back and verify whether or not my design (and my prediction of continued profitability) has been a success. I hope my prediction is correct and that my models remain profitable. Who can tell at this point in time? In any event, I shall continue research and development and ensure that if and when changes to the models become necessary, I make them.

Quote from ssb11:

my idea would be that while we have seen but a sliver of possible market outcomes, i would hypothesize that the outcomes that we have seen give us some idea of what the majority of future market outcomes might look like. in developing a model i am not only interested in how profitable it was in the past, but how likely that profitability will continue in the future. does the market need to do very specific things or very broad things for me to make money in the future? there are also certain facts (like the effects of costs on trading) that are not dependent on empirical evidence (that i would call an assertion or theory). by aligning the most facts, using the fewest variables, perhaps i can maximize my chances of future profitability. i would argue that any model that "stopped working" never really "worked" in the first place (arb situations aside. they can certainly become unavailable due to competition). by observing all the available data (beginning in roughly 1972) and using the same parameters (under 5) on 30 different markets across every possible "sector", one can observe over 1000 market years. this is very different than creating a model in the early 80's based on 10 futures markets or developing an equity model based on the NASDAQ from 1985-1999.
________________________________________

AFJG Response:
I would not seek to argue with your basic premise â€“ as I stated above, I hope you are right and that an analysis of sufficient past data may give us a sporting chance of designing a simple system with few parameters which lasts well into the future. But markets of course go back far beyond 1972 and JW Henry stated at one time that they analysed price movements on commodities going back to the 19th Century and beyond. Admittedly beyond a certain date futures markets did not exist and daily prices, let alone daily OHLC, are not obtainable.

But my point is that there have been plenty of people analysing plenty of data well before we started in business â€“ even if they did not have the computing power readily available today. Perhaps the likes of JW Henry, Dunn and Eckhart messed up their analysis; I would not know. What I do know is that some of the systems they designed seem to have perished (at least in their original form) â€“ notably the Original Turtle Rules. JWH and Dunn looked as if they were going out of business recently and were forced to make alterations to their models â€“ despite Dunnâ€™s reputed contention that he is trading the same way as when he started out in business. You may have a point â€“ perhaps (as per your definition) their systems never worked in the first place. But I for one have difficulty in believing that a system can be designed to be eternally profitable. I very much hope I am wrong.

A statistically-based version of (my interpretation of) AFJG's response might be something like:

We have a sample return from a test. How representative of the universe is this sample? Is there even a consistent "universe" or is the "universe" itself dynamic?

While our other methods of portfolio analysis, such as performance metrics and tests of their robustness across different subsamples of our data, and walk-forward analysis, etc., all answer questions for us, there is another question: what do we really know about markets and how they change over time?

Once you develop a ROBUST mechanical trading system, then you should trade that system forever, unmodified, never making a single change.

It sounds like you agree ... ?

I don't. I think it far wiser to continuously add new systems to the Suite AND to delete old systems from the Suite. In the Robusti family situation, only allocate money to the N youngest brothers with the N newest systems. The older brothers are allowed (forced) to retire. This sets up a FIFO queue: first in, first out.

An obvious modification is to revise the acceptance and rejection criteria, so that you make more sophisticated decisions than mere "ranking by lack of seniority". But whether you use FIFO or something deeper, I feel it is crucially important to be willing (eager?) to trade a system for a while and then throw it away. You throw it away in the twin hopes that (a) you throw it away before its performance falls flaccid; and (b) you replace it with something as good, or better.

By "throw a system away," I don't mean re-optimize its parameters; I mean discard that system completely. Into a vat of boiling nitric acid. Fuggeddaboudit.

This methodology requires you to research new systems and new Suites, forever. Research never stops. Similar to Kaizen. Fortunately, research has one of the highest (Benefit / Cost) ratios of just about anything a systematic trader might do.

Once you develop a ROBUST mechanical trading system, then you should trade that system forever, unmodified, never making a single change.

It sounds like you agree ... ?

On the contrary I agree with you and was arguing thus on that thread in elitetrader.com. Although I do believe one can re-optimize parameters and introduce modifications to an existing system rather than simply chuck it out. I started the thread to argue against a vendor (who shall remain nameless) who made the following (in my view erroneous) comment on his website:

â€œOccasionally, someone trying to promote something or start a debate will argue that trend following rules must always change due to changing market conditions. This is nonsense. It is a specious argument.â€

I do know for certain, that we don't know for certain if any system can be robust forever. I believe that some things about the market will be constant, simply because it's a market of human individuals and human nature is constant (at least over our lifetimes). I also know for a fact, from research and experience, that some of the dynamics of the market change with time.

"MUST CHANGE?" I'm agnostic on the "must" portion of that.

N youngest and N newest? FIFO? I have to disagree on that.

My semi-literate and ill-informed, amateur opinion is that system R&D should concentrate on both improvements/modifications to existing systems, and consideration of new system ideas. Rather than FIFO, I subscribe to BIWO.

Best In, Worst Out.

As discussed above, what is "best" is, at best, subjective. But in a stable of systems, one should always try to have the N systems that one considers "best" to be available at any one time.

I don't care which member of the Robusti family developed which system how many years ago ... all I care about is keeping the best N systems in my stable at any one time, according to my own criteria of "best." BIWO.

Levi wrote:
How exhausting. Why do you subject yourself to such punishment?

I only do so when I feel bored and have time on my hands. I believe (as Mark does) that research should never stop, although I lack the energy to constantly trade new systems and chuck out old ones. From time to time I get a lull â€“ I have one at the moment â€“ when I canâ€™t quite pick out what I want to research next. Eventually I will pick up a new research topic and beaver away for long hours over a hot computer keyboard. And spurn elitetrader as anyone with any sense does.

In the flat periods, I occasionally spot less than helpful comments on websites which for some ridiculous reason I canâ€™t help pointing out.

You are absolutely right though â€“ it is masochism and a complete waste of time. Although sometimes others make some comment or other which gives me pause for thought and sets me off on a new line of research.

nodoodahs wrote:I do know for certain, that we don't know for certain if any system can be robust forever. I believe that some things about the market will be constant, simply because it's a market of human individuals and human nature is constant (at least over our lifetimes). I also know for a fact, from research and experience, that some of the dynamics of the market change with time. [...] system R&D should concentrate on both improvements/modifications to existing systems, and consideration of new system ideas.

I think that the very act of developing and optimizing a trading system (trading rules) is a process of curve fitting to past price history which, undoubtedly, will not be exactly the same in the future.

So the challenge, as I see it, is coming up with rules and parameters which will increase our chances of surviving and making the most profit of future price history. Ironically, there doesn't seem to be anything mechanical about the best possible solution (a compromise between curve fitting and overfitting), which is up to our choice.

Good thread/link indeed alp, thanks - did look around before posting but did not see that one...

Lots of good info and avenues for further reflexion on this thread: I am definitely glad I joined the TradingBlox forum community!

nodoodahs wrote:So let's say you have two systems which both score equally on your other metric. Then take the most robust one. That's easy. But what if your two systems are slightly unequal on your other metric, and the slightly less preferable guy is slightly more robust? You have a weighting problem. If you've ever looked behind the scenes at the indexing arguments for stocks, you know that weighting problems are fundamentally important and terribly expressive about our biases and preconceptions. To top it all off, our biases and preconceptions will change over time ...

Fair point although I guess this issue is not specific to including robustness into an objective/bliss function (ie if you use more than 1 criteria in your function, you will have to work out how you weigh them to "mingle" them together..)

I found the first book of Taleb (Fooled by Randomness) valuable, use- and thoughtful. I made it only half through this one (The Black Swan), since it quickly degenerated into useless, redundant tirades against one single theme, namely the misuse of the bell curve and how wrong we all apply statistics. That made me sad, because the first book was really something. They are so different that I find it difficult to accept that both are supposedly written by the same person.

Don't forget, lots of tips on the idea of testing robustness in that response [ viewtopic.php?p=39802#39802 ] appeared after the section you quoted ...

I found Taleb's rants to be about five or ten pages of good solid material, turned into a massively boring book for some ungodly reason. Like I said, good stuff, but too stepped on for me. There's baking soda in it.

There are really nine flawed assumptions in MPT, but most focus primarily on the â€œnormal distributionsâ€