If it isn't already, this should be the key question in everyone's mind -- at least, everyone who employs a blend for their MI investing. I've done hours and hours of testing over the past year or so to answer this question, compiling tests that include hundreds of millions of data points. At the very core, each test is trying to determine which measure is the most predictive of future return.

How did you set up the tests?

Well, I've run hundreds and hundreds of tests, but for this post, I'll be sharing based on using ranks 1-10 of each screen with a backtest history from 1969 to present. From these backtests I find five screens to invest in at the beginning of each year from 1989 using a descending sort of each of the measures. The # of lookback years to consider is varied from 1 to 29 in steps of two for a total of 15 lookbacks for each measure. I then take the results from this test and find the Minimum, Maximum, Median and Average value for each measure. I would consider the median and maximum to be key, but I also pay attention to the degree of deviation from max to min. The measure you actually use will also be used with a fixed number of years lookback so the maximum value is probably the one you will end up with, but we want to consider all to determine how robust the measure is. A vast span between min and max reflects a measure that probably just landed on such a high value by luck.

In summary, the period tested is 1989 to 11/9/2007 and the # of screens held is five holding ranks 1-10 for a total hold of 50 stocks. The idea here is not to necessarily hold 50 stocks, but to use this as a basis for finding the best measure. I've actually tested the exact same thing as this using ranks 1-4 and the results are not that much different, although the values are all higher.

All of the tables that follow are sorted by the Median column with the best values at the top.

What is the best measure for finding the highest future CAGR? In other words, which measure is most predictive of future CAGR?

A measure that is not talked about on this board comes out on top -- Jensen. The next one down is Alpha, also not often discussed as far as using it to predict future CAGR. The one that comes out on the absolute bottom is Sharpe/GSD recently suggested by StevnFool.

When looking at these returns, keep in mind that using the Solver to optimize and select a blend at the end of each year produced these results:A Sharpe Optimized StrategyCAGR around 26Sharpe around 1.20This then is the benchmark that we are comparing to.

But our goal is risk-adjusted return, not raw return, so what is the best measure for finding the highest Sharpe Ratio?

Well, you'll be surprised to learn that what is at the absolute bottom of the list above, comes out at the absolute top of the list for risk-adjusted return -- StevnFool's Sharpe/GSD. BTW -- I just added this measure to my backtester after he suggested it yesterday. As a result I had to run a special test just for that indicator and add it to all my other tests.

Once again we find that Sharpe/GSD comes out on top. We also see that although Jensen is great at finding the best screens to invest in at certain times, it does so at the expensive of a much higher Ulcer Index. If your goal is to find screens with just amazing forward returns then Jensen may be for you, but be forewarned, it will not be a smooth ride to the top.

In other words, which of the measures tends to choose screens that consistently produce a positive return each year? Note again which measure comes out on top! Indeed, this simple little idea that StevnFool has come up with has produced some amazing results.

When it comes to the maximum daily drawdown, which of the measures is able to avoid drawdowns the best?

Of course, here we are not suggesting that max drawdown is predictive of anything, but rather, evaluating how well a measure does at avoiding drawdowns. Once again we find that the winner is Sharpe/GSD.

Which of the measures you used actually produced a backtest with the lowest GSD?

You guessed it -- Sharpe/GSD comes out on top once again! In fact, this new measure comes out on top for every evaluation except CAGR. I sure do like the looks of those GSD values for the Sharpe/GSD strategy. A GSD around 10 is pretty much unheard of.

It seems that Sharpe does not seem to be the best measure based on your tests -- why is that?

I have no idea, but it definitely does not come out on top. In fact, here is its position for each of the evaluation tables above.

Sharpe Position CAGR 11 GSD 5 Sharpe 6 UI 5Win Ratio 5Drawdown 5

So out of the nineteen measures that I used to sort on, Sharpe takes 11th place when it comes to CAGR, and sixth when it comes to Sharpe itself. In other words, there are much better measures to use than Sharpe itself for determining the best blend to invest in.

Remember that using the Solver to find a new blend each year resulted in a CAGR around 26 and a Sharpe around 1.20. Using the Sharpe/GSD above produces a CAGR around 1.65 and a CAGR around 24. Which would you say is better?

NOTE: I actually had no intention of sharing even this much research on this theme, but decided to open the flood gates and let the generosity flow -- being that the holidays are just around the corner and all. I hope you've found it useful, and I also hope it promotes some discussion that will be a benefit to us all.

Further Research

A very interesting bit of information that comes out of this research is the fact that some of these measures do exceedingly well at finding the best screens for when the market is bullish, and others find the best when the market is bearish. If you have even a very simple way to determine the bullish and bearish periods you'll do very well switching between the measures used. For example, when the market is bullish use the Jensen -- it found five screens for a total hold of 50 stocks that produced a return of 131% in 1991 and 180% in 1999 and close to 90% in both 2003 and 2006. Pretty amazing! Now if you use a SMA on the market and switch to Sharpe/GSD during the bearish periods you'd get 45% in 2000, 27% in 2001 and 15% in 2002. The problem with Sharpe/GSD is that it really tends to under-perform when the market is super-bullish like 1991 and 1999, but does great the rest of the time. It seems to me that a very simple method can be found that would enable anyone here to switch their sorting method to fit the market. But of course, this would be timing, and there are so many strongly opposed to that no matter how sound the evidence is supporting its benefit. Right? Even when the Hindenburg Omen gives a perfect signal there will still be naysayers that will claim it is all luck and voodoo. Right? That's fine, I'm simply trying to help, and I'm sure that there are many out there that appreciate it, even if a few insist on rejecting it.

But of course, this would be timing, and there are so many strongly opposed to that no matter how sound the evidence is supporting its benefit.

Not so much a matter of being opposed as a matter of so little supporting evidence being present. Given the massive scope of what timing is trying to accomplish as well as the lack of any supportive data to suggest that it truly does work, many are naturally skeptical about it. It also goes without saying that many have tried it in the past only to have it end in disaster.

Back on the topic of the very informative research you presented, though, (and thanks for sharing it), i'm curious what screens the Sharpe/GSD measure has chosen during the last 12 months in particular considering the volatility the market has given us. As you said, a GSD that low sounds almost too good to be true, especially with the wild ride that virtually all of the screens have taken everyone on lately.

If it isn't already, this should be the key question in everyone's mind -- at least, everyone who employs a blend for their MI investing. I've done hours and hours of testing over the past year or so to answer this question, compiling tests that include hundreds of millions of data points. At the very core, each test is trying to determine which measure is the most predictive of future return.

You're right about the question, and thank you for offering an answer.

You mentioned two measures that I've never heard of - Jensen and Treynor. Could you define them or point to their definition?

Treynor Index Treynor Index -- Treynor developed the first composite measure. This measure postulates two components of risk: *Risk resulting from unique fluctuations in markets *Risk resulting from unique fluctuations in portfolios

In order to identify risk due to market fluctuations, Treynor introduced the Characteristic Line. This line defines the relationship between rates of return for a portfolio over time and the rates of return for an appropriate market portfolio. Treynor noted that the Characteristic Line's slope measures the relative volatility of the portfolio's returns in relation to returns for the aggregate market -- the Portfolio's Beta. Deviations from the Characteristic Line indicate unique returns for the portfolio relative to the market. Treynor suggested that risk averse investors would always prefer portfolio possibility lines with larger slopes, because such high-slope lines would place investors on higher indifference curves.

What is measured by the Treynor Index?It measures the portfolio's Risk Premium Return per Unit of Risk All investors would prefer to maximize this value Beta measures systematic risk, but says nothing about diversification. It implicitly assumes a completely diversified portfolio. The Treynor Index (T) is a measure of the stock's excess return per unit of risk. The excess return is defined as the difference between the stock’s return and the 1-Year Treasury Bill rate over the same period of time. The risk measure here is the market risk evaluated by the stock's beta.

How should the Treynor's values be Interpreted?

Positive T Values: The larger the T value, the more preferable the stock is for all investors, regardless of their risk preference. Normally, a stock with a negative beta should experience a rate of return below the Treasury Bill rate, so both the excess return and beta would be negative, and the T values would be positive.

Jensen Performance Measure What is measured by the Jensen Performance Measure? Superior portfolio managers who accurately predict market turns or who identify undervalued investments earn higher returns. Each of them have consistently positive random error terms. Jensen's alpha represents the ability of the fund manager to achieve a return that is above what could be expected, given the risk in the fund. It is a statistical measure of performance similar to the Sharpe Ratio. This indicator was developed to determine the success of a fund manager in stock selection by providing a basis of comparison for portfolios that have different risk exposures. It is generally used for portfolio performance measurement and to identify the part of the performance that can be attributed solely to the portfolio (in other words, the result of superior performance and not luck).

How should Jensen index's values be Interpreted?Positive alphas represent good performance by the fund manager. Negative alphas reflect poorly on the performance of the fund manager. The alpha should be calculated based on an extended period of time - usually three years.

If it's not to much trouble - each year you found an optimal blend for the yearand then calculated the returns for the blend if used for the year.Could you also indicate the expected return for the year as indicated bythe blend. This would give some idea of the gap between expectations and reality.

No, the Sharpe and the GSD are both based on the same # of years. I
do not have the ability to set them as different, at least, not as of
right now.
Zee,
In this respect, what you have done and what I suggested / tested are
different, but the basic idea is the same.
From your original post, you showed the table below.
Sharpe
Minimum Maximum Median Average
Sharpe/GSD 1.25 1.67 1.62 1.58
If I have interpreted your testing correctly, I understand that the
variations in the resulting Sharpe is due to varying the lookback
period. Would you be able to post a table showing the resulting Sharpe
ratio for all of the different lookbacks used for calculating
Sharpe/GSD. Given that the Maximum, Median and Average are grouped
relatively closely it would seem that there is a fairly wide range of
lookbacks that work quite well.
StevnFool

You have done a great job in demonstrating through testing the possible use of Sharpe/GSD as a useful prediction metric.

I have commented many times before that (CAGR-rfr)/GSD is a reasonable substitute for Sharpe when looking at our typical screens and where rfr = the average risk free rate over the backtest period. RRR4 as suggested by me a long time ago and as used by some is the version of this that uses 4% as the risk free rate.

My main point here though is that Sharpe essentially has a CAGR component in its numerator and a GSD component in its demoninator. When I originally suggested the idea of looking for low GSD screens which ultimately led to using Sharpe/GSD, the logic was that the GSD component of the Sharpe ratio is a better predictor than the CAGR component and thus the GSD component needs to get extra weight. We started doing this by selecting low GSD screens first before selecting the high Sharpe, but using Sharpe/GSD is just another way of overweighting the GSD as we essentially have GSD^2 in the denominator now.

This raises the question. "What is the optimimum amount by which we should overweight GSD?"

For example, which of the following is a better predictor of future Sharpe?

Sharpe/GSDSharpe/(GSD^X) where X could be any value > 0

A suggestion might be to try X = 0.5 and X = 2 and see how they compare with what you have already done (X=1).

If you would be interested in testing this idea, I would love if you posted the results. If not, let me know and I might try it if I get time.

Since Sharpe/GSD results are still slightly better with one year lookback, it might be worthwhile to see if there is any improvement using a measure like Sharpe(13)/GSD(1) - long lookaback for Sharpe and short lookback for GSD, as StevenFool initially did (post #203911).The measure is roughly reciprocal to a multiple of long term and short term GSD, giving some extra weight to recent volatility of a screen.

Just a thought, I do not have the tools and capability to test this myself :-(

Since Sharpe is essentially CAGR - C / GSD, where C is a constant, and you are using this as a ranking indicator, the constant C isn't really changing anything. For ranking purposes Sharpe / GSD / GSD should give the same ranking as CAGR / GSD squared. Other methods that could be explored would be CAGR / GSD *x (x being a variable to solve for), rank by GSD top x, then by CAGR, etc.

This test finally completed. It is set up the same way as I did in the Sharpe/GSD^x post -- three screens are selected from VL and three from SIPRO and then blended to form a total of six screens with ranks 1-4 for a total hold of 24 stocks. The period examined is 1999 to present.

I used lookbacks from 20 to 240 market days allowing them to be different for VL and SIPRO -- the lookback days are on the left side of the table. The following very long table is sorted by the resulting Sharpe Ratio.

It appears that VL requires a short lookback, while SIPRO requires a longer lookback. But 40/50 comes in at position #5 in the sort so one cannot arrive at a very fixed statement regarding this. I was expecting the lookback to favor something around 200 or so meaning these results are not what I thought.

At the beginning of this thread, Zee concluded by pointing out that using Sharpe/GSD could provide an even greater advantage combined with timing. Timing is not an easy thing to accomplish as evidenced by the poor performance over the last couple of years by commercial times like Equitrend, Intellitimer, Timing Cube, etc. Zee on the other hand seems to be far more succesful than most.

One timing method that does seem to generally add value is trend following. Something as simple as using a 100 day SMA on the Nasdaq composite over a 45 year period produced some pretty good results.

You can get 17% v. 11% for the S&P since 1969 with identical GSD to the S&P and a better Sharpe by monthly trading by putting 60% of money into T-Bills and 40% into the 20 stocks of the RS 13 screen, without timing at all.

TMT33 stated:One timing method that does seem to generally add value is trend following. Something as simple as using a 100 day SMA on the Nasdaq composite over a 45 year period produced some pretty good results.

I wondered if anyone would ask more on this question. I took some time the other day to find a simple way to do this for the typical investor on this board.

I found that it was optimal to use the Nasdaq 100 as a proxy for the market to determine when to be invested in bullish screens and when to be in defensive screens. The primary reason for this is that there are very few signals doing it the way I found as optimal. Only nine signals to be long over the period from 1989 to present. Here are the dates:

This is based on a cross of two EMAs on the Nasdaq 100 -- 40 and 140. For the test of the index itself this produces an ROI of 2067, compared to what you suggested as a 100 EMA crossing the close of the index which produces an ROI of 1000.

Now I also found a way for anyone on this board to easily track this. You just go to this site and sign up for a free subscription.http://secure.timetotrade.eu/

Once you are signed up you can log in and set up an email alert on the NDX for a 40/140 EMA crossover. Whenever there is a cross you'll receive an email alert that you can act on the next day.

The idea here would be to move between screen sorts -- using very low risk, low GSD, high Sharpe screens when the market is in bearish mode, and use higher risk, high CAGR screens when the market is in bullish mode.

The other simple timing system that could be easily used by investors like myself is Seasonality - whether using fixed dates (bearish May through end-September or October), or switch dates modified with MACD per Sy Harding's STS .

Have you tested how Seasonality compares with the NDX 40/140 EMA crossover for bullish/bearish screen selection?

I wonder not only about CAGR and Sharpe, but also whether going bearish in summer will improve drawdowns and UI.

Well, to test that fully would take quite a bit of time, but the following tables I can produce in less than a minute. This shows the average returns for all the various tests with ranks 1-10 -- 5 screens. As you can easily see, Jensen pops out on top. The top row in each table shows the average of all the results. So about 96% of the years are positive for all results. The % POS column shows the # of years with a positive return. The High is the highest year return, while Low is the lowest return. The StdDev tells us the standard deviation of all the year returns. The tables are sorted by the average return of all the years for each test. The most difficult period for the bullish season was Oct 2000 to May 2001. The absolute best measure for this period was UPI with a three or five year lookback.Bullish Period

One very important thing to remember is that when the market is really tanking big time, nothing wins, no matter what measure you use. Here are the percent of the results that were positive for each year during the summer months. Note that in 2002, 1998 and 1990, nothing worked. The only thing that would have saved you during those three periods was being 100% in cash. It is also interesting to note that this year has been extremely difficult as well. One would be better served using a super-defensive position during these times, rather than just move into another blend. The problem is, how do you know whether it will be one of those years or not.

In order to enhance everyone's analysis of which measure is the best predictor of future risk-adjusted return, let me take a few minutes to post the exact same data as what you find in the top post of this thread but based on ranks 1-4 instead of ranks 1-10. Keep in mind that this is holding five screens for a total blend of twenty stock positions.

It used to be that we believed Sharpe was the best measure to use for determining a blend. How has that measured faired under all the testing you've done?

As I posted a day or two ago, the Sharpe does not tend to come out on top with any of the means we have to evaluate a blend. Let me share the same table again, but this time with ranks 1-4 included.

What this shows more than anything else, is simply that there are better tools available for blend selection.What is the best measure for finding the highest future CAGR? In other words, which measure is most predictive of future CAGR?

Jensen and Alpha came out on top with ranks 1-10, but this time GSD is the top CAGR finder. This is most definitely counter-intuitive, since what this is saying is that we find the best screens by sorting on GSD in descending order. In other words, the highest GSD screens produce the highest CAGR. GSD was in position #5 for ranks 1-10, but it shoots to first position when only twenty stocks are held.

I consider the UI one of the most important tools for evaluating a blend. A low value with this measure will make a huge difference in real-time use of the blend chosen, at least that has been my experience since Mungo first suggested the measure. Based on this I'd say Treynor is a very reasonable option rather than Sharpe/GSD.

This tells us what percent of the screens employed each year produced a winning return. It is pretty impressive to find a win ratio about 85%. I doubt there are too many here who have had this level of success in picking screens to use for their investing.

Here I've prepared a table with Sharpe, Treynor and Sharpe/GSD laid out side by side, and their position rank among all the measures for each of the factors we use to evalute the results. I'm confident that there is a huge difference in opinion regarding how to best use all of this data, but I for one, based on this data alone, would lean toward the Treynor as the best measure. One other factor why I think so is that when you look at the annual returns it wins every year and has an average win of over 50% CAGR. Pretty impressive! The only year where it really did not do great is 1990. In fact, when I take all the yearly returns and sort by the percent of years that were positive so that those with 100% winning are at the top, and then sort by average yearly return, Treynor takes the top twelve positions. Pretty amazing!

My default -- the S&P 500 -- but I can test it off of any index. Do you have some thought of one being better than another?

I would guess it doesn't make too much difference.

The problem I've run into with Alpha, is that backtested performance is so far above the S&P 500 that when I try to optimize for alpha I get blends that have unacceptably (to me) high GSD's (I'm talking in the 30's). But that's using only monthly data.

Another approach to your optimization is to use the Sharpe ratio, but modify the risk-free rate as a constant rather than using the treasury yield. For example, something I might call Sharpe(8) using a figure of 8% as the risk-free rate. You could vary that number and see how it affects future returns.

I've found that's actually a decent way to optimize a blend based on differential risk levels. I use a different value in optimizing, which yields blends that vary in their level of volatility.

For alpha I would definitely think it would not, but for other measures, it can make a huge difference.

The problem I've run into with Alpha, is that backtested performance is so far above the S&P 500 that when I try to optimize for alpha I get blends that have unacceptably (to me) high GSD's (I'm talking in the 30's). But that's using only monthly data.

The median GSD is 31.87 for ranks 1-4 and 28.75 for ranks 1-10. I personally would think there are other measures that work better so there is no use in using this one.

Another approach to your optimization is to use the Sharpe ratio, but modify the risk-free rate as a constant rather than using the treasury yield. For example, something I might call Sharpe(8) using a figure of 8% as the risk-free rate. You could vary that number and see how it affects future returns.

Here is a backtest of the Sharpe using various MAR values from 2-15%. I also have the ability to easily test the granularity of the data used so I included that in the backtest. This is using daily, weekly, monthly and yearly data. As you can see with the Sharpe-based sort below, the highest Sharpe Ratio is attained using yearly granularity, but the highest CAGR is obtained by using a larger MAR and daily data. The default in the list that follows is using the Sharpe with the 1 year t-bill.

Emintz wrote:Another approach to your optimization is to use the Sharpe ratio, but modify the risk-free rate as a constant rather than using the treasury yield. For example, something I might call Sharpe(8) using a figure of 8% as the risk-free rate. You could vary that number and see how it affects future returns.

No comment on the results I posted? I'd like to hear what you think about how this test turned out. Is it what you expected? What conclusions would you draw from it, if any?

No comment on the results I posted? I'd like to hear what you think about how this test turned out. Is it what you expected? What conclusions would you draw from it, if any?

I'm on the road and haven't commented because of that.

My initial comment is to ask what is meant by the comment that is labeled "Look-back period". In the paragraph above the table you refer to the granularity of the data by that's not what I normally think "look-back period" means. Does this mean that you are using only single-start dates per period, or that you are only looking back a limited time for some calculation?

My own results, using a much more limited data set, find that as the MAR was increased, you get higher CAGR, higher GSD, and lower Sharpe, but the changes were much more dramatic than what you saw... probably because of my inclusion of simulated bond returns - with low MAR, a high percentage of the port is in bonds.

I guess I need a better understanding of what that look-back period column really means. If I had to guess, I'd think it just meant granularity of the data used and that the results (other than daily) are averages of the rolling periodic returns?

Eric asked:I guess I need a better understanding of what that look-back period column really means. If I had to guess, I'd think it just meant granularity of the data used and that the results (other than daily) are averages of the rolling periodic returns?

Yes, that is right. There are nine years of data in all cases, but the granularity is changed from Annual to Daily. It illustrates well the difference that can make.

What about the calculation of the Sharpe of the resulting optimization? Is this the Sharpe using the monthly granularity? Do you think it makes a difference if you use daily or annual? For example, does optimization using the daily values still produce worse results if you calculate the Sharpe using daily data? Or is annual granularity better at predicting annual sharpe?

Eric asked:What about the calculation of the Sharpe of the resulting optimization? Is this the Sharpe using the monthly granularity?

No, all that I have posted to date uses daily granularity.

Do you think it makes a difference if you use daily or annual? For example, does optimization using the daily values still produce worse results if you calculate the Sharpe using daily data? Or is annual granularity better at predicting annual sharpe?

Let me share the results of some tests I did earlier this year. Unfortunately, this means that some of the methods I'm using now are not included, and some of the measures as well are not in this test, most importantly, it does not include Sharpe/GSD.

The period tested is from 1/1/1986 to 2/14/2007, so it is three years farther back in time, which does make a big difference. These results are based on holding five screens of ranks 1-4. I show the measures in both ascending and descending sort, whereas in previous tests I was only using a descending sort. As you can see, sometimes the measure works a whole lot better in ascending sort.

This is the same as above, but in this case the date range tested is 1/1/1999 to 2/14/2007. The reason for this is that both SIPRO and VL screens are used. There are times, therefore, that SI screens alone are chosen for a particular year.

So in answer to your questions regarding the Sharpe as the basis for screen selection, based on these tests alone, it is optimal to use yearly granularity for the highest CAGR, daily granularity for the highest Sharpe, and daily/weekly for the lowest GSD. This is for the VL only test. When it comes to the test of VL-SI, the highest CAGR is obtained with yearly again, and daily data gets the best Sharpe and lowest GSD. When it comes to Sharpe it is optimal to use daily granularity, but keep in mind, that is not true for other measures.

Take note of the highest CAGR measures:Sharpe DESC Yearly at 41.00%Treynor DESC Yearly at 45.81%GSD Ratio DESC Yearly at 41.57%

What about the calculation of the Sharpe of the resulting optimization? Is this the Sharpe using the monthly granularity?

No, all that I have posted to date uses daily granularity.

Maybe it is just my own denseness, but I'm not clear what the columns represent in your data. Your daily/weekly etc. columns: is this the granularity of all the calculations, or the one on which you are optimizing, or the resulting measure after optimization?

My question really is this: if there are significant differences in the result of the optimization, using daily Sharpe as the measure, regardless of whether you are optimizing on Sharpe, GSD, or anything else: then might not there also be differences depending on whether you rank the optimization by final Sharpe based on annual instead of daily data?

You may have already explained this with your data, but I'm not getting it.

Eric asked:Maybe it is just my own denseness, but I'm not clear what the columns represent in your data. Your daily/weekly etc. columns: is this the granularity of all the calculations, or the one on which you are optimizing, or the resulting measure after optimization?

Right now almost everyone on this board has one option for optimizing blends and that is using Jamie's backtester which outputs monthly granularity. Of course, I have daily data, so I can calculate off of daily, weekly, monthly and yearly, or anything else for that matter. In the table you are seeing above, I've shown the differences in the results from using these various granularity levels.

My question really is this: if there are significant differences in the result of the optimization, using daily Sharpe as the measure, regardless of whether you are optimizing on Sharpe, GSD, or anything else: then might not there also be differences depending on whether you rank the optimization by final Sharpe based on annual instead of daily data?

Yes, there most certainly are differences as the tables above show. It can make a huge difference. Of course, the problem is that the vast majority do not have any other option besides monthly or yearly data. They can't get ahold of weekly or daily, so in some measure, although I find daily to be optimal, that doesn't help too many. Of course, if you can get the daily data from Robbie's backtester, than I'd guess that would resolve this problem.