Trading Metrics that Actually Matter

Traders love their performance metrics. Anyone who’s used their platform’s backtesting features has probably come across a few dozen of them, and everyone’s got their favorite. Anybody who’s anybody in the finance world has one named after them: Sharpe, Sortino, Calmar, Treynor, Gartman, etc. (OK, maybe not the last one). But which ones are the most important? There should be some kind of objective answer to this, right? If you look for the answer to this question on Google, you’ll be quickly overwhelmed. Most people want to give you 5+ different metrics you should be focusing on, all at once. But they can’t all be equally important; what we really want is a fitness function, or one statistic that can be optimized to compare all strategies against one another.

My last two articles covered position sizing from a practitioner’s perspective. In these articles, the idea of “Ideal f” (a truly uncatchy name) was introduced; this is the fraction of capital that should be allocated to an asset/trading strategy to maximize risk-adjusted compounded returns. If you haven’t read them, you should really take the time to do so now. You can read Part 1 here and Part 2 here.

Didn’t read them, did ya? That’s OK; the summary is this: Ideal f determines the fraction of staked capital that will return the highest compounded growth rate whilst constraining drawdown within the trader’s comfort level. Calculating this fraction also returns the projected geometric holding period return (GHPR) of the system at this level of leverage.

This median GHPR, in my opinion, is the only metric that traders should really care about. The end goal is to achieve the highest possible compounded return, while ensuring that the path is smooth enough that you continue to follow the system. If you can’t stomach the system’s drawdowns, you’ll stop trading, and wont achieve the returns. But at the end of the day, you just want to grow capital as much as possible.

However, there is one problem with using this metric: the time it takes to calculate it. One has to generate thousands of equity curves and calculate metrics based on each of them. While this is feasible to do once or a handful of times, when you want to use the metric as a criteria for an optimization that could include thousands of iterations, it quickly becomes unwieldy. What we seek instead are time-independent metrics that predict what the drawdown-constrained GHPR would be.

Introducing: The Metrics

So let’s look at some popular metrics that can be used to evaluate performance. Each of these can be easily applied to a distribution of trades (I prefer daily marked-to-market returns) agnostic to the order in which they occurred.

Metric Correlation to GHPR

Now that we have our metrics defined, we can see how they correlate with our median GHPR at Ideal f. To do so, we’ll simulate a bunch of trading system returns, record all the metrics for each, and see what patterns develop between them. For the tradeable asset universe, I’ve gathered data for five large-cap cryptocurrencies (BCH, BTC, ETH, LTC, and XRP) as well as 67 different ETFs since their inception. The returns used are the mark-to-market daily closing returns of the assets. Each “system” is a random sampling of returns assuming that the system was in the market 25% of the time. We can repeat this process about a thousand times and see what happens. If you’re following along in the notebook, now would be a great time to grab a coffee, walk the dog, call your mom (she’ll appreciate it), etc.

Results

Well, the results speak for themselves. Sharpe Ratio, the finance industry standard is almost perfectly correlated to median GHPR. I have to admit, I was pretty surprised that such a simple metric correlated so strongly with our target. Going forward in my testing, I will probably be using Sharpe Ratio of returns as my fitness function of choice for model evaluation. The formula is easy to compute, easy to understand, and well-known by most everyone in the finance/trading industry.

Extension: Linear Regression

Most of you can probably stop reading now. The following is just an experiment to see if applying regression to all of the metrics returns a meaningful improvement over just using Sharpe Ratio by itself.

So using a linear combination of all the metrics actually degraded model accuracy. Certainly not a great argument for adding complexity. Next, we’ll see if there using polynomial features improves the model at all.

Even more complexity, even worse results. I think I’ll just stop now. If anyone has a way to squeeze that last 3% out with more sophisticated models, I’d love to hear about it in the comments section!

Conclusion

It appears that in order to optimize for drawdown-constrained GHPR without generating thousands of curves, all we need to do is optimize for Sharpe Ratio. I’ve heard various arguments for why one shouldn’t use this metric, and you can find plenty of them out there on the web. However, I think this simple test has shown that there is merit to using it. Personally, I like it for its analogy to a z-score in statistics, where the higher the Sharpe, the lower the probability the true mean of returns is 0.

I hope you found this informative and can use it to guide your testing/modeling decisions.