Many astute fund managers will tell you that any successful strategy decays over time. Market dynamics change, technologies evolve, and proprietary data eventually makes its way to the masses. At Lucena we stay ahead of the curve with new innovative techniques. Today, I’d like to talk about ensemble voting, a supervised learning technique in which a single strategy utilizes multiple independent learners to deliver investment decisions.

What is Ensemble Learning?

According to Wikipedia: “In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone.”

The idea is to combine multiple independent weak models to deliver a higher quality forecast.

Image 1: Ensemble voting takes into account multiple independent “opinions” derived from either the same or different observation (data) and deliver a single combined score.

It’s important to consider ensemble voting merely because time and again ensemble models won in prestigious machine learning competitions like Kaggle.

So, What is a Weak Model?

Without getting too deep into the mathematical representation of a model’s efficacy, a weak model represents an observation (X) and its predictive outcome (Y) in which such relationship is inconsistent. There are three main components that contribute to the strength of a machine learning model:

Irreducible Error – Inherent noise in data that can’t be reduced with a better algorithm. In general, all data consists of some noise. If you measure the relationship between a house’s square footage and its sales price, for example, although house prices are heavily influenced by their square footage, it is not a perfect price predictor. Therefore, using square footage to forecast a house’s price carries some form of measurable noise.

Image 2: A Bayesian linear regression model depicting an outcome Y from an observation X. The relationship between X and Y is represented via a linear formula and the distance above or below the perfect outcome (represented by the red line) could be attributed to the inherit noise in the data.

Bias Error – Measures the average the rate of divergence of predicted outcome to actual outcome. By traveling back in time and empirically comparing the model’s prediction to the actual outcome, we can assess the model’s predictive success (also called the model’s average fit). A high bias score indicates that the model is not predictive (not flexible enough to capture data patterns or trends) and is missing important information in the data.

Variance – Measures how different outcomes are derived from similar observations in different training data. This is for all intents and purposes a measure of how “over fit” the model is. A model with a high predictive outcome (relationship between prediction and actual outcome) during the training timeframe, but which has a measurably lower correlation between predictions and their respective outcomes in another period, has a high variance error.

So, what does it all mean?

First, irreducible error or noise is a function of the data and cannot be improved with better machine-learning models. The only solution to noisy data is to use a different set of less-noisy data. For our discussion today, let’s assume that our data is predictive and with a reasonable measure of noise. The ultimate goal of an ensemble learner is to train multiple weak models, whether due to high bias or high variance, in order to produce a lower combined bias and variance. A low bias and variance is the ultimate goal of a supervised ensemble learner and would most likely yield a strong predictive outcome. Unfortunately, bias and variance are a function of a model’s complexity, and are somewhat inversely correlated. Therefore, improving one would normally come at the expense of the other. The ensemble learner is ultimately tasked with finding the optimal trade-off between bias and variance. A more complex model (one with lots of factors) is more likely to fall apart out of sample (due to over fitting or high variance). Conversely, a model with too few factors is too generic and will more likely be prone to error (i.e., bias).

Image 3: Relationship between model complexity and error level. The more complex the models the more biased and subject to overfitting they are. In contrast, the simpler the models the more error they encounter, and are thus subject to higher variance. Credit: Scott Fortmann Roe http://scott.fortmann-roe.com/docs/BiasVariance.html.

Conclusion
By combining multiple strategies into one, we reduce the likelihood of overfitting and consequently increase the model’s robustness and ultimate success. There are various ways by which an ensemble voting strategy can combine multiple scores into one. The simplest form is to consider all votes equally. More sophisticated strategies apply rules by which they give greater weight to votes that were successful historically. Some of the common scoring methods used in supervised ensemble learning are:

Bagging – Averaging the score of similar models against different sets of data (or timeframes).

Boosting – A roll-forward process in which successful models attribute higher weight in the total score.

Stacking – Creating a machine learning model from the output of various independent learners.

At Lucena, we recently deployed on QuantDesk® the ability to combine multiple independent event studies into a single backtest. All models must agree on a particular asset before making an entry selection. The strength of this approach is heavily dependent on how different the models are (or how uncorrelated they are). If we apply ensemble voting on similar models, the value is not much different than using a single model. However, if we can truly combine uncorrelated models to agree on an outcome, the likelihood of success will increase substantially.

Our preliminary results are rather encouraging and we are in the process of creating smart alpha and smart beta data feeds using intelligent ensemble learning techniques. Stay tuned, as more on this is soon to follow.

Image 4: Ensemble learner cross-validation backtest report. Three event study models combined into one strategy with crossover on 1/1/2016. As can be seen by the cone in the top image, out-of-sample performance is slightly above average.