The Role of Advanced Models in Performance Boosting

In the second part of a two-part article, David Aronson, President of Hood River
Research, examines the modelling techniques used in arriving at a valuable predictor set for boosting ‘raw’ trading model performance.

The development of a boosting model is a two-stage process. The
first is discovering which, if any, of the large list of
candidate indicators proposed for consideration are helpful in
predicting signal returns. The second is establishing the shape
of the surface that best describes the relationship between the
selected candidate indicators and signal returns. One widely used
modelling technique is multiple linear regression, which assumes
that the shape of the surface is linear (flat with no hills and
valleys). Only the slope of the surface with respect to each axis
(i.e. the weight of each indicator) is left open to discovery.
However, modern data modelling techniques have allowed the
constraining assumption of a flat surface to be eliminated. This
allows the modelling procedure to discover the most appropriate
shape for the model's hyper-surface1.

Advanced modelling vs. linear regression

As powerful as multiple linear regression is, relative to the
intuitive judgment of human experts, greater predictive power can
be attained with more advanced modelling methods that are not
constrained by the simplifying assumption of linearity. This
creates the opportunity for more accurate predictions of signal
outcomes. Advanced methods such as kernel regression can detect
complex non-linear relationships. Figures 1 to 5 illustrate how
more sophisticated non- linear modelling differs from traditional
linear regression. For simplicity, the illustrations depict a
single indicator (Xi) on the horizontal axis and the return
earned by the signal on the vertical axis. Figure 1 shows the
true functional relationship between signal returns and Xi, which
is unknown in the sense that the true shape of the function
sought in any predictive modelling problem is by definition
unknown and remains to be inferred from an observed sample of
data. Note that the relationship is not linear.

In Figure 2, a sample of trading signals is shown, with each
point representing a single signal. The position with respect to
the vertical axis is the return earned by the signal and the
position with respect to the horizontal axis is the value of
indicator Xi at the time the signal was given. For purposes of
illustration, there is an obvious relationship between Xi and
signal return, which is unlikely to exist in practice.

Figure 3 shows the model surface that results from modelling this
data with linear regression. This linear model is too simple and
wrong in a systematic sense in that it assumes that the model
surface is flat throughout the range of the predictor variable
Xi, thus causing the model to make systematic errors (i.e. the
model is biased). In some ranges of Xi the model's predictions of
signal return are systematically too low, while in other ranges
the predictions are systematically too high. Systematic errors
are symptomatic of a model surface that does not accurately
represent the underlying functional relationship between the
indicators and signal return.

A reasonable solution would be to propose a more complex model by
visual inspection, such as a parabola. However, in the case of a
linear model, a quadratic (parabolic) model, a cubic model where
Xi is raised to the third power, or any model where the
functional relationship is assumed prior to analysis, the model
is constrained to adopt the assumed form. This is perfectly
legitimate when there is well established theory that suggests
what the correct functional form should be.

However, for many phenomena characterised by high complexity and
high randomness, such as financial markets, there is no
well-established theory to support the choice of a particular
functional form. In these situations, constraining the analysis
to an assumed functional form is too limiting. Instead an
approach is needed that does not require the assumption of the
shape of the model's surface - in other words, non-parametric
modelling.

Adapting non-parametric models

One example of non-parametric modelling is kernel regression.
Figure 4 shows the model surface that would be obtained by
applying kernel regression to the sample data. Note that the
discovered model surface conforms closely to the true
relationship depicted in Figure 1. Kernel regression discovers
the correct shape of the surface by estimating the value of the
dependant variable Y (signal return) within small local ranges
along the Xi axis. The simplest approach to kernel regression
takes an average of the Y values in each local Xi neighbourhood.
This becomes the altitude of the model surface in that region of
Xi. A more sophisticated kernel method fits linear models to each
small Xi neighbourhood.

Kernel regression is one of a
family of advanced modelling methods that do not impose the
constraint of a particular functional form. However, while the
flexibility of these techniques allows the model surface to
conform to the true underlying relationships in the data, if not
properly constrained, this flexibility can cause the model to
'over fit' the data. This results in the model's surface becoming
contaminated with random effects in the data, describing not only
the authentic relationship between signal returns and indicator
reading but also the random variation in the particular sample of
data used to develop the performance-boosting model (see Figure
5).

Over fitting can be mitigated by using cross-validation, which
involves breaking up the historical sample into several subsets:
training, testing and evaluation. Starting with the simplest
possible model, i.e. a single indicator and a flat model surface,
the modelling procedure tests a progression of increasingly
complex models utilising more indicators and surfaces that bend
ever more closely to the data. At the same time, the modelling
procedure is cross-validating between the training and testing
sets to discover the degree of complexity that yields the most
accurate predictions.

The training and testing subsets are used to discover which of
the candidate indicators warrant a place in the model space and
how much the model's surface should be allowed to bend in order
to fit the data without the surface becoming over fitted. Over
fitting is detected when an increase in model complexity improves
fit on the training set, but degrades fit on the testing set.
When the model of optimal complexity has finally been discovered,
then and only then is it applied to the evaluation data. This
third subset of data did not participate in the modelling
process, so it provides an unbiased estimate of the model's true
efficacy. An illustration of these concepts is shown in Figure 6.

For clarity, the foregoing illustrations were confined to a
single predictor variable. In practice, the number of predictors
will be larger. Because financial markets are complex non-linear
systems, it is likely that the governing predictive relationships
will also be non-linear. Nevertheless, linear regression should
not be discarded out of hand as it can prove to be a valuable
method when combined with non-linear methods.

Prediction model ensembles

There are numerous advanced modelling techniques available that
each utilise a distinct mathematical paradigm and search
procedure. Therefore, if various modelling methods are applied to
the same set of signals the boosting models produced will differ
and tend to make different uncorrelated prediction errors. This
intuitively leads to the concept of ensembles of prediction
models. Just as in investing, where it makes sense to combine
securities whose returns are uncorrelated into a portfolio, in
performance boosting it makes sense to combine the predictions of
an ensemble of models whose prediction errors are uncorrelated.
In other words, the degree of volatility of returns experienced
by a portfolio of securities is analogous to the size of the
prediction errors made by a combined forecast. When the
predictions are combined the errors tend to negate one another,
resulting in a combined prediction that has a smaller average
error.

A performance boosting case study

The following is an example of applying the boosting process to a
two-factor technical strategy that takes long positions in
stocks. The strategy is based on two factors: a stock's recent
price change normalised by the stock's volatility; and its recent
change in trading volume. Called the C-K rule, it buys stocks
that have recently fallen on declining trading.

Figure 7 shows the performance of the boosting model on data that
was not used in the model's development (i.e. out-of-sample
validation data). This is the acid test of a boosting model's
efficacy. The boosting model was developed on an in-sample data
set (1984-1999) and was then used to predict returns for signals
in an out-of-sample data set (2000-2004). The out-of-sample
signals were then grouped into ten groups or deciles based on the
boosting model's predicted return. In Figure 7 note that signals
that were predicted to do the best (decile 10) did indeed have
the highest average returns (+1.76 per cent) and performed better
than the average of all signals (+0.37 per cent). The signals
predicted to have the worst returns (decile 1) earned an average
return of -0.64 per cent. By taking smaller positions on signals
predicted to do the worst or avoiding them entirely and taking
larger than normal positions on signals predicted to do the best,
the overall strategy returns produced by the C-K rule would have
been increased relative to a strategy of taking the same position
size on all signals.