Trading with Support Vector Machines (SVM)

Finally all the stars have aligned and I can confidently devote some time for back-testing of new trading systems, and Support Vector Machines (SVM) are the new “toy” which is going to keep me busy for a while.

SVMs are a well-known tool from the area of supervised Machine Learning, and they are used both for classification and regression. For more details refer to the literature.

It seems to me that the most intuitive application for trading is regression, so let’s start by building an SVM regression model.

Following our experience with ARMA+GARCH models, we will start by trying to forecast returns, instead of prices. Likewise, in our first tests, we will use only the returns of the previous 5 days as the features determining the return of a particular day. We will start with history of 500 days as the training set.

In more mathematical terms, for the training set we have N features, for each of them we have M samples. We also have M responses.

Given a row of feature values, the left matrix, the SVM is trained to produce the response value. In our specific example, we have five columns (features), each column corresponding to the returns with a different lag (from 1 to 5). We have 500 samples and the corresponding responses.

Once the SVM is trained on this set, we can start feeding it with sets of five features, corresponding to the returns for the five previous days, and the SVM will provide us with the response, which is the forecasted return. For example, after training the SVM on the previous 500 days, we will use the returns for days 500, 499, 498, 497 and 496 (these are ours as the input to obtain the forecasted return for day 501.

From all the packages available in R, I decided to choose the e1071 package. A close second choice was the kernlab package, which I am still planning to try in the future.

Then I tried a few strategies. First I tried something very similar to the ARMA+GARCH approach – the lagged returns from the five previous days. I was quite surprised to see this strategy performing better than the ARMA+GARCH (this is the home land of the ARMA+GARCH and I would have been quite happy just with comparable performance)!

Next, I tried to the same five features, but trying to select the best subset. The selection was done using a greedy approach, starting with 0 features, and interactively adding the feature which minimizes the error best. This approach improved things further.

Finally, I tried a different approach with about a dozen features. The features included returns over different period of time (1-day, 2-day, 5-day, etc), some statistics (mean, median, sd, etc) and volume. I used the same greedy approach to select features. This final system showed a very good performance as well, but it took a hell of a time to run.

Time to end this post, the back-testing results have to wait. Until then you can play with the full source code yourself. Here is an example of using it: