SHOGUN provides "static" interfaces to Matlab(tm), R, Python, Octave and provides a command line stand-a-lone executable. The idea behind the static interfaces is to provide a simple environment just enough to do simple experiments. For example, it will allow you to train and evaluate a classifier but not go beyond that. In case you are looking for basically unlimited extensibility (multiple methods like classifiers potentially sharing data and interacting) you might want to look at the ModularInterfaces" instead.

In this tutorial we demonstrate how to use shogun to create a simple gaussian kernel based support vector machine classifier but first things first. Lets start up R, python, octave or matlab load the shogun environment.

Starting SHOGUN

To start SHOGUN in python, start python and type

from sg import sg

For R issue (from within R)

library(sg)

For octave and matlab just make sure sg is in the path (use addpath). For the cmdline interface just start the shogun executable

Now in all languages

sg('help')

and

help

in the cmdline interface will show the help screen. If not consult Installation on how to install shogun.

Creating an SVM classifier

The rest of this tutorial assumes that the cmdline shogun executable is used (but hints on how things work using other interfaces). The basic syntax is

<command> <option1> <option2> ...

here options are separated by spaces. For example

set_kernel GAUSSIAN REAL 10 1.2

will create a gaussian kernel that operates on real-valued features uses a kernel cache of size 10 MB and kernel width of 1.2. In analogy the other the cmdline for the other interfaces (python,r,...) would look like

sg('set_kernel', 'GAUSSIAN', 'REAL', 10, 1.2)

Note that there is little difference to the other interfaces, basically only strings are marked as such and arguments comma separated.

We now use two random gaussians as inputs as train data:

set_features TRAIN ../data/fm_train_real.dat

(For other interfaces sth. like

sg('set_features', 'TRAIN', [ randn(2, 100)-1, randn(2,100)+1 ])

would work).

For training a supervised method like an SVM we need a labeling of the training data, which we set via

set_labels TRAIN ../data/label_train_twoclass.dat

(For other interfaces, e.g. matlab/octave sth. like

sg('set_labels', 'TRAIN', sign(randn(1, 100)))

would work)

Now we create an SVM and set the SVM-C parameter to some hopefully sane value (which in real applications needs tuning; like the kernel parameters (here kernel width)).

new_classifier LIBSVM
c 1

We then train our SVM:

train_classifier

We can now apply our classifier to unseen test data by loading some test data and classifying the examples:

set_features TEST ../data/fm_test_real.dat
out.txt = classify

In case we want to save the classifiers, such that we don't have to perform potentially time consuming training again we can save and load like this

save_classifier libsvm.model
load_classifier libsvm.model LIBSVM

Other interfaces (python,r...) could use the load/save functions but typically one manually obtains and restores the model parameters, like