% In this example a two-class linear support vector machine classifier is trained
% on a toy data set and the trained classifier is then used to predict labels of
% test examples. As training algorithm the Stochastic Gradient Descent (SGD)
% solver is used with the SVM regularization parameter C=1.2 and the bias term in the
% classification rule switched off. The solver iterates until the maximal
% training time (max_train_time=60 seconds) is exceeded.
%
% For more details on the SGD solver see
% L. Bottou, O. Bousquet. The tradeoff of large scale learning. In NIPS 20. MIT
% Press. 2008.
% SVMSGD
print SVMSGD
%
set_features TRAIN ../data/fm_train_sparsereal.dat
set_labels TRAIN ../data/label_train_twoclass.dat
new_classifier SVMSGD
svm_epsilon 1e-5
svm_use_bias 0
c 0.42
train_classifier
set_features TEST ../data/fm_test_sparsereal.dat
out-classifier_svmsgd.txt = classify
! rm out-classifier_svmsgd.txt

Clustering

examples/documented/cmdline_static/clustering_hierarchical.sg

% In this example an agglomerative hierarchical single linkage clustering method
% is used to cluster a given toy data set. Starting with each object being
% assigned to its own cluster clusters are iteratively merged. Here the clusters
% are merged that have the closest (minimum distance, here set via the Euclidean
% distance object) two elements.
% Hierarchical
print Hierarchical
set_features TRAIN ../data/fm_train_real.dat
set_distance EUCLIDEAN REAL
new_clustering HIERARCHICAL
train_clustering 3
merge_distance.txt, pairs.txt = get_clustering

examples/documented/cmdline_static/clustering_kmeans.sg

% In this example the k-means clustering method is used to cluster a given toy
% data set. In k-means clustering one tries to partition n observations into k
% clusters in which each observation belongs to the cluster with the nearest mean.
% The algorithm class constructor takes the number of clusters and a distance to
% be used as input. The distance used in this example is Euclidean distance.
% After training one can fetch the result of clustering by obtaining the cluster
% centers and their radiuses.
% KMEANS
print KMeans
set_features TRAIN ../data/fm_train_real.dat
set_distance EUCLIDEAN REAL
new_clustering KMEANS
train_clustering 3 1000
radi.txt, centers.txt = get_clustering

Distance

examples/documented/cmdline_static/distance_braycurtis.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'BRAYCURTIS'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance matrix is computed by
% 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance matrix between
% these two matrices is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CBrayCurtisDistance.html.
%
% Obviously, using the Bray Curtis distance is not limited to this showcase
% example.
% BrayCurtis Distance
print BrayCurtis Distance
set_distance BRAYCURTIS REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_canberra.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'CANBERRA'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance (dissimilarity ratio) matrix is
% computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance (dissimilarity ratio)
% matrix between these two data sets is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CCanberraMetric.html.
%
% Obviously, using the Canberra distance is not limited to this showcase
% example.
% Canberra Metric
print CanberraMetric
set_distance CANBERRA REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_canberraword.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored data sets in 'STRING' representation
% (feature type 'CHAR' with alphabet 'DNA') from different files and
% initializes the distance to 'CANBERRA' with feature type 'WORD'.
%
% Data points in this example are defined by the transformation function
% 'convert' and the preprocessing step applied afterwards (defined by
% 'add_preproc' and preprocessor 'SORTWORDSTRING').
%
% The target 'TRAIN' for 'set_features' controls the binding of the given
% data points. In order to compute a pairwise distance matrix by
% 'get_distance_matrix', we have to perform two preprocessing steps for
% input data 'TRAIN'. The method 'convert' transforms the input data to
% a string representation suitable for the selected distance. The individual
% strings are sorted in ascending order after the execution of 'attach_preproc'.
% A pairwise distance matrix is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the binding of the given
% data points 'TRAIN' and 'TEST'. In order to compute a pairwise distance
% matrix between these two data sets by 'get_distance_matrix', we have to
% perform two preprocessing steps for input data 'TEST'. The method 'convert'
% transforms the input data 'TEST' to a string representation suitable for
% the selected distance. The individual strings are sorted in ascending order
% after the execution of 'attach_preproc'. A pairwise distance matrix between
% the data sets 'TRAIN' and 'TEST' is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see
% doc/classshogun_1_1CSortWordString.html,
% doc/classshogun_1_1CPreprocessor.html,
% doc/classshogun_1_1CStringFeatures.html (method obtain_from_char_features) and
% doc/classshogun_1_1CCanberraWordDistance.html.
%
% Obviously, using the Canberra word distance is not limited to this showcase
% example.
% CanberraWord Distance
print CanberraWordDistance
set_distance CANBERRA WORD
add_preproc SORTWORDSTRING
set_features TRAIN ../data/fm_train_dna.dat DNA
convert TRAIN STRING CHAR STRING WORD 3 2 0 n
attach_preproc TRAIN
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_dna.dat DNA
convert TEST STRING CHAR STRING WORD 3 2 0 n
attach_preproc TEST
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_chebyshew.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'CHEBYSHEW'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance matrix (maximum of absolute feature
% dimension differences) is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance matrix (maximum
% of absolute feature dimension differences) between these two data sets is
% computed.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CChebyshewMetric.html.
%
% Obviously, using the Chebyshew distance is not limited to this showcase
% example.
% Chebyshew Metric
print ChebyshewMetric
set_distance CHEBYSHEW REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_chisquare.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'CHISQUARE'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance matrix is computed by
% 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance matrix between
% these two matrices is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CChiSquareDistance.html.
%
% Obviously, using the ChiSquare distance is not limited to this showcase
% example.
% ChiSquare Distance
print ChiSquareDistance
set_distance CHISQUARE REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_cosine.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'COSINE'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance matrix is computed by
% 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance matrix between
% these two data sets is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CCosineDistance.html.
%
% Obviously, using the Cosine distance is not limited to this showcase
% example.
% Cosine Distance
print CosineDistance
set_distance COSINE REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'GEODESIC'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance (shortest path on a sphere) matrix is
% computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance (shortest path on
% a sphere) matrix between these two data sets is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CGeodesicMetric.html.
%
% Obviously, using the Geodesic distance is not limited to this showcase
% example.
% Geodesic Metric
print GeodesicMetric
set_distance GEODESIC REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_hammingword.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored data sets in 'STRING' representation
% (feature type 'CHAR' with alphabet 'DNA') from different files and
% initializes the distance to 'HAMMING' with feature type 'WORD'.
%
% Data points in this example are defined by the transformation function
% 'convert' and the preprocessing step applied afterwards (defined by
% 'add_preproc' and preprocessor 'SORTWORDSTRING').
%
% The target 'TRAIN' for 'set_features' controls the binding of the given
% data points. In order to compute a pairwise distance matrix by
% 'get_distance_matrix', we have to perform two preprocessing steps for
% input data 'TRAIN'. The method 'convert' transforms the input data to
% a string representation suitable for the selected distance. The individual
% strings are sorted in ascending order after the execution of 'attach_preproc'.
% A pairwise distance matrix is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the binding of the given
% data points 'TRAIN' and 'TEST'. In order to compute a pairwise distance
% matrix between these two data sets by 'get_distance_matrix', we have to
% perform two preprocessing steps for input data 'TEST'. The method 'convert'
% transforms the input data 'TEST' to a string representation suitable for
% the selected distance. The individual strings are sorted in ascending order
% after the execution of 'attach_preproc'. A pairwise distance matrix between
% the data sets 'TRAIN' and 'TEST' is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see
% doc/classshogun_1_1CSortWordString.html,
% doc/classshogun_1_1CPreprocessor.html,
% doc/classshogun_1_1CStringFeatures.html (method obtain_from_char_features) and
% doc/classshogun_1_1CHammingWordDistance.html.
%
% Obviously, using the Hamming word distance is not limited to this showcase
% example.
% HammingWord Distance
print HammingWordDistance
set_distance HAMMING WORD
add_preproc SORTWORDSTRING
set_features TRAIN ../data/fm_train_dna.dat DNA
convert TRAIN STRING CHAR STRING WORD 3 2 0 n
attach_preproc TRAIN
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_dna.dat DNA
convert TEST STRING CHAR STRING WORD 3 2 0 n
attach_preproc TEST
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_jensen.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'JENSEN'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance (divergence measure based on the
% Kullback-Leibler divergence) matrix is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance (divergence measure
% based on the Kullback-Leibler divergence) matrix between these two data sets
% is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CJensenMetric.html.
%
% Obviously, using the Jensen-Shannon distance/divergence is not limited to
% this showcase example.
% Jensen Metric
print JensenMetric
set_distance JENSEN REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_manhatten.sg

% n approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'MANHATTAN'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance (sum of absolute feature
% dimension differences) matrix is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance (sum of absolute
% feature dimension differences) matrix between these two data sets is
% computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CManhattanMetric.html.
%
% Obviously, using the Manhattan distance is not limited to this showcase
% example.
% Manhattan Metric
print ManhattanMetric
set_distance MANHATTAN REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_manhattenword.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored data sets in 'STRING' representation
% (feature type 'CHAR' with alphabet 'DNA') from different files and
% initializes the distance to 'MANHATTAN' with feature type 'WORD'.
%
% Data points in this example are defined by the transformation function
% 'convert' and the preprocessing step applied afterwards (defined by
% 'add_preproc' and preprocessor 'SORTWORDSTRING').
%
% The target 'TRAIN' for 'set_features' controls the binding of the given
% data points. In order to compute a pairwise distance matrix by
% 'get_distance_matrix', we have to perform two preprocessing steps for
% input data 'TRAIN'. The method 'convert' transforms the input data to
% a string representation suitable for the selected distance. The individual
% strings are sorted in ascending order after the execution of 'attach_preproc'.
% A pairwise distance matrix is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the binding of the given
% data points 'TRAIN' and 'TEST'. In order to compute a pairwise distance
% matrix between these two data sets by 'get_distance_matrix', we have to
% perform two preprocessing steps for input data 'TEST'. The method 'convert'
% transforms the input data 'TEST' to a string representation suitable for
% the selected distance. The individual strings are sorted in ascending order
% after the execution of 'attach_preproc'. A pairwise distance matrix between
% the data sets 'TRAIN' and 'TEST' is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see
% doc/classshogun_1_1CSortWordString.html,
% doc/classshogun_1_1CPreprocessor.html,
% doc/classshogun_1_1CStringFeatures.html (method obtain_from_char_features) and
% doc/classshogun_1_1CManhattanWordDistance.html.
%
% Obviously, using the Manhattan word distance is not limited to this showcase
% example.
% ManhattanWord Distance
print ManhattanWordDistance
set_distance MANHATTAN WORD
add_preproc SORTWORDSTRING
set_features TRAIN ../data/fm_train_dna.dat DNA
convert TRAIN STRING CHAR STRING WORD 3 2 0 n
attach_preproc TRAIN
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_dna.dat DNA
convert TEST STRING CHAR STRING WORD 3 2 0 n
attach_preproc TEST
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_minkowski.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'MINKOWSKI' with
% norm 'k'. Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance matrix is computed by
% 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance matrix between
% these two data sets is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CMinkowskiMetric.html.
%
% Obviously, using the Minkowski metric is not limited to this showcase
% example.
% Minkowski Metric
print MinkowskiMetric
set_distance MINKOWSKI REAL 3
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

examples/documented/cmdline_static/distance_tanimoto.sg

% An approach as applied below, which shows the processing of input data
% from a file becomes a crucial factor for writing your own sample applications.
% This approach is just one example of what can be done using the distance
% functions provided by shogun.
%
% First, you need to determine what type your data will be, because this
% will determine the distance function you can use.
%
% This example loads two stored matrices of real values (feature type 'REAL')
% from different files and initializes the distance to 'TANIMOTO'.
% Each column of the matrices corresponds to one data point.
%
% The target 'TRAIN' for 'set_features' controls the processing of the given
% data points, where a pairwise distance (extended Jaccard coefficient)
% matrix is computed by 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix' and
% target 'TRAIN'.
%
% The target 'TEST' for 'set_features' controls the processing of the given
% data points 'TRAIN' and 'TEST', where a pairwise distance (extended
% Jaccard coefficient) matrix between these two data sets is computed by
% 'get_distance_matrix'.
%
% The resulting distance matrix can be reaccessed by 'get_distance_matrix'
% and target 'TEST'. The 'TRAIN' distance matrix ceased to exist.
%
% For more details see doc/classshogun_1_1CTanimotoDistance.html.
%
% Obviously, using the Tanimoto distance/coefficient is not limited to
% this showcase example.
% Tanimoto Distance
print TanimotoDistance
set_distance TANIMOTO REAL
set_features TRAIN ../data/fm_train_real.dat
dm_train.txt = get_distance_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
dm_test.txt = get_distance_matrix TEST

% This is an example for the initialization of the diag-kernel.
% The diag kernel has all kernel matrix entries but those on
% the main diagonal set to zero.
% Diag
print Diag
set_kernel DIAG REAL 10 23.
set_features TRAIN ../data/fm_train_real.dat
km_train.txt = get_kernel_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
km_test.txt = get_kernel_matrix TEST

% This is an example for the initialization of a linear kernel on string data. The
% strings are all of the same length and consist of the characters 'ACGT' corresponding
% to the DNA-alphabet. Each column of the matrices of type char corresponds to
% one training/test example.
% Linear String
print LinearString
set_kernel LINEAR CHAR 10
set_features TRAIN ../data/fm_train_dna.dat DNA
km_train.txt = get_kernel_matrix TRAIN
set_features TEST ../data/fm_test_dna.dat DNA
km_test.txt = get_kernel_matrix TEST

% This example initializes the locality improved string kernel. The locality improved string
% kernel is defined on sequences of the same length and inspects letters matching at
% corresponding positions in both sequences. The kernel sums over all matches in windows of
% length l and takes this sum to the power of 'inner_degree'. The sum over all these
% terms along the sequence is taken to the power of 'outer_degree'.
% Locality Improved String
print LocalityImprovedString
set_kernel LIK CHAR 10 5 5 7
set_features TRAIN ../data/fm_train_dna.dat DNA
km_train.txt = get_kernel_matrix TRAIN
set_features TEST ../data/fm_test_dna.dat DNA
km_test.txt = get_kernel_matrix TEST

% This example initializes the polynomial kernel with real data.
% If variable 'inhomogene' is 'true' +1 is added to the scalar product
% before taking it to the power of 'degree'. If 'use_normalization' is
% set to 'true' then kernel matrix will be normalized by the square roots
% of the diagonal entries.
% Poly
print Poly
set_kernel POLY REAL 10 4 0 1
set_features TRAIN ../data/fm_train_real.dat
km_train.txt = get_kernel_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
km_test.txt = get_kernel_matrix TEST

examples/documented/cmdline_static/kernel_polymatchstring.sg

% This is an example for the initialization of the PolyMatchString kernel on string data.
% The PolyMatchString kernel sums over the matches of two stings of the same length and
% takes the sum to the power of 'degree'. The strings consist of the characters 'ACGT' corresponding
% to the DNA-alphabet. Each column of the matrices of type char corresponds to
% one training/test example.
% Poly Match String
print PolyMatchString
set_kernel POLYMATCH CHAR 10 3 0
set_features TRAIN ../data/fm_train_dna.dat DNA
km_train.txt = get_kernel_matrix TRAIN
set_features TEST ../data/fm_test_dna.dat DNA
km_test.txt = get_kernel_matrix TEST

examples/documented/cmdline_static/kernel_polymatchword.sg

% The PolyMatchWordString kernel is defined on strings of equal length.
% The kernel sums over the matches of two stings of the same length and
% takes the sum to the power of 'degree'. The strings in this example
% consist of the characters 'ACGT' corresponding to the DNA-alphabet. Each
% column of the matrices of type char corresponds to one training/test example.
% Poly Match Word
%print PolyMatchWord
%set_kernel POLYMATCH WORD 10 2 1 1
%set_features TRAIN ../data/fm_train_word.dat
%km_train.txt = get_kernel_matrix TRAIN
%set_features TEST ../data/fm_test_word.dat
%km_test.txt = get_kernel_matrix TEST

Preproc

examples/documented/cmdline_static/preproc_logplusone.sg

% In this example a kernel matrix is computed for a given real-valued data set.
% The kernel used is the Chi2 kernel which operates on real-valued vectors. It
% computes the chi-squared distance between sets of histograms. It is a very
% useful distance in image recognition (used to detect objects). The preprocessor
% LogPlusOne adds one to a dense real-valued vector and takes the logarithm of
% each component of it. It is most useful in situations where the inputs are
% counts: When one compares differences of small counts any difference may matter
% a lot, while small differences in large counts don't. This is what this log
% transformation controls for.
% LogPlusOne
print LogPlusOne
add_preproc LOGPLUSONE
set_kernel CHI2 REAL 10 1.4
set_features TRAIN ../data/fm_train_real.dat
attach_preproc TRAIN
km_train.txt = get_kernel_matrix TRAIN
set_features TEST ../data/fm_test_real.dat
attach_preproc TEST
km_test.txt = get_kernel_matrix TEST