Syntax

Description

fitckernel trains or cross-validates a binary Gaussian
kernel classification model for nonlinear classification.
fitckernel is more practical for big data applications that
have large training sets but can also be applied to smaller data sets that fit in
memory.

fitckernel maps data in a low-dimensional space into a
high-dimensional space, then fits a linear model in the high-dimensional space by
minimizing the regularized objective function. Obtaining the linear model in the
high-dimensional space is equivalent to applying the Gaussian kernel to the model in the
low-dimensional space. Available linear classification models include regularized
support vector machine (SVM) and logistic regression models.

To train a nonlinear SVM model for binary classification of in-memory data, see
fitcsvm.

Mdl = fitckernel(X,Y)
returns a binary Gaussian kernel classification model trained using the predictor
data in X and the corresponding class labels in
Y. The fitckernel function maps the
predictors in a low-dimensional space into a high-dimensional space, then fits a
binary SVM model to the transformed predictors and class labels. This linear model
is equivalent to the Gaussian kernel classification model in the low-dimensional
space.

Mdl = fitckernel(X,Y,Name,Value)
returns a kernel classification model with additional options specified by one or
more name-value pair arguments. For example, you can implement logistic regression,
specify the number of dimensions of the expanded space, or specify to
cross-validate.

[Mdl,FitInfo] = fitckernel(___)
also returns the fit information in the structure array FitInfo
using any of the input arguments in the previous syntaxes. You cannot request
FitInfo for cross-validated models.

[Mdl,FitInfo,HyperparameterOptimizationResults] = fitckernel(___)
also returns the hyperparameter optimization results
HyperparameterOptimizationResults when you optimize
hyperparameters by using the 'OptimizeHyperparameters'
name-value pair argument.

Examples

Train Kernel Classification Model

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

load ionosphere
[n,p] = size(X)

n = 351

p = 34

resp = unique(Y)

resp = 2x1 cell
{'b'}
{'g'}

Train a binary kernel classification model that identifies whether the radar return is bad ('b') or good ('g'). Extract a fit summary to determine how well the optimization algorithm fits the model to the data.

Mdl is a ClassificationKernel model. To inspect the in-sample classification error, you can pass Mdl and the training data or new data to the loss function. Or, you can pass Mdl and new predictor data to the predict function to predict class labels for new observations. You can also pass Mdl and the training data to the resume function to continue training.

For better accuracy, you can increase the maximum number of optimization iterations ('IterationLimit') and decrease the tolerance values ('BetaTolerance' and 'GradientTolerance') by using the name-value pair arguments. Doing so can improve measures like ObjectiveValue and RelativeChangeInBeta in FitInfo. You can also optimize model parameters by using the 'OptimizeHyperparameters' name-value pair argument.

Optimize Kernel Classifier

Load the ionosphere data set. This data set has 34 predictors and 351 binary responses for radar returns, either bad ('b') or good ('g').

load ionosphere

Find hyperparameters that minimize five-fold cross-validation loss by using automatic hyperparameter optimization. Specify 'OptimizeHyperparameters' as 'auto' so that fitckernel finds optimal values of the 'KernelScale' and 'Lambda' name-value pair arguments. For reproducibility, set the random seed and use the 'expected-improvement-plus' acquisition function.

For big data, the optimization procedure can take a long time. If the data set is too large to run the optimization procedure, you can try to optimize the parameters using only partial data. Use the datasample function and specify 'Replace','false' to sample data without replacement.

Name-Value Pair Arguments

Specify optional
comma-separated pairs of Name,Value arguments. Name is
the argument name and Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN.

Example: Mdl =
fitckernel(X,Y,'Learner','logistic','NumExpansionDimensions',2^15,'KernelScale','auto')
implements logistic regression after mapping the predictor data to the
2^15 dimensional space using feature expansion with a kernel
scale parameter selected by a heuristic procedure.

Note

You cannot use any cross-validation name-value pair argument along with the
'OptimizeHyperparameters' name-value pair argument. You can modify
the cross-validation for 'OptimizeHyperparameters' only by using the
'HyperparameterOptimizationOptions' name-value pair
argument.

Number of dimensions of the expanded space, specified as the comma-separated
pair consisting of 'NumExpansionDimensions' and
'auto' or a positive integer. For
'auto', the fitckernel
function selects the number of dimensions using
2.^ceil(min(log2(p)+5,15)), where
p is the number of predictors.

Kernel scale parameter, specified as the comma-separated pair consisting of
'KernelScale' and 'auto' or a positive scalar.
The software obtains a random basis for random feature expansion by using the kernel
scale parameter. For details, see Random Feature Expansion.

If you specify 'auto', then the software selects an appropriate kernel
scale parameter using a heuristic procedure. This heuristic procedure uses subsampling,
so estimates can vary from one call to another. Therefore, to reproduce results, set a
random number seed by using rng before training.

Example: 'KernelScale','auto'

Data Types: char | string | single | double

'BoxConstraint' — Box constraint1 (default) | positive scalar

Box constraint, specified as the comma-separated pair consisting of
'BoxConstraint' and a positive scalar.

This argument is valid only when 'Learner' is
'svm'(default) and you do not
specify a value for the regularization term strength
'Lambda'. You can specify
either 'BoxConstraint' or
'Lambda' because the box
constraint (C) and the
regularization term strength (λ)
are related by C =
1/(λn), where n is the
number of observations.

Cross-Validation Options

Flag to train a cross-validated classifier, specified as the
comma-separated pair consisting of 'Crossval' and
'on' or 'off'.

If you specify 'on', then the software trains a
cross-validated classifier with 10 folds.

You can override this cross-validation setting using the
CVPartition, Holdout,
KFold, or Leaveout
name-value pair argument. You can use only one cross-validation
name-value pair argument at a time to create a cross-validated
model.

Cross-validation partition, specified as the comma-separated pair consisting of
'CVPartition' and a cvpartition partition
object created by cvpartition. The partition object
specifies the type of cross-validation and the indexing for the training and validation
sets.

To create a cross-validated model, you can use one of these four name-value pair arguments
only: CVPartition, Holdout,
KFold, or Leaveout.

Example: Suppose you create a random partition for 5-fold cross-validation on 500
observations by using cvp = cvpartition(500,'KFold',5). Then, you can
specify the cross-validated model by using
'CVPartition',cvp.

'Holdout' — Fraction of data for holdout validationscalar value in the range (0,1)

Fraction of the data used for holdout validation, specified as the comma-separated pair
consisting of 'Holdout' and a scalar value in the range (0,1). If you
specify 'Holdout',p, then the software completes these steps:

Randomly select and reserve p*100% of the data as
validation data, and train the model using the rest of the data.

Store the compact, trained model in the Trained
property of the cross-validated model.

To create a cross-validated model, you can use one of these
four name-value pair arguments only: CVPartition, Holdout, KFold,
or Leaveout.

Number of folds to use in a cross-validated model, specified as the comma-separated pair
consisting of 'KFold' and a positive integer value greater than 1. If
you specify 'KFold',k, then the software completes these steps:

Randomly partition the data into k sets.

For each set, reserve the set as validation data, and train the model
using the other k – 1 sets.

Store the k compact, trained models in the cells of a
k-by-1 cell vector in the Trained
property of the cross-validated model.

To create a cross-validated model, you can use one of these
four name-value pair arguments only: CVPartition, Holdout, KFold,
or Leaveout.

Leave-one-out cross-validation flag, specified as the comma-separated pair consisting of
'Leaveout' and 'on' or
'off'. If you specify 'Leaveout','on', then,
for each of the n observations (where n is the
number of observations excluding missing observations), the software completes these
steps:

Reserve the observation as validation data, and train the model using the
other n – 1 observations.

Store the n compact, trained models in the cells of an
n-by-1 cell vector in the Trained
property of the cross-validated model.

To create a cross-validated model, you can use one of these
four name-value pair arguments only: CVPartition, Holdout, KFold,
or Leaveout.

Other Kernel Classification Options

Maximum amount of allocated memory (in megabytes), specified as the comma-separated pair consisting of 'BlockSize' and a positive scalar.

If fitckernel requires more memory than the value of
'BlockSize' to hold the transformed predictor data, then the
software uses a block-wise strategy. For details about the block-wise strategy, see
Algorithms.

Random number stream for reproducibility of data transformation, specified as the comma-separated pair consisting of 'RandomStream' and a random stream object. For details, see Random Feature Expansion.

Size of the history buffer for Hessian approximation, specified as the comma-separated pair
consisting of 'HessianHistorySize' and a positive integer. At each
iteration, fitckernel composes the Hessian approximation by using
statistics from the latest HessianHistorySize iterations.

Example: 'HessianHistorySize',10

Data Types: single | double

'Verbose' — Verbosity level0 (default) | 1

Verbosity level, specified as the comma-separated pair consisting of
'Verbose' and either 0 or
1. Verbose controls the
display of diagnostic information at the command line.

Value

Description

0

fitckernel does not display
diagnostic information.

1

fitckernel displays and stores
the value of the objective function, gradient magnitude,
and other diagnostic information.
FitInfo.History contains the
diagnostic information.

Example: 'Verbose',1

Data Types: single | double

Other Classification Options

Names of classes to use for training, specified as the comma-separated pair consisting of
'ClassNames' and a categorical, character, or string array, a
logical or numeric vector, or a cell array of character vectors.
ClassNames must have the same data type as
Y.

If ClassNames is a character array, then each element must correspond to
one row of the array.

Use 'ClassNames' to:

Order the classes during training.

Specify the order of any input or output argument dimension that
corresponds to the class order. For example, use
'ClassNames' to specify the order of the dimensions
of Cost or the column order of classification scores
returned by predict.

Select a subset of classes for training. For example, suppose that the set
of all distinct class names in Y is
{'a','b','c'}. To train the model using observations
from classes 'a' and 'c' only, specify
'ClassNames',{'a','c'}.

The default value for ClassNames is the set of all distinct class names in
Y.

'Cost' — Misclassification costsquare matrix | structure array

Misclassification cost, specified as the comma-separated pair consisting of
'Cost' and a square matrix or structure.

If you specify the square matrix cost
('Cost',cost), then cost(i,j) is the
cost of classifying a point into class j if its true class is
i. That is, the rows correspond to the true class, and
the columns correspond to the predicted class. To specify the class order for
the corresponding rows and columns of cost, use the
ClassNames name-value pair argument.

If you specify the structure S
('Cost',S), then it must have two fields:

S.ClassNames, which contains the class names as
a variable of the same data type as Y

S.ClassificationCosts, which contains the cost
matrix with rows and columns ordered as in
S.ClassNames

The default value for Cost is
ones(K) –
eye(K), where K is
the number of distinct classes.

fitckernel uses Cost to adjust the prior
class probabilities specified in Prior. Then,
fitckernel uses the adjusted prior probabilities for training
and resets the cost matrix to its default.

Prior probabilities for each class, specified as the comma-separated pair consisting
of 'Prior' and 'empirical',
'uniform', a numeric vector, or a structure array.

This table summarizes the available options for setting prior
probabilities.

Value

Description

'empirical'

The class prior probabilities are the class relative frequencies
in Y.

'uniform'

All class prior probabilities are equal to
1/K, where
K is the number of classes.

numeric vector

Each element is a class prior probability. Order the elements
according to their order in Y. If you specify
the order using the 'ClassNames' name-value
pair argument, then order the elements accordingly.

structure array

A structure S with two fields:

S.ClassNames contains the class
names as a variable of the same type as
Y.

Score transformation, specified as the comma-separated pair consisting of
'ScoreTransform' and a character vector, string scalar, or
function handle.

This table summarizes the available character vectors and string scalars.

Value

Description

'doublelogit'

1/(1 + e–2x)

'invlogit'

log(x / (1 – x))

'ismax'

Sets the score for the class with the largest score to 1, and sets the scores for all other
classes to 0

'logit'

1/(1 + e–x)

'none' or 'identity'

x (no transformation)

'sign'

–1 for x < 0
0
for x = 0
1 for x >
0

'symmetric'

2x – 1

'symmetricismax'

Sets the score for the class with the largest score to 1, and sets the scores
for all other classes to –1

'symmetriclogit'

2/(1 + e–x)
– 1

For a MATLAB® function or a function you define, use its function handle for the score
transform. The function handle must accept a matrix (the original scores) and return a
matrix of the same size (the transformed scores).

Example: 'ScoreTransform','logit'

Data Types: char | string | function_handle

'Weights' — Observation weightspositive numeric vector

Observation weights, specified as the comma-separated pair consisting of
'Weights' and a positive numeric vector of length
n, where n is the number of
observations in X. The fitckernel function
weighs the observations in X with the corresponding values in
Weights.

The default value is ones(n,1).

fitckernel normalizes Weights to sum up to
the value of the prior probability in the respective class.

Hyperparameter Optimization Options

Parameters to optimize, specified as the comma-separated pair
consisting of 'OptimizeHyperparameters' and one of
these values:

'none' — Do not optimize.

'auto' — Use
{'KernelScale','Lambda'}.

'all' — Optimize all eligible
parameters.

Cell array of eligible parameter names.

Vector of optimizableVariable objects,
typically the output of hyperparameters.

The optimization attempts to minimize the cross-validation loss
(error) for fitckernel by varying the parameters.
To control the cross-validation type and other aspects of the
optimization, use the
HyperparameterOptimizationOptions name-value
pair argument.

Note

'OptimizeHyperparameters' values override any values you set using
other name-value pair arguments. For example, setting
'OptimizeHyperparameters' to 'auto' causes the
'auto' values to apply.

By default, iterative display appears at the command line, and
plots appear according to the number of hyperparameters in the optimization. For the
optimization and plots, the objective function is log(1 + cross-validation loss) for regression and the misclassification rate for classification. To control
the iterative display, set the Verbose field of the
'HyperparameterOptimizationOptions' name-value pair argument. To
control the plots, set the ShowPlots field of the
'HyperparameterOptimizationOptions' name-value pair argument.

Options for optimization, specified as the comma-separated pair consisting of
'HyperparameterOptimizationOptions' and a structure. This
argument modifies the effect of the OptimizeHyperparameters
name-value pair argument. All fields in the structure are optional.

'gridsearch'
searches in a random order, using uniform sampling
without replacement from the grid. After
optimization, you can get a table in grid order by
using the command
sortrows(Mdl.HyperparameterOptimizationResults).

'bayesopt'

AcquisitionFunctionName

'expected-improvement-per-second-plus'

'expected-improvement'

'expected-improvement-plus'

'expected-improvement-per-second'

'lower-confidence-bound'

'probability-of-improvement'

Acquisition functions whose names include
per-second do not yield reproducible results because the optimization
depends on the runtime of the objective function. Acquisition functions whose names include
plus modify their behavior when they are overexploiting an area. For more
details, see Acquisition Function Types.

'expected-improvement-per-second-plus'

MaxObjectiveEvaluations

Maximum number of objective function evaluations.

30 for 'bayesopt' or 'randomsearch', and the entire grid for 'gridsearch'

MaxTime

Time limit, specified as a positive real. The time limit is in seconds, as measured by tic and toc. Run time can exceed MaxTime because MaxTime does not interrupt function evaluations.

Inf

NumGridDivisions

For 'gridsearch', the number of values in each dimension. The value can be
a vector of positive integers giving the number of
values for each dimension, or a scalar that
applies to all dimensions. This field is ignored
for categorical variables.

10

ShowPlots

Logical value indicating whether to show plots. If true, this field plots
the best objective function value against the
iteration number. If there are one or two
optimization parameters, and if
Optimizer is
'bayesopt', then
ShowPlots also plots a model of
the objective function against the
parameters.

true

SaveIntermediateResults

Logical value indicating whether to save results when Optimizer is
'bayesopt'. If
true, this field overwrites a
workspace variable named
'BayesoptResults' at each
iteration. The variable is a BayesianOptimization object.

Logical value indicating whether to run Bayesian optimization in parallel, which requires
Parallel
Computing Toolbox™. Due to the nonreproducibility of parallel timing, parallel
Bayesian optimization does not necessarily yield reproducible results. For
details, see Parallel Bayesian Optimization.

false

Repartition

Logical value indicating whether to repartition the cross-validation at every iteration. If false, the optimizer uses a single partition for the optimization.

true usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, true requires at least twice as many function evaluations.

If you set any of the name-value pair arguments
CrossVal, CVPartition,
Holdout, KFold, or
Leaveout, then Mdl is a
ClassificationPartitionedKernel cross-validated
classifier. Otherwise, Mdl is a
ClassificationKernel classifier.

To reference properties of Mdl, use dot notation. For
example, enter Mdl.NumExpansionDimensions in the Command
Window to display the number of dimensions of the expanded space.

Cross-validation optimization of hyperparameters, returned as a BayesianOptimization object or a table of hyperparameters and associated
values. The output is nonempty when the value of
'OptimizeHyperparameters' is not 'none'. The
output value depends on the Optimizer field value of the
'HyperparameterOptimizationOptions' name-value pair
argument:

More About

Random Feature Expansion

Random feature expansion, such as
Random Kitchen Sinks[1] and Fastfood[2],
is a scheme to approximate Gaussian kernels of the kernel classification algorithm to use
for big data in a computationally efficient way. Random feature expansion is more practical
for big data applications that have large training sets, but can also be applied to smaller
data sets that fit in memory.

The kernel classification algorithm searches for an
optimal hyperplane that separates the data into two classes after mapping features into a
high-dimensional space. Nonlinear features that are not linearly separable in a
low-dimensional space can be separable in the expanded high-dimensional space. All the
calculations for hyperplane classification use only dot products. You can obtain a nonlinear
classification model by replacing the dot product x1x2' with the nonlinear kernel function G(x1,x2)=〈φ(x1),φ(x2)〉, where xi is the
ith observation (row vector) and φ(xi) is a transformation that maps xi
to a high-dimensional space (called the “kernel trick”). However, evaluating G(x1,x2)
(Gram matrix) for each pair of observations is computationally expensive
for a large data set (large n).

The random feature expansion scheme finds a random
transformation so that its dot product approximates the Gaussian kernel. That is,

G(x1,x2)=〈φ(x1),φ(x2)〉≈T(x1)T(x2)',

where T(x) maps x in ℝp to a high-dimensional space (ℝm). The Random Kitchen Sink scheme uses the random transformation

T(x)=m−1/2exp(iZx')',

where Z∈ℝm×p is a sample drawn from N(0,σ−2) and σ2 is a kernel scale. This scheme requires O(mp) computation and storage. The Fastfood scheme introduces another random
basis V instead of Z using Hadamard matrices combined
with Gaussian scaling matrices. This random basis reduces the computation cost to O(mlogp) and reduces storage to O(m).

The fitckernel function uses the Fastfood scheme for random feature expansion and uses linear classification to train a Gaussian kernel classification model. Unlike solvers in the fitcsvm function, which require computation of the n-by-n Gram matrix, the solver in fitckernel only needs to form a matrix of size n-by-m, with m typically much less than n for big data.

Box Constraint

A box constraint is a parameter that controls the maximum penalty imposed on margin-violating observations, and aids in preventing overfitting (regularization). Increasing the box constraint can lead to longer training times.

The box constraint (C) and the regularization term strength (λ) are related by C = 1/(λn), where n is the number of observations.

Algorithms

fitckernel minimizes the regularized objective function using a Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) solver with ridge (L2) regularization. To find the type of LBFGS solver used for training, type FitInfo.Solver in the Command Window.

'LBFGS-fast' — LBFGS solver.

'LBFGS-blockwise' — LBFGS solver with a block-wise strategy. If fitckernel requires more memory than the value of BlockSize to hold the transformed predictor data, then it uses a block-wise strategy.

When fitckernel uses a block-wise strategy, fitckernel implements LBFGS by distributing the calculation of the loss and gradient among different parts of the data at each iteration. Also, fitckernel refines the initial estimates of the linear coefficients and the bias term by fitting the model locally to parts of the data and combining the coefficients by averaging. If you specify 'Verbose',1, then fitckernel displays diagnostic information for each data pass and stores the information in the History field of FitInfo.

When fitckernel does not use a block-wise strategy, the initial estimates are zeros. If you specify 'Verbose',1, then fitckernel displays diagnostic information for each iteration and stores the information in the History field of FitInfo.

Extended Capabilities

Tall ArraysCalculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

Some name-value pair arguments have different defaults compared to the default values
for the in-memory fitckernel function. Supported name-value pair
arguments, and any differences, are:

'Learner'

'NumExpansionDimensions'

'KernelScale'

'BoxConstraint'

'Lambda'

'BetaTolerance' — Default value is relaxed to
1e–3.

'GradientTolerance' — Default value is relaxed to
1e–5.

'IterationLimit' — Default value is relaxed to
20.

'BlockSize'

'RandomStream'

'HessianHistorySize'

'Verbose' — Default value is
1.

'ClassNames'

'Cost'

'Prior'

'ScoreTransform'

'Weights' — Value must be a tall array.

'OptimizeHyperparameters'

'HyperparameterOptimizationOptions' — For
cross-validation, tall optimization supports only 'Holdout'
validation. For example, you can specify
fitckernel(X,Y,'OptimizeHyperparameters','auto','HyperparameterOptimizationOptions',struct('Holdout',0.2)).

If 'KernelScale' is 'auto', then
fitckernel uses the random stream controlled by tallrng
for subsampling. For reproducibility, you must set a random number seed for both the
global stream and the random stream controlled by tallrng.

If 'Lambda' is 'auto', then
fitckernel might take an extra pass through the data to
calculate the number of observations in X.

This website uses cookies to improve your user experience, personalize content and ads, and analyze website traffic. By continuing to use this website, you consent to our use of cookies. Please see our Privacy Policy to learn more about cookies and how to change your settings.