ClassificationPartitionedECOC

Description

ClassificationPartitionedECOC is a set of
error-correcting output codes (ECOC) models trained on cross-validated folds. Estimate
the quality of the cross-validated classification by using one or more
“kfold” functions: kfoldPredict, kfoldLoss, kfoldMargin, kfoldEdge, and kfoldfun.

Every “kfold” method uses models trained on training-fold (in-fold)
observations to predict the response for validation-fold (out-of-fold) observations. For
example, suppose you cross-validate using five folds. In this case, the software
randomly assigns each observation into five groups of equal size (roughly). The
training fold contains four of the groups (roughly 4/5 of the
data), and the validation fold contains the other group (roughly
1/5 of the data). In this case, cross-validation proceeds as follows:

The software trains the first model (stored in
CVMdl.Trained{1}) by using the observations in the
last four groups and reserves the observations in the first group for
validation.

The software trains the second model (stored in
CVMdl.Trained{2}) by using the observations in the
first group and the last three groups. The software reserves the
observations in the second group for validation.

The software proceeds in a similar fashion for the third, fourth, and
fifth models.

If you validate by using kfoldPredict, the software computes
predictions for the observations in group i by using the
ith model. In short, the software estimates a response for every
observation by using the model trained without that observation.

Creation

You can create a ClassificationPartitionedECOC model in two
ways:

Create a cross-validated ECOC model from an ECOC model by using the crossval object function.

Create a cross-validated ECOC model by using the fitcecoc function and specifying
one of the name-value pair arguments 'CrossVal',
'CVPartition', 'Holdout',
'KFold', or 'Leaveout'.

Cross-Validation Properties

CrossValidatedModel — Cross-validated model namecharacter vector

KFold — Number of cross-validated foldspositive integer

Number of cross-validated folds, specified as a positive integer.

Data Types: double

ModelParameters — Cross-validation parameter valuesobject

Cross-validation parameter values, specified as an object. The
parameter values correspond to the name-value pair argument values used
to cross-validate the ECOC classifier.
ModelParameters does not contain estimated
parameters.

You can access the properties of ModelParameters
using dot notation.

NumObservations — Number of observationspositive numeric scalar

Number of observations in the training data, specified as a positive numeric scalar.

Data Types: double

Partition — Data partitioncvpartition model

Data partition indicating how the software splits the data into
cross-validation folds, specified as a cvpartition model.

Observed class labels used to cross-validate the model, specified as a
categorical or character array, logical or numeric vector, or cell array
of character vectors. Y has
NumObservations elements and has the same data
type as the input argument Y that you pass to
fitcecoc to cross-validate
the model. (The software treats string arrays as cell arrays of character
vectors.)

Each row of Y represents the observed
classification of the corresponding row of
X.

If you train using binary learners that use different loss functions,
then the software sets BinaryLoss to
'hamming'. To potentially increase accuracy,
specify a binary loss function other than the default during a
prediction or loss computation by using the
'BinaryLoss' name-value pair argument of
kfoldPredict or kfoldLoss.

Data Types: char

BinaryY — Binary learner class labelsnumeric matrix | []

Binary learner class labels, specified as a numeric matrix or
[].

If the coding matrix is the same across all folds, then
BinaryY is a
NumObservations-by-L
matrix, where L is the number of binary
learners (size(CodingMatrix,2)).

The elements of BinaryY are
–1, 0, or
1, and the values correspond to
dichotomous class assignments. This table describes how learner
j assigns observation
k to a dichotomous class corresponding to
the value of BinaryY(k,j).

Value

Dichotomous Class Assignment

–1

Learner j assigns observation k to a negative
class.

0

Before training, learner j removes observation k from the data set.

1

Learner j assigns observation k to a positive
class.

If the coding matrix varies across folds, then
BinaryY is empty
([]).

Data Types: double

CodingMatrix — Codes specifying class assignmentsnumeric matrix | []

Codes specifying class assignments for the binary learners, specified
as a numeric matrix or [].

If the coding matrix is the same across all folds, then
CodingMatrix is a
K-by-L matrix, where
K is the number of classes and
L is the number of binary
learners.

The elements of CodingMatrix are
–1, 0, or
1, and the values correspond to
dichotomous class assignments. This table describes how learner
j assigns observations in class
i to a dichotomous class corresponding to
the value of CodingMatrix(i,j).

Value

Dichotomous
Class Assignment

–1

Learner j assigns observations in class i to a negative
class.

0

Before training, learner j removes observations
in class i from the data set.

1

Learner j assigns observations in class i to a positive
class.

If the coding matrix varies across folds, then
CodingMatrix is empty
([]). You can obtain the coding matrix
for each fold by using the Trained
property. For example,
CVMdl.Trained{1}.CodingMatrix is the
coding matrix in the first fold of the cross-validated ECOC
model CVMdl.

Data Types: double | single | int8 | int16 | int32 | int64

Other Classification Properties

Categorical predictor
indices, specified as a vector of positive integers. CategoricalPredictors
contains index values corresponding to the columns of the predictor data that contain
categorical predictors. If none of the predictors are categorical, then this property is empty
([]).

Unique class labels used in training, specified as a categorical or
character array, logical or numeric vector, or cell array of
character vectors. ClassNames has the same
data type as the class labels Y.
(The software treats string arrays as cell arrays of character
vectors.)ClassNames also determines the class
order.

Data Types: categorical | char | logical | single | double | cell

Cost — Misclassification costssquare numeric matrix

This property is read-only.

Misclassification costs, specified as a square numeric matrix. Cost has
K rows and columns, where K is the number of
classes.

Cost(i,j) is the cost of classifying a point into class
j if its true class is i. The order of the
rows and columns of Cost corresponds to the order of the classes in
ClassNames.

PredictorNames — Predictor namescell array of character vectors

Predictor names in order of their appearance in the predictor data
X, specified as a cell array of
character vectors. The length of
PredictorNames is equal to the
number of columns in X.

Data Types: cell

Prior — Prior class probabilitiesnumeric vector

This property is read-only.

Prior class probabilities, specified as a numeric vector. Prior has as
many elements as the number of classes in
ClassNames, and the order of
the elements corresponds to the order of the classes in
ClassNames.

Speed Up Training ECOC Classifiers Using Binning and Parallel Computing

Train a one-versus-all ECOC classifier using a GentleBoost ensemble of decision trees with surrogate splits. To speed up training, bin numeric predictors and use parallel computing. Binning is valid only when fitcecoc uses a tree learner. After training, estimate the classification error using 10-fold cross-validation. Note that parallel computing requires Parallel Computing Toolbox™.

The data set contains 279 predictors, and the sample size of 452 is relatively small. Of the 16 distinct labels, only 13 are represented in the response (Y). Each label describes various degrees of arrhythmia, and 54.20% of the observations are in class 1.

Train One-Versus-All ECOC Classifier

Create an ensemble template. You must specify at least three arguments: a method, a number of learners, and the type of learner. For this example, specify 'GentleBoost' for the method, 100 for the number of learners, and a decision tree template that uses surrogate splits because there are missing observations.

tEnsemble is a template object. Most of its properties are empty, but the software fills them with their default values during training.

Train a one-versus-all ECOC classifier using the ensembles of decision trees as binary learners. To speed up training, use binning and parallel computing.

Binning ('NumBins',50) — When you have a large training data set, you can speed up training (a potential decrease in accuracy) by using the 'NumBins' name-value pair argument. This argument is valid only when fitcecoc uses a tree learner. If you specify the 'NumBins' value, then the software bins every numeric predictor into a specified number of equiprobable bins, and then grows trees on the bin indices instead of the original data. You can try 'NumBins',50 first, and then change the 'NumBins' value depending on the accuracy and training speed.

Parallel computing ('Options',statset('UseParallel',true)) — With a Parallel Computing Toolbox license, you can speed up the computation by using parallel computing, which sends each binary learner to a worker in the pool. The number of workers depends on your system configuration. When you use decision trees for binary learners, fitcecoc parallelizes training using Intel® Threading Building Blocks (TBB) for dual-core systems and above. Therefore, specifying the 'UseParallel' option is not helpful on a single computer. Use this option on a cluster.

Additionally, specify that the prior probabilities are 1/K, where K = 13 is the number of distinct classes.

CVMdl is a ClassificationPartitionedECOC model. The warning indicates that some classes are not represented while the software trains at least one fold. Therefore, those folds cannot predict labels for the missing classes. You can inspect the results of a fold using cell indexing and dot notation. For example, access the results of the first fold by entering CVMdl.Trained{1}.

Use the cross-validated ECOC classifier to predict validation-fold labels. You can compute the confusion matrix by using confusionchart. Move and resize the chart by changing the inner position property to ensure that the percentages appear in the row summary.

Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

This website uses cookies to improve your user experience, personalize content and ads, and analyze website traffic. By continuing to use this website, you consent to our use of cookies. Please see our Privacy Policy to learn more about cookies and how to change your settings.