m is returned as a numeric column vector with the same length as
Mdl.Y. The software estimates each entry of m
using the trained ECOC model Mdl, the corresponding row of
Mdl.X, and the true class label Mdl.Y.

m = resubMargin(Mdl,Name,Value)
returns the classification margins with additional options specified by one or more
name-value pair arguments. For example, you can specify a decoding scheme, binary learner
loss function, and verbosity level.

The margin distribution of FullMdl is situated higher and has less variability than the margin distribution of PartMdl. This result suggests that the model trained with all the predictors fits the training data better.

Input Arguments

Mdl — Full, trained multiclass ECOC modelClassificationECOC model

Name-Value Pair Arguments

Specify optional
comma-separated pairs of Name,Value arguments. Name is
the argument name and Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN.

Binary learner loss function, specified as the comma-separated pair consisting of
'BinaryLoss' and a built-in loss function name or function handle.

This table describes the built-in functions, where
yj is a class label for a
particular binary learner (in the set {–1,1,0}),
sj is the score for
observation j, and
g(yj,sj)
is the binary loss formula.

Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 +
exp(–2yjsj)]/[2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 +
exp(–yjsj)]/[2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj –
1)]2/2

The software normalizes binary losses so that the loss is 0.5
when yj = 0. Also, the software
calculates the mean binary loss for each class.

bLoss is the classification loss. This
scalar aggregates the binary losses for every learner in a
particular class. For example, you can use the mean binary loss
to aggregate the loss over the learners for each class.

More About

Classification Margin

The classification margin is, for each observation,
the difference between the negative loss for the true class and the maximal negative loss
among the false classes. If the margins are on the same scale, then they serve as a
classification confidence measure. Among multiple classifiers, those that yield greater
margins are better.

Binary Loss

A binary loss is a function
of the class and classification score that determines how well a binary
learner classifies an observation into the class.

In loss-based decoding[Escalera et al.], the class producing the minimum sum of the binary losses over
binary learners determines the predicted class of an observation, that is,

k^=argmink∑j=1L|mkj|g(mkj,sj).

In loss-weighted decoding[Escalera et al.], the class producing the minimum average of the binary losses
over binary learners determines the predicted class of an observation, that is,

k^=argmink∑j=1L|mkj|g(mkj,sj)∑j=1L|mkj|.

Allwein et al. suggest that loss-weighted decoding improves classification
accuracy by keeping loss values for all classes in the same dynamic range.

This table summarizes the supported loss functions, where
yj is a class label for a particular binary
learner (in the set {–1,1,0}), sj is the score for
observation j, and
g(yj,sj).

Value

Description

Score Domain

g(yj,sj)

'binodeviance'

Binomial deviance

(–∞,∞)

log[1 +
exp(–2yjsj)]/[2log(2)]

'exponential'

Exponential

(–∞,∞)

exp(–yjsj)/2

'hamming'

Hamming

[0,1] or (–∞,∞)

[1 – sign(yjsj)]/2

'hinge'

Hinge

(–∞,∞)

max(0,1 – yjsj)/2

'linear'

Linear

(–∞,∞)

(1 – yjsj)/2

'logit'

Logistic

(–∞,∞)

log[1 +
exp(–yjsj)]/[2log(2)]

'quadratic'

Quadratic

[0,1]

[1 – yj(2sj –
1)]2/2

The software normalizes binary losses such that the loss is 0.5 when
yj = 0, and aggregates using the average
of the binary learners [Allwein et al.].

Do not confuse the binary loss with the overall classification loss (specified by the
'LossFun' name-value pair argument of the loss and
predict object functions), which measures how well an ECOC classifier
performs as a whole.

Tips

To compare the margins or edges of several ECOC classifiers, use template objects to
specify a common score transform function among the classifiers during training.