com.aliasi.classify
Class ConfusionMatrix

An instance of ConfusionMatrix represents a
quantitative comparison between two classifiers over a fixed set of
categories on a number of test cases. For convenience, one
classifier is termed the "reference" and the other the
"response".

Typically the reference will be determined by a human or other
so-called "gold standard", whereas the response will be
the result of an automatic classification. This is how confusion
matrices are created from test cases in ClassifierEvaluator. With this confusion matrix implementation,
two human classifiers or two automatic classifications may also be
compared. For instance, human classifiers that label corpora for
training sets are often evaluated for inter-annotator agreement;
the usual form of reporting for this is the kappa statistic, which
is available in three varieties from the confusion matrix. A set
of systems may also be compared pairwise, such as those arising
from a competitive evaluation.

Confusion matrices may be initialized on construction; with no
matrix argument, they will be constructed with zero values in all
cells. The values can then be incremented by category name with
category name with increment(String,String) or by
category index with increment(int,int). There is also
a incrementByN(int,int,int) which allows explicit control
over matrix values.

Consider the following confusion matrix, which reports on the
classification of 27 wines by grape variety. The reference in
this case is the true variety and the response arises from the
blind evaluation of a human judge.

Many-way Confusion Matrix

Response

Cabernet

Syrah

Pinot

Refer-ence

Cabernet

9

3

0

Syrah

3

5

1

Pinot

1

1

4

Each row represents the results of classifying objects belonging to
the category designated by that row. For instance, the first row
is the result of 12 cabernet classifications. Reading across, 9 of
those cabernets were correctly classified as cabernets, 3 were
misclassified as syrahs, and none were misclassified as pinot noir.
In the next row are the results for 9 syrahs, 3 of which were
misclassified as cabernets and 1 of which was misclassified as a
pinot. Similarly, the six pinots being classified are represented
on the third row. In total, the classifier categorized 13 wines as
cabernets, 9 wines as syrahs, and 5 wines as pinots. The sum of
all numbers in the graph is equal to the number of trials, in this
case 27. Further note that the correct answers are the ones on the
diagonal of the matrix. The individual entries are recoverable
using the method count(int,int). The positive and
negative counts per category may be recovered from the result of
oneVsAll(int).

Collective results are either averaged per category (macro
average) or averaged per test case (micro average). The results
reported here are for a single operating point of results. Very
often in the research literature, results are returned for the best
possible post-hoc system settings, established either globally or
per category.

The multiple outcome classification can be decomposed into a
number of one-versus-all classification problems. For each
category, a classifier that categorizes objects as either belonging
to that category or not. From an n-way classifier, a
one-versus-all classifier can be constructed automatically by
treating an object to be classified as belonging to the category if
the category is the result of classifying it. For the above
three-way confusion matrix, the following three one-versus-all
matrices are returned as instances of PrecisionRecallEvaluation through the method oneVsAll(int):

Cab-vs-All

Response

Cab

Other

Refer-ence

Cab

9

3

Other

4

11

Syrah-vs-All

Response

Syrah

Other

Refer-ence

Syrah

5

4

Other

4

14

Pinot-vs-All

Response

Pinot

Other

Refer-ence

Pinot

4

2

Other

1

20

Note that each has the same true-positive number as in the
corresponding cell of the original confusion matrix. Further note
that the sum of the cells in each derived matrix is the same as in
the original matrix. Finally note that if the original
classification problem was two dimensional, the derived matrix will
be the same as the original matrix. The results of the various
precision-recall evaluation methods for these matrices are shown
in the class documentation for PrecisionRecallEvaluation.

Macro-averaged results are just the average of the per-category
results. These include precision, recall and f-measure. Yule's Q
and Y statistics along with the per-category chi squared results
are also computed based on the one-versus all matrices.

Micro-averaged results are reported based on another derived
matrix: the sum of the scores in the one-versus-all matrices. For
the above case, the result given as a PrecisionRecallEvaluation
by the method microAverage() is:

Sum of One-vs-All Matrices

Response

True

False

Refer-ence

True

18

9

False

9

45

Note that the true positive cell will be the sum of the
true-positive cells of the original matrix (9+5+4=18 in the running
example). A little algebra shows that the false positive cell will
be equal to the sum of the off-diagonal elements in the original
confusion matrix (3+3+1+1+1=9); symmetry then shows that the false
negative value will be the same. Finally, the true negative cell
will bring the total up to the number of categories times the sum
of the entries in the original matrix (here 27*3-18-9-9=45); it is
also equal to two times the number of true positives plus the
number of false negatives (here 2*18+9=45). Thus for
one-versus-all confusion matrices derived from many-way confusion
matrices, the micro-averaged precision, recall and f-measure will
all be the same.

For the above confusion matrix and derived matrices, the
no-argument and category-indexed methods will return the values in
the following tables. The hot-linked method documentation defines
each statistic in detail.

categories

Return the array of categories for this confusion matrix. The
order of categories here is the same as that in the matrix and
consistent with that returned by getIndex(). For
a category c in the set of categories:

numCategories

public int numCategories()

Returns the number of categories for this confusion matrix.
The underlying two-dimensional matrix of counts for this
confusion matrix has dimensions equal to the number of
categories. Note that numCategories() is
guaranteed to be the same as categories().length
and thus may be used to compute iteration bounds.

confidence

Returns the normal approximation of half of the binomial
confidence interval for this confusion matrix for the specified
z-score.

A z score represents the number of standard deviations from
the mean, with the following correspondence of z score and
percentage confidence intervals:

Z

Confidence +/- Z

1.65

90%

1.96

95%

2.58

99%

3.30

99.9%

Thus the z-score for a 95% confidence interval is 1.96 standard
deviations. The confidence interval is just the accuracy plus or minus
the z score times the standard deviation.
To compute the normal approximation to the deviation of the
binomial distribution, assume
p=totalAccuracy() and n=totalCount().
Then the confidence interval is defined in terms of the deviation of
binomial(p,n), which is defined by first taking
the variance of the Bernoulli (one trial) distribution with
success rate p:

variance(bernoulli(p)) = p * (1-p)

and then dividing by the number n of trials in the
binomial distribution to get the variance of the binomial
distribution:

variance(binomial(p,n)) = p * (1-p) / n

and then taking the square root to get the deviation:

dev(binomial(p,n)) = sqrt(p * (1-p) / n)

For instance, with p=totalAccuracy()=.90, and
n=totalCount()=10000:

dev(binomial(.9,10000)) = sqrt(0.9 * (1.0 - 0.9) / 10000) = 0.003

Thus to determine the 95% confidence interval, we take
z = 1.96 for a half-interval width of
1.96 * 0.003 = 0.00588. The
resulting interval is just 0.90 +/- 0.00588
or roughly (.894,.906).

Parameters:

z - The z score, or number of standard deviations.

Returns:

Half the width of the confidence interval for the specified
number of deviations.

referenceEntropy

public double referenceEntropy()

The entropy of the decision problem itself as defined by the
counts for the reference. The entropy of a distribution is the
average negative log probability of outcomes. For the
reference distribution, this is:
referenceEntropy()
=
- Σi
referenceLikelihood(i)
* log2 referenceLikelihood(i)

referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()

Returns:

The entropy of the reference distribution.

responseEntropy

public double responseEntropy()

The entropy of the response distribution. The entropy of a
distribution is the average negative log probability of
outcomes. For the response distribution, this is:

crossEntropy

public double crossEntropy()

The cross-entropy of the response distribution against the
reference distribution. The cross-entropy is defined by the
negative log probabilities of the response distribution
weighted by the reference distribution:

Note that crossEntropy() >= referenceEntropy().
The entropy of a distribution is simply the cross-entropy of
the distribution with itself.

Low cross-entropy does not entail good classification,
though good classification entails low cross-entropy.

Returns:

The cross-entropy of the response distribution
against the reference distribution.

jointEntropy

public double jointEntropy()

Returns the entropy of the joint reference and response
distribution as defined by the underlying matrix. Joint
entropy is derfined by:

jointEntropy()
= - ΣiΣj
P'(i,j) * log2 P'(i,j)

P'(i,j) = count(i,j) / totalCount()

and where by convention:

0 log2 0 =def 0

Returns:

Joint entropy of this confusion matrix.

conditionalEntropy

public double conditionalEntropy(int refCategoryIndex)

Returns the entropy of the distribution of categories
in the response given that the reference category was
as specified. The conditional entropy is defined by:

conditionalEntropy(i)
= - Σj
P'(j|i) * log2 P'(j|i)

P'(j|i) = count(j,i) / referenceCount(i)

where

Parameters:

refCategoryIndex - Index of the reference category.

Returns:

Conditional entropy of the category with the specified
index.

conditionalEntropy

public double conditionalEntropy()

Returns the conditional entropy of the response distribution
against the reference distribution. The conditional entropy
is defined to be the sum of conditional entropies per category
weighted by the reference likelihood of the category.

kappaUnbiased

public double kappaUnbiased()

Returns the value of the kappa statistic adjusted for bias.
The unbiased kappa value is defined in terms of total accuracy
and a slightly different computation of expected likelihood that
averages the reference and response probabilities. The exact
definition is:

chiSquaredDegreesOfFreedom

public int chiSquaredDegreesOfFreedom()

Return the number of degrees of freedom of this confusion
matrix for the χ2 statistic. In general, for an
n×m matrix, the number of degrees of
freedom is equal to (n-1)*(m-1). Because this
is a symmetric matrix of dimensions equal to the number of
categories, the result is defined to be:

chiSquaredDegreesOfFreedom()
= (numCategories() - 1)2

Returns:

The number of degrees of freedom for this confusion
matrix.

chiSquared

public double chiSquared()

Returns Pearson's C2 independence test
statistic for this matrix. The value is asymptotically
χ2 distributed with a number of degrees of
freedom as specified by chiSquaredDegreesOfFreedom().

microAverage

Returns the micro-averaged precision-recall evaluation. This
is just the sum of the precision-recall evaluatiosn provided
by oneVsAll(int) over all category indices. See the
class definition above for an example.

Returns:

The micro-averaged precision-recall evaluation.

macroAvgPrecision

public double macroAvgPrecision()

Returns the average precision per category. This
averaging treats each category of being equal in
weight. Macro-averaged precision is defined by:

macroAvgPrecision()
= Σi
precision(i) / numCategories()

precision(i) = oneVsAll(i).precision()

Returns:

The macro-averaged precision.

macroAvgRecall

public double macroAvgRecall()

Returns the average precision per category. This averaging
treats each category as being equal in weight. Macro-averaged
recall is defined by:

macroAvgRecall()
= Σi
recall(i) / numCategories()

recall(i) = oneVsAll(i).recall()

Returns:

The macro-averaged recall.

macroAvgFMeasure

public double macroAvgFMeasure()

Returns the average F measure per category. This averaging
treats each category as being equal in weight. Macro-averaged
F measure is defined by:

macroAvgFMeasure()
= Σi
fMeasure(i) / numCategories()

recall(i) = oneVsAll(i).fMeasure()

Note that this is not necessarily the same value as results
from computing the F-measure from the the macro-averaged
precision and macro-averaged recall.

Returns:

The macro-averaged F measure.

lambdaA

public double lambdaA()

Returns Goodman and Kruskal's λA index
of predictive association. This is defined by:

where maxReferenceCount(j) is the maximum count
in column j of the matrix:

maxReferenceCount(j) = MAXi count(i,j)

and where maxReferenceCount() is the maximum
reference count:

maxReferenceCount() = MAXi referenceCount(i)

Note that like conditional probability and conditional
entropy, the λA statistic is
antisymmetric; the measure λB
simply reverses the rows and columns. The probabilistic
interpretation of λA is like that
of λB, only reversing the role of
the reference and response.

Returns:

The λB statistic for this
matrix.

lambdaB

public double lambdaB()

Returns Goodman and Kruskal's λB index
of predictive association. This is defined by:

where maxResponseCount(i) is the maximum count
in row i of the matrix:

maxResponseCount(i) = MAXj count(i,j)

and where maxResponseCount() is the maximum
response count:

maxResponseCount() = MAXj responseCount(j)

The probabilistic interpration of
λB is the reduction in error
likelihood from knowing the specified reference category in
predicting the response category. It will thus take on a value
between 0.0 and 1.0, with higher values being better. Perfect
association yields a value of 1.0 and perfect independence a
value of 0.0.

Note that the λB statistic is
antisymmetric; the measure λA
simply reverses the rows and columns.

Returns:

The λB statistic for this
matrix.

mutualInformation

public double mutualInformation()

Returns the mutual information between the reference and
response distributions. Mutual information is defined
Kullback-Lieblier divergence, between the product of the
individual distributions and the joint distribution. Mutual
information is defined as: