Multi-label classification

Multi-label classification problem is a task to predict labels given two or more categories.

Each sample iii has lil_il​i​​ labels, where LLL is a set of unique labels in the dataset, and 0≤li≤∣L∣0 \leq l_i \leq |L|0≤l​i​​≤∣L∣.
This page focuses on evaluation of such multi-label classification problems.

Example

This page introduces toy example dataset for explanation.

Data

The following table shows examples of multi-label classification's prediction.

Suppose that animal names represent tags of blog posts and the given task is to predict tags for blog posts.
The left column shows the ground truth labels and the right column shows predicted labels by a multi-label classifier.

truth labels

predicted labels

cat, bird

cat, dog

cat, dog

cat, bird

cat

(no truth label)

bird

bird

bird, cat

bird, cat

cat, dog

cat, dog, bird

dog, bird

dog

Evaluation metrics for multi-label classification

Hivemall provides micro F1-score and micro F-measure.

Define LLL is the set of the tag of blog posts, and lil_il​i​​ is a tag set of iii-th document.
In the same manner, pip_ip​i​​ is a predicted tag set of iii-th document.

Caution

Hivemall also provides f1score function, but it is old function to obtain F1-score. The value of f1score is based on set operation. So, we recommend to use fmeasure function to get F1-score based on this article.