Entropy Scorer

Scorer for clustering results given a reference clustering.
Connect the table containing the reference clustering to the
first input port (the table should contain a column with the
cluster IDs) and the table with the clustering results to
the second input port (it should also contain a column with
some cluster IDs). Select the respective columns in both
tables from the dialog. After successful execution, the view
will show entropy values (the smaller the better) and some
quality value (in [0,1] - with 1 being the best possible
value, as used in
Fuzzy Clustering in Parallel Universes
, section 6: "Experimental results").

Options

Reference column

Column containing the reference clustering. This column is
provided by the first input table.

Clustering column

Column containing the cluster IDs to evaluate. This column
is provided by the second input table.

Input Ports

Table containing reference clustering.

Table containing clustering (to score).

Output Ports

Table containing entropy values for each cluster. The last row
contains statistics on the entire clustering. It corresponds to
the table show in the Statistics View.

Views

Statistics View

Simple statistics on the clustering such as number of
clusters being found, number of objects in clusters, number
of reference clusters, and total number of objects. Further
statistics include:

Entropy: The accumulated entropy of all identified
clusters, weighted by the relative cluster size. The
entropy is not normalized and may be greater than 1.

Quality: The quality value according to the formula
referenced above. It is the sum of the weighted
qualities of the individual clusters, whereby the
quality of a single cluster is calculated as (1 -
normalized_entropy). The domain of the quality value
is [0,1].

The table at the bottom of the view provides statistics on
cluster size, cluster entropy,
normalized cluster entropy and quality. The
entropy of a clusters is based on the reference clustering (provided
at the first input port) and the normalized entropy
is this value scaled to an interval [0, 1]. More precisely,
it is the entropy divided by log2(number of different
clusters in the reference set). The quality value is only available
in the last row (showing the overall statistics).

Workflows

Installation

To use this node in KNIME, install
KNIME Core
from the following update site:

KNIME 4.0

Wait a sec! You want to explore and install nodes even faster? We highly recommend our
NodePit for KNIME
extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Contact

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well?
Do you think, the search results could be improved or something is missing?
Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com,
follow @NodePit on Twitter,
or chat on Gitter!

Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.

NodePit is the world’s first search engine that allows you to easily search, find and install KNIME nodes and workflows. Explore the KNIME community’s variety. Start mining and follow @NodePit on Twitter.