MultiClust is a workshop held in conjunction with
the 16th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD-2010),
July 25-28, 2010 in Washington, DC.

Workshop Description

Data is often multi-faceted by nature. Given a single data set,
one can interpret it in several different ways. This is particularly
true with complex data that has become prevalent in the data mining
community: text, video, images and biological data to name just a few.
Yet, many data mining and clustering algorithms in particular only
extract and present a single clustering/summarization even though
multiple good alternatives exist. Practitioners oftentimes find that
the clustering solution provided by an algorithm is not what they are
looking for. Why limit the output to one clustering
solution? Why not provide all possible alternative and
interesting clustering solutions?

Recently, there has developed an emerging interest on discovering
multiple clustering solutions from complex data. To avoid redundancy
and excessive burden on the data analyst, it is key to extract
clustering solutions that are informative yet non-redundant from one
another. Toward this goal, important research issues include, how to
define redundancy among clusterings, can existing algorithms be
modified to accommodate this goal, how many solutions should we
extract, how to select among exponentially many possible solutions
which solutions to present to the data analyst, and how to most
effectively help the data analyst find what he or she is searching for.
Existing work approach this problem by looking for non-redundant,
alternative, disparate or orthogonal clustering. Research in this area
is developing and can benefit from well-established closely related
areas, such as ensemble clustering, constraint-based clustering,
compression and coding theory.

In this workshop, we plan to bring together the researchers from the
above research areas to discuss important issues in multiple clustering
discovery, compression and summarization. Our objectives are to:
1) further increase the general interest on this important topic in the
broader research community; 2) bring together experts from closely
related areas (e.g., cluster ensembles and constraint-based clustering)
to shed light on how this emerging new research direction can benefit
from other well-established areas; 3) provide a venue for active
researchers to exchange ideas and explore important research issues in
this area.

Suggested topics:

Alternative clustering: discovering new clusterings that are
different from previously known clusterings

Algorithms for learning simultaneously multiple diverse
clusterings

Visualization of multiple clustering solutions

Interactive exploration of multiple clustering solutions

Multiple high dimensional subspace clusterings

Disparate Clustering

Meta Clustering

Model selection for non-redundant clustering: how many
clusterings and how many clusters?

Non-redundant frequent patterns

Non-redundant subspace clustering

Relation between cluster ensembles and disparate clustering

Constraint-based Clustering for alternative clustering

Evaluation Metrics for Multiple Clusterings

Applications

Keynote Speeches

Joydeep Ghosh (University of Texas, Austin) Multi-Clust Systems: When Many Views are Better than One
As opposed to multi-classifier systems where the primary goal is to improve classification accuracy,
multiple clusterings over a common set of objects can provide a wide range of benefits.
For example, they can facilitate detecting multi-membership objects, allow knowledge reuse,
provide alternative ways of model selection and can imbibe a variety of domain knowledge to
constrain the consensus solution. Examples from disciplines ranging from psychology to
marketing will be presented to highlight these capabilities.

James Bailey (University of Melbourne) Talk slides Alternative Clusterings: Current Progress and Open Challenges
This talk will review the state of the art for discovering alternative
clusterings, identify key applications and highlight current challenges
for this important and emerging area.

Rich Carauna (Microsoft) Clustering With Side Information vs. Multi/Meta Clustering
Early clustering work usually assumed there was one true clustering of the data, and that the goal of clustering was to find a clustering as close to the correct clustering and as efficiently as possible. It is now clear that complex data sets can be clustered in many different ways, and that different clusterings are useful for different purposes. The goal now is to efficiently find multiple, significantly different, yet high quality clusterings, and to allow users to efficiently find among these the clustering(s) that are most useful for them. In this talk we'll compare two competing approaches for accomplishing this: clustering with side information and multi/meta clustering. One surprising result from our experiments is that the clustering which is most useful often is not a very compact clustering using common definitions of compactness.