While data clustering algorithms are becoming increasingly popular across
scientific, industrial and social data mining applications, model complexity
remains a major challenge. Most clustering algorithms do not incorporate a
mechanism for finding an optimal scale parameter that corresponds to an
appropriate number of clusters. We propose , a kernel-density
smoothing-based approach to data clustering. Its main ideas derive from two
unsupervised clustering approaches – kernel density estimation (KDE) and
scale-spacing clustering (SSC). The novel method determines the optimal
number of clusters by first finding dense regions in data before separating them
based on data-dependent parameter estimates. The optimal number of clusters is
determined from different levels of smoothing after the inherent number of
arbitrary shape clusters has been detected without a priori information. We
demonstrate the applicability of the proposed method under both nested and
non-nested hierarchical clustering methodologies. Simulated and real data results
are presented to validate the performance of the method, with repeated runs
showing high accuracy and reliability.
Keywords: BASINS -1, data clustering, data mining, kernel density estimation,
local optimization, scale-space clustering, supervised learning, unsupervised
learning.