A Distributed Approach Towards Density Based Clustering: D-TDCT

Abstract

Data clustering is not a new term in computing and specifically data mining. It has been around for years now. Many researches have been conducted in data mining using clustering techniques in recent times with various paradigms. Finding patterns from highly populated data sets is called clustering and it has been taken care by well framed algorithms. The main concern now is how to achieve speed up and scaleup in such clustering tasks using parallelism and distribution. In this paper, a technique of density based clustering has been examined in distributed environment. Shared-nothing distributed architecture is used for the setup. The dataset can be divided into number of subsets and fed into the nodes of the distributed setup connected via network. The experimental results have reported to have established the superiority of this technique with respect to the existing approaches.

Keywords

Clustering Density-based Distributed clustering Parallelism

This work has been done while in Jadavpur University (Department of Information Technology).

S. Kisilevich, F. Mansmann, D. Keim, P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos, in Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application (ACM, 2010), p. 38Google Scholar