Abstract

Data cluster is an important area of data mining and this technology has been vastly applied in many fields like data mining, statistical data analysis, mode recognition and image processing. Up to now, many cluster calculation methods that are applied to large-scale datbase have been put forward. The algorithm of DBSCAN is the spatial cluster method based on density with the advantages of fast-speed, effectiveness in dealing with noise and finding out clusters of any shape. Aimed at the limitations of DBSCAN in dealing with non-core object, this paper puts forward the algorithm of DBSCAN based on probability distribution. The results shows that the improved algorithm has improved the quality of cluster.