Authors

Document Type

Other

Publication Date

4-2009

Abstract

In diverse applications ranging from stock trading to traffic monitoring, popular data streams are typically monitored by multiple analysts for patterns of interest. These analysts thus may submit similar pattern mining requests, such as mining for clusters or outliers, yet customized with different parameter settings. In this work, we present an efficient shared execution strategy for a large number of density-based cluster detection queries with arbitrary parameter settings. Given the high algorithmic complexity of the clustering process and the real-time responsiveness required by streaming applications, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining clusters for different queries independently is often infeasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. To overcome this, we analyze the interrelations between the cluster sets identified by queries with different parameters settings, considering both pattern-specific and window-specific parameters. We characterize the conditions under which a proposed growth property holds among these cluster sets. By exploiting this growth property we propose a uniform solution, called Chandi, which represents these identified cluster sets as one single compact structure and performs integrated maintenance on them – resulting in significant sharing of computational and memory resources. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that Chandi is on average four times faster than the best alternative methods, while using 85% less memory space in our test cases. It also shows that Chandi scales in handling large numbers of queries, on the order of hundreds or even thousands, under high input data rates.