Title: Better Burst Detection
(NYU-CS-TR876)
Authors: Dennis Shasha and Xin Zhang
Abstract:
A burst is a large number of events occurring within a certain time
window. As an unusual activity, it's a noteworthy phenomenon in many
natural and social processes. Many data stream applications require
the detection of bursts across a variety of window sizes. For example,
stock traders may be interested in bursts having to do with institutional
purchases or sales that are spread out over minutes or hours. Detecting a
burst over any of $k$ window sizes, a problem we call {\em elastic burst
detection}, in a stream of length $N$ naively requires $O(kN)$ time.
Previous work \cite{DiscoveryBook03} showed that a simple Shifted Binary
Tree structure can reduce this time substantially (in very favorable cases
near to $O(N)$) by filtering away obvious non-bursts. Unfortunately, for
certain data distributions, the filter marks many windows of events as
possible bursts, even though a detailed check shows them to be non-bursts.
In this paper, we present a new algorithmic framework for elastic burst
detection: a family of data structures that generalizes the Shifted Binary
Tree. We then present a heuristic search algorithm to find an efficient
structure among the many offered by the framework, given the input. We
study how different inputs affect the desired structures. Experiments
on both synthetic and real world data show a factor of up to 35 times
improvement compared with the Shifted Binary Tree over a wide variety of
inputs, depending on the data distribution. We show an example application
that identifies interesting correlations between bursts of activity in
different stocks.