Holistic UDAFs at streaming speeds.
Graham Cormode, Theodore Johnson, Flip Korn, S. Muthukrishnan,
Oliver Spatscheck and Divesh Srivastava.
Many algorithms have been proposed to approximate holistic
aggregates, such as quantiles and heavy hitters, over data
streams. However, little work has been done to explore what
techniques are required to incorporate these algorithms in
a data stream query processor, and to make them useful in
practice.
In this paper, we study the performance implications of using
user-defined aggregate functions (UDAFs) to incorporate
selection-based and sketch-based algorithms for holistic
aggregates into a data stream management system's query processing
architecture. We identify key performance bottlenecks and
tradeoffs, and propose novel techniques to make these holistic
UDAFs fast and space-efficient for use in high-speed data
stream applications. We evaluate performance using generated
and actual IP packet data, focusing on approximating quantiles
and heavy hitters. The best of our current implementations
can process streaming queries at OC48 speeds (2x 2.4Gbps).