Summingbird:… [VLDB 2014]

Summingbird is an open-source domain-specific language implemented in Scala and designed to integrate online and batch MapReduce computations in a single framework. Summingbird programs are written using data flow abstractions such as sources, sinks, and stores, and can run on different execution platforms: Hadoop for batch processing (via Scalding/Cascading) and Storm for online processing. Different execution modes require different bindings for the data flow abstractions (e.g., HDFS files or message queues for the source) but do not require any changes to the program logic. Furthermore, Summingbird can operate in a hybrid processing mode that transparently integrates batch and online results to efficiently generate up-to-date aggregations over long time spans. The language was designed to improve developer productivity and address pain points in building analytics solutions at Twitter where often, the same code needs to be written twice (once for batch processing and again for online processing) and indefinitely maintained in parallel. Our key insight is that certain algebraic structures provide the theoretical foundation for integrating batch and online processing in a seamless fashion. This means that Summingbird imposes constraints on the types of aggregations that can be performed, although in practice we have not found these constraints to be overly restrictive for a broad range of analytics tasks at Twitter.

This entry was posted
on Monday, August 4th, 2014 at 4:07 pm and is filed under Hadoop, Scala, Storm, Summingbird, Tweets.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.

One Response to “Summingbird:… [VLDB 2014]”

[…] are likely already aware of the VLDB proceedings but after seeing the basis for Summingbird:… [VLDB 2014], I was reminded that I should have a tickler to check updates on the VLDB proceedings every month. […]