HBaseCon 2012: A Glimpse into the Applications Track

HBaseCon 2012 is coming to San Francisco on May 22, less than 2 months away! The conference agenda continues to grow daily with exciting presentation content, which means it’s time to share a few sessions that have been added to the HBaseCon 2012 Applications Track.

Apache HBase is primarily used for real-time random read/write access to Big Data as part of the Apache Hadoop ecosystem. Applications on Apache HBase are typically built to query Big Data with extremely low latency. Sessions in the HBaseCon 2012 Applications Tracks will include explanations of real-world HBase use cases, where HBase fits in an organization’s entire Big Data stack and when HBase is the “right” solution for an organization.

Applications Track Presentations

YapMap is a new kind of search platform that does multi-quanta search to better understand threaded discussions. This talk will cover how HBase made it possible for two self-funded guys to build a new kind of search platform. The presentation will discuss the YapMap data model and how YapMap uses row based atomicity to manage parallel data integration problems. Also learn where YapMap does not use HBase and instead uses a traditional SQL based infrastructure; the benefits of using MapReduce and HBase for index generation; the YapMap migration of tasks from a message based queue to the Coprocessor framework; and YapMap’s future Coprocessor use cases. Lastly, learn about YapMap’s operational experience with HBase, hardware choices and the challenges YapMap has faced.

Adobe Systems uses “SaasBase Analytics” to incrementally process large heterogeneous data sets into pre-aggregated, indexed views, stored in HBase to be queried in real- time. The goal was to process new data in real- time (currently minutes) and have it ready for a large number of concurrent queries that execute in milliseconds. This set Adobe’s problem apart from what is traditionally solved with Hive or Pig. This talk will describe the design and the strategies (and hacks) used to achieve low latency and scalability, from theoretical model to the entire process of ETL to warehousing and queries.

This talk goes into detail about Tumblr’s experience developing Motherboy, an eventually consistent inbox style storage system built around HBase. The SLA, write concurrency, data volume, and failure modes for this application created a number of challenges in developing a solution. The user homing scheme introduced additional complexity that made capacity planning tricky as Tumblr tried to trade off availability and cost. Performance testing of our workload, and automation to support that testing, also provided a number of valuable lessons. This talk will be most useful to people considering HBase for their application, but will have enough detail to be useful to current HBase users as well.

Be sure to check the agenda in the coming weeks as we are adding more sessions soon. Remember that the Early Bird registration price expires this Friday April 6 so register soon to take advantage of the discount.