Meet Tephra, An Open Source Transaction Engine

July 18, 2014

Gary Helmling was a software engineer at Cask and is an Apache HBase committer and Project Management Committee (PMC) member. Prior to Cask, Gary led the development of Hadoop and HBase applications at Twitter, TrendMicro, and Meetup.

Please note: Continuuity is now known as Cask, and Continuuity Reactor is now known as the Cask Data Application Platform (CDAP).

Our platform, Continuuity Reactor, uses several open source technologies in the Apache HadoopTM ecosystem to enable any developer to build data applications. One of the major components of our platform is Apache HBase, a non-relational, massively salable column-oriented database modeled after Google’s BigTable. We use HBase for a number of reasons, including the strong data consistency it provides. One of the limitations of HBase as a standalone system, however, is that data updates are consistent only within a single region, or a set of contiguous rows, because it is very difficult to coordinate updates across these regions in a way that maintains scalability.

As a result, one of the tradeoffs is that HBase maintains consistency for a single row or region of rows, but anything across regions or tables, cannot be updated atomically—i.e., where the entire transaction is committed as one—nor can you do an atomic update that spans multiple remote procedure calls (RPCs). While we value what HBase provides, we believe providing globally consistent transactions simplifies application development a great deal, allowing developers to focus more on the problems and use cases they care about rather than on implementing complex data access patterns. Our platform, Continuuity Reactor, uses several open source technologies in the Apache Hadoop™ ecosystem to enable any developer to build data applications. One of the major components of our platform is Apache HBase, a non-relational, massively scalable column-oriented database modeled after Google’s BigTable. We use HBase for a number of reasons, including the strong data consistency it provides. One of the limitations of HBase as a standalone system, however, is that data updates are consistent only within a single region, or a set of contiguous rows, because it is very difficult to coordinate updates across these regions in a way that maintains scalability.

This is why we built Tephra, a distributed, scalable transaction engine designed for HBase and Hadoop. Tephra can also be extended to integrate with other NoSQL systems like MongoDB and LevelDB as well as traditional relational databases and data warehouses. Tephra is a powerful data management tool that makes a wide range of use cases easier to solve, especially online and OLTP applications. It utilizes the key features of HBase to make transactional capabilities available without sacrificing overall performance.

Today we’re open sourcing Tephra for anyone to use because we believe that the broader developer community can benefit from it, and for anyone to contribute to because we have built Tephra with extensibility in mind.

How can developers use Tephra?

One common use case is secondary indexes. Developers typically create secondary indexes on HBase by writing updates to a second table with additional rows that reference the rows in the main table based on the index values. The problem is that there isn’t consistency in operations across the two tables, so they can get out of sync. Based on their actual data access patterns and what their application cares about, developers are forced to adopt more complicated application logic to manage the data and work around the inconsistencies. In contrast, Tephra simplifies this use case by allowing updates to both tables to be performed in a single globally consistent transaction.

Why are we open sourcing Tephra?

Many developers and companies are successfully using HBase, but there are still gaps in its accessibility to developers. Tephra takes the strong foundation that HBase has given us to build upon and enhances it by making it more developer-friendly and broadening the potential users and use cases of HBase. We are open sourcing the technology because we want to give back to the community and believe Tephra will be useful to a broad range of developers.

We also are excited to see how others will use, apply, and extend Tephra transactions to their own applications, infrastructures, and environments. We recognize that developers have specific needs, some of which we haven’t anticipated, and we look forward to Tephra growing as a project and community.

Learn more and get involved

Check out the release notes or ourslideshare for more details about Tephra. And please help us make the project better by joining our user and developer mailing list and contributing and reporting any issues, bugs, or ideas.