What is Apache Tephra (TM)

Apache Tephra provides globally consistent transactions on top of distributed data stores such as Apache HBase. While HBase provides strong consistency with row- or region-level ACID operations, it sacrifices cross-region and cross-table consistency in favor of scalability. This trade-off requires application developers to handle the complexity of ensuring consistency when their modifications span region boundaries. By providing support for global transactions that span regions, tables, or multiple RPCs, Tephra simplifies application development on top of HBase, without a significant impact on performance or scalability for many workloads.

Tephra is used by the Apache Phoenix as well to add cross-row and cross-table transaction support with full ACID semantics.

TransactionProcessor Coprocessor - applies filtering to the data read (based on a given transaction’s state) and cleans up any data from old (no longer visible) transactions.

Transaction Server

A central transaction manager generates a globally unique, time-based transaction ID for each transaction that is started, and maintains the state of all in-progress and recently committed transactions for conflict detection. While multiple transaction server instances can be run concurrently for automatic failover, only one server instance is actively serving requests at a time. This is coordinated by performing leader election amongst the running instances through Apache ZooKeeper. The active transaction server instance will also register itself using a service discovery interface in ZooKeeper, allowing clients to discover the currently active server instance without additional configuration.

Transaction Client

A client makes a call to the active transaction server in order to start a new transaction. This returns a new transaction instance to the client, with a unique transaction ID (used to identify writes for the transaction), as well as a list of transaction IDs to exclude for reads (from in-progress or invalidated transactions). When performing writes, the client overrides the timestamp for all modified HBase cells with the transaction ID. When reading data from HBase, the client skips cells associated with any of the excluded transaction IDs. The read exclusions are applied through a server-side filter injected by the TransactionProcessor coprocessor.

TransactionProcessor Coprocessor

The TransactionProcessor coprocessor is loaded on all HBase tables where transactional reads and writes are performed. When clients read data, it coordinates the server-side filtering performed based on the client transaction’s snapshot. Data cells from any transactions that are currently in-progress or those that have failed and could not be rolled back (“invalid” transactions) will be skipped on these reads. In addition, the TransactionProcessor cleans up any data versions that are no longer visible to any running transactions, either because the transaction that the cell is associated with failed or a write from a newer transaction was successfully committed to the same column.

More details on how Tephra transactions work and the interactions between these components can be found in our Presentations.

Note: Components versions shown in this table are those that we have tested and are confident of their suitability and compatibility. Later versions of components may work, but have not necessarily been either tested or confirmed compatible.

Disclaimer

Apache Tephra is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache Tephra, Apache, the Apache feather logo,
and the Apache Tephra project logos are trademarks of The Apache Software Foundation.
All other marks mentioned may be trademarks or registered trademarks of their respective owners.