The Singularity: Apache HBase Compatibility and Extensibility

Overview

One of the major features of the upcoming Apache HBase 0.96 release is improved support for compatibility and extensibility across different HBase versions. This includes support for the following:

Upgrading with no downtime: support for a rolling upgrade across a single major version (e.g. 0.96 to 0.98). Because HBase has lacked this feature, moving to a new major version has required “lock step” upgrades: the entire cluster has to be shut down, the components upgraded, and then the cluster restarted. This has been a major source of downtime and unavailability in HBase clusters.

Accessing multiple HBase clusters running different versions: support for master/slave replication and multiple sharded clusters with individual clusters running different versions of HBase. Like upgrading, this feature is supported across a single major version.

To achieve this support, the remote procedure calls (RPC) and persistent data formats are being converted to use protobufs. The latest version of trunk has 42 rpc calls and 130 data types defined via protobufs. These definitions represent, for the first time, a clear specification of HBase client protocol, which should make it easier to write new clients (e.g. clients in other languages).

The conversion to protobufs also allows HBase to become more extensible: developers can add additional parameters to RPC calls and additional fields to the data formats without breaking existing clients. This has been a limitation of HBase in the past: bugs have lingered in older versions and improvements have been unable to be backported, because doing so would break compatibility. See for example HBASE-5904 and HBASE-6009. Thus, improved extensibility means a quicker cadence for fixing bugs and adding new features.

Michael Stack, chair of the HBase PMC, has dubbed the conversion to protobufs and the release of 0.96 as “the Singularity” because it will not be backwards compatible, but once deployed, will be forward compatible with future versions of HBase. Therefore, a “lock-step” upgrade will be necessary to move from a 0.92/0.94 release to 0.96. No additional work is necessary to migrate data to the new formats: HBase 0.96 can read the old formats and will automatically migrate them when they are first read. In addition, existing applications do not need to be rewritten or even recompiled; linking against a new version of the client will be sufficient to get applications working against 0.96.>

We expect wire compatibility to be incorporated into CDH in CDH5.

Work so far

Jimmy Xiang and I gave an overview of the wire compatibility work at the March HBase Meetup. Slides are available here. Since that time, the HBase community has been hard at work getting all the system components ready for the 0.96 release, which is currently targeted for a summer release.

HBASE-5305 lists the individual subtasks. Here are some highlights of the work that has been done so far:

HBASE-5443 split the existing HRegionInterface into administrative (e.g. open/close region) and client (e.g. get/put/scan) operations and converted the RPC calls to use protobufs.

HBASE-5446 converted the data stored inside of ZooKeeper nodes to protobufs. These nodes store, among other things, the location of the -ROOT- table as well as the location of the active and backup HMaster.

HBASE-5445 (in review) and HBASE-5444 converted the RPC functions of the HMaster to protobufs. These functions include creating and deleting tables, handling RegionServer errors, and reporting cluster statistics to the client.

Remaining Work

While the majority of the RPC calls and file formats are now forward compatible, much work remains to ensure compatibility across the broad HBase feature set. Coprocessors, filters, and replication, to name a few, still require work to be ready for “the singularity” of HBase 0.96.

Want to help improve the compatibility and extensibility of HBase? Get started by reading about Getting Involved in the HBase Reference Guide, send an e-mail to the development mailing list, or look into one of the subtasks of HBASE-5305. Contributions are always welcomed!