HBase Read Replicas

CDH 5.4 introduces read replicas. Without read replicas, only one RegionServer services a read request from a client, regardless of whether RegionServers are
colocated with other DataNodes that have local access to the same block. This ensures consistency of the data being read. However, a RegionServer can become a bottleneck due to an underperforming
RegionServer, network problems, or other reasons that could cause slow reads.

With read replicas enabled, the HMaster distributes read-only copies of regions (replicas) to different RegionServers in the cluster. One RegionServer services the
default or primary replica, which is the only replica which can service write requests. If the RegionServer servicing the primary replica is down, writes will fail.

Other RegionServers serve the secondary replicas, follow the primary RegionServer and only see committed updates. The secondary replicas are read-only, and are
unable to service write requests. The secondary replicas can be kept up to date by reading the primary replica's HFiles at a set interval or by replication. If
they use the first approach, the secondary replicas may not reflect the most recent updates to the data when updates are made and the RegionServer has not yet flushed the memstore to HDFS. If the
client receives the read response from a secondary replica, this is indicated by marking the read as "stale". Clients can detect whether or not the read result is stale and react accordingly.

Replicas are placed on different RegionServers, and on different racks when possible. This provides a measure of high availability (HA), as far as reads are concerned. If a RegionServer
becomes unavailable, the regions it was serving can still be accessed by clients even before the region is taken over by a different RegionServer, using one of the secondary replicas. The reads may
be stale until the entire WAL is processed by the new RegionServer for a given region.

For any given read request, a client can request a faster result even if it comes from a secondary replica, or if consistency is more important than speed, it can ensure that its request
is serviced by the primary RegionServer. This allows you to decide the relative importance of consistency and availability, in terms of the CAP Theorem, in the context of your application, or individual aspects of your application, using Timeline Consistency semantics.

Timeline Consistency

Timeline Consistency is a consistency model which allows for a more flexible standard of consistency than the default HBase model of strong
consistency. A client can indicate the level of consistency it requires for a given read (Get or Scan) operation. The default consistency level is STRONG, meaning
that the read request is only sent to the RegionServer servicing the region. This is the same behavior as when read replicas are not used. The other possibility, TIMELINE, sends the request to all RegionServers with replicas, including the primary. The client accepts the first response, which includes whether it came from the primary or a
secondary RegionServer. If it came from a secondary, the client can choose to verify the read later or not to treat it as definitive.

Keeping Replicas Current

The read replica feature includes two different mechanisms for keeping replicas up to date:

Using a Timer

In this mode, replicas are refreshed at a time interval controlled by the configuration option hbase.regionserver.storefile.refresh.period. Using a timer
is supported in CDH 5.4 and higher.

Using Replication

In this mode, replicas are kept current between a source and sink cluster using HBase replication. This can potentially allow for faster synchronization than using a timer. Each time a
flush occurs on the source cluster, a notification is pushed to the sink clusters for the table. To use replication to keep replicas current, you must first set the column family attribute
REGION_MEMSTORE_REPLICATION to false, then set the HBase configuration property hbase.region.replica.replication.enabled to true.
Important: Read-replica updates using replication are not supported for the hbase:meta table. Columns of
hbase:meta must always have their REGION_MEMSTORE_REPLICATION attribute set to false.

Enabling Read Replica Support

Important:

Before you enable read-replica support, make sure to account for their increased heap memory requirements. Although no additional copies of HFile data are created, read-only replicas
regions have the same memory footprint as normal regions and need to be considered when calculating the amount of increased heap memory required. For example, if your table requires 8 GB of heap
memory, when you enable three replicas, you need about 24 GB of heap memory.

To enable support for read replicas in HBase, you must set several properties.

HBase Read Replica Properties

Property Name

Default Value

Description

hbase.region.replica.replication.enabled

false

The mechanism for refreshing the secondary replicas. If set to false, secondary replicas are not guaranteed to be consistent at the row level. Secondary
replicas are refreshed at intervals controlled by a timer (hbase.regionserver.storefile.refresh.period), and so are guaranteed to be at most that interval of
milliseconds behind the primary RegionServer. Secondary replicas read from the HFile in HDFS, and have no access to writes that have not been flushed to the HFile by the primary RegionServer.

If true, replicas are kept up to date using replication. and the column family has the attribute REGION_MEMSTORE_REPLICATION
set to false, Using replication for read replication of hbase:meta is not supported, and REGION_MEMSTORE_REPLICATION must always be set to false on the column family.

hbase.regionserver.storefile.refresh.period

0 (disabled)

The period, in milliseconds, for refreshing the store files for the secondary replicas. The default value of 0 indicates that the feature is disabled.
Secondary replicas update their store files from the primary RegionServer at this interval.

If refreshes occur too often, this can create a burden for the NameNode. If refreshes occur too infrequently, secondary replicas will be less consistent with the primary RegionServer.

hbase.master.loadbalancer.class

org.apache.hadoop.hbase.master.
balancer.StochasticLoadBalancer

(the class name is split for formatting purposes)

The Java class used for balancing the load of all HBase clients. The default implementation is the StochasticLoadBalancer, which is the only load balancer that supports reading data from secondary RegionServers.

hbase.ipc.client.allowsInterrupt

true

Whether or not to enable interruption of RPC threads at the client. The default value of true enables
primary RegionServers to access data from other regions' secondary replicas.

hbase.client.primaryCallTimeout.get

10 ms

The timeout period, in milliseconds, an HBase client's will wait for a response before the read is submitted to a secondary replica
if the read request allows timeline consistency. The default value is 10. Lower values increase the number of remote procedure calls while lowering latency.

hbase.client.primaryCallTimeout.multiget

10 ms

The timeout period, in milliseconds, before an HBase client's multi-get request, such as HTable.get(List<GET>)), is submitted to a secondary replica if the multi-get request allows timeline consistency. Lower values increase the number of remote procedure calls
while lowering latency.

Configure Read Replicas Using Cloudera Manager

Before you can use replication to keep replicas current, you must set the column attribute REGION_MEMSTORE_REPLICATION to false for the HBase table, using HBase Shell or the client API. See Activating Read Replicas On a Table.

Select Clusters > HBase.

Click the Configuration tab.

Select Scope > HBase or HBase Service-Wide.

Select Category > Advanced.

Locate the HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml property or search for it by typing its name in the Search
box.

Configure Read Replicas Using the Command Line

Follow these command-line instructions on systems that do not use Cloudera Manager.

This information applies specifically to CDH 5.15.0. See Cloudera Documentation for information specific to other releases.

Before you can use replication to keep replicas current, you must set the column attribute REGION_MEMSTORE_REPLICATION to false for the HBase table, using HBase Shell or the client API. See Activating Read Replicas On a
Table.

Add the properties from HBase Read Replica Properties to hbase-site.xml on each RegionServer in your cluster, and configure each of them to a value appropriate to your needs. The following example configuration shows the syntax.

Configuring Rack Awareness for Read Replicas

Rack awareness for read replicas is modeled after the mechanism used for rack awareness in Hadoop. Its purpose is to ensure that some replicas are on a different rack than the
RegionServer servicing the table. The default implementation, which you can override by setting hbase.util.ip.to.rack.determiner, to custom implementation, is
ScriptBasedMapping, which uses a topology map and a topology script to enforce distribution of the replicas across racks. To
use the default topology map and script for CDH, setting hbase.util.ip.to.rack.determiner, to ScriptBasedMapping is sufficient. Add the
following property to HBase Service Advanced Configuration Snippet (Safety Valve) for hbase-site.xml if you use Cloudera Manager, or to hbase-site.xml otherwise.

Creating a Topology Map

The topology map assigns hosts to racks. It is read by the topology script. A rack is a logical grouping, and does not necessarily correspond to physical hardware or location. Racks can
be nested. If a host is not in the topology map, it is assumed to be a member of the default rack. The following map uses a nested structure, with two data centers which each have two racks. All
services on a host that are rack-aware will be affected by the rack settings for the host.

If you use Cloudera Manager, do not create the map manually. Instead, go to Hosts, select the hosts to assign to a rack, and select Actions for Selected > Assign Rack.

Creating a Topology Script

The topology script determines rack topology using the topology map. By default, CDH uses /etc/hadoop/conf.cloudera.YARN-1/topology.py To use a different
script, set net.topology.script.file.name to the absolute path of the topology script.

Activating Read Replicas On a Table

After enabling read replica support on your RegionServers, configure the tables for which you want read replicas to be created. Keep in mind that each replica increases the amount of
storage used by HBase in HDFS.

At Table Creation

To create a new table with read replication capabilities enabled, set the REGION_REPLICATION property on the table. Use a command like the following, in HBase Shell:

hbase> create 'myTable', 'myCF', {REGION_REPLICATION => '3'}

By Altering an Existing Table

You can also alter an existing column family to enable or change the number of read replicas it propagates, using a command similar to the following. The change will take effect at the
next major compaction.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.