Configuring HBase in Pseudo-Distributed Mode

Note: You can skip this section if you are already running HBase in distributed mode, or if you intend to use only standalone mode.

Pseudo-distributed mode differs from standalone mode in that each of the component processes run in a separate JVM. It differs from distributed mode in that each of the separate processes run on the same server, rather than multiple servers in a cluster. This section also assumes you wish to store your HBase data in
HDFS rather than on the local filesystem.

Note:Before you start

This section assumes you have already installed the HBase master and gone through the standalone configuration steps.

If the HBase master is already running in standalone mode, stop it as follows before continuing with pseudo-distributed configuration:

To stop the CDH 4 version: sudo service hadoop-hbase-master stop, or

To stop the CDH 5 version if that version is already running: sudo service hbase-master stop

Modifying the HBase Configuration

To enable pseudo-distributed mode, you must first make some configuration changes. Open /etc/hbase/conf/hbase-site.xml in your editor of choice, and insert
the following XML properties between the <configuration> and </configuration> tags. The hbase.cluster.distributed property directs HBase to start each process in a separate JVM. The hbase.rootdir property directs HBase to store its data
in an HDFS filesystem, rather than the local filesystem. Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file); you may also need to change the port number from the
default (8020).

Note: If Kerberos is enabled, do not use commands in the form sudo -u <user> hadoop <command>; they will fail with a security error. Instead, use the following commands: $ kinit <user> (if you
are using a password) or$ kinit -kt <keytab> <principal> (if you are using a keytab) and then, for
each command executed by this user, $ <command>

Enabling Servers for Pseudo-distributed Operation

After you have configured HBase, you must enable the various servers that make up a distributed HBase cluster. HBase uses three required types of servers:

Installing and Starting ZooKeeper Server

HBase uses ZooKeeper Server as a highly available, central location for cluster management. For example, it allows clients to locate the servers, and ensures that only one master is
active at a time. For a small cluster, running a ZooKeeper node collocated with the NameNode is recommended. For larger clusters, contact Cloudera Support for configuration help.

Starting the HBase Master

After ZooKeeper is running, you can start the HBase master in standalone mode.

$ sudo service hbase-master start

Starting an HBase RegionServer

The RegionServer is the HBase process that actually hosts data and processes requests. The RegionServer typically runs on all HBase nodes except for the node running the HBase master
node.

To enable the HBase RegionServer On Red Hat-compatible systems:

$ sudo yum install hbase-regionserver

To enable the HBase RegionServer on Ubuntu and Debian systems:

$ sudo apt-get install hbase-regionserver

To enable the HBase RegionServer on SLES systems:

$ sudo zypper install hbase-regionserver

To start the RegionServer:

$ sudo service hbase-regionserver start

Verifying the Pseudo-Distributed Operation

After you have started ZooKeeper, the Master, and a RegionServer, the pseudo-distributed cluster should be up and running. You can verify that each of the daemons is running using the
jps tool from the Oracle JDK, which you can obtain from here. If you are running a pseudo-distributed HDFS installation and a pseudo-distributed HBase installation on one machine, jps will show the following
output:

You should also be able to navigate to http://localhost:60010 and verify that the local RegionServer has registered with the Master.

Installing and Starting the HBase Thrift Server

The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact
with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the RegionServers, but should not be collocated with the
NameNode or the JobTracker. For more information about Thrift, visit http://thrift.apache.org/.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.