Configuring the Lily HBase NRT Indexer Service for Use with Cloudera Search

The Lily HBase NRT Indexer Service is a flexible, scalable, fault-tolerant, transactional, near real-time (NRT) system for processing a continuous stream of HBase cell updates into live
search indexes. Typically it takes seconds for data ingested into HBase to appear in search results; this duration is tunable. The Lily HBase Indexer uses SolrCloud to index data stored in HBase. As
HBase applies inserts, updates, and deletes to HBase table cells, the indexer keeps Solr consistent with the HBase table contents, using standard HBase replication. The indexer supports flexible
custom application-specific rules to extract, transform, and load HBase data into Solr. Solr search results can contain columnFamily:qualifier links back to the data
stored in HBase. This way, applications can use the Search result set to directly access matching raw HBase cells. Indexing and searching do not affect operational stability or write throughput of
HBase because the indexing and searching processes are separate and asynchronous to HBase.

The Lily HBase NRT Indexer Service must be deployed in an environment with a running HBase cluster, a running SolrCloud cluster, and at least one ZooKeeper cluster. This can be done with
or without Cloudera Manager. See Managing Services for more information on adding services such as the Lily
HBase Indexer Service.

Enabling Cluster-wide HBase Replication

The Lily HBase Indexer is implemented using HBase replication, presenting indexers as RegionServers of the worker cluster. This requires HBase replication on the HBase cluster, as well
as the individual tables to be indexed. An example of settings required for configuring cluster-wide HBase replication is shown in /usr/share/doc/hbase-solr-doc*/demo/hbase-site.xml. You must add these settings to all of the hbase-site.xml configuration files on the HBase
cluster, except the replication.replicationsource.implementation property. You can use the Cloudera Manager HBase Indexer Service GUI to do this. After making these
updates, restart your HBase cluster.

Pointing a Lily HBase NRT Indexer Service at an HBase Cluster that Needs to Be Indexed

Before starting Lily HBase NRT Indexer services, you must configure individual services with the location of a ZooKeeper ensemble that is used for the target HBase cluster. Add the
following property to /etc/hbase-solr/conf/hbase-indexer-site.xml. Remember to replace hbase-cluster-zookeeper with the actual ensemble
string found in the hbase-site.xml configuration file:

Configure all Lily HBase NRT Indexer Services to use a particular ZooKeeper ensemble to coordinate with one another. Add the following property to /etc/hbase-solr/conf/hbase-indexer-site.xml, and replace hbase-cluster-zookeeper:2181 with the actual ensemble string:

Configuring Lily HBase Indexer Security

Beginning with CDH 5.4 the Lily HBase Indexer includes an HTTP interface for the list-indexers, create-indexer, update-indexer, and delete-indexer commands. This interface can be configured to use Kerberos and to integrate with Sentry.

Configuring Lily HBase Indexer to Use Security

To configure the Lily HBase Indexer to use security, you must create principals and keytabs and then modify default configurations.

To create principals and keytabs

Repeat this process on all Lily HBase Indexer hosts.

Create a Lily HBase Indexer service user principal using the syntax: hbase/<fully.qualified.domain.name>@<YOUR-REALM>. This principal is used to authenticate with the Hadoop cluster. where: fully.qualified.domain.name is the host where the Lily HBase Indexer is running YOUR-REALM is the name of your Kerberos realm.

Create a HTTP service user principal using the syntax: HTTP/<fully.qualified.domain.name>@<YOUR-REALM>. This principal is used to authenticate user requests coming to the Lily HBase Indexer web-services. where: fully.qualified.domain.name is the host where the Lily HBase Indexer is running YOUR-REALM is the name of your Kerberos realm.

Set up the Java Authentication and Authorization Service (JAAS). Create a jaas.conf file in the HBase-Indexer
configuration directory containing the following settings. Make sure that you substitute a value for principal that matches your particular environment.

Then, modify hbase-indexer-env.sh in the hbase-indexer configuration directory to add the jaas configuration to the system properties.
You can do this by adding -Djava.security.auth.login.config to the HBASE_INDEXER_OPTS. For example, you might add the following:

If role has WRITE privilege for indexer1, a call to create, update, or delete indexer1 succeeds.

If role has READ privilege for indexer1, a call to list-indexers will list indexer1, if it exists. If an indexer called indexer2 exists, but the role
does not have READ privileges for it, information about indexer2 is filtered out of the response.

To configure Sentry for the Lily HBase Indexer, add the following properties to hbase-indexer-site.xml:

Note: These settings can be added using Cloudera Manager or by manually editing the hbase-indexer-site.xml
file.

Starting a Lily HBase NRT Indexer Service

You can use the Cloudera Manager GUI to start Lily HBase NRT Indexer Service on a set of machines. In non-managed deployments, you can start a Lily HBase Indexer Daemon manually on the
local host with the following command:

sudo service hbase-solr-indexer restart

After starting the Lily HBase NRT Indexer Services, verify that all daemons are running using the jps tool from the Oracle JDK, which you can obtain from the Java SE Downloads page. If
you are running a pseudo-distributed HDFS installation and a Lily HBase NRT Indexer Service installation on one machine, jps shows the following output:

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.