­HBase & Solr – Near Real time indexing and search

Once Solr server ready then we are ready to configure our collection (in solr cloud); which will be link to HBase table.

Add below properties to hbase-site.xml file.

Add below properties to/etc/hbase-solr/conf/hbase-indexer-site.xml. This will enable Lily indexer to reach HBase cluster for indexing. Replace your values for properties. Replace the hbase-cluster-zookeeper values as mentioned in hbase-site.xml, for local environment its value is localhost.

YAML

1

2

3

4

5

6

7

8

9

<property>

<name>hbase.zookeeper.quorum</name>

<value>hbase-cluster-zookeeper</value>

</property>

<property>

<name>hbaseindexer.zookeeper.connectstring</name>

<value>hbase-cluster-zookeeper:2181</value>

</property>

Restart below services

Shell

1

2

3

$sudo service hbase-solr-indexer restart

$sudo service solr-server restart

Create a hbase table with replication

Since the HBase Indexer works by acting as a Replication Sink, we need to make sure that Replication is enabled in HBase. You can activate replication using Cloudera Manager by clicking HBase Service->Configuration->Backup and ensuring “Enable HBase Replication” and “Enable Indexing” are both checked.

In addition, we have to make sure that the column family in the HBase table that needs to be replicated must have replication enabled. This can be done by ensuring that the REPLICATION_SCOPE flag is set while the column family is created, as shown below:

Shell

1

2

hbase shell>create'EmployeeTable',{NAME=>'data',REPLICATION_SCOPE=>1}

Create Solr cloud collection

Shell

1

2

$solrctl instancedir--generate$HOME/hbase-collection1

Once you run above command get into path $HOME/hbase-collection1/conf in which there is solr config file; you can edit the schema.xml file

Shell

1

2

$nano$HOME/hbase-collection1/conf/schema.xml

with our own schema, for this use case we have to add below tag which is column family of HBase (data).

The Lily hbase indexer services provides a command line utility that can be used to add, list, update and delete indexer configurations. The command shown below registers and adds a indexer configuration to the Hbase Indexer. This is done by passing an index configuration XML file also with the zookeeper ensemble information used for Hbase and SOLR and the solr collection

Shell

1

2

3

4

5

6

$hbase-indexer add-indexer--name myNRTIndexer

--indexer-conf$HOME/morphline-hbase-mapper.xml

--connection-param solr.zk=localhost:2181/solr

--connection-param solr.collection=hbase-collection1

--zookeeper localhost:2181

Verify that the indexer was successfully created as follows:

Shell

1

2

$hbase-indexer list-indexers

Verifying the indexing is working

Add rows to the indexed HBase table. For example:

Shell

1

2

3

4

$hbase shell

hbase(main):001:0>put'EmployeeTable','row1','data','value'

hbase(main):002:0>put'EmployeeTable','row2','data','value2'

If the put operation succeeds, wait a few seconds, then navigate to the Search in HUE UI, and query the data. Note the updated rows in Search.

Configuring Lily HBase NRT Indexer Service for Use with Cloudera Search