Installing the Spark Indexer

The Spark indexer uses a Spark or MapReduce ETL batch job to move data from HDFS files into Apache Solr. As part of this process, the indexer uses Morphlines to extract and transform
data.

To use the Spark indexer, solr-crunch must be installed on hosts where you want to submit a batch indexing job.

By default, this tool is installed when Cloudera Search is installed using parcels, such as in a Cloudera Manager deployment. If you are using a package installation and this tool does
not exist on your system, you can install this tool using the commands described in this topic.

To install solr-crunch On RHEL systems:

$ sudo yum install solr-crunch

To install solr-crunch on Ubuntu and Debian systems:

$ sudo apt-get install solr-crunch

To install solr-crunch on SLES systems:

$ sudo zypper install solr-crunch

For information on using Spark to batch index documents, see the Spark Indexing.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.