Installing and Starting ZooKeeper Server

SolrCloud mode uses a ZooKeeper Service as a highly available, central location for cluster management. For a small cluster, running a ZooKeeper host collocated with the NameNode is
recommended. For larger clusters, you may want to run multiple ZooKeeper servers. For more information, see Installing the ZooKeeper Packages.

Initializing Solr

Once the ZooKeeper Service is running, configure each Solr host with the ZooKeeper Quorum address or addresses. Provide the ZooKeeper Quorum address for each ZooKeeper server. This could
be a single address in smaller deployments, or multiple addresses if you deploy additional servers.

Configure the ZooKeeper Quorum address in solr-env.sh. The file location varies by installation type. If you accepted default file locations, the
solr-env.sh file can be found in:

Parcels: /opt/cloudera/parcels/CDH-*/etc/default/solr

Packages: /etc/default/solr

Edit the property to configure the hosts with the address of the ZooKeeper service. You must make this configuration change for every Solr Server host. The following example shows a
configuration with three ZooKeeper hosts:

SOLR_ZK_ENSEMBLE=<zkhost1>:2181,<zkhost2>:2181,<zkhost3>:2181/solr

Configuring Solr for Use with HDFS

To use Solr with your established HDFS service, perform the following configurations:

Configure the HDFS URI for Solr to use as a backing store in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, edit the following property to configure the location of Solr index data in HDFS:

SOLR_HDFS_HOME=hdfs://namenodehost:8020/solr

Replace namenodehost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file). You may also need to change the port number from the default (8020). On an HA-enabled cluster, ensure
that the HDFS URI you use reflects the designated name service used by your cluster. This value should be reflected in fs.default.name; instead of a hostname, you would
see hdfs://nameservice1 or something similar.

In some cases, such as for configuring Solr to work with HDFS High Availability (HA), you may want to configure the Solr HDFS client by setting
the HDFS configuration directory in /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr. On every Solr Server host, locate
the appropriate HDFS configuration directory and edit the following property with the absolute path to this directory :

SOLR_HDFS_CONFIG=/etc/hadoop/conf

Replace the path with the correct directory containing the proper HDFS configuration files, core-site.xml and hdfs-site.xml.

Add Kerberos-related settings to /etc/default/solr or /opt/cloudera/parcels/CDH-*/etc/default/solr on every host in your cluster, substituting appropriate values. For a package based installation, use something similar to the
following:

Creating the /solr Directory in HDFS

Before starting the Cloudera Search server, you need to create the /solr directory in HDFS. The Cloudera Search master runs
as solr:solr, so it does not have the required permissions to create a top-level directory.

Initializing the ZooKeeper Namespace

Before starting the Cloudera Search server, you need to create the solr namespace in ZooKeeper:

$ solrctl init

Warning:solrctl init takes a --force option as well. solrctl init --force clears the Solr data in ZooKeeper and interferes with any running hosts. If you clear Solr data from ZooKeeper to start over, be sure to stop the cluster
first.

Starting Solr

To start the cluster, start Solr Server on each host:

$ sudo service solr-server restart

After you have started the Cloudera Search Server, the Solr server should be running. To verify that all daemons are running, use the jps tool from the
Oracle JDK, which you can obtain from the Java SE Downloads page. If you are running a
pseudo-distributed HDFS installation and a Solr search installation on one machine, jps shows the following output:

Runtime Solr Configuration

To start using Solr for indexing the data, you must configure a collection holding the index. A configuration for a collection requires a solrconfig.xml
file, a schema.xml and any helper files referenced from the xml files. The solrconfig.xml file contains
all of the Solr settings for a given collection, and the schema.xml file specifies the schema that Solr uses when indexing documents. For more details on how to
configure a collection for your data set, see http://wiki.apache.org/solr/SchemaXml.

Configuration files for a collection are managed as part of the instance directory. To generate a skeleton of the instance directory, run the following command:

$ solrctl instancedir --generate $HOME/solr_configs

You can customize it by directly editing the solrconfig.xml and schema.xml files created in $HOME/solr_configs/conf.

These configuration files are compatible with the standard Solr tutorial example documents.

After configuration is complete, you can make it available to Solr by issuing the following command, which uploads the content of the entire instance directory to ZooKeeper:

$ solrctl instancedir --create collection1 $HOME/solr_configs

Use the solrctl tool to verify that your instance directory uploaded successfully and is available to ZooKeeper. List the contents of an instance
directory as follows:

$ solrctl instancedir --list

If you used the earlier --create command to create collection1, the --list command should
return collection1.

Important:

If you are familiar with Apache Solr, you might configure a collection directly in solr home: /var/lib/solr. Although this is possible, Cloudera recommends
using solrctl instead.

Creating Your First Solr Collection

By default, the Solr server comes up with no collections. Make sure that you create your first collection using the instancedir that you provided to Solr
in previous steps by using the same collection name. numOfShards is the number of SolrCloud shards you want to partition the collection across. The number of shards
cannot exceed the total number of Solr servers in your SolrCloud cluster:

$ solrctl collection --create collection1 -s {{numOfShards}}

You should be able to check that the collection is active. For example, for the server myhost.example.com, you should be able to browse to http://myhost.example.com:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true and verify that the collection is active. Similarly, you should be able to view the
topology of your SolrCloud using a URL similar to http://myhost.example.com:8983/solr/#/~cloud.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.