Saturday, June 15, 2013

Step by Step setting up solr cloud

Here is a quick step by step instruction for setting up solr cloud on few machines. We will be setting up solr cloud to match production environment - that is - there will be separate setup for zookeeper & solr will be sitting inside tomcat instead of the default jetty.

Apache zookeeper is a centralized service for maintaining configuration information. In case of solr cloud, the solr configuration information is maintained in zookeeper. Lets set up the zookeeper first.

We will be setting up solr cloud on 3 machines. Please make the following entries in /etc/host file on all machines.

172.22.67.101 solr1
172.22.67.102 solr2
172.22.67.103 solr3

Lets setup the zookeeper cloud on 2 machines

download and untar zookeeper in /opt/zookeeper directory on both servers solr1 & solr2. On both the servers do the following

root@solr1$ mkdir /opt/zookeeper/dataroot@solr1$ cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfgroot@solr1$ vim /opt/zookeeper/zoo.cfg
Make the following changes in the zoo.cfg file

Point your browser to http://172.22.67.101:8983/solr/
Click on "cloud"->"graph" on your left section and you can see that there are 2 nodes in shard 1 and 1 node in shard 2.

Lets feed some data and see how they are distributed across multiple shards.

root@solr3$ cd /opt/solr/example/exampledocsroot@solr3$ java -jar post.jar *.xml
Lets check the data as it was distributed between the two shards. Head back to to Solr admin cloud graph page at http://172.22.67.101:8983/solr/.

click on the first shard collection1, you may have 14 documents in this shard
click on the second shard collection1, you may have 18 documents in this shard
check the replicas for each shard, they should have the same counts
At this point, we can start issues some queries against the collection:

Get all documents in the collection:http://172.22.67.101:8983/solr/collection1/select?q=*:*

Get all documents in the collection belonging to shard1:http://172.22.67.101:8983/solr/collection1/select?q=*:*&shards=shard1

Get all documents in the collection belonging to shard2:http://172.22.67.101:8983/solr/collection1/select?q=*:*&shards=shard2

set the numShards variable as a part of solr starup environment variable on all machines.

root@solr1$ cat /opt/apache-tomcat-7.0.40/bin/setenv.shexport JAVA_OPTS=' -Xms4096M -Xmx8192M -DnumShards=2 '
To be on the safer side, cleanup the clusterstate.json in zookeeper. Now start tomcat on all machines and check the catalina.out file for errors if any. Once all nodes are up, you should be able to point your browser to http://172.22.67.101:8080/solr -> cloud -> graph and see the 3 nodes which form the cloud.

Lets add a new node to shard 2. It will be added as a replica of current node on shard 2.

http://172.22.67.101:8983/solr/admin/cores?action=CREATE&name=collection1_shard2_replica2&collection=collection1&shard=shard2
And lets check the clusterstate.json file now