Getting Started with Apache Zeppelin and Elassandra with Instaclustr

Instaclustr has retired support for Elassandra and now supports Elasticsearch.

Elassandra (Elasticsearch + Cassandra) is a fork of Elasticsearch modified to run on top of Apache Cassandra to provide advanced search features on Cassandra tables. In this tutorial we will walk you through the basic steps of setting up an Instaclustr Elassandra cluster with Zeppelin on Amazon Web Services (AWS) and how to query and visualize Elassandra indexes using Elasticsearch interpreter. The high-level steps are:

Provision a cluster with Elassandra and Zeppelin

If you haven’t already signed up for an Instaclustr account, refer our support article to sign up and create an account.

Once you have signed up for Instaclustr and verified your email, log in to the Instaclustr console and click the Create Cluster button.

On the Create Cluster page, enter an appropriate name for your cluster. Under Applications section, select:

Elassandra 5.5.0.13 (Cassandra 3.11) (preview)

Apache Zeppelin as an Add-on

Under Data Centre section, select:

Amazon Web Services as the Infrastructure Provider

A minimum node size of t2.medium

Leave the other options as default. Accept the terms and conditions and click Create Cluster button.The cluster will automatically provision and will be available for use once all nodes are in the running state.

Create a Zeppelin notebook based on Elasticsearch interpreter

Once all nodes in the cluster are in the running state, click the Zeppelin tab. This will take you to Connection Details information page for Zeppelin.

On the Connection Details information page, make note of Username and Password for Zeppelin and click the URL to login to Zeppelin dashboard..

On the Zeppelin Dashboard, click Create new note. On the Create New Note dialog box, choose a name for the notebook, select elasticsearch as Default Interpreter and click Create Note button.

The notebook has already been preconfigured to use Elasticsearch interpreter. Click the gear button on the top right of the notebook to see the enabled interpreters and more importantly Elasticsearch.

Make sure Elasticsearch interpreter is at the top of the list and Cassandra interpreter is enabled. Click Save button to save the settings.

Add data to Elassandra using Zeppelin Elasticsearch interpreter

To start off, let’s index some data into Elassandra by running the commands below, one per paragraph.

Note: if Elasticsearch is not your default interpreter, you should have %elasticsearch at the top of each paragraph to get it to run.

1

indextwitter/user/kimchy{"name":"Shay Banon"}

Index some more data by running the following commands on the notebook:

Shell

1

2

3

indextwitter/tweet/1{

"postDate":"2009-11-15T13:12:00",

"message":"Trying out Zeppelin Elasticsearch interpreter, so far so good?"}

Shell

1

2

3

indextwitter/tweet/2{

"postDate":"2009-11-15T14:12:12",

"message":"Another tweet, will it be indexed?"}

Shell

1

2

3

indextwitter/tweet/3{

"postDate":"2009-11-15T15:12:12",

"message":"Give me my index and no query gets hurt!"}

Shell

1

2

3

indextwitter/tweet/4{

"postDate":"2009-11-16T15:12:12",

"message":"Index it before search it!"}

Query and search data via Zeppelin notebook

Once the data is in Elassandra, we can search using Zeppelin, for example:

Shell

1

gettwitter/user/kimchy

Shell

1

count twitter/tweet

Shell

1

search twitter/tweet

The result of a search query can also be viewed graphically (histograms, pie charts etc.) or downloaded as CSV (Comma Separated Values) or TSV (Tab Separated Values) file by clicking on the buttons marked in blue box in the above screenshot.